fix classpath hashing #1832

kpodsiad · 2022-10-05T05:41:57Z

remove previous mechanism which tried to share work between concurrent requests - it was causing too much mental overhead and was error prone. I'd say that gains from it were pretty negligibly small - if the same jar was being hashed at the same time workload was shared - I'd say that it's very rare situation and it doesn't cost that much. After computation results are cached so subsequent requests will use cache.
uncomment already existing tests & fix them
remove unused code

backend/src/main/scala/bloop/io/ClasspathHasher.scala

kpodsiad · 2022-10-05T05:45:26Z

frontend/src/test/scala/bloop/testing/BaseSuite.scala

+  def testAsyncT(name: String, maxDuration: Duration)(
+      run: => Task[Unit]
+  ): Unit = {
+    test(name) {
+      Await.result(run.runAsync(ExecutionContext.scheduler), maxDuration)
+    }
+  }
+


new utility method which accepts task - no more Thread.sleep and Await.Result in tests. Logic can be composed via Task's map/flatMap and we can have Await.Result only at the end.

kpodsiad · 2022-10-05T05:46:18Z

frontend/src/test/scala/bloop/util/TestUtil.scala

+  /** Creates an empty workspace where operations  described as Tasks can happen. */
+  def withinWorkspaceT[T](fun: AbsolutePath => Task[T]): Task[T] = for {
+    temp <- Task(Files.createTempDirectory("bloop-test-workspace").toRealPath())
+    tempDirectory = AbsolutePath(temp)
+    result <- fun(tempDirectory).doOnFinish(_ => Task(delete(tempDirectory)))
+  } yield result


similar to the withinWorkspaceT but creating and deleting directory is bounded to Task's lifetime

project/Dependencies.scala

backend/src/main/scala/bloop/io/ClasspathHasher.scala

dos65

Nice work! In general looks good, I have only one question.

Also, we need to ask @tgodzik to run performance tests.
It's the first occurrence of Task.parSequnceN on a large amount of task - I have some doubts about it's implementation 😸

backend/src/main/scala/bloop/io/ClasspathHasher.scala

tgodzik · 2022-10-06T16:01:43Z

Nice work! In general looks good, I have only one question.

Also, we need to ask @tgodzik to run performance tests. It's the first occurrence of Task.parSequnceN on a large amount of task - I have some doubts about it's implementation smile_cat

Running them right now!

bloopoid · 2022-10-06T18:19:06Z

Performance test finished successfully.

Benchmarks is based on merging with master

kpodsiad · 2022-10-07T06:05:07Z

Performance test finished successfully.

Benchmarks is based on merging with master

Gimme Gimme Gimme results from perf tests after midnight.

tgodzik · 2022-10-07T12:27:21Z

Performance test finished successfully.
Benchmarks is based on merging with master

Gimme Gimme Gimme results from perf tests after midnight.

It seems to have run correctly but I don't see any results yet, I think there is some delay there :/

tgodzik · 2022-10-10T15:43:27Z

Performance test finished successfully.
Benchmarks is based on merging with master

Gimme Gimme Gimme results from perf tests after midnight.

It seems to have run correctly but I don't see any results yet, I think there is some delay there :/

Ok, I don't see any regressions, so we should be good.

kpodsiad · 2022-10-12T08:08:36Z

@tgodzik @dos65 can we merge this PR then?

tgodzik

Added some very minor comments, but I am wondering how much was the deduplication of hashing needed.

@Duhemm do you have any idea what kind of gains this might offer? Would you be able to test the difference on a larger codebase?

backend/src/main/scala/bloop/io/ClasspathHasher.scala

backend/src/main/scala/bloop/task/Task.scala

dos65

LGTM!

@tgodzik

@Duhemm do you have any idea what kind of gains this might offer?

From my side it looks more like refactoring + fix for non-correct usage of array across threads. Actually, I wanted to do the same while I was doing monix upgrade.

kpodsiad

Ok, I brought back workload sharing. Implementation is almost pure (updating concurrent hashmap isn't) and should be more straightforward to understand.

kpodsiad · 2022-10-14T17:20:19Z

backend/src/main/scala/bloop/io/ClasspathHasher.scala


-    Task.gatherUnordered(classpath.map(readJar(_)))
+object ClasspathHasher {
+  final val global: ClasspathHasher = new ClasspathHasher


It was impossible to test ClasspathHasher because tests were sharing global cache. I turned it into class and create global instance because:

I didn't want to touch already existing code which was referring to ClasspathHasher as an object. Now it's using ClasspathHasher.global which is almost the same

in tests, each testcase is creating a fresh instance of ClasspathHasher - no pollution

kpodsiad · 2022-10-14T17:21:10Z

backend/src/main/scala/bloop/io/ClasspathHasher.scala

+  private sealed trait FileHashingResult {
+    def task: Task[FileHash]
  }
+  private final case class Computed(task: Task[FileHash]) extends FileHashingResult
+  private final case class FromPromise(task: Task[FileHash]) extends FileHashingResult


Is scaladoc good enough to explain why this is needed?

In the description you wrote:

remove previous mechanism which tried to share work between concurrent requests - it was causing too much mental overhead and was error prone.

should we not have only computed since we don't want to depend on another task?

kpodsiad · 2022-10-15T06:42:38Z

I must have messed up cancelation somehow :/

tgodzik · 2022-10-17T11:34:45Z

Looks like the tests are not finishing now 🤔

tgodzik · 2022-10-17T11:38:08Z

ClasspathHasher.scala:196 this.hashingPromises.size(): 0 <- looks like no promises are ever added?

kpodsiad · 2022-10-17T15:09:21Z

@tgodzik @dos65 you guys probably want to unsubscribe from his PR for while, I can push a lot of trash commits here to debug what's going on.

ClasspathHasher.scala:196 this.hashingPromises.size(): 0 <- looks like no promises are ever added?

This is actually ok I think, every hash request for file is adding/removing or just waiting for promise to complete. Logging is placed at the very end of task, when all task were resolved so it means that all promises should be completed & removed too.

I'm a bit lost here currently as locally there is no such deadlock like there is on CI.

Duhemm · 2022-10-18T07:48:32Z

Hi! Thanks a lot for looking into this @kpodsiad ! I did some testing on a small subset of our build, and it looks like there's a small performance regression for us:

	v1.5.4-4-00074600	This PR
First full build	5m53s	6m15s
Second full build	4m25s	5m8s

Some info about the Bloop projects that are being built:

709 projects
3915 sources
Average classpath length is 2335
7299 unique classpath entries

* remove previous mechanism which tried to share work between concurrent requests - it was causing too much mental overhead and was error prone * uncomment already existing tests & fix them * remove unused code

kpodsiad · 2024-03-27T15:50:52Z

backend/src/main/scala/bloop/task/ParSequenceN.scala

+  def parSequenceN[A](n: Int)(in: Iterable[Task[A]]): Task[Vector[A]] = {
+    if (in.isEmpty) {
+      Task.now(Vector.empty)
+    } else {
+      // val isCancelled = new AtomicBoolean(false)
+      Task.defer {


previously, parSequenceN impl was fragile to big tasks halting other tasks in chunk from progressing:

val chunks = in.grouped(n).toList.map(group => Task.parSequence(group)) Task.sequence(chunks).map(_.flatten)

I ported Monix implementation of parSequenceN which starts N workers that are constantly polling for the next job whenever they finish its task.

tgodzik

Thanks for coming back to this! It looks good, though I have some questions and comments.

tgodzik · 2024-04-02T10:58:15Z

frontend/src/test/scala/bloop/bsp/BspBaseSuite.scala

@@ -593,6 +593,7 @@ abstract class BspBaseSuite extends BaseSuite with BspClientTest {
      // https://github.com/scalacenter/bloop/issues/281
      super.ignore(name, "DISABLED")(fun)
    } else {
+      pprint.log(name)


tgodzik · 2024-04-02T11:01:16Z

frontend/src/main/scala/bloop/engine/caches/ResultsCache.scala

@@ -159,7 +160,10 @@ object ResultsCache {
      logger: Logger
  ): ResultsCache = {
    val handle = loadAsync(build, cwd, cleanOrphanedInternalDirs, logger)
-    Await.result(handle.runAsync(ExecutionContext.ioScheduler), Duration.Inf)
+    Await.result(


Looks like this is causing timeouts for DAP tests, any reason for the change?

tgodzik · 2024-04-02T11:02:04Z

frontend/src/test/scala/bloop/testing/BaseSuite.scala

+    }
+  }
+
+  /*


Remove the commented out method?

tgodzik · 2024-04-02T11:02:50Z

frontend/src/test/scala/bloop/testing/BaseSuite.scala

@@ -533,6 +533,16 @@ abstract class BaseSuite extends TestSuite with BloopHelpers {
    )
  }

+  @nowarn("msg=parameter value run|maxDuration in method ignore is never used")


Is this not removed to avoid too many changes?

tgodzik · 2024-04-02T11:08:29Z

backend/src/main/scala/bloop/io/ClasspathHasher.scala

+  private sealed trait FileHashingResult {
+    def task: Task[FileHash]
  }
+  private final case class Computed(task: Task[FileHash]) extends FileHashingResult
+  private final case class FromPromise(task: Task[FileHash]) extends FileHashingResult


In the description you wrote:

remove previous mechanism which tried to share work between concurrent requests - it was causing too much mental overhead and was error prone.

should we not have only computed since we don't want to depend on another task?

kpodsiad requested a review from tgodzik October 5, 2022 05:42

kpodsiad commented Oct 5, 2022

View reviewed changes

kpodsiad force-pushed the fix/hashing branch 2 times, most recently from 456bf89 to 54160c6 Compare October 5, 2022 06:32

kpodsiad marked this pull request as draft October 5, 2022 07:32

kpodsiad force-pushed the fix/hashing branch from 223d796 to 0a54ea0 Compare October 5, 2022 14:30

kpodsiad marked this pull request as ready for review October 5, 2022 15:06

kpodsiad requested a review from dos65 October 5, 2022 15:06

dos65 reviewed Oct 6, 2022

View reviewed changes

backend/src/main/scala/bloop/io/ClasspathHasher.scala Show resolved Hide resolved

kpodsiad requested a review from dos65 October 12, 2022 08:07

tgodzik requested changes Oct 13, 2022

View reviewed changes

backend/src/main/scala/bloop/io/ClasspathHasher.scala Outdated Show resolved Hide resolved

backend/src/main/scala/bloop/task/Task.scala Outdated Show resolved Hide resolved

dos65 approved these changes Oct 13, 2022

View reviewed changes

kpodsiad commented Oct 14, 2022

View reviewed changes

kpodsiad requested review from tgodzik and dos65 October 14, 2022 17:23

ckipp01 added the enhancement label Dec 10, 2022

kpodsiad added 3 commits March 27, 2024 16:42

fix classpath hashing

a372514

* remove previous mechanism which tried to share work between concurrent requests - it was causing too much mental overhead and was error prone * uncomment already existing tests & fix them * remove unused code

bring back workload sharing

5798261

try to make par sequence N a bit faster

169d0a7

fix formatting

9edc0ff

kpodsiad force-pushed the fix/hashing branch from 65318a5 to 9edc0ff Compare March 27, 2024 15:46

extract cancellation outisde of raiseError

cf77788

kpodsiad commented Mar 27, 2024

View reviewed changes

tgodzik requested changes Apr 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix classpath hashing #1832

fix classpath hashing #1832

kpodsiad commented Oct 5, 2022 •

edited

Loading

kpodsiad Oct 5, 2022

kpodsiad Oct 5, 2022

dos65 left a comment

tgodzik commented Oct 6, 2022

bloopoid commented Oct 6, 2022

kpodsiad commented Oct 7, 2022

tgodzik commented Oct 7, 2022

tgodzik commented Oct 10, 2022

kpodsiad commented Oct 12, 2022

tgodzik left a comment

dos65 left a comment

kpodsiad left a comment

kpodsiad Oct 14, 2022

kpodsiad Oct 14, 2022

tgodzik Apr 2, 2024

kpodsiad commented Oct 15, 2022

tgodzik commented Oct 17, 2022

tgodzik commented Oct 17, 2022

kpodsiad commented Oct 17, 2022

Duhemm commented Oct 18, 2022

kpodsiad Mar 27, 2024

tgodzik left a comment

tgodzik Apr 2, 2024

tgodzik Apr 2, 2024

tgodzik Apr 2, 2024

tgodzik Apr 2, 2024

tgodzik Apr 2, 2024

+                  }
+                }
+                /*

fix classpath hashing #1832

Are you sure you want to change the base?

fix classpath hashing #1832

Conversation

kpodsiad commented Oct 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dos65 left a comment

Choose a reason for hiding this comment

tgodzik commented Oct 6, 2022

bloopoid commented Oct 6, 2022

kpodsiad commented Oct 7, 2022

tgodzik commented Oct 7, 2022

tgodzik commented Oct 10, 2022

kpodsiad commented Oct 12, 2022

tgodzik left a comment

Choose a reason for hiding this comment

dos65 left a comment

Choose a reason for hiding this comment

kpodsiad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kpodsiad commented Oct 15, 2022

tgodzik commented Oct 17, 2022

tgodzik commented Oct 17, 2022

kpodsiad commented Oct 17, 2022

Duhemm commented Oct 18, 2022

Choose a reason for hiding this comment

tgodzik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kpodsiad commented Oct 5, 2022 •

edited

Loading