Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix classpath hashing #1832

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

kpodsiad
Copy link
Collaborator

@kpodsiad kpodsiad commented Oct 5, 2022

  • remove previous mechanism which tried to share work between concurrent requests - it was causing too much mental overhead and was error prone. I'd say that gains from it were pretty negligibly small - if the same jar was being hashed at the same time workload was shared - I'd say that it's very rare situation and it doesn't cost that much. After computation results are cached so subsequent requests will use cache.
  • uncomment already existing tests & fix them
  • remove unused code

@kpodsiad kpodsiad requested a review from tgodzik October 5, 2022 05:42
backend/src/main/scala/bloop/io/ClasspathHasher.scala Outdated Show resolved Hide resolved
Comment on lines +585 to +599
def testAsyncT(name: String, maxDuration: Duration)(
run: => Task[Unit]
): Unit = {
test(name) {
Await.result(run.runAsync(ExecutionContext.scheduler), maxDuration)
}
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new utility method which accepts task - no more Thread.sleep and Await.Result in tests. Logic can be composed via Task's map/flatMap and we can have Await.Result only at the end.

Comment on lines +472 to +488
/** Creates an empty workspace where operations described as Tasks can happen. */
def withinWorkspaceT[T](fun: AbsolutePath => Task[T]): Task[T] = for {
temp <- Task(Files.createTempDirectory("bloop-test-workspace").toRealPath())
tempDirectory = AbsolutePath(temp)
result <- fun(tempDirectory).doOnFinish(_ => Task(delete(tempDirectory)))
} yield result
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to the withinWorkspaceT but creating and deleting directory is bounded to Task's lifetime

project/Dependencies.scala Outdated Show resolved Hide resolved
backend/src/main/scala/bloop/io/ClasspathHasher.scala Outdated Show resolved Hide resolved
@kpodsiad kpodsiad force-pushed the fix/hashing branch 2 times, most recently from 456bf89 to 54160c6 Compare October 5, 2022 06:32
@kpodsiad kpodsiad marked this pull request as draft October 5, 2022 07:32
@kpodsiad kpodsiad marked this pull request as ready for review October 5, 2022 15:06
@kpodsiad kpodsiad requested a review from dos65 October 5, 2022 15:06
Copy link
Collaborator

@dos65 dos65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! In general looks good, I have only one question.

Also, we need to ask @tgodzik to run performance tests.
It's the first occurrence of Task.parSequnceN on a large amount of task - I have some doubts about it's implementation 😸

@tgodzik
Copy link
Contributor

tgodzik commented Oct 6, 2022

Nice work! In general looks good, I have only one question.

Also, we need to ask @tgodzik to run performance tests. It's the first occurrence of Task.parSequnceN on a large amount of task - I have some doubts about it's implementation smile_cat

Running them right now!

@bloopoid
Copy link
Collaborator

bloopoid commented Oct 6, 2022

Performance test finished successfully.

Benchmarks is based on merging with master

@kpodsiad
Copy link
Collaborator Author

kpodsiad commented Oct 7, 2022

Performance test finished successfully.

Benchmarks is based on merging with master

Gimme Gimme Gimme results from perf tests after midnight.

@tgodzik
Copy link
Contributor

tgodzik commented Oct 7, 2022

Performance test finished successfully.
Benchmarks is based on merging with master

Gimme Gimme Gimme results from perf tests after midnight.

It seems to have run correctly but I don't see any results yet, I think there is some delay there :/

@tgodzik
Copy link
Contributor

tgodzik commented Oct 10, 2022

Performance test finished successfully.
Benchmarks is based on merging with master

Gimme Gimme Gimme results from perf tests after midnight.

It seems to have run correctly but I don't see any results yet, I think there is some delay there :/

Ok, I don't see any regressions, so we should be good.

@kpodsiad kpodsiad requested a review from dos65 October 12, 2022 08:07
@kpodsiad
Copy link
Collaborator Author

@tgodzik @dos65 can we merge this PR then?

Copy link
Contributor

@tgodzik tgodzik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some very minor comments, but I am wondering how much was the deduplication of hashing needed.

@Duhemm do you have any idea what kind of gains this might offer? Would you be able to test the difference on a larger codebase?

backend/src/main/scala/bloop/io/ClasspathHasher.scala Outdated Show resolved Hide resolved
backend/src/main/scala/bloop/task/Task.scala Outdated Show resolved Hide resolved
Copy link
Collaborator

@dos65 dos65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tgodzik

@Duhemm do you have any idea what kind of gains this might offer?

From my side it looks more like refactoring + fix for non-correct usage of array across threads. Actually, I wanted to do the same while I was doing monix upgrade.

Copy link
Collaborator Author

@kpodsiad kpodsiad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I brought back workload sharing. Implementation is almost pure (updating concurrent hashmap isn't) and should be more straightforward to understand.


Task.gatherUnordered(classpath.map(readJar(_)))
object ClasspathHasher {
final val global: ClasspathHasher = new ClasspathHasher
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was impossible to test ClasspathHasher because tests were sharing global cache. I turned it into class and create global instance because:

  • I didn't want to touch already existing code which was referring to ClasspathHasher as an object. Now it's using ClasspathHasher.global which is almost the same
  • in tests, each testcase is creating a fresh instance of ClasspathHasher - no pollution

Comment on lines +226 to +227
private sealed trait FileHashingResult {
def task: Task[FileHash]
}
private final case class Computed(task: Task[FileHash]) extends FileHashingResult
private final case class FromPromise(task: Task[FileHash]) extends FileHashingResult
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is scaladoc good enough to explain why this is needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the description you wrote:

remove previous mechanism which tried to share work between concurrent requests - it was causing too much mental overhead and was error prone. 

should we not have only computed since we don't want to depend on another task?

@kpodsiad
Copy link
Collaborator Author

I must have messed up cancelation somehow :/

@tgodzik
Copy link
Contributor

tgodzik commented Oct 17, 2022

Looks like the tests are not finishing now 🤔

@tgodzik
Copy link
Contributor

tgodzik commented Oct 17, 2022

ClasspathHasher.scala:196 this.hashingPromises.size(): 0 <- looks like no promises are ever added?

@kpodsiad
Copy link
Collaborator Author

@tgodzik @dos65 you guys probably want to unsubscribe from his PR for while, I can push a lot of trash commits here to debug what's going on.

ClasspathHasher.scala:196 this.hashingPromises.size(): 0 <- looks like no promises are ever added?

This is actually ok I think, every hash request for file is adding/removing or just waiting for promise to complete. Logging is placed at the very end of task, when all task were resolved so it means that all promises should be completed & removed too.

I'm a bit lost here currently as locally there is no such deadlock like there is on CI.

@Duhemm
Copy link
Collaborator

Duhemm commented Oct 18, 2022

Hi! Thanks a lot for looking into this @kpodsiad ! I did some testing on a small subset of our build, and it looks like there's a small performance regression for us:

v1.5.4-4-00074600 This PR
First full build 5m53s 6m15s
Second full build 4m25s 5m8s

Some info about the Bloop projects that are being built:

  • 709 projects
  • 3915 sources
  • Average classpath length is 2335
  • 7299 unique classpath entries

* remove previous mechanism which tried to share work between concurrent requests - it was causing too much mental overhead and was error prone
* uncomment already existing tests & fix them
* remove unused code
Comment on lines +11 to +16
def parSequenceN[A](n: Int)(in: Iterable[Task[A]]): Task[Vector[A]] = {
if (in.isEmpty) {
Task.now(Vector.empty)
} else {
// val isCancelled = new AtomicBoolean(false)
Task.defer {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously, parSequenceN impl was fragile to big tasks halting other tasks in chunk from progressing:

val chunks = in.grouped(n).toList.map(group => Task.parSequence(group))
Task.sequence(chunks).map(_.flatten)

I ported Monix implementation of parSequenceN which starts N workers that are constantly polling for the next job whenever they finish its task.

Copy link
Contributor

@tgodzik tgodzik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for coming back to this! It looks good, though I have some questions and comments.

@@ -593,6 +593,7 @@ abstract class BspBaseSuite extends BaseSuite with BspClientTest {
// https://github.com/scalacenter/bloop/issues/281
super.ignore(name, "DISABLED")(fun)
} else {
pprint.log(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

@@ -159,7 +160,10 @@ object ResultsCache {
logger: Logger
): ResultsCache = {
val handle = loadAsync(build, cwd, cleanOrphanedInternalDirs, logger)
Await.result(handle.runAsync(ExecutionContext.ioScheduler), Duration.Inf)
Await.result(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is causing timeouts for DAP tests, any reason for the change?

}
}

/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the commented out method?

@@ -533,6 +533,16 @@ abstract class BaseSuite extends TestSuite with BloopHelpers {
)
}

@nowarn("msg=parameter value run|maxDuration in method ignore is never used")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not removed to avoid too many changes?

Comment on lines +226 to +227
private sealed trait FileHashingResult {
def task: Task[FileHash]
}
private final case class Computed(task: Task[FileHash]) extends FileHashingResult
private final case class FromPromise(task: Task[FileHash]) extends FileHashingResult
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the description you wrote:

remove previous mechanism which tried to share work between concurrent requests - it was causing too much mental overhead and was error prone. 

should we not have only computed since we don't want to depend on another task?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants