Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TaskGraph implementation for mid-frame parallelism #5302

Merged
merged 16 commits into from
Nov 28, 2021

Conversation

sturnclaw
Copy link
Member

This PR opens up the ability to better take advantage of modern massively-multicore CPUs by adding in a new system to manage worker threads and dispatch work items to them. This replaces the existing AsyncJobQueue implementation (but is backwards-compatible with existing Jobs and provides a compatibility shim) with an implementation that is suitable for both "fire and forget" mid-frame tasks and long-running background tasks that need to be finalized on the main thread.

All threads are now capable of queuing tasks at any time, and there is an explicit API to allow a thread to "go wide" and split computation over all threads including the main thread (WaitForTaskSet). "Pinned" tasks are run at the end of the frame on the main thread (for tasks that need access to the OpenGL context or similar externally-synchronized workloads).

Jobs are now used as a low-priority background task system for longer (multi-millisecond) processing and will only be executed if there are no current Tasks to execute available to a worker thread.

I also fixed an issue where the profiler would generate invalid trace files due to the profiler recording the length of the profile before it "closed" the list of profile events, resulting in events being recorded into the trace file with timestamps after the "end" timestamp. (This showed up fairly regularly in heavily threaded code.)

Testing Notes

I'd like to get this branch tested somewhat widely to confirm there aren't any subtle threading errors or initialization issues; @nozmajner specifically I'd appreciate a test since you managed to catch the last batch of initialization issues when I touched the Application code. You're not looking for any specific problems, just confirming the game runs and hyperspace works correctly etc; if there's a problem you'll most likely get a crash or application deadlock.

Basic tests validated, latency + occupancy turning out good.
Further work required to fully integrate existing Jobs into the system.
Not currently hooked up to any game framework code.
TaskGraph now provides a JobQueue implementation object that can be used as a replacement to AsyncJobQueue

Slightly altered semantics of cancelled jobs - OnCancel is always run regardless of whether the job itself was run.

This prevents existing jobs from leaking allocated memory if they were cancelled.
The duration was being set before all thread data became immutable.
This resulted in incorrect profiling data when a zone was recorded on another thread during the readback period.
Disabled some unnecessarily verbose profile points; added more Job profiling.
@sturnclaw
Copy link
Member Author

Build-VS is failing as expected, added two new files (and another six to contrib/) it doesn't know about. Fluffy can at some point add those to the solution when he next has time to hack on Pioneer.

@Bodasey
Copy link

Bodasey commented Nov 6, 2021

Load a game -> click a bit in personal info -> launch request from station lobby -> the ship doesn't move, but the bulletin board is no longer available, as if you fly yet.

press ESC, close menu -> the ship lifts.

And I'm not sure if the autopilot is broken: scoped hydrogen on Jupiter, wanted to fly back, on the first try, I have chosen a target to land, on the second try, I told it to fly back in a low orbit, but both times, the autopilot flew deeper into Jupiter and the ship crashed on it's virtual surface.

@fluffyfreak
Copy link
Contributor

Fixed up the VS2019 projects here fluffyfreak@2b27096

it won't let me create a PR into your branch for some reason 🤷

fluffyfreak and others added 3 commits November 6, 2021 13:58
This decreases thread-kickoff latency significantly and resolves an issue with benchmark threads not acquring work items.
Now test for waiting and task queuing on worker threads, as well as queueing pinned tasks from other threads.
@sturnclaw
Copy link
Member Author

@fluffyfreak thanks! I'm still not sure why that's happening, but feel free to push directly to the PR branch in the future - I've cherry-picked in that commit so you don't have to do it now though 😄

@Bodasey

Load a game -> click a bit in personal info -> launch request from station lobby -> the ship doesn't move, but the bulletin board is no longer available, as if you fly yet.
press ESC, close menu -> the ship lifts.

I'm not able to reproduce this at all on a Mars start, but something I'd like you to double confirm is whether the game was paused after you requested the launch - games are loaded in Paused state, but closing the options window puts you back into 1x time mode. I'm guessing that's responsible for the behavior you're seeing here.

And I'm not sure if the autopilot is broken: scoped hydrogen on Jupiter, wanted to fly back, on the first try, I have chosen a target to land, on the second try, I told it to fly back in a low orbit, but both times, the autopilot flew deeper into Jupiter and the ship crashed on it's virtual surface.

Autopilot bugs like this one are well known issues by now; I'm going to consider this report unrelated to the PR specifically because autopilot navigation and gas giant body parameters aren't affected by any threaded code; the affected areas are primarily music loading on startup, terrain generation on terrestrial planets, and the initial generation of nearby sectors and systems after a game start or hyperjump to a new system.

@Bodasey
Copy link

Bodasey commented Nov 7, 2021

Yes, game is in pause when loaded, and keeps being paused when you launch from Station lobby.
(But, in contrary, switches to 1 time speed when you undock in main window - maybe these two should react the same way?)

@Bodasey
Copy link

Bodasey commented Nov 7, 2021

#5154 not (completely) fixed, still to see at Mexico City Starport. Not a big Problem here, you can land on partially blocked platforms.

@sturnclaw
Copy link
Member Author

Appreciate the bug hunting Bodasey, but neither of those are actually pertinent to this PR. 😄 They're all known issues that are existing behavior in master branch already - I'm looking specifically for new issues that are caused by this change and cannot be reproduced otherwise.

@sturnclaw
Copy link
Member Author

Going to merge this in during the weekend if no one's managed to find any showstopper bugs - I've added a bit more documentation to the TaskGraph implementation to hopefully make clear any nebulous points and I've not hit any issues in my testing.

@sturnclaw sturnclaw merged commit 40c7c5e into pioneerspacesim:master Nov 28, 2021
@Gliese852 Gliese852 mentioned this pull request Feb 20, 2022
@sturnclaw sturnclaw deleted the taskgraph branch May 14, 2022 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants