Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzing Performance (Documenting Progress, Results, etc.) #5150

Open
15 tasks
jdrueckert opened this issue Oct 22, 2023 · 5 comments
Open
15 tasks

Analyzing Performance (Documenting Progress, Results, etc.) #5150

jdrueckert opened this issue Oct 22, 2023 · 5 comments
Assignees
Labels
Category: Doc Requests, Issues and Changes targeting javadoc and module documentation Category: Performance Requests, Issues and Changes targeting performance Revive: Keep issue has been looked at and deemed potentially helpful for revive Size: L Very big effort likely requiring a lot of research and work in many areas across the codebase Status: Needs Discussion Requires help discussing a reported issue or provided PR Status: Needs Investigation Requires to be debugged or checked for feasibility, etc. Topic: Architecture Requests, Issues and Changes related to software architecture, programming patterns, etc. Topic: Concurrency Requests, issues, and changes relating to threading, concurrency, parallel execution, etc. Topic: Stabilization Requests, Issues and Changes related to improving stablity and reducing flakyness Topic: WorldGen Requests, Issues and Changes related to facets, rasterizers, etc. Type: Question Issue intended to help understanding something that is unclear

Comments

@jdrueckert
Copy link
Member

jdrueckert commented Oct 22, 2023

Assumptions

  • probable culprits:
    • chunk generation
    • lighting
  • we use multithreading
  • we don't support dynamic thread scaling
  • chunk generation phases:
    • Base terrain generation (noise)
    • Terrain Additions (facets?)
    • Chunk Mesh generation
    • Chunk Lighting calcuation

Next Steps

  • clarify chunk generation phase assumption using code
  • measure duration of chunk generation phases
  • confirm which threads are "ours"
  • find out which threads are used for what
  • visualize chunk generation flow

Collected Insights

Past Insights

Related Issues

Concurrency Providers & Consumers

(current state as created by @skaldarnar for Reactor effort)
image

Time-consuming tasks

(as compiled by @DarkWeird):
Many time takes:

  1. generating/loading chunks.
  2. Exposure node
  3. Shadow map.
  4. Nui

Many memory takes chunks... but we cannot shrink them almost. Bytewise operation take many time, any object structure (like octotree) take so huge memory.. that current impl is optimal. Java modules can enable agressive optimization if we hide it in separate module (or cannot). Also octotree can be more optimal by memory, when java implement compact class header. (Or we hide chunk in rust)

Most frequently called methods

(as compiled by @BenjaminAmos via JFR)

I don't know if this is right but a quick JFR recording seems to indicate void org.terasology.core.world.generator.facetProviders.DensityNoiseProvider.process(GeneratingRegion, float) as being called an awful lot (24% of the time). That doesn't necesarily mean that it's a bottleneck though (sampling does not measure execution time).

Interestingly, on the slow server recording the most frequently sampled methods were:

  • com.google.common.collect.MapMakerInternalMap$HashIterator.nextInTable()
  • java.lang.invoke.VarHandleObjects$Array.getVolatile(VarHandleObjects$Array, Object, int)

The HashIterator method was generally (indirectly) called from:

  • void org.terasology.engine.rendering.logic.LightFadeSystem.update(float)
  • void org.terasology.engine.logic.behavior.BehaviorSystem.update(float)
  • void org.terasology.engine.logic.characters.CharacterSystem.update(float)
  • void org.terasology.engine.logic.common.lifespan.LifespanSystem.update(float)
  • void org.terasology.engine.logic.behavior.CollectiveBehaviorSystem.update(float)

Actually, it's those systems for both frequent methods.
Inside of those methods, the stack generally goes:

  • boolean com.google.common.collect.Iterators$ConcatenatedIterator.hasNext()
  • boolean java.util.Spliterators$1Adapter.hasNext()
  • boolean java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(Consumer)
    Which implicates java.util.stream again (a known big issue in general).

DensityNoiseProvider is not as big of an issue on that machine though. It's only 1.52% of samples.
Could it possibly be related to

public final Iterable<EntityRef> getEntitiesWith(Class<? extends Component>... componentClasses) {
if (isWorldPoolGlobalPool()) {
return Iterables.concat(globalPool.getEntitiesWith(componentClasses),
sectorManager.getEntitiesWith(componentClasses));
}
return Iterables.concat(globalPool.getEntitiesWith(componentClasses),
getCurrentWorldPool().getEntitiesWith(componentClasses), sectorManager.getEntitiesWith(componentClasses));
}
or
public final Iterable<EntityRef> getEntitiesWith(Class<? extends Component>... componentClasses) {
return () -> entityStore.keySet().stream()
//Keep entities which have all of the required components
.filter(id -> {
for (Class<? extends Component> component : componentClasses) {
if (componentStore.get(id, component) == null) {
return false;
}
}
return true;
})
.map(id -> getEntity(id))
.iterator();
}
- This does use Java streams.

Multi-Threading

@BenjaminAmos found the following list of threads indicated by JFR (threads marked with '*' are assumed to be "ours"):

C1
C2
*Chunk-Processing-0
*Chunk-Processing-Reactor
*Chunk-Unloader-0
*Chunk-Unloader-1
*Chunk-Unloader-2
*Chunk-Unloader-3
Common-Cleaner
FileSystemWatchService
FileSystemWatchService
Finalizer
G1
Java2D
JFR
JFR
JFR
JFR:
Logging-Cleaner
*main
nioEventLoopGroup-2-1
nioEventLoopGroup-3-1
nioEventLoopGroup-3-2
nioEventLoopGroup-3-3
Reference
*Saving-0
Service
Signal
SIGTERM
StreamCloser
Sweeper
*Thread
*Thread-1
*Thread-2
VM

Code Areas with Longest Per-Call Durations

Based on the statistical info in https://benjaminamos.github.io/TerasologyPerformanceTracyView/tracy-profiler.html

TODO: Refactor the individual code areas to improve their performance and reduce their per-call run time.

References

Reactor Effort:

Potentially Helpful Tooling

  • Java Flight Recording
  • in-game Performance Monitor (F3 to open, F4 to cycle through individual tools)
    • Means Mode
    • Spikes Mode
    • Memory Allocations Mode
    • Running Threads Mode
    • World Renderer Mode
    • Rendering Execution Mode
  • Enable the Monitoring option in Settings->Autoconfig->System Settings to show this information in a separate window (requires restarting the game). The Performance tab will only function when the in-game performance monitor is open (F3) and F4 has been pressed once.

Information Sources

Performance-related issues:

Tooling-related issues:

Follow-Up Actions

  • improve documentation of in-game debug/analytics tooling
@jdrueckert jdrueckert added Status: Needs Investigation Requires to be debugged or checked for feasibility, etc. Category: Doc Requests, Issues and Changes targeting javadoc and module documentation Topic: Architecture Requests, Issues and Changes related to software architecture, programming patterns, etc. Status: Needs Discussion Requires help discussing a reported issue or provided PR Category: Performance Requests, Issues and Changes targeting performance Size: L Very big effort likely requiring a lot of research and work in many areas across the codebase Topic: Stabilization Requests, Issues and Changes related to improving stablity and reducing flakyness Type: Question Issue intended to help understanding something that is unclear Topic: WorldGen Requests, Issues and Changes related to facets, rasterizers, etc. Topic: Concurrency Requests, issues, and changes relating to threading, concurrency, parallel execution, etc. labels Oct 22, 2023
@jdrueckert jdrueckert added the Revive: Keep issue has been looked at and deemed potentially helpful for revive label Oct 22, 2023
@BenjaminAmos
Copy link
Contributor

BenjaminAmos commented Oct 26, 2023

These methods appear to be consuming significant quantities of time (these are the highest execution times recorded). Results may be skewed by threading or other interruptions (such as GC collections).

Method Max Duration (ms)
AbstractStorageManager::loadCompressedChunk 595
LocalChunkProvider::processReadyChunk (for all chunks) 379
rendering/CoreRendering:worldReflectionNode 291
rendering/CoreRendering:opaqueObjectsNode (MeshRenderer) 165
LocalChunkProvider::loadChunkStore 106
GameThread::processWaitingProcesses 42

@BenjaminAmos
Copy link
Contributor

Reducing the number of chunk threads appears to help with AbstractStorageManager::loadCompressedChunk.

LocalChunkProvider::processReadyChunk (per-chunk) behaves somewhat unpredictably in terms of performance. At times, a single call can take up to 150ms. At other times, the runtime is negligible.

@BenjaminAmos
Copy link
Contributor

VoxelWorldSystem::onNewChunk is also quite expensive to run, it would appear.

@jdrueckert
Copy link
Member Author

@BenjaminAmos Added the 10 longest per-call duration finds (with move than 1000 calls) of the trace to the issue description.

@jdrueckert jdrueckert added this to the 2023 Revive - Milestone 4 milestone May 19, 2024
@jdrueckert
Copy link
Member Author

During today's playtest, we identified a few areas to look into in more detail:

  • ServerCharacterPredictionSystem - claiming to have "Received too much input"
  • CharacterSystem - claiming that "Player xyz seems to have cheated: It stated that it performed an action from ... but the predicted position is ..."
  • LocalChunkProvider and friends - stalling until flushing something, claiming "Chunk processing took too long this tick"
  • ChunkEventErrorLogger - claiming "Multiple laods of chunk ..." in client logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Doc Requests, Issues and Changes targeting javadoc and module documentation Category: Performance Requests, Issues and Changes targeting performance Revive: Keep issue has been looked at and deemed potentially helpful for revive Size: L Very big effort likely requiring a lot of research and work in many areas across the codebase Status: Needs Discussion Requires help discussing a reported issue or provided PR Status: Needs Investigation Requires to be debugged or checked for feasibility, etc. Topic: Architecture Requests, Issues and Changes related to software architecture, programming patterns, etc. Topic: Concurrency Requests, issues, and changes relating to threading, concurrency, parallel execution, etc. Topic: Stabilization Requests, Issues and Changes related to improving stablity and reducing flakyness Topic: WorldGen Requests, Issues and Changes related to facets, rasterizers, etc. Type: Question Issue intended to help understanding something that is unclear
Projects
None yet
Development

No branches or pull requests

2 participants