Subtick overclock is very TPS-expensive #2809

talchas2 · 2025-02-01T03:02:52Z

Checked for existing issues

I have checked for existing issues, and have found none.

Tested latest version

I have checked that this occurs on the latest version.

GregTech CEu Version

v1.6.3

Minecraft Version

1.20.1 Forge

Recipe Viewer Installed

EMI

Environment

Singleplayer

Cross-Mod Interaction

Yes

Other Installed Mods

Monifactory 0.11.3, Observable

Expected Behavior

Running a Large Material Press with ME Stocking Input Bus (and ~infinite cobblestone input) and ME Output Bus, in forge hammer cobblestone->gravel mode, should not be taking several milliseconds in RecipeLogic.onRecipeFinish->fullModifyRecipe->...->ParallelLogic.limitByOutputMerging->ItemRecipeCapability.limitParallel->OverlayedItemHandler.insertStackedItemStack

Actual Behavior

Instead it takes 48ms/tick:

This is exacerbated by having a full 2^31 cobble due to testing, but super large inputs aren't totally unreasonable.

Large Material Press is at 268 million EU/t (2x 16A UIV energy hatches), and has a 256x parallel hatch (which I'm not sure is actually doing anything here).

Steps to Reproduce

Build one of the parallelization multiblocks, with stocking input bus, and ME output bus, give it enough power to hit subtick overclocking in OverclockingLogic.getModifier, give it a silly amount of input in the ME system.

Additional Information

There's a whole lot of places that come together to make this extra slow:

First of all, this is run every tick rather than being cached, I think because alwaysTryModifyRecipe is always true. (I think IRecipeLogicMachine's impl is never or almost never actually overridden?)

The simplest cause of the actual operation being slow for most practical uses is OverclockingLogic doing maxParallels = ParallelLogic.getParallelAmount(machine, recipe, Integer.MAX_VALUE); with Integer.MAX_VALUE rather than an (over)estimate of the maximum parallel that the overclock logic can actually support. That means that the input multiplier to ParallelLogic.limitByOutputMerging->ItemRecipeCapability.limitParallel is only limited by the input count, which even outside of creative can easily be something silly with stocking input busses. So limitParallel does binary search starting at 33554432 rather than 512 or such.

These enormous multipliers are then extra slow because OverlayedItemHandler is doing actual work copying ItemStacks for each of the ME output bus's 32k slots (because it pays attention to stack size even though the output bus wants to ignore it), and because the code is careful to support recipes which might have the same item multiple times (and I mean, moni and maybe base GTm actually has those recipes, so). This could also be sped up by special casing "slot has been filled" rather than copying the itemstack to make a full stack, when there is no possible item that could be added to a full stack in a subsequent recipe output.

But also, is it really worth doing the computation to limit parallel to fit the actual space in the output in the first place? It's pretty much completely unwanted when using the ME output bus (which could have a special case); when using less infinite busses it could maybe be useful when some recipes wind up outputting nearly a stack baseline but it still seems pretty marginal to me.

The text was updated successfully, but these errors were encountered:

krossgg · 2025-02-01T03:40:19Z

This is being worked on - we know there are a lot of potential optimizations in our parallel logic. Thank you for your concerns.
To speak to your points:

The modified recipe cannot be cached - we cannot be certain of condition changes between recipe runs and have to attempt to remodify the base recipe again.
Yes, a potential heuristic could most likely be used by subtick overclocking.
Using the input amount as a base limit is usually the best case.
OverlayedItemHandler, as well as the entire item/fluid parallel logic, are poor-performing and are actively being worked on for the next major release.
Yes. People don't always empty their output buses, such as in passive lines, and not accounting for the output space means that outputs are voided, extra energy is used, and more time is wasted.

talchas2 · 2025-02-01T04:44:35Z

Good to know it's even already being worked on. For the last option, from my small amount of testing skipping the output space computation in ParallelLogic doesn't cause voiding when the output doesn't have room, it causes the machine not to run at all. Now this might still be confusing enough to avoid it, but for situations as late game as "full paralleled output doesn't fit in output bus" it doesn't seem too bad.

talchas2 added the type: bug Something isn't working label Feb 1, 2025

krossgg self-assigned this Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subtick overclock is very TPS-expensive #2809

Subtick overclock is very TPS-expensive #2809

talchas2 commented Feb 1, 2025

krossgg commented Feb 1, 2025

talchas2 commented Feb 1, 2025

Subtick overclock is very TPS-expensive #2809

Subtick overclock is very TPS-expensive #2809

Comments

talchas2 commented Feb 1, 2025

Checked for existing issues

Tested latest version

GregTech CEu Version

Minecraft Version

Recipe Viewer Installed

Environment

Cross-Mod Interaction

Other Installed Mods

Expected Behavior

Actual Behavior

Steps to Reproduce

Additional Information

krossgg commented Feb 1, 2025

talchas2 commented Feb 1, 2025