You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked for existing issues, and have found none.
Tested latest version
I have checked that this occurs on the latest version.
GregTech CEu Version
v1.6.3
Minecraft Version
1.20.1 Forge
Recipe Viewer Installed
EMI
Environment
Singleplayer
Cross-Mod Interaction
Yes
Other Installed Mods
Monifactory 0.11.3, Observable
Expected Behavior
Running a Large Material Press with ME Stocking Input Bus (and ~infinite cobblestone input) and ME Output Bus, in forge hammer cobblestone->gravel mode, should not be taking several milliseconds in RecipeLogic.onRecipeFinish->fullModifyRecipe->...->ParallelLogic.limitByOutputMerging->ItemRecipeCapability.limitParallel->OverlayedItemHandler.insertStackedItemStack
Actual Behavior
Instead it takes 48ms/tick:
This is exacerbated by having a full 2^31 cobble due to testing, but super large inputs aren't totally unreasonable.
Large Material Press is at 268 million EU/t (2x 16A UIV energy hatches), and has a 256x parallel hatch (which I'm not sure is actually doing anything here).
Steps to Reproduce
Build one of the parallelization multiblocks, with stocking input bus, and ME output bus, give it enough power to hit subtick overclocking in OverclockingLogic.getModifier, give it a silly amount of input in the ME system.
Additional Information
There's a whole lot of places that come together to make this extra slow:
First of all, this is run every tick rather than being cached, I think because alwaysTryModifyRecipe is always true. (I think IRecipeLogicMachine's impl is never or almost never actually overridden?)
The simplest cause of the actual operation being slow for most practical uses is OverclockingLogic doing maxParallels = ParallelLogic.getParallelAmount(machine, recipe, Integer.MAX_VALUE); with Integer.MAX_VALUE rather than an (over)estimate of the maximum parallel that the overclock logic can actually support. That means that the input multiplier to ParallelLogic.limitByOutputMerging->ItemRecipeCapability.limitParallel is only limited by the input count, which even outside of creative can easily be something silly with stocking input busses. So limitParallel does binary search starting at 33554432 rather than 512 or such.
These enormous multipliers are then extra slow because OverlayedItemHandler is doing actual work copying ItemStacks for each of the ME output bus's 32k slots (because it pays attention to stack size even though the output bus wants to ignore it), and because the code is careful to support recipes which might have the same item multiple times (and I mean, moni and maybe base GTm actually has those recipes, so). This could also be sped up by special casing "slot has been filled" rather than copying the itemstack to make a full stack, when there is no possible item that could be added to a full stack in a subsequent recipe output.
But also, is it really worth doing the computation to limit parallel to fit the actual space in the output in the first place? It's pretty much completely unwanted when using the ME output bus (which could have a special case); when using less infinite busses it could maybe be useful when some recipes wind up outputting nearly a stack baseline but it still seems pretty marginal to me.
The text was updated successfully, but these errors were encountered:
This is being worked on - we know there are a lot of potential optimizations in our parallel logic. Thank you for your concerns.
To speak to your points:
The modified recipe cannot be cached - we cannot be certain of condition changes between recipe runs and have to attempt to remodify the base recipe again.
Yes, a potential heuristic could most likely be used by subtick overclocking.
Using the input amount as a base limit is usually the best case.
OverlayedItemHandler, as well as the entire item/fluid parallel logic, are poor-performing and are actively being worked on for the next major release.
Yes. People don't always empty their output buses, such as in passive lines, and not accounting for the output space means that outputs are voided, extra energy is used, and more time is wasted.
Good to know it's even already being worked on. For the last option, from my small amount of testing skipping the output space computation in ParallelLogic doesn't cause voiding when the output doesn't have room, it causes the machine not to run at all. Now this might still be confusing enough to avoid it, but for situations as late game as "full paralleled output doesn't fit in output bus" it doesn't seem too bad.
Checked for existing issues
Tested latest version
GregTech CEu Version
v1.6.3
Minecraft Version
1.20.1 Forge
Recipe Viewer Installed
EMI
Environment
Singleplayer
Cross-Mod Interaction
Yes
Other Installed Mods
Monifactory 0.11.3, Observable
Expected Behavior
Running a Large Material Press with ME Stocking Input Bus (and ~infinite cobblestone input) and ME Output Bus, in forge hammer cobblestone->gravel mode, should not be taking several milliseconds in RecipeLogic.onRecipeFinish->fullModifyRecipe->...->ParallelLogic.limitByOutputMerging->ItemRecipeCapability.limitParallel->OverlayedItemHandler.insertStackedItemStack
Actual Behavior
Instead it takes 48ms/tick:
This is exacerbated by having a full 2^31 cobble due to testing, but super large inputs aren't totally unreasonable.
Large Material Press is at 268 million EU/t (2x 16A UIV energy hatches), and has a 256x parallel hatch (which I'm not sure is actually doing anything here).
Steps to Reproduce
Build one of the parallelization multiblocks, with stocking input bus, and ME output bus, give it enough power to hit subtick overclocking in OverclockingLogic.getModifier, give it a silly amount of input in the ME system.
Additional Information
There's a whole lot of places that come together to make this extra slow:
First of all, this is run every tick rather than being cached, I think because alwaysTryModifyRecipe is always true. (I think IRecipeLogicMachine's impl is never or almost never actually overridden?)
The simplest cause of the actual operation being slow for most practical uses is OverclockingLogic doing
maxParallels = ParallelLogic.getParallelAmount(machine, recipe, Integer.MAX_VALUE);
withInteger.MAX_VALUE
rather than an (over)estimate of the maximum parallel that the overclock logic can actually support. That means that the input multiplier toParallelLogic.limitByOutputMerging->ItemRecipeCapability.limitParallel
is only limited by the input count, which even outside of creative can easily be something silly with stocking input busses. So limitParallel does binary search starting at 33554432 rather than 512 or such.These enormous multipliers are then extra slow because OverlayedItemHandler is doing actual work copying ItemStacks for each of the ME output bus's 32k slots (because it pays attention to stack size even though the output bus wants to ignore it), and because the code is careful to support recipes which might have the same item multiple times (and I mean, moni and maybe base GTm actually has those recipes, so). This could also be sped up by special casing "slot has been filled" rather than copying the itemstack to make a full stack, when there is no possible item that could be added to a full stack in a subsequent recipe output.
But also, is it really worth doing the computation to limit parallel to fit the actual space in the output in the first place? It's pretty much completely unwanted when using the ME output bus (which could have a special case); when using less infinite busses it could maybe be useful when some recipes wind up outputting nearly a stack baseline but it still seems pretty marginal to me.
The text was updated successfully, but these errors were encountered: