Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtick overclock is very TPS-expensive #2809

Open
2 tasks done
talchas2 opened this issue Feb 1, 2025 · 2 comments
Open
2 tasks done

Subtick overclock is very TPS-expensive #2809

talchas2 opened this issue Feb 1, 2025 · 2 comments
Assignees
Labels
type: bug Something isn't working

Comments

@talchas2
Copy link

talchas2 commented Feb 1, 2025

Checked for existing issues

  • I have checked for existing issues, and have found none.

Tested latest version

  • I have checked that this occurs on the latest version.

GregTech CEu Version

v1.6.3

Minecraft Version

1.20.1 Forge

Recipe Viewer Installed

EMI

Environment

Singleplayer

Cross-Mod Interaction

Yes

Other Installed Mods

Monifactory 0.11.3, Observable

Expected Behavior

Running a Large Material Press with ME Stocking Input Bus (and ~infinite cobblestone input) and ME Output Bus, in forge hammer cobblestone->gravel mode, should not be taking several milliseconds in RecipeLogic.onRecipeFinish->fullModifyRecipe->...->ParallelLogic.limitByOutputMerging->ItemRecipeCapability.limitParallel->OverlayedItemHandler.insertStackedItemStack

Actual Behavior

Instead it takes 48ms/tick:

Image

This is exacerbated by having a full 2^31 cobble due to testing, but super large inputs aren't totally unreasonable.

Large Material Press is at 268 million EU/t (2x 16A UIV energy hatches), and has a 256x parallel hatch (which I'm not sure is actually doing anything here).

Steps to Reproduce

Build one of the parallelization multiblocks, with stocking input bus, and ME output bus, give it enough power to hit subtick overclocking in OverclockingLogic.getModifier, give it a silly amount of input in the ME system.

Additional Information

There's a whole lot of places that come together to make this extra slow:

First of all, this is run every tick rather than being cached, I think because alwaysTryModifyRecipe is always true. (I think IRecipeLogicMachine's impl is never or almost never actually overridden?)

The simplest cause of the actual operation being slow for most practical uses is OverclockingLogic doing maxParallels = ParallelLogic.getParallelAmount(machine, recipe, Integer.MAX_VALUE); with Integer.MAX_VALUE rather than an (over)estimate of the maximum parallel that the overclock logic can actually support. That means that the input multiplier to ParallelLogic.limitByOutputMerging->ItemRecipeCapability.limitParallel is only limited by the input count, which even outside of creative can easily be something silly with stocking input busses. So limitParallel does binary search starting at 33554432 rather than 512 or such.

These enormous multipliers are then extra slow because OverlayedItemHandler is doing actual work copying ItemStacks for each of the ME output bus's 32k slots (because it pays attention to stack size even though the output bus wants to ignore it), and because the code is careful to support recipes which might have the same item multiple times (and I mean, moni and maybe base GTm actually has those recipes, so). This could also be sped up by special casing "slot has been filled" rather than copying the itemstack to make a full stack, when there is no possible item that could be added to a full stack in a subsequent recipe output.

But also, is it really worth doing the computation to limit parallel to fit the actual space in the output in the first place? It's pretty much completely unwanted when using the ME output bus (which could have a special case); when using less infinite busses it could maybe be useful when some recipes wind up outputting nearly a stack baseline but it still seems pretty marginal to me.

@talchas2 talchas2 added the type: bug Something isn't working label Feb 1, 2025
@krossgg
Copy link
Contributor

krossgg commented Feb 1, 2025

This is being worked on - we know there are a lot of potential optimizations in our parallel logic. Thank you for your concerns.
To speak to your points:

  1. The modified recipe cannot be cached - we cannot be certain of condition changes between recipe runs and have to attempt to remodify the base recipe again.
  2. Yes, a potential heuristic could most likely be used by subtick overclocking.
  3. Using the input amount as a base limit is usually the best case.
  4. OverlayedItemHandler, as well as the entire item/fluid parallel logic, are poor-performing and are actively being worked on for the next major release.
  5. Yes. People don't always empty their output buses, such as in passive lines, and not accounting for the output space means that outputs are voided, extra energy is used, and more time is wasted.

@krossgg krossgg self-assigned this Feb 1, 2025
@talchas2
Copy link
Author

talchas2 commented Feb 1, 2025

Good to know it's even already being worked on. For the last option, from my small amount of testing skipping the output space computation in ParallelLogic doesn't cause voiding when the output doesn't have room, it causes the machine not to run at all. Now this might still be confusing enough to avoid it, but for situations as late game as "full paralleled output doesn't fit in output bus" it doesn't seem too bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants