-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NO MERGE: CAD-2907 ouroboros-consensus-shelley: computation stub module #3113
base: master
Are you sure you want to change the base?
Conversation
e74b9d3
to
f13beab
Compare
applyHelper $ | ||
-- Apply the BBODY transition using the ticked state | ||
withExcept BBodyError ..: SL.applyBlock | ||
|
||
reapplyLedgerBlock = runIdentity ...: | ||
seq (stubComputation stubComputationArg) $ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written, this thunk will only ever be evaluated once during the program execution, right? (And Common Subexpression Elimination likely even causes that one thunk to be shared between both applyLedgerBlock
and reapplyLedgerBlock
.)
You'd need to give stubComputation
a dummy argument seemingly dependent upon the context here (eg the arguments to reapplyLedgerBlock
). By "seemingly" I mean real enough that GHC won't float the thunk out from under reapplyLedgerBlock
lambda but not real enough that the value actually affects the computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pointer comparison seems like a fine way to do it.
My only follow-up that is that you might want to add the pointer comparison for each of the arguments, not just the first. I haven't double-checked, but it seems entirely reasonable that the first argument in particular could be captured in a partial application closure just once. Since this method body is so small and GHC tries pretty hard to inline dictionaries, I'd worry (at least a little bit :)) that even this thing will still be thunked.
If you repeat the same fake usage for all three arguments of these methods, I'd be more confident that the stub computation is always being repeated per invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nfrisby, this is sufficiently interesting that I'd be inclined to try diving into core analysis.
Before I seriously consider this -- do you think it would be plausible, or it is a pipe dream due to core volume / entanglement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seq (stubComputation (stubComputationArg
+ I# (reallyUnsafePtrEquality# x x))
+ I# (reallyUnsafePtrEquality# y y))
+ I# (reallyUnsafePtrEquality# z z))
)
I anticipate that would prevent the floating.
Also: it should be relatively clear whether there is sharing of the thunk between invocations. If that is happening, then you'll see essentially no difference compared to the baseline (ie no computational delay) run-time of a multi-block execution. If it isn't happening, then you should be able to (for large enough input stub arguments) see a marked difference in run-time.
So I don't think you need to Core dive here. But maybe you want to.
In that case, there are two basic steps I'd suggest starting with:
-
Find where the call-sites end up, after inlining, etc. EG If
applyLedgerBlock
occurs inFoo.hs
, that's a bit misleading if the function that contains that occurrence is itself always inlined into its own call-sites. And so on. -
Then inspect the Core at those final call-sites. I usually start with
-ddump-simpl --dsuppress-all
in anOPTIONS_GHC
pragma.
Because so much of the Consensus layer is (a) polymorphic in the block type and (b) has large enough RHSes that they won't be inlined, I don't think step 1 will actually involve that much work. The "root occurrences" are in only six core library files. Sounds manageable.
$ git grep -l applyLedgerBlock -- 'ouroboros-consensus/src/*hs' | wc -l
6
HTH!
81bdcd9
to
e7fbcad
Compare
@nfrisby, thank you for catching this! Does that look better now? P.S. Prelude GHC.Prim GHC.Exts> let a = 1 in I# (reallyUnsafePtrEquality# a a)
0 |
7d2fa58
to
d9f61a5
Compare
d9f61a5
to
2b6721c
Compare
2b6721c
to
1d1c837
Compare
1d1c837
to
e3cdafc
Compare
e3cdafc
to
20a7358
Compare
20a7358
to
3010bb6
Compare
3010bb6
to
8d095ef
Compare
This provides a stub computation in
ouroboros-consensus-shelley
'sapplyLedgerBlock
/reapplyLedgerBlock
that is configurable for its run time.Note that this requires calibration to be performed once, at system init time.