[rest_api][aptos_vm] Prevent running move code on too stale of a state #15588

igor-aptos · 2024-12-12T23:03:02Z

Description

Running code on very stale state (i.e. before node was able to state-sync on startup), leads to confusing outcomes, and also excercises paths that should otherwise never happen.

For example - new prologue functions have been introduced, and VM expects them to exist, but genesis framework doesn't have them.

There are two places that call AptosVM to execute code - /view and /transaction/simulate, and gate both of them.

By default I set 1 day as the limit - which is long enough to not cause any issues if node temporarily goes out of sync, while short enough to not cross more than one release.

Alternative is to wait for a different signal - like first state sync completed, etc, but then it is tricky if node is suspended for extended periods of time.

How Has This Been Tested?

Key Areas to Review

Type of Change

Which Components or Systems Does This Change Impact?

Checklist

I have read and followed the CONTRIBUTING doc
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I identified and added all stakeholders and component owners affected by this change as reviewers
I tested both happy and unhappy path of the functionality
I have made corresponding changes to the documentation

trunk-io · 2024-12-12T23:03:06Z

⏱️ 16m total CI duration on this PR

Job	Cumulative Duration	Recent Runs
rust-move-tests	12m	🟩
rust-cargo-deny	2m	🟩
check-dynamic-deps	39s	🟩
general-lints	28s	🟩
semgrep/ci	23s	🟩
file_change_determinator	10s	🟩
permission-check	3s	🟩
permission-check	2s	🟩

_{settings ⋅ feedback ⋅ docs ⋅ learn more about trunk.io}

vgao1996 · 2024-12-13T00:21:43Z

By default I set 1 day as the limit - which is long enough to not cause any issues if node temporarily goes out of sync, while short enough to not cross more than one release.

No strong opinion myself, but I wonder how we compare this to a wider window, lets say 3 or 6 month, closer to when we could drop replayability guarantees?

igor-aptos · 2024-12-18T16:55:04Z

@vgao1996 - replay guarantee is that transaction should execute the same with the state as defined at that time.

There shouldn't be invariant violations when simulating on stale state, but results could be very much useless if there has been framework upgrade / feature flag change, binary rollout that removes deprecated feature etc.

so 1 day here is set as long enough to not cause issues on temporarily stale nodes, and be short enough to be shorter than the release cycle.

For our nodes/fullnodes (i.e. https://fullnode.mainnet.aptoslabs.com/), maybe we should be even more aggressive - like 10 minutes, to avoid serving wrong data to users, and pick a different node (though load balances in api gateway should do that already)

igor-aptos · 2024-12-18T17:00:51Z

@JoshLind , @banool - can I get your reviews here, not on the motivation, but on the implementation, as you've touched the API code the most?

gregnazario

Okay, so basically you can't run something that's more than 24 hours old?

igor-aptos · 2025-01-29T22:06:45Z

You can't run simulation etc, if full node's state is more than 24h stale

JoshLind

Looks solid! Curious to see if we'll run into any edge cases, but I can't really imagine any 🤔

Running code on very stale state (i.e. before node was able to state-sync on startup), leads to confusing outcomes, and also excercises paths that should otherwise never happen. For example - new prologue functions have been introduced, and VM expects them to exist, but genesis framework doesn't have them.

github-actions · 2025-01-31T17:21:53Z

✅ Forge suite `compat` success on `60f7ca8827f5d64a148c3b163dc4126b0879279b` ==> `87dd64cc33076a43b7a0608423a727b3233897f2`

Compatibility test results for 60f7ca8827f5d64a148c3b163dc4126b0879279b ==> 87dd64cc33076a43b7a0608423a727b3233897f2 (PR)
1. Check liveness of validators at old version: 60f7ca8827f5d64a148c3b163dc4126b0879279b
compatibility::simple-validator-upgrade::liveness-check : committed: 12388.11 txn/s, latency: 2520.35 ms, (p50: 2600 ms, p70: 2700, p90: 3000 ms, p99: 3600 ms), latency samples: 409240
2. Upgrading first Validator to new version: 87dd64cc33076a43b7a0608423a727b3233897f2
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 4450.90 txn/s, latency: 6986.25 ms, (p50: 7900 ms, p70: 8400, p90: 8600 ms, p99: 8900 ms), latency samples: 94320
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 4438.32 txn/s, latency: 7678.83 ms, (p50: 8600 ms, p70: 8700, p90: 9000 ms, p99: 9000 ms), latency samples: 156740
3. Upgrading rest of first batch to new version: 87dd64cc33076a43b7a0608423a727b3233897f2
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 4219.02 txn/s, latency: 7384.65 ms, (p50: 8200 ms, p70: 8700, p90: 9200 ms, p99: 9300 ms), latency samples: 91180
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 4262.30 txn/s, latency: 7983.00 ms, (p50: 9000 ms, p70: 9000, p90: 9300 ms, p99: 9300 ms), latency samples: 149860
4. upgrading second batch to new version: 87dd64cc33076a43b7a0608423a727b3233897f2
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 7704.02 txn/s, latency: 3983.94 ms, (p50: 4600 ms, p70: 4800, p90: 5000 ms, p99: 5200 ms), latency samples: 142500
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 4057.09 txn/s, submitted: 4057.26 txn/s, expired: 0.18 txn/s, latency: 4555.14 ms, (p50: 5000 ms, p70: 5000, p90: 5100 ms, p99: 5200 ms), latency samples: 251349
5. check swarm health
Compatibility test for 60f7ca8827f5d64a148c3b163dc4126b0879279b ==> 87dd64cc33076a43b7a0608423a727b3233897f2 passed
Test Ok

github-actions · 2025-01-31T17:22:01Z

✅ Forge suite `realistic_env_max_load` success on `87dd64cc33076a43b7a0608423a727b3233897f2`

two traffics test: inner traffic : committed: 14442.49 txn/s, latency: 2745.06 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3900 ms), latency samples: 5491400
two traffics test : committed: 99.98 txn/s, latency: 1485.73 ms, (p50: 1400 ms, p70: 1500, p90: 1600 ms, p99: 3000 ms), latency samples: 1780
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.548, avg: 1.399", "ConsensusProposalToOrdered: max: 0.310, avg: 0.295", "ConsensusOrderedToCommit: max: 0.446, avg: 0.418", "ConsensusProposalToCommit: max: 0.738, avg: 0.713"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.95s no progress at version 17651 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.62s no progress at version 2774853 (avg 0.62s) [limit 16].
Test Ok

github-actions · 2025-01-31T17:22:55Z

✅ Forge suite `framework_upgrade` success on `60f7ca8827f5d64a148c3b163dc4126b0879279b` ==> `87dd64cc33076a43b7a0608423a727b3233897f2`

Compatibility test results for 60f7ca8827f5d64a148c3b163dc4126b0879279b ==> 87dd64cc33076a43b7a0608423a727b3233897f2 (PR)
Upgrade the nodes to version: 87dd64cc33076a43b7a0608423a727b3233897f2
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1439.41 txn/s, submitted: 1443.65 txn/s, failed submission: 4.24 txn/s, expired: 4.24 txn/s, latency: 2020.81 ms, (p50: 1800 ms, p70: 2100, p90: 3000 ms, p99: 6300 ms), latency samples: 129000
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1123.52 txn/s, submitted: 1127.04 txn/s, failed submission: 3.52 txn/s, expired: 3.52 txn/s, latency: 2500.39 ms, (p50: 1500 ms, p70: 2100, p90: 3200 ms, p99: 13600 ms), latency samples: 102200
5. check swarm health
Compatibility test for 60f7ca8827f5d64a148c3b163dc4126b0879279b ==> 87dd64cc33076a43b7a0608423a727b3233897f2 passed
Upgrade the remaining nodes to version: 87dd64cc33076a43b7a0608423a727b3233897f2
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1513.41 txn/s, submitted: 1518.18 txn/s, failed submission: 4.78 txn/s, expired: 4.78 txn/s, latency: 2399.63 ms, (p50: 1500 ms, p70: 2100, p90: 3700 ms, p99: 12700 ms), latency samples: 133081
Test Ok

igor-aptos requested review from gregnazario, JoshLind, banool, vgao1996 and georgemitenkov December 12, 2024 23:03

igor-aptos requested a review from 0xmaayan as a code owner December 12, 2024 23:03

gregnazario approved these changes Jan 29, 2025

View reviewed changes

JoshLind approved these changes Jan 29, 2025

View reviewed changes

igor-aptos enabled auto-merge (squash) January 29, 2025 23:47

This comment has been minimized.

Sign in to view

igor-aptos force-pushed the igor/prevent_api_running_move_code_on_stale_state branch from 3582826 to 1d9cdab Compare January 30, 2025 00:29

This comment has been minimized.

Sign in to view

igor-aptos force-pushed the igor/prevent_api_running_move_code_on_stale_state branch from 1d9cdab to 87dd64c Compare January 31, 2025 16:53

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rest_api][aptos_vm] Prevent running move code on too stale of a state #15588

[rest_api][aptos_vm] Prevent running move code on too stale of a state #15588

igor-aptos commented Dec 12, 2024

trunk-io bot commented Dec 12, 2024 •

edited

Loading

vgao1996 commented Dec 13, 2024

igor-aptos commented Dec 18, 2024

igor-aptos commented Dec 18, 2024

gregnazario left a comment

igor-aptos commented Jan 29, 2025

JoshLind left a comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Jan 31, 2025

github-actions bot commented Jan 31, 2025

github-actions bot commented Jan 31, 2025

[rest_api][aptos_vm] Prevent running move code on too stale of a state #15588

Are you sure you want to change the base?

[rest_api][aptos_vm] Prevent running move code on too stale of a state #15588

Conversation

igor-aptos commented Dec 12, 2024

Description

How Has This Been Tested?

Key Areas to Review

Type of Change

Which Components or Systems Does This Change Impact?

Checklist

trunk-io bot commented Dec 12, 2024 • edited Loading

vgao1996 commented Dec 13, 2024

igor-aptos commented Dec 18, 2024

igor-aptos commented Dec 18, 2024

gregnazario left a comment

Choose a reason for hiding this comment

igor-aptos commented Jan 29, 2025

JoshLind left a comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Jan 31, 2025

✅ Forge suite compat success on 60f7ca8827f5d64a148c3b163dc4126b0879279b ==> 87dd64cc33076a43b7a0608423a727b3233897f2

github-actions bot commented Jan 31, 2025

✅ Forge suite realistic_env_max_load success on 87dd64cc33076a43b7a0608423a727b3233897f2

github-actions bot commented Jan 31, 2025

✅ Forge suite framework_upgrade success on 60f7ca8827f5d64a148c3b163dc4126b0879279b ==> 87dd64cc33076a43b7a0608423a727b3233897f2

trunk-io bot commented Dec 12, 2024 •

edited

Loading

✅ Forge suite `compat` success on `60f7ca8827f5d64a148c3b163dc4126b0879279b` ==> `87dd64cc33076a43b7a0608423a727b3233897f2`

✅ Forge suite `realistic_env_max_load` success on `87dd64cc33076a43b7a0608423a727b3233897f2`

✅ Forge suite `framework_upgrade` success on `60f7ca8827f5d64a148c3b163dc4126b0879279b` ==> `87dd64cc33076a43b7a0608423a727b3233897f2`