Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simulate: resource population #6015

Open
wants to merge 52 commits into
base: master
Choose a base branch
from

Conversation

joe-p
Copy link
Contributor

@joe-p joe-p commented Jun 5, 2024

Summary

When a user calls simulate with UnnamedResources enabled, simulate should suggest to the user how they can populate the resource arrays in their transactions to properly send the transaction group to the network.

Test Plan

  • Test ResourcePopulator works with simple local (not group sharing) resources
  • Test ResourcePopulator with group sharing
  • Test ResourcePopulator resource limit detection with group sharing (ie. it is able to find the correct transaction to put a resource in)
  • Test Simulate with ResourcePopulator functionality
  • Test /simulate endpoint with ResourcePopulator functionality
  • Write smaller tests for better ledger/simulation/resources.go coverage

@joe-p joe-p changed the title Feat/populate_resources resource population Jun 5, 2024
@joe-p joe-p force-pushed the feat/populate_resources branch from 466fd50 to 5ba0a9a Compare June 5, 2024 23:06
@joe-p joe-p changed the title resource population simulate: resource population Jun 5, 2024
Copy link

codecov bot commented Jun 6, 2024

Codecov Report

Attention: Patch coverage is 88.07786% with 49 lines in your changes missing coverage. Please review.

Project coverage is 51.97%. Comparing base (8fce49c) to head (fd2c8dc).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
daemon/algod/api/server/v2/utils.go 0.00% 29 Missing ⚠️
ledger/simulation/resources.go 95.67% 10 Missing and 6 partials ⚠️
ledger/simulation/simulator.go 66.66% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6015      +/-   ##
==========================================
+ Coverage   51.79%   51.97%   +0.17%     
==========================================
  Files         644      644              
  Lines       86511    86922     +411     
==========================================
+ Hits        44805    45174     +369     
- Misses      38834    38873      +39     
- Partials     2872     2875       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

joe-p added 2 commits January 17, 2025 15:32
exmpty box ref is expected because the box size is 1025
@joe-p
Copy link
Contributor Author

joe-p commented Jan 21, 2025

After having this on the backburner for awhile I've come back to working on this and discovered why I was slow to make progress once I started to implement the endpoint. I was making two mistakes

  1. I was not building algod before running the e2e tests. In hindsight this seems obvious, but I was used to go test picking up the changes automatically for me. With the e2e tests the built algod is spawned as a seperate task, so any changes to algod need to be explicitly rebuilt.

  2. The test cache was not being properly invalidated. Most likely because of the first problem, but I was running tests and getting incorrect cached results. This lead to me making changes that actually broke things but I was under the impression they were still working. This made debugging breaking changes harder because I was breaking things without realizing it (see 035ef72 fixed by 41d63dd )

Now with 41d63dd all tests are passing, although I am experiencing an intermittent issue with database tables being locked when testing, which is seemingly causing a tracked app to be missing

--- FAIL: TestPopulatorWithGlobalResources (0.00s)
    resources_test.go:431: 
                Error Trace:    /Users/joe/git/algorand/go-algorand/ledger/simulation/resources_test.go:431
                Error:          elements differ
                            
                                extra elements in list B:
                                ([]interface {}) (len=1) {
                                 (basics.AppIndex) 3
                                }
                            
                            
                                listA:
                                ([]basics.AppIndex) (len=2) {
                                 (basics.AppIndex) 11,
                                 (basics.AppIndex) 5
                                }
                            
                            
                                listB:
                                ([]basics.AppIndex) (len=3) {
                                 (basics.AppIndex) 5,
                                 (basics.AppIndex) 11,
                                 (basics.AppIndex) 3
                                }
                Test:           TestPopulatorWithGlobalResources
time="2025-01-21T15:46:24.630756 -0500" level=warning msg="db.LoggedRetry: 5 retries (last err: database table is locked: accountbase)" file=dbutil.go function=github.com/algorand/go-algorand/util/db.LoggedRetry line=171
time="2025-01-21T15:46:24.630995 -0500" level=warning msg="db.LoggedRetry: 6 retries (last err: database table is locked: accountbase)" file=dbutil.go function=github.com/algorand/go-algorand/util/db.LoggedRetry line=171
time="2025-01-21T15:46:24.631008 -0500" level=warning msg="db.LoggedRetry: 7 retries (last err: database table is locked: accountbase)" file=dbutil.go function=github.com/algorand/go-algorand/util/db.LoggedRetry line=171
time="2025-01-21T15:46:24.631220 -0500" level=warning msg="db.LoggedRetry: 8 retries (last err: database table is locked: acctrounds)" file=dbutil.go function=github.com/algorand/go-algorand/util/db.LoggedRetry line=171

Here is a gist showing the full output with 2/10 runs failing because of the above: https://gist.github.com/joe-p/860cf28908a99db2f58c5010cb378894

I have not yet tried to reproduce on the e2e tests, but I was running them extensively last week and never saw this issue.

Once this issue is resolved the only remaining work is to make some smaller unit tests to test the "bad" cases and make sure things fail gracefully.

joe-p added 8 commits January 28, 2025 15:57
as per the comment, these checks should never be needed due to the logic
in eval context, but felt safer to add just in case
previously non appls weren't properly added to the populator, meaning
their fields were not accounted for when checking for availability. This
is actually probably fine since these sorts of duplicates should be
prevented by the logic in evalcontext, but as mentioned in previous
commits it feels safer to check here just in case
@joe-p joe-p marked this pull request as ready for review January 31, 2025 12:03
@joe-p
Copy link
Contributor Author

joe-p commented Jan 31, 2025

I believe all comments have been addressed at this point and test coverage is near 100%. The only problem is I'm still occasionally getting database table is locked when running tests locally. So far it's only happened with TestPopulatorWithGlobalResources. I tried just running this test and disabling parallel testing but I'm still seeing the same error occasionally. I believe this is just a problem with the test harness so not sure if it should be considered a blocker or not. I'd be interested to know if others can replicate.

@joe-p
Copy link
Contributor Author

joe-p commented Feb 19, 2025

As I was working on SDK support and testing for full coverage of the API I realized I didn't have a test with full coverage of the API here. I found a bug that seems to be related to extra-resource-arrays. I will write a test for proper coverage of the API and then mark as ready for review once ready

Fixed in 791df53. See body for details

This was initially missed because the e2e test did not use the extra
resource arrays and coverage was easy to overlook because we don't
actually get handler coverage info for e2e tests (because they are e2e
and the test itself goes through libgoal)
@joe-p joe-p marked this pull request as ready for review February 20, 2025 13:05
@joe-p
Copy link
Contributor Author

joe-p commented Feb 20, 2025

The failing CI was from the non-deterministic behavior of the population algorithm. Turns out the table is locked message was correlated but not causal.

From the body of fd2c8dc:

The initial reason for desiring this was because of the non-deterministic behavior of maps made testing difficult. When thinking about it more, I realized that having deterministic population will also improve the developer experience since the same txn group will always get back the same populated resources. This change, however, does expose a potential problem inherit to resource population: order matters. As seen in the modified test, the determinstic order results in an extra resource in the extra resource array. In the future steps could be taken to try to improve the efficiency, but unless we go over every permutation of resource ordering there will always be cases where algorithmic population will not result in the most efficient resource packing.

The inital reason for desiring this was because of the non-deterministic
behavior of maps made testing difficult. When thinking about it more, I
realized that having deterministic population will also improve the
developer experience since the same txn group will always get back the
same populated resources. This change, however, does expose a potential
problem inherit to resource population: order matters. As seen in the
modified test, the determinstic order results in an extra resource in
the extra resource array. In the future steps could be taken to try to
improve the efficiency, but unless we go over every permutation of
resource ordering there will always be cases where algorithmic
population will not result in the most efficient resource packing.
@algorandskiy
Copy link
Contributor

Relevant issue #5616

Copy link
Contributor

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor suggestions/questions.

@@ -4302,6 +4339,13 @@
},
"unnamed-resources-accessed": {
"$ref": "#/definitions/SimulateUnnamedResourcesAccessed"
},
"extra-resource-arrays": {
"description": "Present if populate-resource-arrays is true in the request and additional tranactions are needed to name all the accessed resources.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "Present if populate-resource-arrays is true in the request and additional tranactions are needed to name all the accessed resources.",
"description": "Present if populate-resource-arrays is true in the request and additional transactions are needed to name all the accessed resources.",

Comment on lines +567 to +568
func convertPopulatedResourceArrays(populatedResources simulation.PopulatedResourceArrays) *model.ResourceArrays {
// Convert the resources to the model structs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func convertPopulatedResourceArrays(populatedResources simulation.PopulatedResourceArrays) *model.ResourceArrays {
// Convert the resources to the model structs
// convertPopulatedResourceArrays converts the resources to the model structs
func convertPopulatedResourceArrays(populatedResources simulation.PopulatedResourceArrays) *model.ResourceArrays {

maxAssets int
}

func (r *txnResources) getTotalRefs() int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider: add a test with reflection/ast checking all arrays from txnResources are listed in getTotalRefs summation. Or a random test with reflection populating all arrays and checking getTotalRefs resultant value - that one might be shorter/easier.

Comment on lines +1260 to +1262
if i >= p.groupSize && len(pop.Accounts)+len(pop.Assets)+len(pop.Apps)+len(pop.Boxes) == 0 {
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain this condition? It appears checks existence of extra resources?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants