-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
algod: Hot/Cold Data Directories and Resource Paths #5614
algod: Hot/Cold Data Directories and Resource Paths #5614
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5614 +/- ##
==========================================
+ Coverage 55.21% 55.51% +0.29%
==========================================
Files 473 473
Lines 66156 66244 +88
==========================================
+ Hits 36528 36774 +246
+ Misses 27164 26977 -187
- Partials 2464 2493 +29
... and 10 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not feel strong about OpenLedger interface change (see the suggestion) but the configuration must be well documented. Maybe even include couple examples there and in PR description.
The more I think about the individual resource paths I exposed, the more I think they should be changed somewhat. I won't make the change presently, but soliciting opinions from folks: I mention this in the description, but if a specific resource path is defined (like TrackerDBFilePath), no attempt to isolate it into a GenesisDir is made. This choice was made because when selecting a specific path to the a resource, I didn't want to modify it away from what the user expected/input. That, and the way pathing was wired into the ledger (and other components) didn't make this path customization very easy to support. However, not isolating files by GenesisID means that a config used twice with different genesis will have resource collisions. I can't really think of any argument to defend that other than to say that these are some very fine-grained options, so the ability to hurt oneself makes sense. I think I can fix this by changing these individual resource paths to "individual resource datadirs":
In this way, any individually specified resource will be defined by a directory, and that directory will auto isolate by Genesis ID. An exception should be made for |
I am pro this change. Not separating genesis dirs does allow for collisions that might not be expected by the end-user not reading the source code. Are there downsides to this approach vs using filepaths that you see? Other than potential implementation complications? |
config v29 is in master, please rebase |
The current challenge is making sure that the ledger (maybe other components too) are handed these paths in a way that doesn't change their behavior too fundamentally. For example, the Ledger doesn't take a genesisDir as an input, it takes a dbPathPrefix, which is a little different. So, making sure these paths are formatted in a way that is transparent to the component is the goal. But yes, I was basically sold that this improvement is required for this feature, so I've been chipping at it between reviews. It is indeed a little tricky. @algorandskiy thank you for the callout re v29 and more documentation in the config. //Documentation I plan on giving a full pass once the names and specifics stop changing. Rebasing on V29, will do 👍 |
b9e3469
to
ba7e57e
Compare
d78b6d5
to
994398f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sold on the at-use checks like below
// if the node's HotGenesisDir is not defined, set it to the root genesis dir
if node.genesisDirs.HotGenesisDir == "" {
node.genesisDirs.HotGenesisDir = node.genesisDirs.RootGenesisDir
}
...
blockDBPrefix := filepath.Join(dbPrefixes.ResolvedGenesisDirs.RootGenesisDir, dbPrefixes.DBFilePrefix)
if dbPrefixes.ResolvedGenesisDirs.HotGenesisDir != "" {
trackerDBPrefix = filepath.Join(dbPrefixes.ResolvedGenesisDirs.HotGenesisDir, dbPrefixes.DBFilePrefix)
}
versus having it resolved inside some some cfg.EnsureDirs
method. @winder / @cce / @bbroder-algo / @iansuvak opinions?
I agree that centralizing the resolution logic would be cleaner but don't feel super strongly about it since it's not being done in too many places. There doesn't seem to be a downside to centralizing it though? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. LGTM.
Do not merge until the beta release is out |
@AlgoAxel please rebase into master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM and like the testing. I have one change suggestion for a missed c/p in config/localTemplate.go and one potentially stray comment
Co-authored-by: Ian Suvak <[email protected]>
What
This PR adds a collection of configuration variables for use by node runners to better organize their file resources when running
algod
.How
Two types of configuration variables have been added:
1 - HotDataDir/ColdDataDir
If Hot or Cold Dirs are set, they will be provided to the downstream components of the node (only implemented for MakeFull at the moment). If they are not provided, the typical DataDir is provided. Either/Or may be specified.
Note, some resources like
config.json
or minor runtime files may continue to exist only in the originally specified-d dataDir
.2 -Specific Resource FilePaths and Dirs
A collection of resource paths are exposed now in the config to give relay operators specific control over the location of some artifacts. These resources are:
Documenting our Storage Paths, Before and After
Previously
dataDir
dataDir is a path provided by node runners which effectively seeds the operation of the node. It is expected to contain a
config.json
, and unless otherwise specified, agenesis.json
file is also expected. dataDir is specified at runtime through either the-d
flag, or theALGORAND_DATA
OS Environment Variable. The nodes operation will create files in the dataDir likenode.log
, or file locks.genesisDir
genesisDir is a path that is referenced by most of the tools built by this repo. Once a genesis file is loaded, the
genesis.ID
is used as a subdirectory in the dataDir. This is done so that one node can operate on potentially many networks. Subcomponents of the node use the genesisDir to hold network specific data like wallets and ledger/block databases.Now
All directories [Root, Hot, Cold, Tracker ...], with the exception of Log paths, are "Ensured and Resolved" to Genesis Directories as part of the server initialization.
cfg.EnsureandResolveGenesisDirs(myRoot, myGenesisID)
will resolve every given directory to absolute, will attach a GenesisDirectory, and will attempt to ensure the path exists.Once a ResolvedGenesisDirectories is made, it is handed down to MakeFull/MakeFollower node, who consults it while starting up its components. A LedgerDirsAndPrefix structure is also used to attach the ledger prefix so that the ledger components can handle their pathing with "ledger" prefix like they usually do.
additionally the cadaver dir in agreement will fall back to using
cfg.ColdDataDir
. previously, this had no fallback and would simply default to a blank string. This intentionally does not change that behavior (nor does it append a genesis dir, as that wasn't happening before)Testing
Manual Testing
Manual testing is old, but I had confirmed that specifying directories worked as expected
Unit Tests
For Config:
ensureAbsPath
EnsureAndResolveGenesisDirs
on the config object, for success and error.These tests did not confirm that logging configuration is functional, because that happens at the Server level. These tests also don't confirm the catchpoint directory because that path is only created once a catchpoint file is created.
Use Guide
Here's a quick rundown of how to use this feature-set.
Default
Don't set anything new, and your node is unaffected. Bits still land in your
-d
specified root, in whatever pathing they've always hadSimple Disk Tuning
If you have a spare NVME or other fast-disk lying around, you could set the
HotDataDir
to a path on it. By doing that, fast-access resources will get created there, and anything else continues to live in your-d
specified root.Or perhaps you have a slower, larger drive that you want us to use -- you could set
ColdDataDir
to a path on it. By doing that, slower/infrequent resources will get created there, and anything else continues to live in your-d
root.Or, you could specify both of those, and now the nodes resources are separated into Hot/Cold, and located on those paths you specified.
Advanced Disk Tuning
If you would like to fully customize pathing, you can specify any one of the new resource dirs, like
TrackerDBDir
. The related resource will be hosted there now.Mix and Match
You can choose to set as many of the individual resource paths as you like. Any unspecified paths will resolve to their most appropriate fallback. Slow resources will use ColdDataDir, and Fast resources will use HotDataDir. If either Hot or Cold aren't set, they fall back to your
-d
specified directory.A good first change to make to get the most disk benefit is to specify
HotDataDir
to point at a dedicated high speed disk. This would allow the most critical persisted resources to spend less time on IO, potentially increasing your node's performance.