Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

algod: Hot/Cold Data Directories and Resource Paths #5614

Merged
merged 51 commits into from
Aug 28, 2023

Conversation

AlgoAxel
Copy link
Contributor

@AlgoAxel AlgoAxel commented Jul 26, 2023

What

This PR adds a collection of configuration variables for use by node runners to better organize their file resources when running algod.

How

Two types of configuration variables have been added:

1 - HotDataDir/ColdDataDir

If Hot or Cold Dirs are set, they will be provided to the downstream components of the node (only implemented for MakeFull at the moment). If they are not provided, the typical DataDir is provided. Either/Or may be specified.
Note, some resources like config.json or minor runtime files may continue to exist only in the originally specified -d dataDir.

2 -Specific Resource FilePaths and Dirs

A collection of resource paths are exposed now in the config to give relay operators specific control over the location of some artifacts. These resources are:

  • Tracker DB Path - the path to your tracker .sqlite file, or root kv directory if using pebble storage engine
  • Block DB Path - the path to your block .sqlite file
  • Catchpoint Directory - the path to your catchpoint folder if using catchpoint tracking
  • Log File Path - the path to your log file!
  • Log Archive Dir - the path to your Log Archive Directory, as LogArchiveName is already in the config
  • StateproofDir - the directory for a stateproof worker to create its resources
  • CrashFilePath - the path for crashes :)

Documenting our Storage Paths, Before and After

Previously

dataDir

dataDir is a path provided by node runners which effectively seeds the operation of the node. It is expected to contain a config.json, and unless otherwise specified, a genesis.json file is also expected. dataDir is specified at runtime through either the -d flag, or the ALGORAND_DATA OS Environment Variable. The nodes operation will create files in the dataDir like node.log, or file locks.

genesisDir

genesisDir is a path that is referenced by most of the tools built by this repo. Once a genesis file is loaded, the genesis.ID is used as a subdirectory in the dataDir. This is done so that one node can operate on potentially many networks. Subcomponents of the node use the genesisDir to hold network specific data like wallets and ledger/block databases.

Now

All directories [Root, Hot, Cold, Tracker ...], with the exception of Log paths, are "Ensured and Resolved" to Genesis Directories as part of the server initialization.

cfg.EnsureandResolveGenesisDirs(myRoot, myGenesisID) will resolve every given directory to absolute, will attach a GenesisDirectory, and will attempt to ensure the path exists.

Once a ResolvedGenesisDirectories is made, it is handed down to MakeFull/MakeFollower node, who consults it while starting up its components. A LedgerDirsAndPrefix structure is also used to attach the ledger prefix so that the ledger components can handle their pathing with "ledger" prefix like they usually do.

additionally the cadaver dir in agreement will fall back to using cfg.ColdDataDir. previously, this had no fallback and would simply default to a blank string. This intentionally does not change that behavior (nor does it append a genesis dir, as that wasn't happening before)

Testing

  • Make Install
  • Existing Tests

Manual Testing

Manual testing is old, but I had confirmed that specifying directories worked as expected

Unit Tests

  • Node Tests (Follower and Full):
  • Without configuration all resources are in expected locations
  • With Hot/Cold configuration, resources are in the expected hot/cold locations
  • With resource paths included, resources are in the expected locations with genesisDir
  • Removed a test for node regarding failure to create the genesis dir, as that is no longer the node's responsibility

For Config:

  • A test demonstrating behavior of ensureAbsPath
  • Tests demonstrating behavior of EnsureAndResolveGenesisDirs on the config object, for success and error.

These tests did not confirm that logging configuration is functional, because that happens at the Server level. These tests also don't confirm the catchpoint directory because that path is only created once a catchpoint file is created.

Use Guide

Here's a quick rundown of how to use this feature-set.

Default

Don't set anything new, and your node is unaffected. Bits still land in your -d specified root, in whatever pathing they've always had

Simple Disk Tuning

If you have a spare NVME or other fast-disk lying around, you could set the HotDataDir to a path on it. By doing that, fast-access resources will get created there, and anything else continues to live in your -d specified root.

Or perhaps you have a slower, larger drive that you want us to use -- you could set ColdDataDir to a path on it. By doing that, slower/infrequent resources will get created there, and anything else continues to live in your -d root.

Or, you could specify both of those, and now the nodes resources are separated into Hot/Cold, and located on those paths you specified.

Advanced Disk Tuning

If you would like to fully customize pathing, you can specify any one of the new resource dirs, like TrackerDBDir. The related resource will be hosted there now.

Mix and Match

You can choose to set as many of the individual resource paths as you like. Any unspecified paths will resolve to their most appropriate fallback. Slow resources will use ColdDataDir, and Fast resources will use HotDataDir. If either Hot or Cold aren't set, they fall back to your -d specified directory.

A good first change to make to get the most disk benefit is to specify HotDataDir to point at a dedicated high speed disk. This would allow the most critical persisted resources to spend less time on IO, potentially increasing your node's performance.

config/localTemplate.go Outdated Show resolved Hide resolved
daemon/algod/server.go Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jul 27, 2023

Codecov Report

Merging #5614 (e6a8f6c) into master (ca420de) will increase coverage by 0.29%.
The diff coverage is 65.64%.

@@            Coverage Diff             @@
##           master    #5614      +/-   ##
==========================================
+ Coverage   55.21%   55.51%   +0.29%     
==========================================
  Files         473      473              
  Lines       66156    66244      +88     
==========================================
+ Hits        36528    36774     +246     
+ Misses      27164    26977     -187     
- Partials     2464     2493      +29     
Files Changed Coverage Δ
cmd/algod/main.go 0.00% <0.00%> (ø)
daemon/algod/server.go 4.07% <0.00%> (+0.01%) ⬆️
netdeploy/remote/deployedNetwork.go 18.49% <0.00%> (-0.07%) ⬇️
node/follower_node.go 45.16% <66.66%> (+11.05%) ⬆️
config/localTemplate.go 69.12% <67.85%> (-1.65%) ⬇️
node/node.go 22.60% <68.75%> (+18.91%) ⬆️
ledger/ledger.go 69.57% <83.33%> (+0.48%) ⬆️
agreement/service.go 86.48% <100.00%> (+0.37%) ⬆️
data/ledger.go 30.96% <100.00%> (ø)
ledger/catchpointtracker.go 61.52% <100.00%> (+1.04%) ⬆️

... and 10 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@AlgoAxel AlgoAxel marked this pull request as ready for review July 27, 2023 19:55
Copy link
Contributor

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not feel strong about OpenLedger interface change (see the suggestion) but the configuration must be well documented. Maybe even include couple examples there and in PR description.

config/localTemplate.go Outdated Show resolved Hide resolved
ledger/ledger.go Outdated Show resolved Hide resolved
@AlgoAxel
Copy link
Contributor Author

The more I think about the individual resource paths I exposed, the more I think they should be changed somewhat. I won't make the change presently, but soliciting opinions from folks:

I mention this in the description, but if a specific resource path is defined (like TrackerDBFilePath), no attempt to isolate it into a GenesisDir is made. This choice was made because when selecting a specific path to the a resource, I didn't want to modify it away from what the user expected/input. That, and the way pathing was wired into the ledger (and other components) didn't make this path customization very easy to support.

However, not isolating files by GenesisID means that a config used twice with different genesis will have resource collisions. I can't really think of any argument to defend that other than to say that these are some very fine-grained options, so the ability to hurt oneself makes sense.

I think I can fix this by changing these individual resource paths to "individual resource datadirs":

  • Change any of the new "FilePath" config options to "Dir" options
  • The server/node when ensuring these data paths are configured, also ensure a genesis directory is made in each one
  • Related changes to pass down any sub-pathing or filepath.joins

In this way, any individually specified resource will be defined by a directory, and that directory will auto isolate by Genesis ID. An exception should be made for node.log, which already is shared by all genesis.

@iansuvak
Copy link
Contributor

I think I can fix this by changing these individual resource paths to "individual resource datadirs":

  • Change any of the new "FilePath" config options to "Dir" options
  • The server/node when ensuring these data paths are configured, also ensure a genesis directory is made in each one
  • Related changes to pass down any sub-pathing or filepath.joins

In this way, any individually specified resource will be defined by a directory, and that directory will auto isolate by Genesis ID. An exception should be made for node.log, which already is shared by all genesis.

I am pro this change. Not separating genesis dirs does allow for collisions that might not be expected by the end-user not reading the source code. Are there downsides to this approach vs using filepaths that you see? Other than potential implementation complications?

@algorandskiy algorandskiy changed the title feature: Hot/Cold Data Directories and Resource Paths node: Hot/Cold Data Directories and Resource Paths Jul 28, 2023
@algorandskiy algorandskiy changed the title node: Hot/Cold Data Directories and Resource Paths alfod: Hot/Cold Data Directories and Resource Paths Jul 28, 2023
@algorandskiy algorandskiy changed the title alfod: Hot/Cold Data Directories and Resource Paths algod: Hot/Cold Data Directories and Resource Paths Jul 28, 2023
@algorandskiy
Copy link
Contributor

config v29 is in master, please rebase

@AlgoAxel
Copy link
Contributor Author

AlgoAxel commented Jul 28, 2023

I am pro this change. Not separating genesis dirs does allow for collisions that might not be expected by the end-user not reading the source code. Are there downsides to this approach vs using filepaths that you see? Other than potential implementation complications?

@iansuvak

The current challenge is making sure that the ledger (maybe other components too) are handed these paths in a way that doesn't change their behavior too fundamentally. For example, the Ledger doesn't take a genesisDir as an input, it takes a dbPathPrefix, which is a little different. So, making sure these paths are formatted in a way that is transparent to the component is the goal.

But yes, I was basically sold that this improvement is required for this feature, so I've been chipping at it between reviews. It is indeed a little tricky.

@algorandskiy thank you for the callout re v29 and more documentation in the config. //Documentation I plan on giving a full pass once the names and specifics stop changing. Rebasing on V29, will do 👍

config/config_test.go Show resolved Hide resolved
config/localTemplate.go Outdated Show resolved Hide resolved
config/localTemplate.go Outdated Show resolved Hide resolved
config/localTemplate.go Outdated Show resolved Hide resolved
ledger/ledger.go Outdated Show resolved Hide resolved
ledger/ledger.go Show resolved Hide resolved
ledger/spverificationtracker_test.go Outdated Show resolved Hide resolved
node/follower_node.go Outdated Show resolved Hide resolved
Copy link
Contributor

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sold on the at-use checks like below

	// if the node's HotGenesisDir is not defined, set it to the root genesis dir
	if node.genesisDirs.HotGenesisDir == "" {
		node.genesisDirs.HotGenesisDir = node.genesisDirs.RootGenesisDir
	}

...
	blockDBPrefix := filepath.Join(dbPrefixes.ResolvedGenesisDirs.RootGenesisDir, dbPrefixes.DBFilePrefix)
	if dbPrefixes.ResolvedGenesisDirs.HotGenesisDir != "" {
		trackerDBPrefix = filepath.Join(dbPrefixes.ResolvedGenesisDirs.HotGenesisDir, dbPrefixes.DBFilePrefix)
	}

versus having it resolved inside some some cfg.EnsureDirs method. @winder / @cce / @bbroder-algo / @iansuvak opinions?

@iansuvak
Copy link
Contributor

I'm still not sold on the at-use checks like below

	// if the node's HotGenesisDir is not defined, set it to the root genesis dir
	if node.genesisDirs.HotGenesisDir == "" {
		node.genesisDirs.HotGenesisDir = node.genesisDirs.RootGenesisDir
	}

...
	blockDBPrefix := filepath.Join(dbPrefixes.ResolvedGenesisDirs.RootGenesisDir, dbPrefixes.DBFilePrefix)
	if dbPrefixes.ResolvedGenesisDirs.HotGenesisDir != "" {
		trackerDBPrefix = filepath.Join(dbPrefixes.ResolvedGenesisDirs.HotGenesisDir, dbPrefixes.DBFilePrefix)
	}

versus having it resolved inside some some cfg.EnsureDirs method. @winder / @cce / @bbroder-algo / @iansuvak opinions?

I agree that centralizing the resolution logic would be cleaner but don't feel super strongly about it since it's not being done in too many places.

There doesn't seem to be a downside to centralizing it though?

config/localTemplate.go Outdated Show resolved Hide resolved
config/localTemplate.go Outdated Show resolved Hide resolved
algorandskiy
algorandskiy previously approved these changes Aug 18, 2023
winder
winder previously approved these changes Aug 18, 2023
Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. LGTM.

@algorandskiy
Copy link
Contributor

Do not merge until the beta release is out

@algorandskiy
Copy link
Contributor

@AlgoAxel please rebase into master.

algorandskiy
algorandskiy previously approved these changes Aug 28, 2023
Copy link
Contributor

@iansuvak iansuvak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM and like the testing. I have one change suggestion for a missed c/p in config/localTemplate.go and one potentially stray comment

config/localTemplate.go Outdated Show resolved Hide resolved
config/localTemplate.go Outdated Show resolved Hide resolved
@algorandskiy algorandskiy merged commit 9ab1cc3 into algorand:master Aug 28, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Multiple Disk Drives on Algorand Node
5 participants