Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't set the Spark checkpoint directory to the tmp directory #181

Open
riley-harper opened this issue Dec 13, 2024 · 0 comments
Open

Don't set the Spark checkpoint directory to the tmp directory #181

riley-harper opened this issue Dec 13, 2024 · 0 comments
Labels
configuration Related to configuration and its syntax type: bug Something isn't working
Milestone

Comments

@riley-harper
Copy link
Contributor

This is a bug, since the checkpoint directory must be on shared storage, but the tmp directory should not be on shared storage. Changing this might require some poking around and changing in the SparkConnection class, and might also be a good time to simplify some of the configuration loading code. It's really messy and makes this confusing.

@riley-harper riley-harper added type: bug Something isn't working configuration Related to configuration and its syntax labels Dec 13, 2024
@riley-harper riley-harper added this to the v4.0.0 milestone Dec 13, 2024
riley-harper added a commit that referenced this issue Dec 13, 2024
This eliminates the need to set a new "conf_path" attribute on the
configuration dictionary before returning it.
riley-harper added a commit that referenced this issue Dec 13, 2024
Instead of using this function to get the config and add attributes to it, we
now separately get the config with load_conf_file() and pass attributes to
Spark. I've translated some of the tests for load_conf() to tests for
load_conf_file().
riley-harper added a commit that referenced this issue Dec 13, 2024
Previously we always set the checkpoint directory to be the same as
spark.local.dir, which we call "tmp_dir". However, this doesn't make sense
because tmp_dir should be on a disk local to each executor, and the checkpoint
directory has to be on shared storage to work correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
configuration Related to configuration and its syntax type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant