Nest default joblib cache dir into .dspy_cache #7633
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This brings the various caching functionality of dspy into one parent dir (by default) and avoids the creation of a non-hidden cachedir_joblib directory in the user's home folder.
Implements #7628.
This is the most minimal change to address the above issue.
I do believe that there might be a benefit in trying to clean up the organization of the cache directories in a more general sense. Just from doing a closer investigation, it seems that there are three areas within this library where some kind of disk caching occurs:
.dspy_cache/cachedir_joblib
).dspy_cache/finetune
directory is created.dspy_cache
Of these, 1 and 3 both initialize on import of the top level
dspy
module, so even if you aren't doing anything that needs the cache, you are still creating the various directories and such. You also kind of have to go hunting to find out how these caches are configured within the library as they are tucked various similarly named files, e.g.cache_utils.py
/utils/caching.py
, etc.A potentially nice reorganization could do module-level configuration for the "parent" cache directory (which
.dspy_cache
does now, but mostly hardcoded in three different areas within the codebase), and then other components could just build off of this (such as afrom dspy.cache import DSPY_CACHE_PATH
or something). Something like this would make future components that require caching to continue to maintain it in a centralized location.If that idea sounds useful at all, I'd be happy to take a closer look at implementing something (otherwise just implementing this PR resolves my original issue, so I'm happy either way 👍 )