-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/caching #1922
base: v2.6
Are you sure you want to change the base?
Feature/caching #1922
Conversation
…d and reset methods, test with multiple threads
dspy/__init__.py
Outdated
@@ -64,6 +66,13 @@ | |||
configure = settings.configure | |||
context = settings.context | |||
|
|||
CACHE_DIR = os.environ.get("DSPY_CACHEDIR") or os.path.join(Path.home(), ".dspy_cache") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: .dspy_cache is already used for litellm, this PR should be swapped (it uses .dspy_litellm_cache for the existing cache but that cache doesn't exist at that new path)
also the limit should be 30GB
dspy/clients/lm.py
Outdated
num_retries=num_retries, | ||
) | ||
from dspy import settings | ||
@dspy_cache_decorator(settings.cache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this is a strange pattern; is the inner function redefined every time? does this mean the decorator... works? I guess it does, but yeah it's a bit overkill probably?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does work but definitely a weird pattern. Not 100% sure if it has any negative side effects.
dspy/clients/lm.py
Outdated
def cached_litellm_text_completion_inner(request: Dict[str, Any], num_retries: int): | ||
return litellm_text_completion( | ||
request, | ||
cache={"no-cache": False, "no-store": False}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This passes cache to the function below. But for some reason we removed the cache kwarg below? Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default (both kwargs true) should be added back to litellm_text_completion
function (I forgot to fix this)
dspy/utils/cache.py
Outdated
key = self.cache_key(request) | ||
except Exception: | ||
return None | ||
with self.lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm do we need to lock just to read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we can remove this lock, we may wanna do something like if key := self.memory_cache.get()
to avoid double access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think reads are thread safe but the docs aren't clear.
Recording some thoughts here.
|
…re/caching_7Dec2024
""" | ||
|
||
self.memory_cache = LRUCache(maxsize=mem_size_limit) | ||
self.fanout_cache = FanoutCache(shards=16, timeout=2, directory=directory, size_limit=disk_size_limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hmoazam As I understand it, part of the motivation for this PR is to enable users to dump the current state of the cache for a particular DSPy session.
Is the FanoutCache
introduced for that purpose? If so, since FanoutCache
shares the same directory
as the litellm.cache
, what extra utility is it providing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately, I think we should define the behavior for dumping / loading a cache.
For example, consider the case where I've already run "Program 1" as a Python script / notebook. Now, I'm running "Program 2" in a separate notebook / Python script invocation. I'm using the default DSPy cache directory for both program executions.
If I dump the cache after I run Program 2, do I also want to include cache contents from Program 1? Or, do I only want to include cache contents from Program 1, since that's my "current session"? Should users have a choice?
Once we align on the desired semantics, we can figure out how to make adjustments to the implementation.
cc @okhat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbczumar For disk caching, we will migrate from LiteLLM's caching to FanoutCache. The former has proved limiting when users need many read/write processes and threads in parallel.
For a short period of time, we might keep the LiteLLM cache as well. This means that highly active users that upgrade versions regularly will have an implicit seamless migration of their caches, because LiteLLM's requests will remain cached and will implicitly transfer to the FanoutCache. (This behavior is not required, and in principle there are other ways to more intentionally allow migration, but explicit migration is almost never worthwhile for caches.)
Cache dumping is fairly simple. It saves the current in-memory Python session, unless the user triggers a reset for the in-memory cache. (The current in-memory session draws on saved caches on disk, as usual.)
One single caching interface which has two levels of cache - in memory lru cache and fanout (on disk)