Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying SQLRegistry when allow_cache=True hangs when cache expires #4898

Open
aloysius-lim opened this issue Jan 6, 2025 · 0 comments
Open

Comments

@aloysius-lim
Copy link
Contributor

Expected Behavior

When using SQLRegistry with cache enabled, and a query is run with allow_cache=True, the registry should refresh the cache (if necessary) and return the results with minimal latency.

Current Behavior

When the cache expires, any method call with allow_cache=True runs indefinitely, and has to be terminated with Ctrl+C. The stacktrace shows that it always gets stuck on this line:

File ~/mambaforge/envs/nudgerank/lib/python3.12/site-packages/feast/infra/registry/caching_registry.py:433, in CachingRegistry._refresh_cached_registry_if_necessary(self)
    431 def _refresh_cached_registry_if_necessary(self):
    432     if self.cache_mode == "sync":
--> 433         with self._refresh_lock:
    434             if self.cached_registry_proto == RegistryProto():
    435                 # Avoids the need to refresh the registry when cache is not populated yet
    436                 # Specially during the __init__ phase
    437                 # proto() will populate the cache with project metadata if no objects are registered
    438                 expired = False

This also happens when calling refresh(), which means that the registry can never be refreshed, and the cache is not usable at all. Only allow_cache=False works, which adds latency to every query.

Steps to reproduce

Set up an SQLRegistry and enable caching. Then, run refresh() or any other method with allow_cache=True.

import time

from feast import Entity, FeatureStore, RepoConfig

# Set up Feature Store with SQLRegistry

repo_config = RepoConfig(
    project="my_project",
    registry={
        "registry_type": "sql",
        "path": "<db_url>",
        # Set short TTL for testing.
        "cache_ttl_seconds": 5,
    },
    ...
)
feature_store = FeatureStore(config=repo_config)

# Create and register entity
driver = Entity(name="driver", join_keys=["driver_id"])
feature_store.apply(driver)

# Get entity, and use cache. This succeeds if it is run before the TTL.
feature_store.get_entity("driver", True)

# Let cache expire
time.sleep(6)

# This runs forever
feature_store.get_entity("driver", True)

# Any of these also run forever
feature_store.refresh_registry()
feature_store.get_*(..., True)

Specifications

  • Version: 0.40.1
  • Platform: macOS 14.6.1
  • Subsystem:

Possible Solution

This only seems to happen when the cache is refreshed synchronously. Async cache refresh (cache_mode="thread") does not seem to run into this issue, and the cache is refreshed successfully (e.g. when entity definition is modified and get_entity("driver", True) is run again after the TTL). So the current workaround is to specify cache_mode="thread" in the registry config.

However, there are use cases where we want to guarantee that the cache is refreshed after some changes before executing more code, so synchronous refresh is still useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant