Querying SQLRegistry when `allow_cache=True` hangs when cache expires #4898

aloysius-lim · 2025-01-06T04:59:40Z

Expected Behavior

When using SQLRegistry with cache enabled, and a query is run with allow_cache=True, the registry should refresh the cache (if necessary) and return the results with minimal latency.

Current Behavior

When the cache expires, any method call with allow_cache=True runs indefinitely, and has to be terminated with Ctrl+C. The stacktrace shows that it always gets stuck on this line:

File ~/mambaforge/envs/nudgerank/lib/python3.12/site-packages/feast/infra/registry/caching_registry.py:433, in CachingRegistry._refresh_cached_registry_if_necessary(self)
    431 def _refresh_cached_registry_if_necessary(self):
    432     if self.cache_mode == "sync":
--> 433         with self._refresh_lock:
    434             if self.cached_registry_proto == RegistryProto():
    435                 # Avoids the need to refresh the registry when cache is not populated yet
    436                 # Specially during the __init__ phase
    437                 # proto() will populate the cache with project metadata if no objects are registered
    438                 expired = False

This also happens when calling refresh(), which means that the registry can never be refreshed, and the cache is not usable at all. Only allow_cache=False works, which adds latency to every query.

Steps to reproduce

Set up an SQLRegistry and enable caching. Then, run refresh() or any other method with allow_cache=True.

import time

from feast import Entity, FeatureStore, RepoConfig

# Set up Feature Store with SQLRegistry

repo_config = RepoConfig(
    project="my_project",
    registry={
        "registry_type": "sql",
        "path": "<db_url>",
        # Set short TTL for testing.
        "cache_ttl_seconds": 5,
    },
    ...
)
feature_store = FeatureStore(config=repo_config)

# Create and register entity
driver = Entity(name="driver", join_keys=["driver_id"])
feature_store.apply(driver)

# Get entity, and use cache. This succeeds if it is run before the TTL.
feature_store.get_entity("driver", True)

# Let cache expire
time.sleep(6)

# This runs forever
feature_store.get_entity("driver", True)

# Any of these also run forever
feature_store.refresh_registry()
feature_store.get_*(..., True)

Specifications

Version: 0.40.1
Platform: macOS 14.6.1
Subsystem:

Possible Solution

This only seems to happen when the cache is refreshed synchronously. Async cache refresh (cache_mode="thread") does not seem to run into this issue, and the cache is refreshed successfully (e.g. when entity definition is modified and get_entity("driver", True) is run again after the TTL). So the current workaround is to specify cache_mode="thread" in the registry config.

However, there are use cases where we want to guarantee that the cache is refreshed after some changes before executing more code, so synchronous refresh is still useful.

The text was updated successfully, but these errors were encountered:

aloysius-lim added kind/bug priority/p2 labels Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Querying SQLRegistry when `allow_cache=True` hangs when cache expires #4898

Querying SQLRegistry when `allow_cache=True` hangs when cache expires #4898

aloysius-lim commented Jan 6, 2025

Querying SQLRegistry when allow_cache=True hangs when cache expires #4898

Querying SQLRegistry when allow_cache=True hangs when cache expires #4898

Comments

aloysius-lim commented Jan 6, 2025

Expected Behavior

Current Behavior

Steps to reproduce

Specifications

Possible Solution

Querying SQLRegistry when `allow_cache=True` hangs when cache expires #4898

Querying SQLRegistry when `allow_cache=True` hangs when cache expires #4898