Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Fast enum cache updates #1094

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

feat: Fast enum cache updates #1094

wants to merge 7 commits into from

Conversation

tushar-composio
Copy link
Contributor

@tushar-composio tushar-composio commented Dec 27, 2024

Important

Introduces fast enum cache updates by adding list_enums() methods, refactoring fetch_and_cache(), and enhancing cache refresh logic.

  • Behavior:
    • Adds list_enums() method to Apps and Actions classes in collections.py to fetch enum names from the server.
    • Refactors fetch_and_cache() in action.py to use create_action() for creating ActionData.
    • Updates check_cache_refresh() in utils.py to handle cache updates more efficiently, including checking for replaced_by field and refreshing cache if necessary.
  • Functions:
    • Adds create_action() in base.py to encapsulate ActionData creation logic.
    • Modifies get() method in Actions class to use t.Collection instead of t.Sequence for parameters.
  • Misc:
    • Removes redundant code in enum.py related to local cache fetching.
    • Adds logging for cache update operations in utils.py.

This description was created by Ellipsis for 1bb796b. It will automatically update as commits are pushed.

Copy link

vercel bot commented Dec 27, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
composio ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 13, 2025 11:07am

logger.debug("Stale actions: %s", actions_to_delete)

for action_name in actions_to_delete:
(enums.base.ACTIONS_CACHE / action_name).unlink()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding error handling for the case where the file unlink operation fails. This could happen due to permission issues or if the file is locked by another process.

(enums.base.ACTIONS_CACHE / action_name).unlink()

if actions_to_update:
# TODO: handle page size limit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment about page size limit is important. Consider adding a maximum batch size for actions_to_update and implementing pagination to avoid potential memory issues with large sets of actions.

@@ -1392,6 +1392,14 @@ def search_for_a_task(
for task in response.json().get("items", [])
]

def list_enums(self) -> list[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a docstring to the list_enums method explaining its purpose, return type, and any potential exceptions it might raise.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 6c30c56 in 1 minute and 41 seconds

More details
  • Looked at 251 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 drafted comments based on config settings.
1. python/composio/client/collections.py:1084
  • Draft comment:
    The change from t.Sequence to t.Collection for the actions, apps, and tags parameters is appropriate as it allows for more flexible input types. t.Collection is a more general type than t.Sequence.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The change from t.Sequence to t.Collection is appropriate for the get method parameters since t.Collection is more general and includes t.Sequence. This change allows for more flexibility in the types of collections that can be passed to the method.
2. python/composio/client/collections.py:1395
  • Draft comment:
    The list_enums method is a good addition for fetching a list of enums from the server, ensuring up-to-date data is retrieved.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The list_enums method in Actions class is a new addition that fetches a list of enums from the server. This method is used in the Enum.iter method to replace the local cache reading logic, which is a good change for fetching up-to-date data.
3. python/composio/client/enums/base.py:127
  • Draft comment:
    The create_action function centralizes the creation of ActionData objects, reducing code duplication and improving maintainability.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The create_action function centralizes the creation of ActionData objects, which is a good refactoring step. It reduces code duplication and makes the codebase easier to maintain.
4. python/composio/client/enums/enum.py:93
  • Draft comment:
    The iter method now uses list_enums from the Actions class to fetch enum names from the server, ensuring up-to-date information.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The iter method in the Enum class has been updated to use the list_enums method from the Actions class. This change ensures that the enum names are fetched from the server, providing more up-to-date information compared to relying solely on local cache.
5. python/composio/client/utils.py:215
  • Draft comment:
    The check_cache_refresh function now ensures local cache consistency with server data by updating or deleting actions as necessary, which is a good practice for data integrity.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The check_cache_refresh function now includes logic to compare local and API actions, updating or deleting as necessary. This ensures the local cache is consistent with the server data, which is a good practice for maintaining data integrity.

Workflow ID: wflow_XZhCV0jv5aZDVc00


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@shreysingla11
Copy link
Collaborator

Code Review Summary

Overall, this PR introduces good improvements to the enum caching system with selective updates and better performance monitoring. Here's a breakdown:

Strengths:

✅ Improved caching mechanism with selective updates
✅ Good use of type hints and better parameter types
✅ Performance monitoring with timing measurements
✅ Clean code organization with centralized action creation logic

Areas for Improvement:

  1. Error Handling:

    • Add error handling for file operations
    • Consider handling API failures during cache updates
  2. Documentation:

    • Add docstrings to new methods
    • Document potential exceptions
  3. Performance:

    • Implement pagination for large action sets
    • Consider adding cache invalidation strategy

The changes look solid and improve the system's efficiency. Once the suggested improvements are addressed, this PR will be ready for merge.

Comment on lines 72 to 78
if "appName" not in response:
return None

replaced_by = replacement_action_name(
response["description"], response["appName"]
)
return ActionData( # type: ignore
name=response["name"],
app=response["appName"],
tags=response["tags"],
no_auth=(
client.http.get(url=str(client.apps.endpoint / response["appName"]))
.json()
.get("no_auth", False)
),
is_local=False,
is_runtime=False,
shell=False,
path=self.storage_path,
replaced_by=replaced_by,
)
return create_action(client, response, self.storage_path)

@property
def name(self) -> str:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential Issue:

Ensure create_action Functionality Matches Original Logic
The refactoring introduces a call to create_action, replacing a block of code that constructs an ActionData object. This change is high-risk as it may introduce logical errors if create_action does not replicate the original behavior, especially concerning the no_auth and replaced_by fields.

Actionable Steps:

  • Review create_action Implementation: Ensure it correctly handles the no_auth and replaced_by fields as the original code did.
  • Test Edge Cases: Verify that all edge cases covered by the original logic are addressed in the new function.
  • Documentation: Update any relevant documentation to reflect changes in logic or function usage.

This will help maintain the integrity of the application and prevent potential data inconsistencies. 🛠️


Comment on lines +1404 to +1411
def list_enums(self) -> list[str]:
"""Get just the action names on the server"""
response = self._raise_if_required(
response=self.client.http.get(
str(self.endpoint / "list" / "enums"),
)
)
return response.text.split("\n")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above. response.text.split is not robust and should be replaced with json.loads if the server returns a JSON array.

📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def list_enums(self) -> list[str]:
"""Get just the action names on the server"""
response = self._raise_if_required(
response=self.client.http.get(
str(self.endpoint / "list" / "enums"),
)
)
return response.text.split("\n")
def list_enums(self) -> list[str]:
"""Get just the action names on the server"""
response = self._raise_if_required(
response=self.client.http.get(
str(self.endpoint / "list" / "enums"),
)
)
return json.loads(response.text)

Comment on lines +268 to +298
api_apps = client.apps.list_enums()
breakpoint()
apps_to_update = set(api_apps) - set(local_apps)
apps_to_delete = set(local_apps) - set(api_apps)
logger.debug("Apps to fetch: %s", apps_to_update)
logger.debug("Stale apps: %s", apps_to_delete)

# for app_name in apps_to_delete:
# (enums.base.APPS_CACHE / app_name).unlink()

# if apps_to_update:
# apps_data = client.http.get(
# str(client.apps.endpoint(queries={"apps": ",".join(apps_to_update)}))
# ).json()
# for app_data in apps_data["items"]:
# storage_path = enums.base.APPS_CACHE / app_data["name"]
# AppData(name=app_data["name"], path=storage_path, is_local=False).store()

# if actions_to_update:
# actions_data = client.http.get(
# str(
# client.actions.endpoint(
# queries={"actions": ",".join(actions_to_update)}
# )
# )
# ).json()
# for action_data in actions_data["items"]:
# storage_path = enums.base.ACTIONS_CACHE / action_data["name"]
# create_action(
# client, response=action_data, storage_path=storage_path
# ).store()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check_cache_refresh function contains commented-out code blocks and a breakpoint. These should be removed before merging.

📝 Committable Code Suggestion

‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
api_apps = client.apps.list_enums()
breakpoint()
apps_to_update = set(api_apps) - set(local_apps)
apps_to_delete = set(local_apps) - set(api_apps)
logger.debug("Apps to fetch: %s", apps_to_update)
logger.debug("Stale apps: %s", apps_to_delete)
# for app_name in apps_to_delete:
# (enums.base.APPS_CACHE / app_name).unlink()
# if apps_to_update:
# apps_data = client.http.get(
# str(client.apps.endpoint(queries={"apps": ",".join(apps_to_update)}))
# ).json()
# for app_data in apps_data["items"]:
# storage_path = enums.base.APPS_CACHE / app_data["name"]
# AppData(name=app_data["name"], path=storage_path, is_local=False).store()
# if actions_to_update:
# actions_data = client.http.get(
# str(
# client.actions.endpoint(
# queries={"actions": ",".join(actions_to_update)}
# )
# )
# ).json()
# for action_data in actions_data["items"]:
# storage_path = enums.base.ACTIONS_CACHE / action_data["name"]
# create_action(
# client, response=action_data, storage_path=storage_path
# ).store()
api_apps = client.apps.list_enums()
apps_to_update = set(api_apps) - set(local_apps)
apps_to_delete = set(local_apps) - set(api_apps)
logger.debug("Apps to fetch: %s", apps_to_update)
logger.debug("Stale apps: %s", apps_to_delete)
for app_name in apps_to_delete:
(enums.base.APPS_CACHE / app_name).unlink()
if apps_to_update:
apps_data = client.http.get

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 1bb796b in 1 minute and 5 seconds

More details
  • Looked at 122 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. python/composio/client/collections.py:357
  • Draft comment:
    The list_enums method is duplicated in collections.py and enums/base.py. Consider refactoring to avoid code duplication.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment points out potential code duplication, but I can only see the new code in collections.py, not the supposedly duplicated code in enums/base.py. Without seeing both files, I cannot verify if there is actually problematic duplication. The two list_enums() methods in collections.py serve different purposes - one gets app names and one gets action names.
    I could be wrong about the two list_enums() methods serving different purposes - they have very similar implementations. Maybe the duplication is actually between these two methods within the same file?
    While the implementations are similar, they hit different endpoints (/apps/list/enums vs /actions/list/enums) and return different data (app names vs action names). This appears to be an appropriate separation of concerns between the Apps and Actions classes.
    The comment should be deleted because 1) I cannot verify the claimed duplication with enums/base.py since it's not in the diff, and 2) the two list_enums() methods in collections.py appropriately serve different purposes despite similar implementations.
2. python/composio/client/collections.py:1402
  • Draft comment:
    The list_enums method is duplicated in collections.py and enums/base.py. Consider refactoring to avoid code duplication.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_5Gx7bUxnl1XQsapL


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

local_apps = list(path.stem for path in enums.base.APPS_CACHE.iterdir())

api_apps = client.apps.list_enums()
breakpoint()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the breakpoint() call. It seems to be left over from debugging and is not suitable for production code.

logger.debug("Apps to fetch: %s", apps_to_update)
logger.debug("Stale apps: %s", apps_to_delete)

# for app_name in apps_to_delete:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing or uncommenting the code for handling apps_to_delete and apps_to_update. Leaving it commented can lead to confusion and clutter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants