Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix default cache dataloader raise key error on non-existing key #3569

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Dartt0n
Copy link

@Dartt0n Dartt0n commented Jul 11, 2024

Description

Right now, calling the .clean(key) method on a non-cached key will raise a KeyError exception. There are several ways to solve this problem:

  1. Create a custom CustomDefaultCache class and override the delete method to first check if the key exists.
  2. Check if the key is cached before deleting it each time the .clean method is called.
  3. Cache the key before calling .clean.
  4. Set the default behavior to be exception-free.
    Among these options, the fourth solution seems to be the best, so these changes are proposed.

Types of Changes

  • Core
  • Bugfix
  • New feature
  • Enhancement/optimization
  • Documentation

Issues Fixed or Closed by This PR

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • I have tested the changes and verified that they work and don't break anything (as well as I can manage).

Summary by Sourcery

This pull request fixes a bug where calling .clean(key) on a non-cached key in the default dataloader would raise a KeyError. The fix ensures that the method no longer throws an exception in such cases. Documentation and tests have been updated accordingly.

  • Bug Fixes:
    • Fixed issue where calling .clean(key) on a non-cached key in the default dataloader would raise a KeyError.
  • Documentation:
    • Added a note in RELEASE.md to document the change in behavior for the .clean(key) method in the default dataloader.
  • Tests:
    • Added a test case to ensure that calling .clean(key) on a non-cached key does not raise an error.

Copy link
Contributor

sourcery-ai bot commented Jul 11, 2024

Reviewer's Guide by Sourcery

This pull request addresses a bug where calling the .clean(key) method on a non-cached key would raise a KeyError. The delete method in strawberry/dataloader.py was modified to use pop with a default value to avoid this error. Corresponding test cases were added to tests/test_dataloaders.py to ensure the new behavior is correctly implemented. Additionally, release notes were added in RELEASE.md to document this patch change.

File-Level Changes

Files Changes
strawberry/dataloader.py
tests/test_dataloaders.py
Updated the delete method in dataloader.py to prevent KeyError and added corresponding test cases in test_dataloaders.py.

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • Continue your discussion with Sourcery by replying directly to review comments.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Dartt0n - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟡 Testing: 1 issue found
  • 🟢 Complexity: all looks good
  • 🟡 Documentation: 3 issues found

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

tests/test_dataloaders.py Outdated Show resolved Hide resolved
RELEASE.md Outdated
@@ -0,0 +1,12 @@
Release type: patch

Calling `.clean(key)` on default dataloader with non-existing `key` will not throw `KeyError` error anymore. Example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (documentation): Redundant phrase 'KeyError error'

Consider changing 'KeyError error' to just 'KeyError' to avoid redundancy.

async def load_data(keys):
return [str(key) for key in keys]

dataloader = DataLoader(load_fn=load_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (documentation): Mention import statement for DataLoader

Consider adding a comment or a line to indicate that DataLoader should be imported from the relevant module to avoid confusion.

Suggested change
dataloader = DataLoader(load_fn=load_data)
```suggestion
```python
from some_module import DataLoader
dataloader = DataLoader(load_fn=load_data)

RELEASE.md Outdated
dataloader.clean(42) # does not throw KeyError anymore
```

This is a patch release, so no breaking changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (documentation): Improve sentence flow

Consider rephrasing to 'This is a patch release with no breaking changes.'

Suggested change
This is a patch release, so no breaking changes.
This is a patch release with no breaking changes.

@botberry
Copy link
Member

botberry commented Jul 11, 2024

Thanks for adding the RELEASE.md file!

Here's a preview of the changelog:


Calling .clear(key) on default dataloader with non-existing key will not throw KeyError anymore. Example:

from strawberry.dataloader import DataLoader


async def load_data(keys):
    return [str(key) for key in keys]


dataloader = DataLoader(load_fn=load_data)
dataloader.clear(42)  # does not throw KeyError anymore

This is a patch release with no breaking changes.

Here's the tweet text:

🆕 Release (next) is out! Thanks to Anton Kudryavtsev for the PR 👏

Get it here 👉 https://strawberry.rocks/release/(next)

@botberry
Copy link
Member

Thanks for adding the RELEASE.md file!

Here's a preview of the changelog:


Calling .clean(key) on default dataloader with non-existing key will not throw KeyError error anymore. Example:

async def load_data(keys):
    return [str(key) for key in keys]


dataloader = DataLoader(load_fn=load_data)
dataloader.clean(42)  # does not throw KeyError anymore

This is a patch release, so no breaking changes.

Here's the tweet text:

🆕 Release (next) is out! Thanks to Anton Kudryavtsev for the PR 👏

Get it here 👉 https://strawberry.rocks/release/(next)

Copy link

codecov bot commented Jul 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.02%. Comparing base (bea5cac) to head (6b0a704).
Report is 93 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3569      +/-   ##
==========================================
+ Coverage   96.58%   97.02%   +0.43%     
==========================================
  Files         524      503      -21     
  Lines       33632    33464     -168     
  Branches     5577     5619      +42     
==========================================
- Hits        32485    32468      -17     
+ Misses        912      790     -122     
+ Partials      235      206      -29     

Copy link

codspeed-hq bot commented Jul 11, 2024

CodSpeed Performance Report

Merging #3569 will not alter performance

Comparing Dartt0n:main (6b0a704) with main (56172dc)

Summary

✅ 15 untouched benchmarks

@Dartt0n
Copy link
Author

Dartt0n commented Jul 11, 2024

Hmm, seems like type linting fails to resolve:

strawberry/dataloader.py:87: error: No overload variant of "pop" of "dict" matches argument types "Hashable", "None"  [call-overload]

However, a overload exist:

strawberry/dataloader.py:87: note: Possible overload variants:
strawberry/dataloader.py:87: note:     def pop(self, Hashable, /) -> Future[T]
strawberry/dataloader.py:87: note:     def pop(self, Hashable, Future[T], /) -> Future[T]
strawberry/dataloader.py:87: note:     def [_T] pop(self, Hashable, _T, /) -> Future[T] | _T # this one

The type is def [None] pop(self, Hashable, NoneType, /) -> Future[T] | None

Will change code to

if key in self.cache_map:
    del self.cache_map[key]

to satisfy linter

@Dartt0n
Copy link
Author

Dartt0n commented Jul 11, 2024

Seems thats all check passed, waiting for the review

@patrick91
Copy link
Member

@Dartt0n in what cases would this be useful? 😊

wouldn't an error be a useful indication that something was wrong somewhere else?

@Dartt0n
Copy link
Author

Dartt0n commented Jul 14, 2024

@Dartt0n in what cases would this be useful? 😊

wouldn't an error be a useful indication that something was wrong somewhere else?

For example, some services modify an object, and this service needs to invalidate the cache for the modified object. However, the service does not know whether the object has already been loaded into the cache. This is a common problem for larger applications.

I think most developers who face this issue either use try-catch blocks or create their own wrapper (like original issue suggest). Therefore, I believe that making this behavior non-exceptional would be beneficial for the developer's user experience.

Additionally, as an example, the Redis DEL command does not throw an error when invalidating a non-existent cache entry.
https://redis.io/docs/latest/commands/del/

If you think that this explicit behavior would be more appropriate for the project, I suggest including a clear statement about it in the documentation in the "Cache Invalidation" section.
https://strawberry.rocks/docs/guides/dataloaders#cache-invalidation

@garyd203
Copy link
Contributor

@Dartt0n in what cases would this be useful? 😊

wouldn't an error be a useful indication that something was wrong somewhere else?

Hey @patrick91 , I would find this useful too (I came here to do a similar PR but found this instead), for similar reasons to those described by @Dartt0n .

My reasoning is that cache invalidation is usually about ensuring an object doesn't exist in the cache. Strict remove-or-error semantics don't make sense because the nature of a data-loader is that the cache entries could be added from anywhere, and with arbitrary concurrency.

For example, my current use-case is that an internal helper (set_site_users()) modifies the list of objects attached to a user via foreign key. The list of objects is loaded for resolvers via a data-loader (load_users_at_site()), and hence is cached. After modifying the list of objects in set_site_users(), we should invalidate the cache for the load_users_at_site() data-loader for that key - but depending which mutation calls that helper (and the input to that mutation), the data-loader may or may not have been called already, and hence .clear() may succeed or may error. There's no benefit in having the set_site_users() helper track whether the load_users_at_site() data-loader has been called and then handle the error specially (because both scenarios are legitimate), we just want to unilaterally clear that key from the data-loader cache.

Copy link
Contributor

@garyd203 garyd203 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this.

I believe you made a typo, the function is called .clear() rather than .clean()

RELEASE.md Outdated Show resolved Hide resolved
RELEASE.md Outdated Show resolved Hide resolved
Comment on lines 304 to 307
try:
loader.clean(1) # no effect on non-cached values
except Exception as e:
raise AssertionError("clean non-cached values does not raise KeyError")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what your intention is here. Normally a test will implement desired behaviour, and if it works without errors then (that part of) the test passes. So we could simplify to this ...

Suggested change
try:
loader.clean(1) # no effect on non-cached values
except Exception as e:
raise AssertionError("clean non-cached values does not raise KeyError")
loader.clear(1) # no effect on non-cached values

...which is similar to the other calls below to loader.clear() / loader.clear_many() / etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also add this to the test_clear() test function above, so that we can exercise the .clear() functionality with an underlying cache

See https://github.com/strawberry-graphql/strawberry/pull/3569/files#diff-9e2920b06c9fb1a3a76b9f675328e820421756fa44783ebdd45ca2c7a047c8a7R266-R279

Dartt0n and others added 2 commits October 21, 2024 10:31
Co-authored-by: Gary Donovan <[email protected]>
Co-authored-by: Gary Donovan <[email protected]>
…chemap deletion behaviour

These tests call `.clear()` on a non-existent cache key.
Previously, this would have resulted in an exception being thrown.
The new behaviour should be to throw no exceptions.
@Dartt0n
Copy link
Author

Dartt0n commented Oct 21, 2024

Hi @garyd203! Thank you for bringing this up! The pull request remained incomplete because I eventually implemented my class, which inherits from AbstractCacheMap and used it for the project (failed to implement Redis cache due to other issues related to strawberry dataloaders, so ended up storing everything in app memory :( ). I have updated the code based on your comments, if you have free time, please review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants