-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memcache write errors in the LMS ("object too large for cache") #877
Comments
@UsamaSadiq @iamsobanjaved please consider this a discovery ticket to try and find a root cause for this, instead of simply bumping up the max threshold. Thanks! |
@dianakhuang can you confirm that Arbi-BOM can fit this work into their current schedule? Thanks! |
During my work to replace the deprecated python-memcache library with pymemcache, I identified this issue related to cache size limitations. This issue was discussed in this Slack thread (link). I also created an SRE ticket to address the cache size limitation (DOS-3846). However, after consulting with Robert, it was determined that increasing the cache size would have a significant impact, as it would require restarting the memcache server, which would result in the loss of active user sessions. As a result, this approach was not pursued. The root cause of the issue is straightforward: we are attempting to save data to the cache that exceeds the predefined size limit. In the production environment, this limit is set to 2MB. As demonstrated in the attached screenshots, data is successfully cached when it falls within this limit. Previously, when using python-memcache, such errors were not encountered because the library silently handled this scenerio without raising exceptions. This behavior masked the issue, whereas pymemcache explicitly raises an error when the data exceeds the cache size limit, making the problem more apparent. |
Additional thoughts:
|
The vast majority of a certain class of memcache calls to set a key are failing with the error "object too large for cache".
These can be identified with
@error.message:"b'object too large for cache'" "error.type:pymemcache.exceptions.MemcacheServerError
in a Datadog query.Notes
operation_name:memcached.command resource_name:set
. These come from the memcache library integration. These failing writes do not propagate their error upwards, which is for the best but does mean that querying is a little complicated; to get more information about what memcache operation was attempted, you'll need to look at their parent spans, which areoperation_name:django.cache
. You'll need to do ana => b
trace search.django.core.cache.backends.memcached.OPERATION KEY_PREFIX
(note the space). There are three key prefixes in effect:default
,course_structure
, and (uncommonly)general
.set
oncourse_structure
. Here's a status breakdown for those resources. A few of the errors come fromdefault
.course_structure
sets are failing; almost all ofdefault
sets are succeeding. They are of roughly equal number.The text was updated successfully, but these errors were encountered: