[BUG] Opaque issue in L2 due to (de)serialization #391

gerwim · 2025-02-12T09:59:35Z

Describe the bug

The L1 cache does not use any serialization, L2 does. Because of abstract types (or interfaces) the values retrieved from L2 are broken.

To Reproduce

Here's a MRE (Minimal Reproducible Example) of the issue:

Clone https://github.com/gerwim/FusionCacheL2Issue
Make sure you have Redis running locally (so we have a distributed cache). If you don't have Redis running, start it through Docker: docker run -p 6379:6379 --rm -it redis
Run the console application twice, it will crash the second run when the data is retrieved from L2.

Expected behavior

I would expect the crash would also happen on line 17. Because there is no serialization in L1, everything is OK. I acknowledge adding serialization in L1 would cause additional overhead, so I'm not sure if that's the way to go.

Versions

I've encountered this issue on:

FusionCache version: 2.1.0
.NET version: 8

Additional context

Adding TypeNameHandling.Objects to the FusionCacheNewtonsoftJsonSerializer settings, fixes this specific case (but not the overall issue).

The text was updated successfully, but these errors were encountered:

jodydonetti · 2025-02-12T16:18:36Z

Hi @gerwim , thanks for the MRE.

This is not a bug, you just need to instruct your choosen serializer (in this case NewtonsoftJson) how you want it to handle deserialization of abstract classes. It's not something that FusionCache can know how to do, and not even the serializer itself can know it, you have to configure it in some way.
Regarding the serializer adapter, there's a way to specify the settings for NewtonsoftJson via this overload of the FusionCacheNewtonsoftJsonSerializer class (the adapter), by specifying anyour own configured instance of JsonSerializerSettings.

Hope this helps, let me know.

PS: one thing I'd like to point out for your convenience is that you don't need a Redis instance to try out stuff like this, you can easily use an instance of MemoryDistributedCache (the abstraction is IDistributedCache anyway, so it wouldn't make a difference regarding a serialization problem.

gerwim · 2025-02-12T16:40:54Z

@jodydonetti I've used Redis as it's a persistent (out of the console app) L2 cache. Because this issue only happens when you restart the application (you need to run it twice, as per "how to reproduce).

But I do still think it's a bug or maybe atleast an issue? Why? Let me try to explain:

Let's say you have two instances (you are horizontally scaled). I'll call these instances 1 and 2.

1 writes an entry (with an abstract type, as per MRE) to the cache and it works pretty well (as the entry is in the L1 cache).
Then, a new request comes in, and it is served from instance 2 (which will cause an exception), because instance 2 doesn't have it in the L1 cache, so it retrieves a value from the L2 cache and can not deserialize the value.

So basically requests to instance 1 will be fine, and requests to instance 2 will throw an exception.

This is due how the serialization is done. So I'm not saying it's FusionCache's fault, I'm just saying that the issue is not directly clear because the L1 cache does not suffer from this problem. In my case, I was implementing FusionCache in our solution and everything worked fine. But when deployed to our staging environment, there were these issues.

jodydonetti · 2025-02-12T18:03:07Z

Hi @gerwim , I'm commuting now, will answer as soon as I can!

gerwim · 2025-02-12T18:16:03Z

@jodydonetti No problem, no stress. Take your time.

I have to thank you for your time and effort into making this great package. Open source maintainers have no obligation to us whatsoever. 😀

jodydonetti · 2025-02-15T12:44:34Z

Hi @gerwim , sorry for the delay, back on this.

So basically requests to instance 1 will be fine, and requests to instance 2 will throw an exception.

I see your point, and I agree that it would be nice to have some sort of "help" in catching these sorts of errors sooner: for example I did something like this with a warning that is written in the logs when you setup a FusionCache instance with a Backplane but without an L2 and the DefaultEntryOptions are set in a potentially incorrect way. I like these kinds of "quality of life" checks.

Having said that, here's a counterpoint: I can list multiple different situations where there can be problems related to L2 or the database or the backplane and so on, and FusionCache cannot do anything about it.

For example:

changing models: if you are working on your models, say a Product class with a string Code property, save on of them in L2, then change the property to int Code, restart/redeploy the app, now deserialization will fail (see here for more)
if you have 2 different nodes with clocks that, for some infrastructure-related reason, are 10s out-of-sync
if you use L2 without a backplane for some reason, and you are in a multi-node scenario, there will be delays to get all the nodes in-sync
and so on

In cases like these FusionCache can't do anything, because it cannot possibly know that you are now using only L1 but in a different environment you'll use L1+L2 or L1+L2+backplane. It also cannot know if the serializer you'll be maybe using in another environment can or cannot handle a certain type (eg: Protobuf-net does natively support DateTime but not DateTimeOffset) or a certain kind of classes (eg: abstract classes) or if you'll do some refactoring in your code and change the type of a property like in the example above, etc.

In my opinion (but let me knowwhat you think) this is what tests are for: for example I'd create a test that uses a MemoryDistributedCache ( so, no need for a Redis instance), create 2 FusionCache instances connected to the same (in-memory) L2, and loop over all the types that you'll handle in your app, create an instance, save it from cache1 and get it from cache2.
This is what I frequently do in FusionCache tests, for example.

This is due how the serialization is done. So I'm not saying it's FusionCache's fault, I'm just saying that the issue is not directly clear because the L1 cache does not suffer from this problem.

I agree about this, and getting back to the idea of helping discovering such issues early, do you have an idea for how to catch such issues? I would gladly do something, if possible at all.

Thanks.

gerwim · 2025-02-15T14:02:53Z

@jodydonetti Thanks Jody.

Yes, you have fair points. There are a lot of cases where FusionCache can't do anything, but I do think this is a case where a small quality of life feature can help.

I think there could be two ways to improve DX, I'd like to know what you think of them. Both options should only be enabled in debug / development mode and not for production. "Production mode" should be as is.

First option: add an option to add serialization support in L1, which would be enabled in debug / development mode only.
Second option: Allow bypass of L1 cache and only use L2 cache (which would be backed by the MemoryDistributedCache), again only in debug / development mode.

In both cases the developer (consumer) of the NuGet package should be able to configure this explicitly. FusionCache should not contain any logic to determine whether it's in debug / development mode.

jodydonetti · 2025-02-15T14:40:43Z

First option: add an option to add serialization support in L1

You already can, by using AutoClone.

Second option: Allow bypass of L1 cache and only use L2 cache (which would be backed by the MemoryDistributedCache)

You already can, by using SkipMemoryCacheRead/SkipMemoryCacheWrite entry options, as always both call-by-call or globally for the entire cache by setting them in the DefaultEntryOptions.

Let me know what you think.

gerwim · 2025-02-15T14:55:38Z

Wow, really? I will respond hopefully Monday, I'll look into this.

Thanks!

jodydonetti closed this as completed Feb 12, 2025

jodydonetti reopened this Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Opaque issue in L2 due to (de)serialization #391

[BUG] Opaque issue in L2 due to (de)serialization #391

gerwim commented Feb 12, 2025 •

edited

Loading

jodydonetti commented Feb 12, 2025

gerwim commented Feb 12, 2025 •

edited

Loading

jodydonetti commented Feb 12, 2025

gerwim commented Feb 12, 2025

jodydonetti commented Feb 15, 2025

gerwim commented Feb 15, 2025

jodydonetti commented Feb 15, 2025

gerwim commented Feb 15, 2025

[BUG] Opaque issue in L2 due to (de)serialization #391

[BUG] Opaque issue in L2 due to (de)serialization #391

Comments

gerwim commented Feb 12, 2025 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Versions

Additional context

jodydonetti commented Feb 12, 2025

gerwim commented Feb 12, 2025 • edited Loading

jodydonetti commented Feb 12, 2025

gerwim commented Feb 12, 2025

jodydonetti commented Feb 15, 2025

gerwim commented Feb 15, 2025

jodydonetti commented Feb 15, 2025

gerwim commented Feb 15, 2025

gerwim commented Feb 12, 2025 •

edited

Loading

gerwim commented Feb 12, 2025 •

edited

Loading