Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Simpfilied cache #4

Open
marcj opened this issue May 13, 2016 · 114 comments
Open

[RFC] Simpfilied cache #4

marcj opened this issue May 13, 2016 · 114 comments
Labels

Comments

@marcj
Copy link
Member

marcj commented May 13, 2016

Initial RFC idea: php-cds/php-cds#7

All discussions about a "advanced" version please here: [RFC] Advanced cache

@CarwynNelson
Copy link

Back on the discussion of a simple Caching interface, what does everybody think should be covered in such an interface?

I'm thinking it could be as simple as functions for get, set and delete?

@marcj
Copy link
Member Author

marcj commented May 13, 2016

It think we should work on a simple cache interface that covers > 95% use cases.

We could work on additional more advanced interface as new recommendation on top of that simple cache interface later.

CDR-simple-cache = we discuss now
CDR-advanced-cache = we discuss later (which will be much more work I guess)

Although I'm really not sure about advanced as its too abstract (could contain trees, invalidations, bulks or just has method), but we should discuss that when simple-cache is done :)

@CarwynNelson
Copy link

@marcj Agreed. Does anybody have any objections to there being just get, set and delete methods? Are there any flaws to this that might not be immediately apparent?

@Isinlor
Copy link

Isinlor commented May 13, 2016

@CarwynNelson All the flaws were already stated. Race conditions and how to recognize a miss. I don't think that there is consensus on this methods.

@CarwynNelson
Copy link

@Isinlor I thought it was agreed / discussed in depth that dealing with race conditions would be down to the implementation, and not the interface to solve?

As to how to recognise a miss, I don't think I fully understand that discussion, so I think I'll leave that to people who know what they are talking about :)

@Isinlor
Copy link

Isinlor commented May 13, 2016

@CarwynNelson Interface that you propose is preventing you from dealing with race conditions. This interface makes it impossible, so it is not implementation detail.

@marcj
Copy link
Member Author

marcj commented May 13, 2016

@CarwynNelson: All the flaws were already stated. Race conditions and how to recognize a miss.

Race conditions should be up to the implementation. If the implementation wants to provide transaction secureness they can add additional methods/subclass the simple cache interface.

Recognizing a miss is possible with get('foo') === null.

@samdark
Copy link

samdark commented May 13, 2016

I think php-cds/php-cds#7 (comment) is a good solution. Even less methods so it's perfectly simple and with handy syntax. Also implementing libraries are able to deal with all the issues w/o changing interface.

@CarwynNelson
Copy link

@Isinlor You seem to know much more about caching than I do, so I'll trust that you are correct. Could you explain what about simple get, set and delete in the interface does not allow the implementation to deal with race conditions?

@Isinlor
Copy link

Isinlor commented May 13, 2016

@CarwynNelson After a delete consumer of the interface have no way of knowing if the value is not already being computed by other process. This requires locking. You could delay get in all places that are asking for the data and let one process handle computation inside get, but then get method needs to know how to make the computations and this leads to php-cds/php-cds#7 (comment)

@marcj
Copy link
Member Author

marcj commented May 13, 2016

@Isinlor: How do we reach consensus on which solution to progress?

If someone feels the need to create another discussion issue regarding a different "simple cache" idea or a advanced cache anyone is free to create those. People will reply and you'll see if it was a good idea :P
I'd rather want to introduce as less bureaucracy as possible, so it becomes a self-adjusting process.

However my 2 cents: I think we are still in the middle of a discussion, so I think making votes for a particular interface doesn't make sense yet.

@Isinlor
Copy link

Isinlor commented May 13, 2016

@marcj At some point this process will need to reach conclusion or we will end up with two or more interfaces trying to achieve the same. Taking all propositions and allowing to vote on them in one place without noise of discussion seems like the best idea in my opinion and will lead fastest to the one interface being obviously preferred. But I don't mind to wait to see if it can be done just by discussion. It's an interesting social experiment :) .

@marcj
Copy link
Member Author

marcj commented May 13, 2016

You are probably right. :) Maybe like here: php-cds/php-cds#4 But honestly, I'm not sure how to organise it "the perfect way". I believe in the community, that someone has an outstanding idea :P

@CarwynNelson
Copy link

CarwynNelson commented May 13, 2016

@marcj @Isinlor Are we able to reach any conclusions at this point in the discussion? Many people seem to like the idea proposed by @Isinlor, so possibly we could focus discussion on something similar to that?

Apologies if I seem to have missed something from before, just trying to focus the discussion on maybe getting closer to an interface proposal.

@geggleto
Copy link

@marcj I think it's important to represent all view-points as I tried to do in that other thread and simply let the community decide what they want. I can try to throw something together today to generate more feedback....

My initial thought's are an issue-thread discussing each implementation. The votes are tallied in 1 master voting thread on a comment that is linked to each issue-thread.

@syrm
Copy link

syrm commented May 13, 2016

@marcj Recognizing a miss is possible with get('foo') === null.

Only if you consider never put null in cache.
It can be useful to mark an object not found in database.

So a CacheItem is a possible solution php-cds/issues/7#issuecomment-218978013

@marcj
Copy link
Member Author

marcj commented May 13, 2016

@OXman, I disagree. null is no value and should not be used to indicate something in a cache as not existent on user site. Store a false instead.

I my opinion: As convention this interface should define that using null as value for set() should only be used to tell all consumer this key does not exist and need re-computing. It should not be used to have any other state. This makes it possible to have a very simple straightforward API without the need for additional objects.

@syrm
Copy link

syrm commented May 13, 2016

Null alone is no value, but null with key ?
User#23 = null.
Means object User with id 23 don't exist.

@CarwynNelson
Copy link

I'm gathering that from @marcj's comment in your situation @OXman User#23 would become false. I think this is perfectly acceptable.

Generally if you are retrieving a result from a database and a record does not exist then the database layer usually returns false, so whilst it's not entirely relevant it also seems to match up from that perspective.

@syrm
Copy link

syrm commented May 13, 2016

Ok, and now i want put in cache an heavy computation, which announce if i need to process data or not.
NeedHeavyComputation = false

CacheItem is very simple, it's a ValueObject.

@CarwynNelson
Copy link

@OXman Whilst I acknowledge and agree that a ValueObject might be a better solution in general, for most use cases (which is what It seems a SimpleCacheInterface is trying to solve) returning null seems perfectly fine.

Maybe a more complex solution would return a value object (PSR-6?), but from reading others comments in the previous thread it seems that null is perfectly fine for 90 - 95% of use cases.

Thoughts?

@marcj
Copy link
Member Author

marcj commented May 13, 2016

If you want to have a state for user-23 is computed as not-existent then you should use not null, but another value like false, empty array, UserNotFoundException or something. Having null in a state machine combined with a cache system to indicate some state is im my opinion a terrible idea. Not only because it destroys our idea having null for get() as a way to tell miss|needs re-computation but also because I'm a believer in using as less null as possible. null is really hard to reason about and definitely needs always clarification, which I always want to avoid. Also null in the nature of data structures is almost always a way to tell does not exist, which is exactly what we need for that cache interface.

Ok, and now i want put in cache an heavy computation, which announce if i need to process data or not.
NeedHeavyComputation = false

What do you mean? I don't understand it.

@syrm
Copy link

syrm commented May 13, 2016

In my opinion using CacheItem is still very simple and solve the null/false problem very easily.

@CarwynNelson
Copy link

CarwynNelson commented May 13, 2016

I guess to flip the discussion on its head a little bit, are there any direct criticisms over having a CacheItem interface? Best to look at it from both sides.

@samdark
Copy link

samdark commented May 13, 2016

@CarwynNelson bit of extra complexity (extra class), extra method call to access actual value, a bit more memory and computation time to have an object instance. I think that's it.

@marcj
Copy link
Member Author

marcj commented May 13, 2016

What you basically propose with a CacheItem is to change from

$cache->set('user-42', null);
$miss = null === $cache->get('user-42');//true

$cache->set('user-42', new User());
$userExists = $cache->get('user-42') instanceof User;

to

$cache->set('user-42', null);
$miss = $cache->has('user-42');//false

$cache->set('user-42', new User());
if ($cache->has('user-42')) {
    //here a race condition can happen, and it could lead to unexpected behavior when `get` returns null or throw an exception because it does not exist.
    $cacheItem = $cache->get('user-42');
    $userExists = $cacheItem->value() instanceof User;
}


//alternative 
$cacheItem = $cache->get('user-42');
if ($cacheItem->exists()) {
    $userExists = $cacheItem->value() instanceof User;
}

Which makes

  1. the API more verbose and thus more complicated
  2. race conditions possible
  3. it necessary to have has
  4. a tiny bit slower

However, without that CacheItem you can still indicate user-42 as computed as not-existent in this way:

$cache->set('user-42', false); //recommended
$cache->set('user-42', new NotFound());
$cache->set('user-42', '');
$cache->set('user-42', 0);

As a result CacheItem makes anything more complicated without even solve a problem, as you could solve the initial issue also with just using a different value than null.


Also the whole idea of caching and its nature is to avoid re-computation on things that have already been computed. So using it as state machine is not the way cache was intended.

To make this whole idea of caching more present I actually very like the idea from php-cds/php-cds#7 (comment)

public function cache($key, callable $callable, $ttl = null);

Which makes it pretty clear:

$user23 = $cache->cache('user-23', function() {
     return UserQuery::create()->findById(23);
}))

Of course $callable needs a way to tell the cache system nothing to compute, let your cache empty.
This is what null means. Thus: If the $callable returns null should not mean user-23 has the value null but this cache is null, recompute next time. If you want to have a state of user-23 being computed as not-existent then you should introduce either a new value so you can use the cache system as state machine or you use a separat state machine system and use cache is it was intended: to avoid re-computation over and over again :)

@Isinlor
Copy link

Isinlor commented May 13, 2016

@marcj

Of course $callable needs a way to say the cache system nothing to compute, let your cache empty.

Could you elaborate on that? Why callable needs to say to the cache system nothing to compute? Callable needs to compute and return result of computation. Therefore IMHO it's the other way around. Callable should not know that it will be cached. The consumer of the cache interface should know that result of the computation will or will not be cached as he is in control of that anyway.

@marcj
Copy link
Member Author

marcj commented May 13, 2016

@Isinlor because it makes the whole code in user land much more simple and readable. Having to detect whether $callable returns something valid which is worth to cache or not makes it more complex.

Example with user-42 from above.

When $callable is aware that it can trigger a caching or not:

$user23 = $cache->cache('user-23', function() {
     return UserQuery::create()->findById(23); //assume it returns null when not in the database
}));

When $callable doesn't know anything from the cache interface:

$user23 = $cache->cache('user-23', function() {
     return UserQuery::create()->findById(23);
}));

//we don't want to cache users that do not exist, because we do not clear those caches
//when they become available somewhere. So we need to remove that cache again.
if (null === $user23) {
    $cache->delete('user-23');
}

//alternatively you need to build `user-23` when they become available, which you need to keep in mind.

IMHO more complex and not necessary to force anyone to do that.

@marcj
Copy link
Member Author

marcj commented May 13, 2016

Actually, as I stated above, I think that when a computation method returns null then it means haven't computed anything, so if this is telling the cache system nothing to cache then it's fine for me. If the cache system does not interpret the result then its up to the code outside whether it should be cached or not, which makes the whole cache method with callable a bit useless as you don't save lines of code :)

@danielgsims
Copy link

@OddGreg I agree that the concerns about race conditions with a probing method can be taken care of in the implementation. I would rather have the discussion be around whether or not the function has merit.

If we wanted to look at apc, memache and redis as examples again, then apc and redis have an exists function, but memcache only has a getter.

@tedivm
Copy link

tedivm commented May 15, 2016

They can't be taken care of in the implementation (at least not one that doesn't introduce a host of complexities), and I would like to see someone demonstrate a counter example if they really believe it is possible. This is an API level "bug" that requires and API level solution.

That is another good point though- if memcache doesn't support it then what is the expected behavior of a memcache implementation of this interface?

@OddGreg
Copy link

OddGreg commented May 15, 2016

@tedivm

Are you asking this question of me? I assume not. And though I have been scanning through the various php memcache implementations, the extension is written in C. Perhaps someone could point me to the php source you are using for this discussion, or that you consider significant. Sorry.

I'll leave further comment until I can see the source code -- but I doubt very much that anyone involved in memcache (I may be wrong) cares at all about what we are doing. Not that that is the point.

PS: There is a lot of code to wade through (in the extension), so it may take some time.

@tedivm
Copy link

tedivm commented May 16, 2016

I was replying to @danielgsims

@OddGreg
Copy link

OddGreg commented May 16, 2016

@tedivm

Excellent. Carry on.

@CarwynNelson
Copy link

@peter-mw This seems like a bit of an edge case. Again I'm not saying that a cache library shouldn't implement this method, but I also don't think it should be a part of a standard / recommended interface.

@peter-mw
Copy link

@CarwynNelson , I dont think my case is an edge case. Generating a $key is more common that you may think. See for example 1, 2, 3, 4

@CarwynNelson
Copy link

@peter-mw Sorry, I should've been more specific. I think flushing the whole entire cache from the application code is a bit of an edge case.

@geggleto geggleto added the [RFC] label May 16, 2016
@Isinlor
Copy link

Isinlor commented May 16, 2016

@peter-mw We can always address your specific case later on with:

FlushCacheInterface {
    public function flush();
}

And you can type-hint on that.

IMHO has()is the same thing - we can add it later on.

@peter-mw
Copy link

@Isinlor @CarwynNelson IMHO many systems use some kind of automatic generation keys and it will be very hard for them to keep track of all cache keys and delete the keys one by one when they want fresh data. Thus the flush function is absolutely necessary. I don't see why we need to make separate interface for just 1 method.

Also the famous libs/frameworks have flush function and this function is there because its really needed by people like me. See Doctrine, Laravel, Stash and others...

@Isinlor
Copy link

Isinlor commented May 18, 2016

@peter-mw Your need for flush is of course reasonable and I agree that this is a common need.

Nevertheless I think that the simple interface should not target common use cases, but the smallest practical use case and keep the limitation enforced by the design of the API to the minimum. No client should be forced to depend on methods it does not use. If we decide on interface that has no issues with race conditions and does not limit stored values, we can extend it as we want, to as complex use cases as we want. Flush, tags, nested keys - sky is the limit. There is really no issue with implementing many interfaces. We can create the simple interface, many one (or couple) method interfaces and then we can create one advanced interface by extending the simple one and many one method ones. You would be able to type hint on the simplest one and the methods that you like, or take the advanced one.

Also PSR-6 is already addressing vast majority of common use cases very well.

@peter-mw
Copy link

peter-mw commented May 18, 2016

@Isinlor as i understand you want the interface's implementation to look something like this

interface SimpleCacheInterface
{
    public function get($key);
    public function set($key, $value, $ttl = null);
    public function delete($key);
}

interface FlushCacheInterface
{
    public function flush();
}


class MyCache implements SimpleCacheInterface, FlushCacheInterface {
    public function get($key){}
    public function set($key, $value, $ttl = null){}
    public function delete($key){}
    public function flush(){}
}

This looks OK to me and its easy to add and remove features

edit: i added $ttl parameter to the set method

@odan
Copy link

odan commented May 18, 2016

Sorry, but the word "flush" reminds me of a toilet
The function name "clear" would be better.

interface FlushCacheInterface
{
    public function clear();
}

@CarwynNelson
Copy link

@odan flush is more consistent with the terminology used by most caching tech from what I'm aware. I think we should stick with flush.

@danielgsims
Copy link

@odan I think you should embrace the toilet humor.

@danielgsims
Copy link

I would be a fan if we could namespace this in such a way that they can be bundled together, because I expect they will be used together often. Can we bundle these interfaces under one Cache related namespace? Hopefully these could be used for the other interfaces in the advanced cache talk.

@Isinlor
Copy link

Isinlor commented May 19, 2016

@peter- Yes, that is the idea. However my proposition of SimpleCacheInterface is little bit different - I would say even simpler - and address two concerns that set/get can not address. It is race conditions and how to represent a miss. They are important if we want to extend simple cache to advanced cache. Moreover there is simply no technical reason to prevent implementations from handling all values and condition races. Get/set interface is preventing that.

@peter-mw
Copy link

peter-mw commented May 19, 2016

hi @Isinlor your proposal is also good here are some thoughts on it

Pros:

  • Simple - just 2 methods
  • Handles race condition (this actually can be part of the "Advanced cache" proposal)

Cons:

  • enforcing a callable parameter, which imo should be part of the implementation not the interface
  • hard to adopt (out of the box) by the existing implementations that have separate get/set methods, see the links in my comment above, they all have separated get and set methods

@Isinlor
Copy link

Isinlor commented May 20, 2016

Enforcing anything in implementation, that is not enforced in interface is going against Liskov Substitution Principle.

Regarding already existing implementations. Seems like Laravel is already using idea that I propose - see remember and rememberForever. @tedivm is maintainer of Stash and he seems to have no issues with implementing the idea. I don't know what about Doctrine folks, but I'm not sure they are interested in any simple interface. See Doctrine representative proposition of cache interfaces.

Besides get/set methods can coexists with cache method without issues - Laravel is doing that. I would be more worried about delete method. I don't remember if PHP is allowing to implement two interfaces with conflicting method names and sharing the same signature.

@icambridge
Copy link

@Isinlor your interface would pretty much result in people having their entire method logic essentially inside a parameter for the caching logic.

Also what if I just want to set the value? I would have to make two calls instead of one.

In my opinion this simple interface results in complex/ugly code.

@Isinlor
Copy link

Isinlor commented May 20, 2016

@icambridge Putting logic into something that can be executed inside cache method is the whole idea.

If you want to just store some value you could do:

$storage->cache('key', new ValueObject($value)); // ValueObject implements __invoke()

However it is not common use case - it is needed only for warm up. One should not relay on value being stored in cache, therefore retrieving code must know how to retrieve value without cache. So, the full, proper use case look like that:

$value = $cache->cache('heavyQuery', function() {
     return HeavyQuery::create()->execute();
}));

or like that:

$value = $cache->get('heavyQuery');

if($value === null) {
    $value = HeavyQuery::create()->execute();
    $cache->set('heavyQuery', $value);
}

Ugliness or cleanliness of one or another depends on personal taste. However first solution allows to avoid race conditions and to store null values. This are objective advantages.

I'm not sure what you mean by:

I would have to make two calls instead of one.

About what calls are you speaking?

edit - Added clarification about value object.

@icambridge
Copy link

$storage->cache('key', new ValueObject($value));

looks very different

$cache->cache('heavyQuery', function() {
     return HeavyQuery::create()->execute();
}));

So the interface wouldn't only take callables but the class implementing it must decide what the use wants to do? This seems like it would lead to bugs.

Also it would mean we're giving two responsibilties to one method reading and writing. Personally I think this is generally a bad move. As if you look at the two samples you have to look at the code and the context to understand what is going on instead of looking at the method name.

About what calls are you speaking?

Calls to the caching storage system. Memcache/Redis/etc.

@odan
Copy link

odan commented May 20, 2016

@Isinlor You like to put logic (callback) into a cache from outside? Isn't it part of the implementation to implement this internal logic?

@mathroc
Copy link

mathroc commented May 20, 2016

@odan not sure what you are refering too. If i'm understanding correctly, you give the cache driver a recipe that it can use to compute the value if it's not in the cache pool. but you are not storign the function in the cache

@icambridge I agree with you; not a fan of the callable (set if not present) or value (force set) approach.
also it feels weird not being able to just ask if a key exists without setting it's value if it does not

@peter-mw
Copy link

peter-mw commented May 20, 2016

If we want to satisfy the needs of all we can have yet another interface for the callable method.

For example:

// file: SimpleCacheInterface.php
interface SimpleCacheInterface
{
    public function get($key);
    public function set($key, $value, $ttl = null);
    public function delete($key);
}

// file: ClearCacheInterface.php
interface ClearCacheInterface
{
    public function clear();
}

// file: CallableCacheInterface.php
interface CallableCacheInterface
{
    public function call($key, callable $value, $ttl = null);
}


// file: BaseCacheInterface.php
interface BaseCacheInterface extends SimpleCacheInterface, ClearCacheInterface, CallableCacheInterface {}

// file: MyCache.php
class MyCache implements BaseCacheInterface {
    public function get($key){}
    public function set($key, $value, $ttl = null){}
    public function delete($key){}
    public function call($key, callable $value, $ttl = null){}
    public function clear(){}
}

I don't know if its good or bad idea to fragmetize across so many interfaces.

Also should we be worry that the above example requires 4 file reads just to make the interface or its not big deal?

Initializing my class costs total of 5 file reads. If we split the interfaces too much this can lead to 5-10 files per standard. Is this an issue or no?

@Isinlor
Copy link

Isinlor commented May 21, 2016

@icambridge @mathroc I added a clarification - ValueObject can simply implement __invoke that return a value.

Also it would mean we're giving two responsibilties to one method reading and writing. Personally I think this is generally a bad move.

Single responsibility principle refers to classes/modules, not methods. Moreover there is only one responsibility of both interfaces - caching.

As if you look at the two samples you have to look at the code and the context to understand what is going on instead of looking at the method name.

$value = $cache->get('heavyQuery');

if($value === null) {
    // do something
}

if($value === false) {
    // do something
}

How do I know what nullmeans in this context? Is it a saved null from query or null because of miss? How do I know what false means in this context? Is it a saved false from query or false because of miss? cache method encapsulates representation of miss and does not leak this information.

@peter-mw Segregation of interfaces is well established practice in computer science, it has even a name - interface segregation principle. As I was saying before I like the idea. It allows clients to not depend on things it does not need. Number of files is not a concern. Build in PHP Opcache will cache retrieval of them for you. You can also always put them in one file if you wish, final proposition should not enforce structure of files in which interfaces would be put. Number of files should be really the last concern.

I would even move public function delete($key); to a separate interface.

@peter-mw
Copy link

Lets make our lives easier and NOT store null in the cache, null can simply mean a cache miss.

Is there a valid usecase for storing a null value? I haven't encountered such need in my practice so far, but some of you may have.

@odan
Copy link

odan commented May 28, 2016

@peter-mw Maybe somebody need to cache values from a database and the column is allowed to contain null "values". A null in this context means: "This value is not defined" and even this could be a importand information for somebody. For example: A Column updated_at contains null as default if the row is not updated, because there was only a insert an no update operation for this row.

A key cannot be null, but a value can be.

If you have to prevent null values we could add a default paramerter to the get function. If the value is null then the default value will be returned. If the key not exists we could return the default value. If the key is null a "ArgumentNullException" should be thrown.

namespace Pcs\Cache;

interface SimpleCacheInterface
{
    public function get($key, $default = null);
    public function set($key, $value, $ttl = null);
    public function remove($key);
    public function has($key);
    public function clear();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests