Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagination #42

Open
blitzmann opened this issue Feb 9, 2014 · 19 comments
Open

Pagination #42

blitzmann opened this issue Feb 9, 2014 · 19 comments

Comments

@blitzmann
Copy link
Contributor

I know there is a notice that PHPRedmin doesn't support pagination yet. As far as I'm aware, it's been there since I started using it (way back when those little pills were still used for navigation).

Just curious where development is on this, if it has even started yet? What might be involved with implementing it?

@eugef
Copy link
Contributor

eugef commented Feb 9, 2014

The main problem is that Redis KEY command (http://redis.io/commands/keys) that is used to search for keys doesn't support any pagination.

So if you have 100000 keys - all of them will be returned at once.

The only solution is to make pagination on application level, but then memory consumption
problems should be solved, because having lot of keys in PHP array is not a good idea.

I can propose to use redis-cli by PHP exec() function and save its output to the temporary "keys file". Then we can iterate through the lines of that file to create pagination.

In this solution the weak place is that after any delete/add/rename/move operation
"keys file' should be recreated.

Instead of using redis-cli we can try to use php worker for gearman, but using of redis-cli looks easier for me (and phpredmin won't depend on gearman so much)

@blitzmann
Copy link
Contributor Author

I figured it had something to do with redis and gathering all the keys.

Not too familiar with gearman, but I can see the usefulness of creating a file cache of keys. Would this file be updated everytime a search takes place? On a cronjob? If the file is generated on-the-fly we may not need to worry too much about moving / deleting operations. (I am soon going to propose a change as to how this may work anyway)

@sasanrose
Copy link
Owner

Actually I don't like the idea of depending on exec() neither. Since, a lot of administrators disable it for security reason. Therefore, it is not a good idea to expose the first and most important functionality of the application to the danger of not working at all. However, I believe, as @eugef mentioned, we have to handle this (If we are planning to do) in application layer. But, within PHP's functionality.

@blitzmann
Copy link
Contributor Author

How does PHPRedmin currently fetch all keys from database? I assume it does not rely on exec() - would it be possible to simply feed that result to a file and then feed the file to a pager?

@eugef
Copy link
Contributor

eugef commented Feb 9, 2014

Now all keys are fetched using KEY command to the PHP array. But, as i mentioned above, this could lead to memory allocation error in case of huge databases.

Also list of keys should be saved somewhere between requests to paginate it. I propose to use file for this (because file can be read line by line very quickly).

Draft of workflow:

  • for each search request we geneare unique request_id and save keys in temporary file with "request_id" in its name
  • pagination link will contain page number and request_id - so we can read appropriate "key file" and return part from it

@blitzmann
Copy link
Contributor Author

That sounds like a pretty standard and elegant solution. So every search will be a snapshot of the keys in the database at that time. Curious: Are keys returned in a specific and predicable order?

I'm not an expert on how Redis works by any means (all my DBs are simple key=>value strings). But I've done a quick check with Google, and it seems you can accomplish something similar to pagination within sorted sets. Perhaps we can dump the existing keys to a new sorted set and use the features of the redis commands to sort of page through the keys. Maybe even give PHPRedmin it's own database to use to store these things.

This doesn't address the issue with loading the keys into php first. I've looked through the redis docs and there doesn't seem to be a Redis command to dump all keys somewhere. Would be extremely easy to bypass php limitations with exec()...

Perhaps check to see if exec() is available and, if so, use that to do the dumping. If not, warn user that PHP will be handling it and that memory allocation may be a problem for very large databases.

Or crazy idea: maybe it would be possible to read from the RDB directly.

RDB Specs

Obviously this will require the user to actually use persistent saves, and have php able to read the RDB which may not be possible in some setup. It's fun to think about anyway.

@eugef
Copy link
Contributor

eugef commented Feb 10, 2014

Every search we will create snapshot of the keys that meet search criteria. Keys are returned in no order (i think in order they are stored in memory) so the next step should be to sort them (actually phpredmin does it right now by PHP asort() function).

I also like the idea to save "keys list" in Redis list - it would be faster than saving/reading from the file (in case operations will be done by PHP script). For now phpredmin already uses "phpredmin:*" sorted sets to save statistics data that is displayed in graphs. The benefit of storing in Redis is that we can define TTL for each "keys list" to expire it automatically.

Here is example workflow of the search process:

  1. KEY command in PHP script is used to get all the keys matching search criteria.
  2. All the keys are sorted alphabetically by PHP asort() fucntion (to speed up whole process this operation could be omitted)
  3. Unique search_id is generated
  4. All keys are stored as values of phpredmin:search:<search_id> list using RPUSH command. (multiple values can be inserted in one command)
  5. With LRANGE command we could get subset of keys to display on specific page, by default return 1000 (maybe less?) keys per page
    5.1 TTL for phpredmin:search:<search_id> is set for 1 hour (maybe less?)

Pagination workflow:
6. Pagination link contains search criteria, search_id and page number
7. We check if phpredmin:search:<search_id> still exists if yes - go to step 5, if no - to step 1

I am pretty sure that this workflow will be fine if less then 100000 keys will match the search criteria. In other case bigger memory_limit for PHP script should be set. We can pay attention on this in documentation.

The only drawback of this approach is that if all available databases are already used, then phpredmin will store its data in one DB with user data - this what i want to avoid and that is why i've first proposed to use temporary files.

Regarding reading from RDB - it would be definitely much slower than using KEY command because:

  • reading from disk is slower then reading from memory
  • iterating through big file by PHP script is much slower than reading the memory by compiled program

@blitzmann
Copy link
Contributor Author

The only drawback of this approach is that if all available databases are already used, then phpredmin will store its data in one DB with user data - this what i want to avoid and that is why i've first proposed to use temporary files.

How many people use all available databases? Perhaps people that don't control their own environment, they may get one or two. I don't think it's inherently bad form to use the user databases to store metadata pertaining to PHPRedmin, but I think it would be a good idea to choose between user DB and a DB dedicated to PHPRedmin.

As for memory, simply check if exec() is available. If it is, use that to dump the keys into redis. If not, give user a warning that PHP might not be able to handle large DBs. Try it, and if we run out of memory we can catch that and let the user know to limit the search further (and possibly set a blacklist of searches that won't work).

@sasanrose
Copy link
Owner

Can you both tell me a scenario in which pagination is needed? Since, I have never needed it myself. Besides, I believe there is a reason that it is not implemented in Redis either. You have to name your keys properly. If you do so you can always limit your search criteria.
I am saying this because from my point of view every method that we use has a drawback. Therefore, I like to know if it really worth to implement it or not

@eugef
Copy link
Contributor

eugef commented Feb 11, 2014

Simple scenario - if you have more than 10000 search results it is better to paginate them or browser can crash. Pagination is needed to prevent such an issue even if user by mistake searched for "*"

If we care about good user experience we should implement this feature.

@sasanrose
Copy link
Owner

First, in the aforementioned scenario I'd rather the browser to crash than using up a great deal of server's memory and process. Furthermore, one should name the keys in a proper way to be able to limit the searching criteria. A case in this point is PHPRedmin itself. As you said we store all the PHPRedmins keys with a phpredmin: prefix. Therefore, you can view all the associated keys using phpredmin_. Likewise, if you want to locate stats keys you can search phpredmin:stats_. From my point of view, what you are suggesting is in contrast to the concept of a key-value store.

@eugef
Copy link
Contributor

eugef commented Feb 11, 2014

First, You can't guarantee that end-users will use your software in a "proper" way and won't use search for the all keys on a big dataset.

Second, for example now i have more that 10000 "hash:user:*" keys in my database and i can hardly use phpredmin to look through them. Pagination will help with this

@blitzmann
Copy link
Contributor Author

It's just plain good design. Whenever you have a large dataset, people expect pagination. Even if you have 200 keys (which is tiny), the list can be very long in the browser. Sure, you should always try to narrow it down, but as @eugef said you cannot be sure the end user names keys properly, or uses applications (that they may not be able to control) that names keys properly.

I have a 67,000 key database that all begins with emdr-1- then has a 4 to 6 digit integer for one of my applications. There's no way of narrowing this down without knowing this integer. Furthermore, the integer is random. 1234 can represent a horse, whereas 1235 could represent the universe. So I can't use 123* and expect my results to be limited by group or like data. Sometimes I would just like to browse this data with *.

You have to cover your bases and try to account for use-cases that you can't think of.

I don't think the drawbacks are too big for this type of situation. Especially if we use Redis to store the pagination results and paginate from there. The only issue would be deciding between gathering all keys in PHP and worrying about memory or using the shell and dumping directly to and from Redis. You can even enable/disable pagination in config so that production environments are not at too much risk of using up a lot of memory.

Side note: how big does the database even have to be to risk running into an average max memory allocation?

@sasanrose
Copy link
Owner

What I am saying is that we have to see whether the advantages of pagination outweigh its disadvantages. I'd rather not to make PHPRedmin a tool that server administrators avoid to use it. At the company that I work we have approximately 2 million keys in our Redis database. From my point of view there is absolutely no way to handle this much of data in application level without consuming server's resources. By implementing this we provide end users with a way to perform a kind of DOS attack. Every time they search '*' we use up a lot of resources to handle it. However, I like the idea of making it configurable. I mean as @blitzmann suggested disable it by default in the config. Consequently, system administrators can make sure that they are using a safe tool.

@eugef
Copy link
Contributor

eugef commented Feb 11, 2014

Hi @sasanrose, i do not agree with you. Because namely without pagination phpredmin can cause DDOS attack on the server.

Now if you have 2 million keys and search for "*" you will definetly hang the server, not because of memory usage, but because for each key phpredmin makes about 6 requests to load additional data such as ttl (see https://github.com/sasanrose/phpredmin/blob/master/views/keys/search.php#L52).

If we implement pagination then additional information will be loaded only for keys listed on one page (no more than 1000 keys).

@blitzmann
Copy link
Contributor Author

I agree with what @eugef says.

Furthermore, having pagination will allow us to return to search results after navigating away form the page (for example with the current master, editing a key, however I plan to propose a change to this). If we navigate away from the search results, we can always return with the pagination data without any extra stress to the server.

@blitzmann
Copy link
Contributor Author

Could we use php's memory wrapper to help with memory problems associated with this? I don't know if it's affected by memory_limit, but it might help even still since it should automatically switch over to a file if memory reaches a specific size.

php://memory and temp

@eugef
Copy link
Contributor

eugef commented Feb 13, 2014

To have some estimation on memory usage i have created several test scripts:

http://3v4l.org/6tIHi - for 100000 keys
http://3v4l.org/5PlNa -for 150000 keys

Average key length is 20 characters.

As you can see:

  • 100000 keys occupy ~18MB.
  • 150000 keys occupy ~ 28MB.

So for 1 million keys i can assume that memory usage will be around 160 - 200MB
If memory_limit is set to 256MB there won't be any issues.

I think we should describe this limitation in phpredmin documentation and if user has DB with more that 1 million keys he should be able to change memory_limit in php.ini.

Another possible solution is to use ini_set('memory_limit', '256MB') before executing KEY command if DB has more than 1 million of keys (we assume that all of them could match the search criteria)

@emilianobonassi
Copy link

What do you think using SCAN on Redis >= 2.8?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants