-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API to get only row keys #12
Comments
No, this is not possible, and functionality like that is not in the Thrift API either. I think you should rethink your design. Scanning rows like you suggested is horribly inefficient, since it results in a lot of useless I/O on the region servers (the data is still read from disk, even though it will not be used). A better option is to keep aggregate counters when inserting data (use Table.counter_inc() for that) and build your pagination using that information. |
Oh, and for the 'next page' link you should remember the last row key from the current page and scan from that row onwards. |
Hi, out of curiosity: is your problem solved? |
Not really. My problem is getting row keys between given range, and not all row keys. so it would look like get_all_row_keys(start_row, end_row): I was looking at KeyOnlyFilter() http://hbase.apache.org/book/thrift.html but that gives column keys too |
Have you looked at the part about scanners in the tutorial? That can be used to specify start and stop keys. Combine it with FirstKeyOnlyFilter to avoid sending complete rows (but only a single cell per row) over the wire. I think it's not going to get any better than that with the current Thrift API (and not with the Java API either). |
Missed FirstKeyOnlyFilter, looks like that should return only row_keys and first column key and value, which is definitely better than getting all column keys (and values) |
Code example (untested): scanner = table.scan(row_start=b'aaa', row_stop=b'bbb', filter=b'FirstKeyOnlyFilter()')
row_keys = [key for key, data in scanner] |
Yeah, I tested using my data and it works |
Cool thanks!! |
I have just opened issue #14. Ideas and patches welcome. :) |
haha! Sure, but I like what you have currently. Thin client which acts as "pass through" to Thrift service on hbase server. This way, you don't have to update python-client-api whenever Thrift service updates list of commands/filters it supports. I will add some examples on how to construct proper filter string expressions |
Great, thanks. I agree with you about keeping up to date, but some helper functions might be useful nonetheless, mostly for properly escaping binary data and so on. |
Just figured out that |
Complete answer/example: scanner = table.scan(
row_start=b'aaa',
row_stop=b'bbb',
filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()',
)
for row_key, data in scanner:
pass # do something with row_key |
Yeah, I probably should have updated this thread. But I was already using the above compound filter |
I assumed so, but I posted it anyway for posteriority and for others on the internet who may stumble upon this issue. :) |
table.scan(filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()') works like a charm! |
table.scan(filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()') This the most efficient I found till now. Thanks wbolster |
Is there a way to get just the row keys (and don't get the column data) from hbase using happybase.
Usecase:
I am trying to implement pagination on rows. My row keys are random integers, they are unique but not sequential.
The closest to efficient pagination I could think of is
a. Get all the row keys
b. Loop through row keys (in batch of 100) and get the column data, when needed
The text was updated successfully, but these errors were encountered: