API to get only row keys #12

vskr · 2013-01-16T23:47:33Z

Is there a way to get just the row keys (and don't get the column data) from hbase using happybase.

Usecase:
I am trying to implement pagination on rows. My row keys are random integers, they are unique but not sequential.

The closest to efficient pagination I could think of is

a. Get all the row keys
b. Loop through row keys (in batch of 100) and get the column data, when needed

wbolster · 2013-01-17T08:18:43Z

No, this is not possible, and functionality like that is not in the Thrift API either.

I think you should rethink your design. Scanning rows like you suggested is horribly inefficient, since it results in a lot of useless I/O on the region servers (the data is still read from disk, even though it will not be used). A better option is to keep aggregate counters when inserting data (use Table.counter_inc() for that) and build your pagination using that information.

wbolster · 2013-01-17T08:20:44Z

Oh, and for the 'next page' link you should remember the last row key from the current page and scan from that row onwards.

wbolster · 2013-01-25T23:00:18Z

Hi, out of curiosity: is your problem solved?

vskr · 2013-01-25T23:46:54Z

Not really. My problem is getting row keys between given range, and not all row keys.

so it would look like get_all_row_keys(start_row, end_row):
and returns [row_key_1, row_key_2,....row_key_last_index]

I was looking at KeyOnlyFilter() http://hbase.apache.org/book/thrift.html but that gives column keys too

wbolster · 2013-01-25T23:55:20Z

Have you looked at the part about scanners in the tutorial? That can be used to specify start and stop keys. Combine it with FirstKeyOnlyFilter to avoid sending complete rows (but only a single cell per row) over the wire. I think it's not going to get any better than that with the current Thrift API (and not with the Java API either).

vskr · 2013-01-25T23:58:10Z

Missed FirstKeyOnlyFilter, looks like that should return only row_keys and first column key and value, which is definitely better than getting all column keys (and values)

wbolster · 2013-01-25T23:58:33Z

Code example (untested):

scanner = table.scan(row_start=b'aaa', row_stop=b'bbb', filter=b'FirstKeyOnlyFilter()')
row_keys = [key for key, data in scanner]

vskr · 2013-01-25T23:59:29Z

Yeah, I tested using my data and it works

vskr · 2013-01-25T23:59:37Z

Cool thanks!!

wbolster · 2013-01-26T00:01:19Z

I have just opened issue #14. Ideas and patches welcome. :)

vskr · 2013-01-26T00:06:05Z

haha! Sure, but I like what you have currently. Thin client which acts as "pass through" to Thrift service on hbase server. This way, you don't have to update python-client-api whenever Thrift service updates list of commands/filters it supports.

I will add some examples on how to construct proper filter string expressions

wbolster · 2013-01-26T00:08:04Z

Great, thanks. I agree with you about keeping up to date, but some helper functions might be useful nonetheless, mostly for properly escaping binary data and so on.

wbolster · 2013-01-27T13:58:18Z

Just figured out that filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter() is even better for your use case (counting rows).

wbolster · 2013-01-29T18:42:24Z

Complete answer/example:

scanner = table.scan(
    row_start=b'aaa',
    row_stop=b'bbb',
    filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()',
)

for row_key, data in scanner:
    pass  # do something with row_key

vskr · 2013-01-29T20:33:06Z

Yeah, I probably should have updated this thread. But I was already using the above compound filter

wbolster · 2013-01-29T20:44:02Z

I assumed so, but I posted it anyway for posteriority and for others on the internet who may stumble upon this issue. :)

ghost · 2015-02-02T16:11:24Z

table.scan(filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()')

works like a charm!
Thank you @wbolster!

bhalgat20 · 2015-04-03T07:33:05Z

table.scan(filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()')

This the most efficient I found till now. Thanks wbolster

vskr closed this as completed Jan 17, 2013

wbolster reopened this Jan 25, 2013

wbolster mentioned this issue Jan 26, 2013

Write nice support functions to construct filter strings for table.scan(filter=...) #14

Open

wbolster closed this as completed Jan 29, 2013

python-happybase deleted a comment from UmfintechWtc Mar 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API to get only row keys #12

API to get only row keys #12

vskr commented Jan 16, 2013

wbolster commented Jan 17, 2013

wbolster commented Jan 17, 2013

wbolster commented Jan 25, 2013

vskr commented Jan 25, 2013

wbolster commented Jan 25, 2013

vskr commented Jan 25, 2013

wbolster commented Jan 25, 2013

vskr commented Jan 25, 2013

vskr commented Jan 25, 2013

wbolster commented Jan 26, 2013

vskr commented Jan 26, 2013

wbolster commented Jan 26, 2013

wbolster commented Jan 27, 2013

wbolster commented Jan 29, 2013

vskr commented Jan 29, 2013

wbolster commented Jan 29, 2013

ghost commented Feb 2, 2015

bhalgat20 commented Apr 3, 2015

API to get only row keys #12

API to get only row keys #12

Comments

vskr commented Jan 16, 2013

wbolster commented Jan 17, 2013

wbolster commented Jan 17, 2013

wbolster commented Jan 25, 2013

vskr commented Jan 25, 2013

wbolster commented Jan 25, 2013

vskr commented Jan 25, 2013

wbolster commented Jan 25, 2013

vskr commented Jan 25, 2013

vskr commented Jan 25, 2013

wbolster commented Jan 26, 2013

vskr commented Jan 26, 2013

wbolster commented Jan 26, 2013

wbolster commented Jan 27, 2013

wbolster commented Jan 29, 2013

vskr commented Jan 29, 2013

wbolster commented Jan 29, 2013

ghost commented Feb 2, 2015

bhalgat20 commented Apr 3, 2015