Skip to content

garry-t/experimental-pagination

 
 

Repository files navigation

Pagination (for large table)

What is large table?

  • More then 10m records
  • Test table has 109 987 200 (~110m) records

Pagination approaches

  • Traditional (rails way)
  • Keyset pagination or pagination with relative cursors

Traditional (Rails) approach

  • Using Limit-Offset-Count
    • Works fine for the first pages (~1000-2000)
  • Works fine for small tables (most of railsway/startup cases)

Examples

  • Get all data
    • first page 109 987 200 records
      === Render data: index === (20.9ms)
      === Render pagination: traditional index === (25756.6ms)
      
      caused by count
    • first page (w/o count) 109 987 200 records
      === Render data: wo_count === (4.9ms)
      === Render pagination: traditional wo_count === (0.6ms)
      
    • page #500000 109 987 200 records
      === Render data: index === (1135.1ms)
      === Render pagination: traditional index === (26448.8ms)
      
      caused by count and offset
    • page #500000 (w/o count) 109 987 200 records
      === Render data: wo_count === (871.4ms)
      === Render pagination: traditional wo_count === (1.2ms)
      
      caused by offset
  • Search by fields
    • indexed fields 44 records (by created_at)
      === Render data: index === (6.2ms)
      === Render pagination: traditional index === (4.0ms)
      
      thanks to index
    • non-indexed fields 44 records (by updated_at)
      === Render data: index === (30307.6ms)
      === Render pagination: traditional index === (30863.7ms)
      
      caused by count and offset, and lack of index
  • When scope is big but w/ index
    • first page 15 811 244 records
      === Render data: index === (34.8ms)
      === Render pagination: traditional index === (884.9ms)
      
    • page #500000 15 811 244 records
      === Render data: index === (1345.3ms)
      === Render pagination: traditional index === (647.4ms)
      
      caused by offset
  • When scope is big but w/o index
    • first page 15 811 244 records
      === Render data: index === (270.2ms)
      === Render pagination: traditional index === (25992.6ms)
      
    • page #500000 15 811 244 records
      === Render data: index === (1309.3ms)
      === Render pagination: traditional index === (25278.8ms)
      
      caused by offset

Keyset pagination or pagination with relative cursors

Examples

  • Get all data
    • first page 109 987 200 records
      === Render data: index === (0.5ms)
      === Render pagination: keyset index === (0.7ms)
      
    • page #500000 109 987 200 records
      === Render data: index === (1.6ms)
      === Render pagination: keyset index === (0.3ms)
      
  • When scope is big but w/ index
    • first page 15 811 244 records
      === Render data: index === (0.7ms)
      === Render pagination: keyset index === (0.4ms)
      
    • page #500000 15 811 244 records
      === Render data: index === (0.9ms)
      === Render pagination: keyset index === (0.4ms)
      
  • When scope is big but w/o index
    • first page
      === Render data: index === (1831.6ms)
      === Render pagination: keyset index === (1.5ms)
      
    • second page
      === Render data: index === (1939.9ms)
      === Render pagination: keyset index === (1.5ms)
      

References

Gems

Traditional vs Keyset

page 500001

Solution

Tricks with index

Getting 25 records out of 41 444

  • indexed field
    EXPLAIN ANALYZE SELECT "miner_data".* FROM "miner_data"
        WHERE
        (created_at >= '2017-01-01 00:00:00')
        AND (created_at <= '2017-01-01 23:01:26')
        ORDER BY "miner_data"."id" ASC
        limit 25
    
    "Limit  (cost=0.57..7.89 rows=25 width=61) (actual time=2167.103..2167.108 rows=25 loops=1)"
    "  ->  Index Scan using miner_data_pkey on miner_data  (cost=0.57..4655891.37 rows=15893334 width=61) (actual time=2167.101..2167.106 rows=25 loops=1)"
    "        Filter: ((created_at >= '2017-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2018-01-01 00:01:26'::timestamp without time zone))"
    "        Rows Removed by Filter: 15811200"
    "Planning Time: 6.494 ms"
    "Execution Time: 2167.160 ms"
    
    • non indexed field
    EXPLAIN ANALYZE SELECT "miner_data".* FROM "miner_data"
        WHERE
        (updated_at >= '2017-01-01 00:00:00')
        AND (updated_at <= '2017-01-01 23:01:26')
        ORDER BY "miner_data"."id" ASC
        limit 25
    
    "Limit  (cost=0.57..2774.57 rows=25 width=61) (actual time=2383.758..2383.765 rows=25 loops=1)"
    "  ->  Index Scan using miner_data_pkey on miner_data  (cost=0.57..4655891.37 rows=41960 width=61) (actual time=2383.757..2383.762 rows=25 loops=1)"
    "        Filter: ((updated_at >= '2017-01-01 00:00:00'::timestamp without time zone) AND (updated_at <= '2017-01-01 23:01:26'::timestamp without time zone))"
    "        Rows Removed by Filter: 15811200"
    "Planning Time: 0.343 ms"
    "Execution Time: 2383.820 ms"
    

Getting 25 records out of 39 644

  • indexed field
    EXPLAIN ANALYZE SELECT "miner_data".* FROM "miner_data"
        WHERE
        (created_at >= '2017-01-01 00:00:00')
        AND (created_at <= '2017-01-01 22:01:26')
        ORDER BY "miner_data"."id" ASC
        limit 25
    "Limit  (cost=147.83..147.89 rows=25 width=61) (actual time=7.288..7.297 rows=25 loops=1)"
    "  ->  Sort  (cost=147.83..152.87 rows=2017 width=61) (actual time=7.287..7.290 rows=25 loops=1)"
    "        Sort Key: id"
    "        Sort Method: top-N heapsort  Memory: 28kB"
    "        ->  Index Scan using index_miner_data_on_created_at on miner_data  (cost=0.57..90.91 rows=2017 width=61) (actual time=2.381..6.060 rows=1844 loops=1)"
    "              Index Cond: ((created_at >= '2017-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2017-01-01 01:01:26'::timestamp without time zone))"
    "Planning Time: 0.338 ms"
    "Execution Time: 7.364 ms"
    
    • non indexed field
    EXPLAIN ANALYZE SELECT "miner_data".* FROM "miner_data"
        WHERE
        (updated_at >= '2017-01-01 00:00:00')
        AND (updated_at <= '2017-01-01 22:01:26')
        ORDER BY "miner_data"."id" ASC
        limit 25
    "Limit  (cost=0.57..2894.44 rows=25 width=61) (actual time=2064.903..2064.909 rows=25 loops=1)"
    "  ->  Index Scan using miner_data_pkey on miner_data  (cost=0.57..4655891.37 rows=40222 width=61) (actual time=2064.902..2064.906 rows=25 loops=1)"
    "        Filter: ((updated_at >= '2017-01-01 00:00:00'::timestamp without time zone) AND (updated_at <= '2017-01-01 22:01:26'::timestamp without time zone))"
    "        Rows Removed by Filter: 15811200"
    "Planning Time: 0.359 ms"
    "Execution Time: 2064.947 ms"
    

Solutions

  • Reduce nubmer of records :)

Conclustion

  • Count is expensive procedure
  • Offset is expensive procedure

General references

Count

https://stackoverflow.com/questions/7943233/fast-way-to-discover-the-row-count-of-a-table-in-postgresql/7945274#7945274

SELECT count(*) FROM "miner_data";
=> 109987200
29.075 sec
SELECT (reltuples / relpages * (pg_relation_size(oid) / 8192))::bigint
FROM   pg_class
WHERE  oid = 'miner_data'::regclass;
=> 109987240
00.087 sec
SELECT (CASE WHEN c.reltuples < 0 THEN NULL       -- never vacuumed
             WHEN c.relpages = 0 THEN float8 '0'  -- empty table
             ELSE c.reltuples / c.relpages END
     * (pg_catalog.pg_relation_size(c.oid)
      / pg_catalog.current_setting('block_size')::int)
       )::bigint
FROM   pg_catalog.pg_class c
WHERE  c.oid = 'public.miner_data'::regclass;
=> 109987240
0.052 sec

About

Experiment with different variants of pagination

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ruby 76.4%
  • HTML 16.6%
  • Haml 3.9%
  • CSS 2.3%
  • JavaScript 0.8%