Add dump/restore support for Hypercore TAM #7356

erimatnor · 2024-10-17T07:32:15Z

Add support for dumping and restoring hypertables that have chunks that use the Hypercore TAM.

Dumping a Hypercore table requires special consideration because its data is internally stored in two separate relations: one for compressed data and one for non-compressed data. The TAM returns data from both relations, but they may be dumped as separate tables. This risks dumping the compressed data twice: once via the TAM and once via the compressed table in compressed format.

The pg_dump tool uses COPY TO to create dumps of each table, and, to avoid data duplication when used on Hypercore tables, this change introduces a GUC that allows selecting one of these two behaviors:

A COPY TO on a Hypercore table returns all data via the TAM, including data stored in the compressed relation. A COPY TO on the internal compressed relation returns no data.
A COPY TO on a Hypercore returns only non-compressed data, while a COPY TO on the compressed relation returns compressed data. A SELECT still returns all the data as normal.

The second approach is the default because it is consistent with compression when Hypercore TAM is not used. It will produce a pg_dump archive that includes data in compressed form (if data was compressed when dumped). Conversely, option (1) will produce an archive that looks identical to a dump from an non-compressed table.

There are pros and cons of each dump format. A non-compressed archive is a platform-agnostic logical dump that can be restored to any platform and architecture, while a compressed archive includes data that is compressed in a platform-dependent way and needs to be restored to a compatible system.

A test is added that tests both these settings and corresponding dumping and restoring.

Disable-check: force-changelog-file

codecov · 2024-10-17T11:28:30Z

Codecov Report

Attention: Patch coverage is 83.95062% with 13 lines in your changes missing coverage. Please review.

Project coverage is 82.51%. Comparing base (59f50f2) to head (304334f).
Report is 548 commits behind head on main.

Files with missing lines	Patch %	Lines
tsl/src/hypercore/hypercore_handler.c	76.66%	2 Missing and 5 partials ⚠️
tsl/src/process_utility.c	85.29%	3 Missing and 2 partials ⚠️
tsl/src/compression/compression.c	83.33%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7356      +/-   ##
==========================================
+ Coverage   80.06%   82.51%   +2.44%     
==========================================
  Files         190      228      +38     
  Lines       37181    42562    +5381     
  Branches     9450    10682    +1232     
==========================================
+ Hits        29770    35120    +5350     
- Misses       2997     3169     +172     
+ Partials     4414     4273     -141

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

A truncate on a hypercore TAM table is executed across both compressed and non-compressed data. This caused an issue when recompressing because it tries to truncate also the compressed data. Fix this issue by introducing a flag that allows truncating only the non-compressed data. Another issue releated to cache invalidation is also fixed. Since a recompression sometimes creates a new compressed relation, and the compressed relid is cached in the Hypercore TAM's relcache entry, the cache needs to be invalidated during recompression. However, this wasn't done previously leading to an error. This is fixed by adding a relcache invalidation during recompression. Finally, compression using an index scan is disabled for Hypercore TAM since the index covers also compressed data (in the recompression case). While the index could be used when compressing the first time (when only non-compressed data is indexed), it is still disabled completely for Hypercore TAM given that index scans are not used by default anyway. Tests are added to cover all of the issues described above.

Replace the scankey flag used to skip compressed data when starting a Hypercore scan with a function that sets this option on the scan descriptor. Internally, use the scan flags instead of scankey flags to convey this setting. Overriding scankey flags was not ideal since this is supposed to be per-column settings and not overall scan settings. It is possible to set the scan flags when calling table_beginscan(), but current table scan functions do not expose flags and instead have a separate function for each flag settings. Hypercore could define its own beginscan function to do the same, but this is left for the future.

Add support for dumping and restoring hypertables that have chunks that use the Hypercore TAM. Dumping a Hypercore table requires special consideration because its data is internally stored in two separate relations: one for compressed data and one for non-compressed data. The TAM returns data from both relations, but they may be dumped as separate tables. This risks dumping the compressed data twice: once via the TAM and once via the compressed table in compressed format. The `pg_dump` tool uses `COPY TO` to create dumps of each table, and, to avoid data duplication when used on Hypercore tables, this change introduces a GUC that allows selecting one of these two behaviors: 1. A `COPY TO` on a Hypercore table returns all data via the TAM, including data stored in the compressed relation. A `COPY TO` on the internal compressed relation returns no data. 2. A `COPY TO` on a Hypercore returns only non-compressed data, while a `COPY TO` on the compressed relation returns compressed data. A `SELECT` still returns all the data as normal. The second approach is the default because it is consistent with compression when Hypercore TAM is not used. It will produce a `pg_dump` archive that includes data in compressed form (if data was compressed when dumped). Conversely, option (1) will produce an archive that looks identical to a dump from an non-compressed table. There are pros and cons of each dump format. A non-compressed archive is a platform-agnostic logical dump that can be restored to any platform and architecture, while a compressed archive includes data that is compressed in a platform-dependent way and needs to be restored to a compatible system. A test is added that tests both these settings and corresponding dumping and restoring.

erimatnor added the hypercore label Oct 17, 2024

erimatnor requested review from fabriziomello, mkindahl and antekresic October 17, 2024 07:32

erimatnor force-pushed the hyperstore-pgdump branch from 669a90e to d7fedcb Compare October 17, 2024 07:33

erimatnor changed the title ~~Add dump/restore support for Hypercore~~ Add dump/restore support for Hypercore TAM Oct 17, 2024

erimatnor force-pushed the hyperstore-pgdump branch 3 times, most recently from 098490f to ab64ed9 Compare October 17, 2024 11:17

erimatnor added 3 commits October 18, 2024 12:48

erimatnor force-pushed the hyperstore-pgdump branch from ab64ed9 to 304334f Compare October 18, 2024 12:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dump/restore support for Hypercore TAM #7356

Add dump/restore support for Hypercore TAM #7356

erimatnor commented Oct 17, 2024 •

edited

Loading

codecov bot commented Oct 17, 2024 •

edited

Loading

Add dump/restore support for Hypercore TAM #7356

Are you sure you want to change the base?

Add dump/restore support for Hypercore TAM #7356

Conversation

erimatnor commented Oct 17, 2024 • edited Loading

codecov bot commented Oct 17, 2024 • edited Loading

Codecov Report

erimatnor commented Oct 17, 2024 •

edited

Loading

codecov bot commented Oct 17, 2024 •

edited

Loading