Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dump/restore support for Hypercore TAM #7356

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

erimatnor
Copy link
Contributor

@erimatnor erimatnor commented Oct 17, 2024

Add support for dumping and restoring hypertables that have chunks that use the Hypercore TAM.

Dumping a Hypercore table requires special consideration because its data is internally stored in two separate relations: one for compressed data and one for non-compressed data. The TAM returns data from both relations, but they may be dumped as separate tables. This risks dumping the compressed data twice: once via the TAM and once via the compressed table in compressed format.

The pg_dump tool uses COPY TO to create dumps of each table, and, to avoid data duplication when used on Hypercore tables, this change introduces a GUC that allows selecting one of these two behaviors:

  1. A COPY TO on a Hypercore table returns all data via the TAM, including data stored in the compressed relation. A COPY TO on the internal compressed relation returns no data.

  2. A COPY TO on a Hypercore returns only non-compressed data, while a COPY TO on the compressed relation returns compressed data. A SELECT still returns all the data as normal.

The second approach is the default because it is consistent with compression when Hypercore TAM is not used. It will produce a pg_dump archive that includes data in compressed form (if data was compressed when dumped). Conversely, option (1) will produce an archive that looks identical to a dump from an non-compressed table.

There are pros and cons of each dump format. A non-compressed archive is a platform-agnostic logical dump that can be restored to any platform and architecture, while a compressed archive includes data that is compressed in a platform-dependent way and needs to be restored to a compatible system.

A test is added that tests both these settings and corresponding dumping and restoring.

Disable-check: force-changelog-file

@erimatnor erimatnor changed the title Add dump/restore support for Hypercore Add dump/restore support for Hypercore TAM Oct 17, 2024
@erimatnor erimatnor force-pushed the hyperstore-pgdump branch 3 times, most recently from 098490f to ab64ed9 Compare October 17, 2024 11:17
Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 83.95062% with 13 lines in your changes missing coverage. Please review.

Project coverage is 82.51%. Comparing base (59f50f2) to head (304334f).
Report is 548 commits behind head on main.

Files with missing lines Patch % Lines
tsl/src/hypercore/hypercore_handler.c 76.66% 2 Missing and 5 partials ⚠️
tsl/src/process_utility.c 85.29% 3 Missing and 2 partials ⚠️
tsl/src/compression/compression.c 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7356      +/-   ##
==========================================
+ Coverage   80.06%   82.51%   +2.44%     
==========================================
  Files         190      228      +38     
  Lines       37181    42562    +5381     
  Branches     9450    10682    +1232     
==========================================
+ Hits        29770    35120    +5350     
- Misses       2997     3169     +172     
+ Partials     4414     4273     -141     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

A truncate on a hypercore TAM table is executed across both compressed
and non-compressed data. This caused an issue when recompressing
because it tries to truncate also the compressed data. Fix this issue
by introducing a flag that allows truncating only the non-compressed
data.

Another issue releated to cache invalidation is also fixed. Since a
recompression sometimes creates a new compressed relation, and the
compressed relid is cached in the Hypercore TAM's relcache entry, the
cache needs to be invalidated during recompression. However, this
wasn't done previously leading to an error. This is fixed by adding a
relcache invalidation during recompression.

Finally, compression using an index scan is disabled for Hypercore TAM
since the index covers also compressed data (in the recompression
case). While the index could be used when compressing the first time
(when only non-compressed data is indexed), it is still disabled
completely for Hypercore TAM given that index scans are not used by
default anyway.

Tests are added to cover all of the issues described above.
Replace the scankey flag used to skip compressed data when starting a
Hypercore scan with a function that sets this option on the scan
descriptor. Internally, use the scan flags instead of scankey flags to
convey this setting.

Overriding scankey flags was not ideal since this is supposed to be
per-column settings and not overall scan settings.

It is possible to set the scan flags when calling table_beginscan(),
but current table scan functions do not expose flags and instead have
a separate function for each flag settings. Hypercore could define its
own beginscan function to do the same, but this is left for the
future.
Add support for dumping and restoring hypertables that have chunks
that use the Hypercore TAM.

Dumping a Hypercore table requires special consideration because its
data is internally stored in two separate relations: one for
compressed data and one for non-compressed data. The TAM returns data
from both relations, but they may be dumped as separate tables. This
risks dumping the compressed data twice: once via the TAM and once via
the compressed table in compressed format.

The `pg_dump` tool uses `COPY TO` to create dumps of each table, and,
to avoid data duplication when used on Hypercore tables, this change
introduces a GUC that allows selecting one of these two behaviors:

1. A `COPY TO` on a Hypercore table returns all data via the TAM,
   including data stored in the compressed relation. A `COPY TO` on
   the internal compressed relation returns no data.

2. A `COPY TO` on a Hypercore returns only non-compressed data, while
   a `COPY TO` on the compressed relation returns compressed data. A
   `SELECT` still returns all the data as normal.

The second approach is the default because it is consistent with
compression when Hypercore TAM is not used. It will produce a
`pg_dump` archive that includes data in compressed form (if data was
compressed when dumped). Conversely, option (1) will produce an
archive that looks identical to a dump from an non-compressed table.

There are pros and cons of each dump format. A non-compressed archive
is a platform-agnostic logical dump that can be restored to any
platform and architecture, while a compressed archive includes data
that is compressed in a platform-dependent way and needs to be
restored to a compatible system.

A test is added that tests both these settings and corresponding
dumping and restoring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant