Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/changefeedccl: add compression options for webhook sink #138872

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

massimo-ua
Copy link

Added gzip compression option to the changefeed webhook sink to reduce both network bytes and storage bytes for the customer, leading to cost savings and potentially better performance. A minimum configuration to support for gzip.

Jira issue: CRDB-42915

Epic CRDB-39392

Issue #132279

@massimo-ua massimo-ua requested a review from a team as a code owner January 11, 2025 15:01
@massimo-ua massimo-ua requested review from rharding6373 and removed request for a team January 11, 2025 15:01
Copy link

blathers-crl bot commented Jan 11, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

Before a member of our team reviews your PR, I have some potential action items for you:

  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jan 11, 2025

CLA assistant check
All committers have signed the CLA.

@blathers-crl blathers-crl bot added the O-community Originated from the community label Jan 11, 2025
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link

blathers-crl bot commented Jan 11, 2025

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@massimo-ua massimo-ua force-pushed the add-gzip-compression-to-changefeed-webhook-sink branch from 90f7858 to 9ab5e87 Compare January 11, 2025 23:16
Copy link

blathers-crl bot commented Jan 11, 2025

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@massimo-ua massimo-ua force-pushed the add-gzip-compression-to-changefeed-webhook-sink branch 3 times, most recently from dc150e3 to 3c59821 Compare January 13, 2025 00:19
@wenyihu6 wenyihu6 requested a review from a team January 13, 2025 15:46
@wenyihu6 wenyihu6 added A-cdc Change Data Capture T-cdc labels Jan 13, 2025
@asg0451 asg0451 self-requested a review January 13, 2025 17:55
Copy link
Contributor

@asg0451 asg0451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @massimo-ua! I have a few nits and a few questions, but this looks very good overall.

pkg/ccl/changefeedccl/sink_webhook_v2.go Outdated Show resolved Hide resolved
pkg/ccl/changefeedccl/sink_webhook_v2.go Outdated Show resolved Hide resolved
pkg/ccl/changefeedccl/sink_webhook_v2.go Show resolved Hide resolved
pkg/ccl/changefeedccl/sink_webhook_test.go Outdated Show resolved Hide resolved
Copy link

blathers-crl bot commented Jan 15, 2025

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@massimo-ua massimo-ua requested a review from asg0451 January 15, 2025 12:30
Copy link

blathers-crl bot commented Jan 15, 2025

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

pkg/ccl/changefeedccl/changefeed_test.go Show resolved Hide resolved
// since we are using decompression only for reading error response body, we can use default reader
return pgzip.NewReader(src)
case sinkCompressionZstd:
// zstd reader does not implement io.Closer interface, so we need to wrap it
Copy link
Contributor

@asg0451 asg0451 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zstd reader does implement Close(), which we should call.

It may also be worth caching the readers instead of making a new one every time (see my next comment on doing the same for writers)... but that might be unnecessary for this case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zstd reader implements func (d *[Decoder] Close() void
but io.ReadCloser requires
func (io.Closer) Close() error with error
that's why I had to wrap it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but your wrapper does not call the void Close method, where it should.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/ccl/changefeedccl/sink_webhook_v2.go Show resolved Hide resolved
pkg/ccl/changefeedccl/sink_webhook_v2.go Outdated Show resolved Hide resolved
pkg/ccl/changefeedccl/sink_webhook_v2.go Outdated Show resolved Hide resolved
pkg/ccl/changefeedccl/sink_webhook_v2.go Outdated Show resolved Hide resolved
@massimo-ua
Copy link
Author

@asg0451 Thanks for the review. I'll work through the comments and update you once I'm done.

Copy link

blathers-crl bot commented Jan 15, 2025

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

1 similar comment
Copy link

blathers-crl bot commented Jan 15, 2025

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Jan 16, 2025

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@massimo-ua massimo-ua force-pushed the add-gzip-compression-to-changefeed-webhook-sink branch from dace026 to 4866761 Compare January 17, 2025 17:23
Copy link

blathers-crl bot commented Jan 17, 2025

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

1 similar comment
Copy link

blathers-crl bot commented Jan 18, 2025

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@massimo-ua massimo-ua requested a review from asg0451 January 18, 2025 08:13
Copy link

blathers-crl bot commented Jan 18, 2025

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@massimo-ua massimo-ua force-pushed the add-gzip-compression-to-changefeed-webhook-sink branch from 622e518 to e8616aa Compare January 18, 2025 23:53
Copy link

blathers-crl bot commented Jan 19, 2025

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@massimo-ua
Copy link
Author

I see that in the webhook sink

req, err := http.NewRequestWithContext(sc.ctx, http.MethodPost, sc.url.String(), bytes.NewReader(body))
, every time makePayloadForBytes is called, it creates an HTTP request object. The request configuration is static, except for the body and ContentLength. We could easily create a request template once, then clone its static configuration and just update the dynamic parts each time. WDYT?

@asg0451
Copy link
Contributor

asg0451 commented Jan 21, 2025

I see that in the webhook sink

req, err := http.NewRequestWithContext(sc.ctx, http.MethodPost, sc.url.String(), bytes.NewReader(body))

, every time makePayloadForBytes is called, it creates an HTTP request object. The request configuration is static, except for the body and ContentLength. We could easily create a request template once, then clone its static configuration and just update the dynamic parts each time. WDYT?

That's a good idea, but let's leave it out of here just to keep the scope down.

Copy link
Contributor

@asg0451 asg0451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know when it's ready for review again (just commenting so this gets removed from my assigned reviews list for now)

Copy link

blathers-crl bot commented Jan 22, 2025

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@massimo-ua massimo-ua force-pushed the add-gzip-compression-to-changefeed-webhook-sink branch 3 times, most recently from cb4fa0f to 505533e Compare January 23, 2025 14:56
@massimo-ua
Copy link
Author

massimo-ua commented Jan 23, 2025

@asg0451 Please review recent changes
Updated encoder/decoders pool implementation

@massimo-ua massimo-ua requested a review from asg0451 January 23, 2025 15:34
pkg/ccl/changefeedccl/compression.go Outdated Show resolved Hide resolved
pkg/ccl/changefeedccl/compression.go Outdated Show resolved Hide resolved
pkg/ccl/changefeedccl/compression.go Outdated Show resolved Hide resolved
pkg/ccl/changefeedccl/compression.go Outdated Show resolved Hide resolved
@@ -887,6 +887,7 @@ func (f *cloudStorageSinkFile) flushToStorage(
if err := f.codec.Close(); err != nil {
return err
}
f.codec = nil // Set to nil to prevent double close
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reasoning behind this? did this issue come up in the tests? maybe it's worthwhile to make the wrappers more robust to double closing if this is an issue

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this came up during the tests. TestCloudStorageSinkFastGzip was panicking with "invalid memory address or nil pointer dereference" while attempting to invoke (*encWrapper).Close. This helped to fix the issue. As far as I understand, the issue is with cloudStorageSinkFile and its codec reference management. It closes the writer but doesn't reset the reference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some guidance here. Not sure we can address that on an encWrapper level

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could have a closed bool flag in the wrappers, set it on Close, and panic/err if it's used while closed. Reset could reset it. This could also prevent incorrect usage (like using it after closing it, which would be bad)

Copy link
Author

@massimo-ua massimo-ua Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about returning nil in case somebody is trying to close a closed wrapper but return error when Write or Read is called?
Returning nil on multiple attempts to Close the same reader is a default behaviour for gzip Writer and Reader

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added bool flag to indicate the underlying encoder/decoder is closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds good

pkg/ccl/changefeedccl/sink_webhook_v2.go Outdated Show resolved Hide resolved
@massimo-ua massimo-ua force-pushed the add-gzip-compression-to-changefeed-webhook-sink branch 2 times, most recently from 2648d6e to 3889c4f Compare January 23, 2025 22:43
// is recycled after closing rather than being discarded.
func (e *encWrapper) Close() error {
var err error
if fErr := e.encoder.Flush(); fErr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to call Flush here. Did we do it before? is Close not enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. There is no need to call flush since according to the docs close flushes unwritten data to reader. I'll remove it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

pkg/ccl/changefeedccl/compression.go Show resolved Hide resolved
pkg/ccl/changefeedccl/compression.go Show resolved Hide resolved
@@ -887,6 +887,7 @@ func (f *cloudStorageSinkFile) flushToStorage(
if err := f.codec.Close(); err != nil {
return err
}
f.codec = nil // Set to nil to prevent double close
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could have a closed bool flag in the wrappers, set it on Close, and panic/err if it's used while closed. Reset could reset it. This could also prevent incorrect usage (like using it after closing it, which would be bad)

@massimo-ua massimo-ua force-pushed the add-gzip-compression-to-changefeed-webhook-sink branch 3 times, most recently from 230a0d8 to ed1168d Compare January 25, 2025 15:10
@massimo-ua massimo-ua requested a review from asg0451 January 25, 2025 15:18
Release note (sql change): Added compression support for
changefeed webhook sinks. This reduces network bandwidth and
storage usage, improving performance and lowering costs. Users
can enable compression by setting the compression=<algorithm>
option. Supported algorithms are gzip and zstd.

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-39392
Resolved: https://cockroachlabs.atlassian.net/browse/CRDB-42915
Issue: cockroachdb#132279
@massimo-ua massimo-ua force-pushed the add-gzip-compression-to-changefeed-webhook-sink branch from ed1168d to 5bbc73a Compare January 25, 2025 17:41
@massimo-ua
Copy link
Author

massimo-ua commented Jan 25, 2025

I noticed the race issues during Bazel extended CI. I don't believe there is a connection to the changes I've made. The race appears when spanInner.isNoop() and Span.reset() in pkg/utils/tracing are attempting to read and write to a shared resource. Do you think I should ignore this?

@asg0451
Copy link
Contributor

asg0451 commented Jan 27, 2025

I noticed the race issues during Bazel extended CI. I don't believe there is a connection to the changes I've made. The race appears when spanInner.isNoop() and Span.reset() in pkg/utils/tracing are attempting to read and write to a shared resource. Do you think I should ignore this?

Feel free to ignore bazel extended ci.

@massimo-ua
Copy link
Author

I noticed the race issues during Bazel extended CI. I don't believe there is a connection to the changes I've made. The race appears when spanInner.isNoop() and Span.reset() in pkg/utils/tracing are attempting to read and write to a shared resource. Do you think I should ignore this?

Feel free to ignore bazel extended ci.

Then that's it. There are no more changes left in this PR. Thank you for the review. Let me know if there is anything

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture O-community Originated from the community T-cdc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants