Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make output of encrypt action compression friendly #309

Closed
kristofmartens opened this issue Oct 29, 2020 · 2 comments
Closed

Make output of encrypt action compression friendly #309

kristofmartens opened this issue Oct 29, 2020 · 2 comments

Comments

@kristofmartens
Copy link

kristofmartens commented Oct 29, 2020

We would like to use the AWS encryption SDK to encrypt individual entries in a database. When we use the encryption SDK to encrypt typical values (10/20 characters) the AWS encryption SDK will inflate this size typically x20 during the encryption step. This has a huge impact potentially in storage costs and we would like to keep this under control. Most of this size increase is related to metadata,

In our specific use case we would like to store this data in (snappy) compressed parquet files, but with compression enabled we still typically see an x8 increase in compressed values which is still too much from our point of view. When data is encrypted using the cache we would assume that the metadata to be mostly be same and therefor we would expect that this would compress efficiently, but we don't see this in our test.

When we perform experiments where we don't use the AWS encryption SDK and manage the metadata ourselves we theoretically only need the following items as metadata to be able to decrypt our data:

  • Encrypted value
  • KMS/CMK metadata
  • Encrypted data key

As the metadata is mostly the same for each row in a specific column (depending on how often you reuse your data key) this compresses quite nicely if we structure the metadata in a predictable way. Doing it like that only slightly increases the size of that column by x1.5 or x2 in compressed parquet.

As we have huge storage requirements, limiting the size is an important (cost) aspect. We hate to have to abandon the AWS Encryption SDK because it badly compresses due to metadata overhead compression reasons.

Would it be possible to also output encrypted results that are more compression friendly? Or do you have proposals on how to achieve this with the current implementation? We mostly use the python encryption SDK, but you can imagine that this needs to be supported by all other supported languages as well...

@robin-aws
Copy link
Contributor

robin-aws commented Nov 30, 2020

(also copy-and-pasting this response on the other copy of this issue: aws/aws-encryption-sdk-java#230)

Hi there!

I'll first just point out that whenever you have the choice, it's always better to compress data before encrypting it, rather than vice versa. I can appreciate that given your values are so small this doesn't really help here, but I mention it because in most cases the extra metadata the AWS Encryption SDK adds is not significant compared to the size of the payload. By design, encrypted data doesn't have the structure that compression takes advantage of, so making the message format "compression friendly" isn't going to help very many people.

The deeper point is that the AWS Encryption SDK (ESDK) is really designed for safe encryption of independent, unstructured data, whereas your use case is really classic database encryption. In general, the ESDK message format is designed to guard against accidental misuse that can lead to security issues, but certainly at the cost of increasing the payload size. The message format is documented in detail here: https://docs.aws.amazon.com/encryption-sdk/latest/developer-guide/message-format.html.

You might have better success compressing and then encrypting entire parquet files instead of individual values, but that may not be possible depending on your architecture and whether it meets your security needs to use a single wrapping key for all entries in a single file.

We're definitely aware that there is an opportunity to support your use case better, and we are considering it carefully as we plan future products. I've cut an issue on the specification repo to track this generic question on the ESDK, outside of any particular implementation: awslabs/aws-encryption-sdk-specification#230

@acioc
Copy link

acioc commented Dec 1, 2020

Closing this issue in favor of awslabs/aws-encryption-sdk-specification#230.

Please re-open this issue or cut a new issue if you have any other questions.

@acioc acioc closed this as completed Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants