Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attachments in transcripts #11

Open
rxdn opened this issue May 25, 2022 · 0 comments
Open

Attachments in transcripts #11

rxdn opened this issue May 25, 2022 · 0 comments

Comments

@rxdn
Copy link
Member

rxdn commented May 25, 2022

Our solution to attachments in transcripts will be to re-upload them as an attachment

Requirements:

  • Attachments must be encrypted
  • GDPR requests affect uploaded attachments
  • Bandwidth usage is a priority

Attachment Storage Implementation:

  • 2 new microservices must be composed for storing and retrieving attachments
    • Must be run outside of main infrastructure, on a host with a high bandwidth cap
  • A encrypted tar.gz file will be attached to the transcript message within the server
    • Use AES-128 instead of AES-256. Still extremely secure, but less rounds (10 vs 14)
  • When closing a ticket and iterating messages, we must select attachments that can be saved with Discord's attachment size limit. Size of attachments is provided in the message data.
    • Backtracking: For each attachment, add it to the tarball, gzip and see if the compressed size is below the size limit. However, this will waste a lot of CPU cycles.
    • Next fit decreasing: Use a next fit decreasing bin packing algorithm, add attachment to tarball if (current_size + attachment_size) < size_limit. However, we may be able to more files in after compression, leading to wasted space.
    • Next fit decreasing + Backtracking: Assume some compression ratio for each file type. Add to tarball if (current_size + attachment_size * compression_ratio) < size_limit. Compress at the end. If greater than size limit, backtrack and remove some files. However, this is a more complex solution and would still waste some CPU cycles.
    • Note: Encryption with AES won't increase file size hugely, however, has a block size of 16, so will round up to next 16 divisor.
    • This may require a temporary "Generating transcript" message, as this may take >3s

Microservice Implementation:

  • worker selects the attachments to store and generates an encryption key. Attachment URLs, encryption key, IDs for the archive message (must be already sent before this process is completed) and bot token if whitelabel, are sent to the microservice via HTTP. The microservice fetches the attachments and creates the encrypted tarball via the above method.
  • The microservice then edits the archive message using the bot token (either received over HTTP or from environment variable) to upload the encrypted tarball.

To retrieve attachments:

  • API generates a token for the user, which is sent to the microservice, along with the encryption key for the attachments, and the CDN URL for the attachments
  • Microservice stores this for 1 minute. When the user's browser requests the transcripts, the token is checked, and the encrypted tarball is fetched from the Discord CDN then decrypted and decompressed.
  • The microservice now has a tarball of all attachments, no encryption or compression. This can be sent to the user's browser, which can then retrieve the files and insert them into the archive using untar-js
    • This will require modification of discord-chat-replica

GDPR:

  • For removal, if the user is the only one who sent an attachment, then we can simply delete the encryption key rather than having to update the tarball
@rxdn rxdn changed the title [DRAFT] Attachments in transcripts Attachments in transcripts May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant