Attachments in transcripts #11

rxdn · 2022-05-25T18:23:09Z

Our solution to attachments in transcripts will be to re-upload them as an attachment

Requirements:

Attachment Storage Implementation:

2 new microservices must be composed for storing and retrieving attachments
- Must be run outside of main infrastructure, on a host with a high bandwidth cap
A encrypted tar.gz file will be attached to the transcript message within the server
- Use AES-128 instead of AES-256. Still extremely secure, but less rounds (10 vs 14)
When closing a ticket and iterating messages, we must select attachments that can be saved with Discord's attachment size limit. Size of attachments is provided in the message data.
- Backtracking: For each attachment, add it to the tarball, gzip and see if the compressed size is below the size limit. However, this will waste a lot of CPU cycles.
- Next fit decreasing: Use a next fit decreasing bin packing algorithm, add attachment to tarball if (current_size + attachment_size) < size_limit. However, we may be able to more files in after compression, leading to wasted space.
- Next fit decreasing + Backtracking: Assume some compression ratio for each file type. Add to tarball if (current_size + attachment_size * compression_ratio) < size_limit. Compress at the end. If greater than size limit, backtrack and remove some files. However, this is a more complex solution and would still waste some CPU cycles.
- Note: Encryption with AES won't increase file size hugely, however, has a block size of 16, so will round up to next 16 divisor.
- This may require a temporary "Generating transcript" message, as this may take >3s

Microservice Implementation:

worker selects the attachments to store and generates an encryption key. Attachment URLs, encryption key, IDs for the archive message (must be already sent before this process is completed) and bot token if whitelabel, are sent to the microservice via HTTP. The microservice fetches the attachments and creates the encrypted tarball via the above method.
The microservice then edits the archive message using the bot token (either received over HTTP or from environment variable) to upload the encrypted tarball.

To retrieve attachments:

API generates a token for the user, which is sent to the microservice, along with the encryption key for the attachments, and the CDN URL for the attachments
Microservice stores this for 1 minute. When the user's browser requests the transcripts, the token is checked, and the encrypted tarball is fetched from the Discord CDN then decrypted and decompressed.
The microservice now has a tarball of all attachments, no encryption or compression. This can be sent to the user's browser, which can then retrieve the files and insert them into the archive using untar-js
- This will require modification of discord-chat-replica

GDPR:

For removal, if the user is the only one who sent an attachment, then we can simply delete the encryption key rather than having to update the tarball

The text was updated successfully, but these errors were encountered:

rxdn changed the title ~~[DRAFT] Attachments in transcripts~~ Attachments in transcripts May 25, 2022

Provide feedback