Skip to content

Create in seconds an AWS pipeline to read email attachments (pdfs, images), read/classify its content as text. Scales horizontally automatically.

Notifications You must be signed in to change notification settings

t-systems/aws-email-app-content-recognition

 
 

Repository files navigation

Email App Content Recognition

This AWS Serverless Application was originally created to automatically read N number of emails (eml format) from an s3 bucket, download all attachments into another s3 bucket. These binary attachments (PDFs or images) need to be stored for own access. The Attachments then are processed and its content extracted through ML models https://aws.amazon.com/textract/faqs/. Due to the nature of this Serverless Architecture, this application has low costs and scales automatically.

See Architecture Design here

Dependencies

  • cdk
  • python
  • aws-cli

Setup

virtualenv .env && source .env/bin/activate && \
    python -m pip install -r requirements.aws.txt && \
    python -m pip install -r requirements.app.development.txt && \
    python -m pip install -r requirements.app.txt

Deploy

exports AWS_ACCOUNT_ID=<UPDATE> 
exports AWS_DEFAULT_REGION=<UPDATE> 
exports EMAILS_S3_BUCKET=<UPDATE> 
exports CONTENT_S3_BUCKET=<UPDATE>

npm run deploy

About

Create in seconds an AWS pipeline to read email attachments (pdfs, images), read/classify its content as text. Scales horizontally automatically.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%