Skip to content
Izaak Schroeder edited this page Jan 22, 2014 · 23 revisions

STILL A DRAFT BUT SLIGHTLY BETTER NOW

My Little Image Sharer: Clouds Are Magic

Rainbow Dash

Shrink Rainbow Dash image. Currently dominates page. Ssshhhh.

Due: Wednesday, January 29th Submissions: https://courses.cs.sfu.ca/2014sp-cmpt-474-d1/+a2

Overview

TNI (Ted n Izaak) has hired you to finish their half-completed project: My Little Image Sharer (you guessed it: an image sharing service). The previous developers were terrible and fired; so they brought you on board and you've been left with partially-completed server and worker services. Users of My Little Image Sharer start by uploading an image to the server service (the only publicly accessible component). Upon receiving said image, the server (i) saves the image in S3, and (ii) sends a notification to the worker service via SQS so the worker can do further processing. When the worker receives an image, it computes several scaled thumbnails and saves each of them in S3. The worker service is asynchronous: The server simply sends the image and does not wait for the worker to complete. This means the user has no guarantee on how quickly the scaled versions will be retrievable from S3 after the original image is available on S3. In the terms we will consider Friday January 24, the latency of the scaled images is unbounded.

Part 1: Into the Clouds

The server is mostly complete, being able to receive images and store them into S3. The worker has only basic image scaling functionality implemented. None of the communication exchange between worker and server was written. To complete the service, you must:

  • modify the server to communicate the arrival of a new image by putting a message in the work queue, and
  • modify the worker to listen for that message; when the worker receives the message, it needs to retrieve the existing image that was uploaded by the server, resize it to the requested thumbnail sizes, and then save those thumbnails to S3.

Part 2: And Over the Rainbow

Do not start this part before finishing Part 1 or your life will be filled with regret!!!!

Unfortunately, simply completing the code for My Little Image Sharer won't be enough to save it from failing when hordes of emotionally-underdeveloped cloud system developers suddenly upload or view all the images of love, tolerance and distributed systems they've been waiting to share with the world. TNI needs to have a service that scales (meets its SLA even as the number of users grows quickly). There are (at least) two scaling issues present: the worker service could become under-provisioned (too many images, not enough resizing power) and the data delivery network could become under-provisioned (too many requests, not enough network bandwidth).

To ensure that there is enough resizing power available, you need to be able to spin up more workers as necessary. When to spin up new workers is left to your discretion, and you have many options to choose from in CloudWatch. This magic can be achieved by:

  • creating an AMI with the worker embedded into it,
  • and configuring CloudWatch to automatically spin up new instances of that AMI.

To ensure that there is enough network bandwidth available, you need to have your images available from a lot of different network locations. This can be achieved by using a content distribution network to host your images.

  • setup a CloudFront distribution to serve data from your S3 bucket,
  • and update the server to return CloudFront URLs instead of S3 ones.

Part 3: The Finale

Even after you've gotten through the blood, sweat and tears, TNI being the horribly managed company it is, wants a brief overview of what your app promises to deliver so they can immediately convince some poor soul to buy it and become millionaires. Some thought-provoking questions you might wish to answer are as follows:

  • How are these terms relevant or related to your app:
  • data center
  • virtual machine
  • virtualization
  • provisioning
  • overprovisioned
  • underprovisioned
  • elastic computing
  • utilization
  • throughput
  • latency
  • API
  • What does the Amazon SLA mean you can expect from your app?
  • How does or doesn't your ID generation algorithm prevent conflicts?
  • What platform-level, cluster-level, and application-level software is being used in your app?
  • How does your app scale?
  • How might it fail? (Hint: server.py)
  • How could you change it to scale better?
  • What other existing apps might use a similar platform? Why? (Hint: Video.)
  • What metric did you chose for your AutoScaler/CloudWatch alarm? Why?
  • If a worker fails while encoding an image, what happens? Can your system recover?

Important: TNI has intentionally omitted documentation in several areas both to avoid spoon feeding you everything (e.g. how do I make an AMI? how do I get a Python script to start at boot? how do I use CloudWatch?) and to encourage exploration, discussion and collaboration. Feel free to share your knowledge with others by posting your issues or resources on our friendly neighbourhood subreddit or swinging by IRC.

Goals (Tests)

  • common.py
  • You can connect to AWS (no errors on connect_to_region)
  • You have a valid bucket (no errors on get_bucket / check AWS console for bucket)
  • You have a valid queue (no errors on get_queue / check AWS console for queue)
  • server.py
  • Generate an id for new images (Print id to console from Python)
  • Put message into SQS with the new id (Check AWS console for messages)
  • Ensure your S3 bucket is public (Have others view links to stuff in your bucket)
  • View image from the generated url (Download image with curl)
  • worker.py
  • Read messages from of SQS (Print messages to console from Python)
  • Delete messages from SQS (Check AWS console to see messages are being removed)
  • Save resized images back into S3 (Check AWS console and view items in your bucket)
  • CloudWatch
  • The worker.py script must start at boot. (Reboot your instance, "ps aux | grep worker.py")
  • You must have an AMI with your worker embedded in it. (Spin up new instance manually with your AMI)
  • Ensure a second instance spins up when first one becomes overloaded (Use "stress" to create CPU spikes on the first instance and then monitor both your EC2 instances and CloudWatch metrics in the AWS console)
  • CloudFront
  • Be able to download data from your S3 images bucket over CloudFront (Test generated CloudFront URLs)
  • Modify server.py to return CloudFront URLs (Download image with curl)

Deliverables

Requirements

  • Students must:

  • Work in groups of size 0 < n < 5

  • Stack:

  • AWS

  • EC2 (Boto)

  • S3 (Boto)

  • SQS (Boto)

  • CloudWatch

  • CloudFront

  • Python

  • json

  • Boto

  • Pillow

  • curl

  • Image sizes must be:

  • Small: 100x100

  • Medium: 300x300

  • Large: 600x600

  • Original: Unspecified

  • POST / must:

  • respond 400 Bad Request on non-existant or invalid image

  • respond 405 Method Not Allowed on anything other than POST

  • respond 406 Not Acceptable for anything other than Accept: application/json

  • respond 413 Request Entity Too Large for images over 1MB

  • respond 202 Accepted for a valid image file named "image"

  • return with Content-Type "application/json"

  • return a JSON object of the format { original: "url...", small: "url...", ... }

  • GET "url..." must:

  • respond with 200 OK

  • respond with the correct Content-Type header (e.g. image/jpeg)

  • respond with an image file of the correct size (small/medium/large/original)

Recommendations

  • Focus on core requirements first
  • Use "curl" to write simple tests
  • Start today
  • Ask lots of questions
  • Share knowledge

Marking Rubric

  • 50% competency - demonstrate your understanding of core concepts
  • 20% understanding - articulate both the breadth and depth of your knowledge
  • 10% dedication - show a level of not inconsiderate effort
  • 10% innovation - solve problems with an uncommon degree of creativity
  • 10% technical - provide solutions that care about the details others overlook