Skip to content
tedkirkpatrick edited this page Jan 22, 2014 · 23 revisions

STILL A DRAFT BUT SLIGHTLY BETTER NOW

My Little Image Sharer: Clouds Are Magic

abc

Shrink Rainbow Dash image. Currently dominates page.

Overview

TNI (acronym never defined) has hired you to finish their half-completed project: My Little Image Sharer (you guessed it: an image sharing service). The previous developers were terrible and fired; you've been left with partially-complete server and worker subservices. Users send images to the server, the user-facing service. Upon receiving an image, the server (i) saves the image in S3, and (ii) sends the image to a worker subservice for further processing. When the worker receives an image, it computes several scaled versions and saves each of them in S3. The worker subservice is asynchronous: The server simply sends the image and does not wait for the worker to complete. This means the user has no guarantee on how quickly the scaled versions will be retrievable from S3 after the original image is available on S3. In the terms we will consider Friday, the latency of the scaled images is unbounded.

The server has been mostly completed, being able to receive images and store them into S3. The worker has only the basic image processing functionality done. None of the communications between worker and server was written. To complete the service, you must modify the server to communicate the arrival of a new image by putting a message in the work queue. You must modify the worker to listen for that message; when the worker receives the message, it needs to retrieve the existing image that was uploaded by the server, scale it to the appropriate thumbnail sizes, and then save the thumbnails to S3.

DO NOT START THIS PART BEFORE FINISHING THE FIRST PART OR YOUR LIFE WILL BE FILLED WITH REGRET!!!! Unfortunately, simply completing the code for My Little Image Sharer won't be enough to save it from failing when hordes of emotionally-underdeveloped cloud system developers suddenly upload or view all the images of love, tolerance and distributed systems they've been waiting to share with the world. TNI needs to have an app that scales and there are (at least) two scaling issues present: the worker system could become under-provisioned (too many images, not enough resizing power) and the data delivery network could become under-provisioned (too many requests, not enough network bandwidth). Ensure that your system can avoid these under-provisioning scenarios by automatically spinning up more workers when they get overwhelmed and by having your images distributed over a CDN for viewing.

Even after you've gotten through the blood, sweat and tears, TNI being the horribly managed company it is, wants a brief overview of what your app promises to deliver and what it doesn't so the next people they hire can continue developing My Little Image Sharer.

Important: TNI has intentionally omitted documentation in several areas both to avoid spoon feeding you everything (e.g. how do I make an AMI? how do I get a Python script to start at boot? how do I use CloudWatch?) and to encourage exploration, discussion and collaboration. Feel free to share your knowledge with others by posting your issues or resources on our friendly neighbourhood subreddit or swinging by IRC.

Goals (Tests)

  • common.py
  • You can connect to AWS (no errors on connect_to_region)
  • You have a valid bucket (no errors on get_bucket / check AWS console for bucket)
  • You have a valid queue (no errors on get_queue / check AWS console for queue)
  • server.py
  • Generate an id for new images (Print id to console from Python)
  • Put message into SQS with the new id (Check AWS console for messages)
  • Ensure your S3 bucket is public (Have others view links to stuff in your bucket)
  • View image from the generated url (Download image with curl)
  • worker.py
  • Read messages from of SQS (Print messages to console from Python)
  • Delete messages from SQS (Check AWS console to see messages are being removed)
  • Save resized images back into S3 (Check AWS console and view items in your bucket)
  • CloudWatch
  • The worker.py script must start at boot. (Reboot your instance, "ps aux | grep worker.py")
  • You must have an AMI with your worker embedded in it. (Spin up new instance manually with your AMI)
  • Ensure a second instance spins up when first one becomes overloaded (Use "stress" to create CPU spikes on the first instance and then monitor both your EC2 instances and CloudWatch metrics in the AWS console)
  • CloudFront
  • Be able to download data from your S3 images bucket over CloudFront (Test generated CloudFront URLs)
  • Modify server.py to return CloudFront URLs (Download image with curl)

Deliverables

Requirements

  • Students must:

  • Work in groups of size 0 < n < 5

  • Stack:

  • Amazon EC2 (Boto)

  • Amazon S3 (Boto)

  • Amazon SQS (Boto)

  • Amazon CloudWatch

  • Amazon CloudFront

  • Python

  • Boto

  • Pillow

  • curl

  • Image sizes must be:

  • Small: 100x100

  • Medium: 300x300

  • Large: 600x600

  • Original: Unspecified

  • POST / must:

  • respond 400 Bad Request on non-existant or invalid image

  • respond 405 Method Not Allowed on anything other than POST

  • respond 406 Not Acceptable for anything other than Accept: application/json

  • respond 413 Request Entity Too Large for images over 1MB

  • respond 202 Accepted for a valid image file named "image"

  • return with Content-Type "application/json"

  • return a JSON object of the format { original: "url...", small: "url...", ... }

  • GET "url..." must:

  • respond with 200 OK

  • respond with the correct Content-Type header (e.g. image/jpeg)

  • respond with an image file of the correct size (small/medium/large/original)

Questions

  • How are these terms relevant or related to your app:
  • data center
  • virtual machine
  • virtualization
  • provisioning
  • overprovisioned
  • underprovisioned
  • elastic computing
  • utilization
  • throughput
  • latency
  • API
  • What does the Amazon SLA mean you can expect from your app?
  • How does or doesn't your ID generation algorithm prevent conflicts?
  • What platform-level, cluster-level, and application-level software is being used in your app?
  • How does your app scale?
  • How might it fail? (Hint: server.py)
  • How could you change it to scale better?
  • What other existing apps might use a similar platform? Why? (Hint: Video.)
  • What metric did you chose for your AutoScaler/CloudWatch alarm? Why?
  • If a worker fails while encoding an image, what happens? Can your system recover?

Recommendations

  • Focus on core requirements first
  • Use "curl" to write simple tests
  • Start today
  • Ask lots of questions
  • Share knowledge

Marking Rubric

  • 50% competency - demonstrate your understanding of core concepts
  • server.py
  • worker.py
  • 20% understanding - articulate both the breadth and depth of your knowledge
  • 10% dedication - show a level of not inconsiderate effort
  • 10% innovation - solve problems with an uncommon degree of creativity
  • 10% technical - provide solutions that care about the details others overlook
Clone this wiki locally