CS5229 - Big Data Analytics Technologies
This guide will walk you through the steps of setting up Amazon EMR. Amazon EMR is a web service that provides a managed cluster platform for processing large amounts of data. This guide assumes that you have an Learner Lab AWS account and basic familiarity with the AWS Management Console.
- An AWS account
- Basic familiarity with the AWS Management Console
- Basic knowledge of terminal commands
- Go to AWS Learner Lab.
- Click "Start Lab".
- Click "AWS".
- Choose "EMR".
- Click "Create cluster".
- Select "m4.large".
- Untick "Auto Termination".
- For EC2 key pair, choose the "vockey" key pair and download "labsuser.pem".
- Click "Create cluster".
- Go to the cluster's Summary >> Security and access >> Security groups for Master.
- Select the security group for the master node.
- In the bottom pane, choose the Inbound rules tab, and then choose Edit inbound rules.
- At the bottom of the page, choose Add rule, and then configure SSH.
- Type: Choose "SSH".
- Source: Choose "Anywhere-IPv4".
- Choose "Save rules".
- Go to Terminal.
- cd Downloads.
- chmod 400 labsuser.pem.
- ssh -i /Users/dilanka/Downloads/labsuser.pem [email protected].
- Go to Amazon S3.
- Create a Bucket.
- Update the permission to access public by adding the following policy to your bucket policy:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": "", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::bucket-test-dilanka-003/", "Condition": {} } ] }
- Create a folder.
- Upload the dataset to the folder.