Skip to content

alexrovner/homebrew-cdh

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 

Repository files navigation

homebrew-cdh

Homebrew formulae for CDH

Formulae

  • cdh-hadoop
  • cdh-mr1

Usage

brew tap hammer/cdh

I think you'll also need Command Line Tools for Xcode.

CDH 4: HDFS 2 with MapReduce 1

Install CDH Hadoop

(Don't worry about the error linking cdh-mr1)

brew install cdh-hadoop
brew install cdh-mr1

No need to edit configuration files

The cdh-hadoop formula uses inreplace to make the following changes, so you don't need to do them manually. These changes suppress some annoying warning messages and configure your cluster to run in pseudo-distributed mode.

  • etc/hadoop/hadoop-env.sh: Append java.security.krb5.realm and java.security.krb5.kdc to HADOOP_OPTS
  • etc/hadoop/core-site.xml: Set hadoop.tmp.dir and fs.default.name
  • etc/hadoop/hdfs-site.xml: Set dfs.replication
  • etc/hadoop/log4j.properties: Set log4j.logger.org.apache.hadoop.util.NativeCodeLoader log level to "ERROR"

The cdh-mr1 formula uses inreplace to make the following changes, so you don't need to do them manually. These changes suppress some annoying warning messages and configure your cluster to run in pseudo-distributed mode.

  • etc/hadoop/hadoop-env.sh: Append java.security.krb5.realm and java.security.krb5.kdc to HADOOP_OPTS
  • etc/hadoop/core-site.xml: Set hadoop.tmp.dir and fs.default.name
  • etc/hadoop/mapred-site.xml: Set mapred.job.tracker
  • etc/hadoop/log4j.properties: Set log4j.logger.org.apache.hadoop.util.NativeCodeLoader log level to "ERROR"

Enable SSH to localhost

systemsetup -f -setremotelogin on
ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
for host_id in localhost 0.0.0.0; do
  ssh-keyscan $host_id >> ~/.ssh/known_hosts
done

Format and start HDFS, ensure all processes are running

`brew --cellar`/cdh-hadoop/4.2.1/bin/hdfs namenode -format
`brew --cellar`/cdh-hadoop/4.2.1/libexec/sbin/start-dfs.sh
jps

Set HADOOP_HOME, Start MapReduce, ensure all processes are running

export HADOOP_MAPRED_HOME=`brew --cellar`/cdh-mr1/4.2.1/libexec
`brew --cellar`/cdh-mr1/4.2.1/bin/start-mapred.sh
jps

Add some data to HDFS

`brew --cellar`/cdh-hadoop/4.2.1/bin/hadoop fs -mkdir input
`brew --cellar`/cdh-hadoop/4.2.1/bin/hadoop fs -put `brew --cellar`/cdh-mr1/4.2.1/libexec/conf/*.xml input

Run a MapReduce job

`brew --cellar`/cdh-mr1/4.2.1/bin/hadoop jar `brew --cellar`/cdh-mr1/4.2.1/libexec/hadoop-examples-2.0.0-mr1-cdh4.2.1.jar grep input output 'dfs[a-z.]+'

Read results of MapReduce job

`brew --cellar`/cdh-hadoop/4.2.1/bin/hadoop fs -cat output/part-00000 | head

When you're done:

Stop MapReduce and HDFS, remove temporary data

`brew --cellar`/cdh-mr1/4.2.1/bin/stop-mapred.sh
`brew --cellar`/cdh-hadoop/4.2.1/libexec/sbin/stop-dfs.sh
rm -rf ~/hadoop-store
unset HADOOP_HOME

About

Homebrew formulae for CDH

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published