Set up Hadoop on OSX

Configure Java

Add the following to your ~/.bash_profile:

$ export JAVA_HOME=`/usr/libexec/java_home`

Install Hadoop with Homebrew

$ brew install hadoop

Set up SSH keys

Hadoop uses ssh to communicate with the working nodes. To setup password-less access to the nodes (including the localhost), add the machines’ keys to your authorized_keys file.

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Running Hadoop Without SSH

If your admin does not allow ssh access to the localhost, change all occurrences of

"$HADOOP_PREFIX/sbin/hadoop-daemons.sh"

to

"$HADOOP_PREFIX/sbin/hadoop-daemon.sh"

in the following config files (under /usr/local/Cellar/hadoop/2.7.2/libexec/sbin/):

  1. start-dfs.sh
  2. stop-dfs.sh
  3. start-yarn.sh
  4. stop-yarn.sh

(Note the “s” at the end of hadoop-daemons(.sh))

Configurations

Hadoop can run in three modes: standalone, pseudo-distributed, and fully-distributed.

[…]

Launching the daemons

Add the script directory to your $PATH:

$ export PATH="/usr/local/Cellar/hadoop/2.7.2/sbin/:$PATH"

To launch Yarn, HDFS, or both:

start-yarn.sh
start-bash.sh
start-all.sh

and to stop:

stop-yarn.sh
stop-bash.sh
stop-all.sh

HDFS

To format the HDFS partition, use the following, prior to launch the hdfs daemon.

bin/hdfs namenode -format

Some common hdfs commands:

hdfs dfs -ls /path/to/dir
hdfs dfs -cat /path/to/file
hdfs dfs -rm /path/to/dir
hdfs dfs -rmr /path/to/dir
hdfs dfs -mkdir /path/to/dir
hdfs dfs -put /path/from path/to
hdfs dfs -get /path/from /path/to

Monitoring

ResourceManager: http://localhost:8088/ DataNode http://localhost:50070/