Set up Hadoop on OSX
Configure Java
Add the following to your~/.bash_profile
:
$ export JAVA_HOME=`/usr/libexec/java_home`
Install Hadoop with Homebrew
$ brew install hadoop
Set up SSH keys
Hadoop uses ssh to communicate with the working nodes. To setup password-less access to the nodes (including the localhost), add the machines’ keys to your authorized_keys file.$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Running Hadoop Without SSH
If your admin does not allow ssh access to the localhost, change all occurrences of"$HADOOP_PREFIX/sbin/hadoop-daemons.sh"
to
"$HADOOP_PREFIX/sbin/hadoop-daemon.sh"
in the following config files (under /usr/local/Cellar/hadoop/2.7.2/libexec/sbin/):
- start-dfs.sh
- stop-dfs.sh
- start-yarn.sh
- stop-yarn.sh
(Note the “s” at the end of hadoop-daemons(.sh))
Configurations
Hadoop can run in three modes: standalone, pseudo-distributed, and fully-distributed.[…]
Launching the daemons
Add the script directory to your $PATH:$ export PATH="/usr/local/Cellar/hadoop/2.7.2/sbin/:$PATH"
To launch Yarn, HDFS, or both:
start-yarn.sh
start-bash.sh
start-all.sh
and to stop:
stop-yarn.sh
stop-bash.sh
stop-all.sh
HDFS
To format the HDFS partition, use the following, prior to launch the hdfs daemon.bin/hdfs namenode -format
Some common hdfs commands:
hdfs dfs -ls /path/to/dir
hdfs dfs -cat /path/to/file
hdfs dfs -rm /path/to/dir
hdfs dfs -rmr /path/to/dir
hdfs dfs -mkdir /path/to/dir
hdfs dfs -put /path/from path/to
hdfs dfs -get /path/from /path/to