Hadoop is everywhere and gaining attention like crazy. This is not an article which explains what’s it or how it works because there are a lot of good resources for that. So I don’t want to repeat the same stuff but I’m going to help you to go step further and deploy a Hadoop multi node cluster on ubuntu. Pretty interesting right? If you follow the steps given below you can get it done in 15 mins. Let’s start.

Prerequisites

All you need is

  • Java 1.7 should be installed.
  • 5 Nodes. In my case it’s 192.168.7.87, 192.168.7.88, 192.168.7.89, 192.168.7.90, 192.168.7.91

1. Configure Environment

  1. Let’s create a dedicated user for hadoop who’s hduser.
    useradd -m -d /home/hduser -s /bin/bash
  2. Configure password-less SSH
    1st you will have to decide which node is going to be the master, the secondary master and the slaves. Then make sure that the master node is able to do a password-less ssh to all the slaves and the secondary master. If you don’t know how to setup password-less ssh refer this article.
  3. Edit /etc/hosts and add the below. Also comment out IPV6.
    192.168.7.87 master
    192.168.7.88 master2
    192.168.7.89 slave1
    192.168.7.90 slave2
    192.168.7.91 slave3
  4.  Edit hostname file
    In the master node edit the hostname file as shown below.

    vim /etc/hostname

    just replace the content with master. Now follow the same steps and edit the hostname in other nodes as well. The hostname should be master2, slave1, slave2, slave3 respectively.

2. Download Hadoop

Let’s download hadoop 2.x from here. In here we are going use the version 2.7.1. Extract it to a folder using the below command. I think it would be better to use the hduser’s HOME folder.

tar -zxvf hadoop-2.7.1.tar.gz

3. Configure PATH variables

edit the .bashrc of the hduser using the below command.

vim ~/.bashrc

Add the below content to the end of the file. Please edit the PATH variables if you didn’t use the hduser HOME to extract Hadoop.

#Set JAVA_HOME
export JAVA_HOME=/home/hduser/jdk1.7.0_55
export PATH=$JAVA_HOME/bin:$JAVA_HOME/lib:$PATH;
#Set HADOOP_HOME
export HADOOP_HOME=$HOME/hadoop-2.7.1
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_COMMON_HOME/bin
export PATH=$PATH:$HADOOP_COMMON_HOME/sbin

Apply them until the next reboot using below command.

source ~/.bashrc

4. Edit hadoop-env.sh

Edit the file etc/hadoop/hadoop-env.sh in Hadoop’s home to define the parameter as follows.

export JAVA_HOME=/home/hduser/jdk1.7.0_55

5. Create Hadoop tmp

Create a tmp folder in HADOOP_HOME

mkdir -p $HADOOP_HOME/tmp

If you want to know why you can read more here.

6. Edit Hadoop config files

core-site.xml

vim $HADOOP_CONF_DIR/core-site.xml

Add below in between configuration tag

<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop-2.7.1/tmp</value>
</property>

hdfs-site.xml

vim $HADOOP_CONF_DIR/hdfs-site.xml

Add below in between configuration tag

<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>master2:50090</value>
<description>Enter your Secondary NameNode hostname</description>
</property>

mapred-site.xml

Let’s create a mapred configuration file from the template given.

cp /home/hduser/hadoop-2.7.1/etc/hadoop/mapred-site.xml.template /home/hduser/hadoop-2.7.1/etc/hadoop/mapred-site.xml

Now let’s edit it.

vim $HADOOP_CONF_DIR/mapred-site.xml

Add below in between configuration tag

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Execution framework.</description>
</property>

yarn-site.xml

Okay we are almost there. Hang on! Let’s configure yarn now.

$HADOOP_CONF_DIR/yarn-site.xml

Add below in between configuration tag

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8040</value>
</property>

slaves

One last configuration.

vim $HADOOP_CONF_DIR/slaves

Add below content.

slave1
slave2
slave3

7. Repeat

Okay now whatever things we did up to here(Step 1 to 6) should be done in the secondary master as well as all the slaves. It’s time to repeat the steps. Boring right? You can use rsync to copy the files located in in $HADOOP_HOME/etc/hadoop to all nodes. If you don’t know much about rsync it’s time to start reading this.

8. Format the namenode

Go back to the master node and execute the below command to format.

hdfs namenode -format

9. Start Hadoop

Time to start the cluster(HDFS/YARN) and I wish you all the best. In the master node these two files reside inside the sbin folder of Hadoop. So go to Hadoop home and to sbin.

./start-dfs.sh
./start-yarn.sh

Else you can right away use below which is being depreciated.

./start-all.sh

Hope you are good.

10. Testing

In the master node execute the below command.

hdfs dfsadmin -report

You should get an output somewhat similar to this. It might differ according to your configurations.

Hadoop hdfs dfsadmin report

Let’s do another test. Let’s check the list of nodes now.

yarn node -list

Hadoop yarn nodes list

Last but not least let’s look at the Hadoop’s web UI. Fire up a browser and type the below URL.

http://192.168.7.87:50070/

Replace 192.168.6.87 with your namenode IP. You should get something like below and it is the web UI of the NameNode.

web UI of the NameNode

If you reach here you are in good shape. That’s about it. If you have any questions let me know in the comments below. Your feedback is highly appreciated(happy-face).

 

Loading

1 Comment

  1. Chrisao January 1, 2017 at 10:59 pm

    Hi Dasun,

    Just a query wrt JAVA_HOME … this is usually installed on /usr/lib/jvm/java-7-openjdk-i386..I don’t get setting JAVA_HOME to /home/hduser/jdk1.7.0_55

    Would you be kind enough to explain this? Thanks.

    Reply

Leave A Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.