Hadoop – psedodistributed mode installation – second time

I was waiting for a computing machine for hadoop. Unfortunatley I couldn’t get it for the past two months due to multiple commitments. Kannan and I visited one of the local computer stores before 2 weeks. I selected a Dell based tower based desktop. But for the desired config (i7/16GB RAM/500 GB) it is going out of my budget.

I lost my hope and postponed the plan. I got a old model laptop with high config in a local expo. It doesn’t have modern features like touch screen, SSD harddisk. But I’m ok. I named it after Jeyamohan’s novel on Krishna – Neelam! (Neelam=Blue)

 

Here are the steps I followed to create the Hadoop environment. This is more precised than my earlier post.

Here is the summary of Hadoop – psedodistributed mode installation. This is my 2nd post regarding the environmental setup.

System Specs

  • OS: Ubuntu 64 bit/VMware Workstation Player
  • RAM: 8 GB
  • CPU: 4
  • Java: 1.8
  • Hadoop: 2.6

Update Ubuntu

Let’s update ubuntu first before starting the process. This may take much time based on your update frequency.

The following command will update the package definitions.

pandian@kunthi:~$ sudo apt-get update
...
...
Fetched 1,646 kB in 8s (204 kB/s)
AppStream cache update completed, but some metadata was ignored due to errors.
Reading package lists... Done

The following command will update the packages

pandian@kunthi:~$ sudo apt-get dist-upgrade
...
...
355 upgraded, 5 newly installed, 0 to remove and 0 not upgraded.
Need to get 295 MB/465 MB of archives.
After this operation, 279 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
...
...

<It is time consuming. Take a break.>

Installing JDK

With reference to http://askubuntu.com/questions/521145/how-to-install-oracle-java-on-ubuntu-14-04 follow the below given instructions to install JDK

pandian@kunthi:~$ sudo apt-add-repository ppa:webupd8team/java
pandian@kunthi:~$ sudo apt-get update
pandian@kunthi:~$ sudo apt-get install oracle-java8-installer
pandian@kunthi:~$ java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) Client VM (build 25.101-b13, mixed mode)
pandian@kunthi:~$ whereis java
java: /usr/bin/java /usr/share/java /usr/share/man/man1/java.1.gz

Create User and User Group

Let’s run Hadoop with its own user and user group.

pandian@kunthi:~$ sudo groupadd -g 599 hadoop
pandian@kunthi:~$ sudo useradd -u 599 -g 599 hadoop

Directory structure

Let’s create the directory structure

pandian@kunthi:~$ sudo mkdir -p /var/lib/hadoop/journaldata
pandian@kunthi:~$ sudo chown hadoop:hadoop -R /var/lib/hadoop/journaldata

User access and sudo privilage

We are still doing linux tasks. We haven’t touched Hadoop part yet.

pandian@kunthi:~$ sudo passwd hadoop
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
pandian@kunthi:/opt/software/hadoop$ sudo su
root@kunthi:/home/pandian# cp /etc/sudoers /etc/sudoers.20160820
root@kunthi:~# vi /etc/sudoers

I made the highlighted change.

# User privilege specification
root ALL=(ALL:ALL) ALL
hadoop ALL=(ALL:ALL) ALL

root@kunthi:~# cd /opt

root@kunthi:~# wget http://download.nus.edu.sg/mirror/apache/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz

root@kunthi:~# gunzip hadoop-2.6.4.tar.gz

root@kunthi:~# tar -xvf hadoop-2.6.4.tar.gz
root@gandhari:/opt# ln -s /opt/hadoop-2.6.4 hadoop
root@gandhari:/opt# chown hadoop:hadoop hadoop
root@gandhari:/opt# chown hadoop:hadoop -R hadoop-2.6.4
root@gandhari:/opt# usermod -d /opt/hadoop hadoop

root@kunthi:~# exit
pandian@kunthi:~$ su - hadoop
$ pwd
/opt/hadoop
$ bash
hadoop@kunthi:~$ id
uid=1001(hadoop) gid=599(hadoop) groups=599(hadoop)

Hadoop

Lets create the configuration directory for Hadoop.
hadoop@kunthi:~$ sudo mkdir -p /etc/hadoop/conf
Create a softlink for the conf folder
hadoop@kunthi:~$ sudo ln -s /opt/hadoop/hadoop-2.6.4/etc/hadoop/** /etc/hadoop/conf/

SSH Keys creation

Hadoop wants to create key based SSH login
hadoop@kunthi:~$ mkdir ~/.ssh
hadoop@kunthi:~$ cd ~/.ssh/
hadoop@kunthi:~/.ssh$ touch authorized keys
hadoop@kunthi:~/.ssh$ touch known hosts
hadoop@kunthi:~/.ssh$ chmod 700 ~/.ssh/&& chmod 600 ~/.ssh/*
hadoop@gandhari:/opt/hadoop-2.6.4$ ssh gandhari
The authenticity of host 'gandhari (192.168.0.169)' can't be established.
ECDSA key fingerprint is SHA256:Y/ed5Le/5xqY1ImoVZBsSF7irydJRUn2TNwPBow4uSA.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'gandhari,192.168.0.169' (ECDSA) to the list of known hosts.
hadoop@gandhari's password:
Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-34-generic x86_64)

Bash profile – Environmental variables

As the home folder of the unix user is created by me manually, I need to create the bashprofile. I’ll get a copy of the bash profile, which is working for another user
hadoop@kunthi:~$ sudo cp /home/pandian/.bash* .
I’ll modify the above environmental variables to .bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
export HADOOP_HOME=/opt/hadoop/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export HADOOP_USER_CLASSPATH_FIRST=true
export HADOOP_PREFIX=$HADOOP_HOME
export JAVA_HOME HADOOP_HOME HADOOP_MAPRED_HOME HADOOP_COMMON_HOME HADOOP_HDFS_HOME PATH HADOOP_LOG_DIR

Let’s apply the changes to current session
hadoop@kunthi:~$ source ~/.bashrc

Hadoop env config

Let’s specify JAVA_HOME
hadoop@kunthi:~/hadoop/etc/hadoop$ cd $HADOOP_HOME/etc/hadoop/
hadoop@kunthi:~/hadoop/etc/hadoop$ cp hadoop-env.sh hadoop-env.sh.20160821

I made the following changes to hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-8-oracle/

Setup passwordless ssh login

hadoop@kunthi:~/hadoop$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/opt/hadoop/.ssh/id_rsa):
Your identification has been saved in /opt/hadoop/.ssh/id_rsa.
Your public key has been saved in /opt/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:UXGO3tnfK9K8DayD0/jc+T/WgZetCHOuBAcssUw3gBo hadoop@kunthi
The key's randomart image is:
+---[RSA 2048]----+
| .+.o o.. |
| E .o = o + |
| o + + . . |
| . . + . o |
| S o o o o|
| oo o. =o|
| ==o+..=|
| =.+=+=oo|
| +=o+=++|
+----[SHA256]-----+
hadoop@kunthi:~/hadoop$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
hadoop@kunthi:~/hadoop$ sudo /etc/init.d/ssh restart
[ ok ] Restarting ssh (via systemctl): ssh.service.
hadoop@kunthi:~/hadoop$ ssh hadoop@gandhari
Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-34-generic x86_64)

Temp folders for Hadoop

hadoop@gandhari:/opt/hadoop-2.6.4$ sudo mkdir -p /var/lib/hadoop/cache
hadoop@gandhari:/opt/hadoop-2.6.4$ sudo mkdir -p /var/lib/hadoop/cache/hadoop/dfs/data
hadoop@gandhari:/opt/hadoop-2.6.4$ sudo chown hadoop:hadoop /var/lib/hadoop/cache
hadoop@gandhari:/opt/hadoop-2.6.4$ sudo chmod 750 /var/lib/hadoop/cache/
hadoop@gandhari:/opt/hadoop-2.6.4$ sudo mkdir -p /var/lib/hadoop/cache/hadoop/dfs/name
hadoop@gandhari:/opt/hadoop-2.6.4$ sudo chown hadoop:hadoop /var/lib/hadoop/cache/hadoop/dfs/data
hadoop@gandhari:/opt/hadoop-2.6.4$ sudo chown hadoop:hadoop /var/lib/hadoop/cache/hadoop/dfs/name/
hadoop@gandhari:/opt/hadoop-2.6.4$ sudo mkdir -p /var/lib/hadoop/cache/hadoop/dfs

$ sudo mkdir -p /var/lib/hadoop/cache/hadoop/dfs/namesecondary

$ sudo mkdir -p /var/lib/hadoop/cache/hadoop/dfs/data

hadoop@gandhari:/opt/hadoop-2.6.4$ sudo mkdir -p /var/lib/hadoop/cache/hadoop/mapred/local
hadoop@gandhari:/opt/hadoop-2.6.4$ sudo chown hadoop:hadoop /var/lib/hadoop/cache
hadoop@gandhari:/opt/hadoop-2.6.4$ sudo chown hadoop:hadoop /var/lib/hadoop/cache/hadoop/dfs/name
hadoop@gandhari:/opt/hadoop-2.6.4$ sudo chown hadoop:hadoop /var/lib/hadoop/cache/hadoop/mapred/local/
hadoop@gandhari:/opt/hadoop-2.6.4$ sudo chown hadoop:hadoop /var/lib/hadoop/cache/hadoop/dfs/namesecondary/

hadoop@gandhari:/opt/hadoop-2.6.4$ sudo chown hadoop:hadoop -R /etc/hadoop/

Define the slave name

Add slave hostname. After change, this is the slave name. It is similar to my hostname
hadoop@kunthi:~/hadoop$ cat /etc/hadoop/conf/slaves
kunthi

core-site.xml

Make the appropriate changes core-site.xml

hadoop@kunthi:~/hadoop$ cat etc/hadoop/core-site.xml
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://gandhari:9000</value>
        </property>
</configuration>

hadoop executable

Check if hadoop command is working. It is located inside $HADOOP_HOME/bin folder

hadoop@kunthi:~$ cd $HADOOP_HOME
hadoop@kunthi:~/hadoop$ hadoop version
Hadoop 2.6.4
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 5082c73637530b0b7e115f9625ed7fac69f937e6
Compiled by jenkins on 2016-02-12T09:45Z
Compiled with protoc 2.5.0
From source with checksum 8dee2286ecdbbbc930a6c87b65cbc010
This command was run using /opt/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar

hdfs-site.xml

hadoop@kunthi:~/hadoop$ cp etc/hadoop/hdfs-site.xml etc/hadoop/hdfs-site.xml.20160820

I made the folllowing changes

<configuration>
<property>
<name>dfs.name.dir</name>
<value>/var/lib/hadoop/cache/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/var/lib/hadoop/cache/hadoop/dfs/data</value>
</property>
</configuration>

Formatting and starting the namenode

hadoop@kunthi:~/hadoop$ hadoop namenode -format
.....
16/08/20 09:15:09 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = gandhari/192.168.0.169
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.6.4
....
16/08/20 09:15:10 INFO common.Storage: Storage directory /var/lib/hadoop/cache/hadoop/dfs/name has been successfully formatted.
....
16/08/20 09:15:10 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at gandhari/192.168.0.169
************************************************************/
hadoop@kunthi:~/hadoop/sbin$ start-dfs.sh
hadoop@kunthi:~/hadoop/sbin$ start-yarn.sh
hadoop@kunthi:~/hadoop/sbin$ jps
6290 DataNode
6707 NodeManager
6599 ResourceManager
6459 SecondaryNameNode
6155 NameNode
7003 Jps
hadoop@kunthi:~/hadoop/sbin$ ./mr-jobhistory-daemon.sh start historyserver

Access the job tracker, name node and data node using your browser as shown below

Job History: http://gandhari:19888/

hadoop004 - jobhistory

Name Node: http://gandhari:50070/

hadoop005 - namenode information

 

Data Node: http://gandhari:50075/

hadoop006 - datanode information

 All applications http://gandhari:8088/cluster

hadoop007 - all applications

 

 

Advertisements

5 thoughts on “Hadoop – psedodistributed mode installation – second time

  1. Pingback: Hadoop Pseudo-Distributed Mode – Setup – Ubuntu – old post. Do not use | JavaShine

  2. Pingback: Hive installation & configuration | JavaShine

  3. Pingback: Pig Installation and Configuration | JavaShine

  4. Pingback: Hadoop Eco System Installation – Contents | JavaShine

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s