Hive installation & configuration

After Hadoop – psedodistributed mode installation – second time, this is our next ICT job as part of my course. Let’s install and test Hive. This would be a continuation of the Hadoop installation. Hence I’d be following the folder structures, usernames as given in the previous post.

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. While developed by Facebook, Apache Hive is now used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive that is included in Amazon Elastic MapReduce on Amazon Web Services.

2000px-wikipedia-logo-v2-en-svg

Download and Install

hadoop@gandhari:/opt/hadoop-2.6.4$ wget http://download.nus.edu.sg/mirror/apache/hive/hive-2.1.0/apache-hive-2.1.0-bin.tar.gz
hadoop@gandhari:/opt/hadoop-2.6.4$ gunzip apache-hive-2.1.0-bin.tar.gz
hadoop@gandhari:/opt/hadoop-2.6.4$ -xvf apache-hive-2.1.0-bin.tar
hadoop@gandhari:/opt/hadoop-2.6.4$ ln -s apache-hive-2.1.0-bin/ hive

Setup Environment – .bashrc changes

Make the following .bashrc file
#HIVE VARIABLES
export HIVE_HOME=/opt/hadoop/apache-hive-2.1.0-bin
export HIVE_CONF_DIR=$HIVE_HOME/conf
export PATH=$PATH:$HIVE_HOME/bin

Setup Environment – Creating directory structure

hadoop@gandhari:~$ hadoop fs -mkdir /tmp
mkdir: Call From gandhari/192.168.0.169 to gandhari:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

DFS and Yarn should be running to setup on hive.

hadoop@gandhari:~$ start-dfs.sh
hadoop@gandhari:~$ start-yarn.sh
hadoop@gandhari:~$ hadoop fs -mkdir /tmp
mkdir: `/tmp’: File exists
hadoop@gandhari:~$ hadoop fs -mkdir /user
hadoop@gandhari:~$ hadoop fs -mkdir /user/hive
hadoop@gandhari:~$ hadoop fs -mkdir /user/hive/warehouse
hadoop@gandhari:~$ hadoop fs -chmod g+w /tmp
hadoop@gandhari:~$ hadoop fs -chmod g+w /user/hive/warehouse

Install MySQL Server

hadoop@gandhari:~$ sudo apt-get install mysql-server
hadoop@gandhari:~$ sudo /etc/init.d/mysql start
[ ok ] Starting mysql (via systemctl): mysql.service.
hadoop@gandhari:~$ sudo apt-get install mysql-client
hadoop@gandhari:~$ sudo apt-get install libmysql-java
hadoop@gandhari:~$ cp /usr/share/java/mysql.jar $HIVE_HOME
hadoop@gandhari:~$ cp /usr/share/java/mysql-connector-java-5.1.38.jar /opt/hadoop/hive/lib/
hadoop@gandhari:~$ /usr/bin/mysql_secure_installation

Creating  Hive database

hadoop@gandhari:~/apache-hive-2.1.0-bin$ mysql -u root -p
Enter password:

mysql> CREATE DATABASE metastore;
Query OK, 1 row affected (0.00 sec)
mysql> USE metastore;
Database changed

mysql> SOURCE /opt/hadoop-2.6.4/hive/scripts/metastore/upgrade/mysql/hive-schema-0.12.0.mysql.sql
mysql> CREATE USER hive@gandhari IDENTIFIED BY ‘P@ssw0rd’;
mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM hive@gandhari;
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON metastore.* TO hive@gandhari;
mysql> FLUSH PRIVILEGES;
mysql> GRANT ALL ON metastore.* TO ‘hive’@’%’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT SELECT,INSERT,UPDATE,SELECT ON *.* TO ‘hive’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL ON *.* TO ‘hive’@’127.0.0.1’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL ON *.* TO ‘hive’@’localhost’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL ON *.* TO ‘hive’@’%’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL ON metastore.* TO ‘hive’@’%’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL PRIVILEGES ON *.* TO ‘hive’@’gandhari’ IDENTIFIED BY ‘P@ssw0rd’;
mysql>  GRANT ALL PRIVILEGES ON *.* TO ‘hive’@’192.168.0.169’ IDENTIFIED BY ‘P@ssw0rd’;
mysql>  GRANT ALL privileges ON metastore.* TO ‘hive’@’127.0.0.1’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL privileges ON *.* TO ‘hive’@’127.0.0.1’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL privileges ON metastore.* TO ‘hive’@’127.0.0.1’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL privileges ON *.* TO ‘%’@’%’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL privileges ON *.* TO ‘*’@’*’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> FLUSH PRIVILEGES;

Grant all permissions to Hive user

mysql> GRANT ALL privileges ON metastore.* TO ‘hive’@’127.0.0.1’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL privileges ON metastore.* TO ‘hive’@’gandhari’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL privileges ON metastore.* TO ‘hive’@’%’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> FLUSH PRIVILEGES;
mysql> exit;

Creating Hive config file

hadoop@gandhari:~/hive/conf$ cp hive-default.xml.template hive-site.xml

hadoop@gandhari:~/hive/conf$ vi hive-site.xml

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore</value>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>P@ssw0rd</value>
</property>

<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>

<property>
<name>hive.stats.autogather</name>
<value>false</value>
</property>

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>

<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp</value>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/tmp</value>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/tmp/operation_logs</value>
</property>

We’ll be launching hive shortly. Let’s make sure the demons are running

hadoop@gandhari:~$ jps
7410 ResourceManager
6931 NameNode
7254 SecondaryNameNode
7046 DataNode
7527 NodeManager
7817 Jps

 Creating demo table and test

hive> CREATE TABLE demo1 (id int, name string);
OK
Time taken: 1.448 seconds
hive> SHOW TABLES;
OK
demo1
Time taken: 0.195 seconds, Fetched: 1 row(s)
hive> select count(*) from demo1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20160823145925_c4271279-c5c0-4948-a1c3-fb6f79718b5d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-08-23 14:59:29,802 Stage-1 map = 0%,  reduce = 100%
Ended Job = job_local1827679072_0001
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
0
Time taken: 4.525 seconds, Fetched: 1 row(s)
hive>

I faced the following issues –

Advertisements

2 thoughts on “Hive installation & configuration

  1. Pingback: Pig Installation and Configuration | JavaShine

  2. Pingback: Hadoop Eco System Installation – Contents | JavaShine

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s