Isabil-Karimganj

Adding local libraries to Maven

How to add user defined .jar files or inhouse developed libraries to Maven Repository, so that my maven project will get executed correctly. Here is the steps I did to add two jar files c:\eg_agent.jar and c:\eg_util.jar to maven repository.

F:\sts-bundle\apache-maven-3.3.9-bin\apache-maven-3.3.9\bin>mvn install:install-file -Dfile=c:\eg_agent.jar -DgroupId=com.eg -DartifactId=agent -Dversion=6.1.2 -Dpackaging=jar

Once it is added to repository, I added the dependency as given below.

<dependency>
<groupId>com.eg</groupId>
<artifactId>agent</artifactId>
<version>6.1.2</version>
</dependency>

Here is another one.

F:\sts-bundle\apache-maven-3.3.9-bin\apache-maven-3.3.9\bin>mvn install:install-file -Dfile=c:\eg_util.jar -DgroupId=com.eg -DartifactId=util -Dversion=6.1.2 -Dpackaging=jar

<dependency>
<groupId>com.eg</groupId>
<artifactId>util</artifactId>
<version>6.1.2</version>
</dependency>

Ref http://stackoverflow.com/questions/29330577/maven-3-3-1-eclipse-dmaven-multimoduleprojectdirectory-system-propery-is-not-s

CaCe12pUMAANLL-.jpg large

Oozie mkdistro fails with mvn: command not found

Oozie installation is not stright forward similar to other applications. When I executed the mkdistro script, it failed with the below given error

./mkdistro.sh
./mkdistro.sh: line 71: mvn: command not found

Maven is a build tool for java, which is not instaled in my ubuntu VM. Hence we need to install it using the below given command to make the script working

hadoop@gandhari:~/oozie/bin$ sudo apt-get install maven

 

sqoop

Sqoop Installation and Configuration

2000px-wikipedia-logo-v2-en-svg1

Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. It supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be used to populate tables in Hive or HBase. Exports can be used to put data from Hadoop into a relational database. Sqoop got the name from sql+hadoop. Sqoop became a top-level Apache project in March 2012.

Download and Extract

hadoop@gandhari:~$ wget http://download.nus.edu.sg/mirror/apache/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

hadoop@gandhari:~$ gunzip sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

hadoop@gandhari:~$ tar -xvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar

hadoop@gandhari:~$ ln -s sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop

.bashrc and other environmental changes

#SQOOP VARIABLES
export SQOOP_HOME=/opt/hadoop/sqoop
export PATH=$PATH:$SQOOP_HOME/bin

hadoop@gandhari:~$ source ~/.bashrc

Sqoop config

hadoop@gandhari:~$ cd sqoop
hadoop@gandhari:~/sqoop$ cd conf/
hadoop@gandhari:~/sqoop/conf$ ls
oraoop-site-template.xml  sqoop-env-template.sh    sqoop-site.xml
sqoop-env-template.cmd    sqoop-site-template.xml

hadoop@gandhari:~/sqoop/conf$ cp sqoop-env-template.sh sqoop-env.sh

export HADOOP_COMMON_HOME=/opt/hadoop
export HADOOP_MAPRED_HOME=/opt/hadoop

hadoop@gandhari:~/sqoop/conf$ cp /usr/share/java/mysql-connector-java-5.1.38.jar /opt/hadoop/sqoop/lib/

Execution

hadoop@gandhari:~/sqoop/conf$ cd ..
hadoop@gandhari:~/sqoop$ sqoop-version
Warning: /opt/hadoop/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /opt/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/08/24 15:24:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Sqoop 1.4.6
git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
Compiled by root on Mon Apr 27 14:38:36 CST 2015

 

 

 

download-and-install-pig-on-ubuntu-training-tutorial-dataflair

Pig Installation and Configuration

2000px-wikipedia-logo-v2-en-svg

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMSs. Pig Latin can be extended using User Defined Functions (UDFs) which the user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the language.

This is the continuation of my Hadoop series. This will follow the folder structures created earlier. Pls refer to the following posts.

Download and Extract

hadoop@gandhari:~$ wget http://download.nus.edu.sg/mirror/apache/pig/pig-0.16.0/pig-0.16.0.tar.gz
hadoop@gandhari:~$ gunzip pig-0.16.0.tar.gz
hadoop@gandhari:~$ tar -xvf pig-0.16.0.tar
hadoop@gandhari:~$ ln -s pig-0.16.0 pig

.bashrc Changes

Add the following environmental variables to your ~/.bashrc

#PIG VARIABLES
export PIG_CONF_DIR=/etc/hadoop/conf
export PIG_CLASSPATH=/etc/hadoop/conf
export PIG_HOME=/opt/hadoop/pig
export PATH=$PATH:/opt/hadoop/pig/bin

Sample Pig Job

Let’s copy a file using hadoop and read it with Pig

hadoop@gandhari:~$ ls -alt>test.txt
hadoop@gandhari:~$ hadoop fs -mkdir /pigdata
hadoop@gandhari:~$ hadoop fs -put test.txt /pigdata

Let’s enure the jobs are running.

hadoop@gandhari:~$ jps
7156 ResourceManager
6788 DataNode
6998 SecondaryNameNode
7846 Jps
7276 NodeManager
6671 NameNode

hadoop@gandhari:~$ pig

grunt> a = LOAD ‘/data/test.txt’ as (name:chararray);

grunt> dump a;

2016-08-24 14:01:27,845 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil – Total input paths to process : 1
(total 452532)
(drwxr-xr-x 15 hadoop hadoop      4096 Aug 24 06:00 .)
(-rw-rw-r–  1 hadoop hadoop         0 Aug 24 06:00 text.txt)
(drwxrwxr-x  3 hadoop hadoop      4096 Aug 24 05:58 logs)
(-rw-rw-r–  1 hadoop hadoop       139 Aug 23 16:09 .hivehistory)
(-rw-r–r–  1 hadoop hadoop      4567 Aug 23 15:59 .bashrc)

 

 

 

n965cpk

Hive installation & configuration

After Hadoop – psedodistributed mode installation – second time, this is our next ICT job as part of my course. Let’s install and test Hive. This would be a continuation of the Hadoop installation. Hence I’d be following the folder structures, usernames as given in the previous post.

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. While developed by Facebook, Apache Hive is now used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive that is included in Amazon Elastic MapReduce on Amazon Web Services.

2000px-wikipedia-logo-v2-en-svg

Download and Install

hadoop@gandhari:/opt/hadoop-2.6.4$ wget http://download.nus.edu.sg/mirror/apache/hive/hive-2.1.0/apache-hive-2.1.0-bin.tar.gz
hadoop@gandhari:/opt/hadoop-2.6.4$ gunzip apache-hive-2.1.0-bin.tar.gz
hadoop@gandhari:/opt/hadoop-2.6.4$ -xvf apache-hive-2.1.0-bin.tar
hadoop@gandhari:/opt/hadoop-2.6.4$ ln -s apache-hive-2.1.0-bin/ hive

Setup Environment – .bashrc changes

Make the following .bashrc file
#HIVE VARIABLES
export HIVE_HOME=/opt/hadoop/apache-hive-2.1.0-bin
export HIVE_CONF_DIR=$HIVE_HOME/conf
export PATH=$PATH:$HIVE_HOME/bin

Setup Environment – Creating directory structure

hadoop@gandhari:~$ hadoop fs -mkdir /tmp
mkdir: Call From gandhari/192.168.0.169 to gandhari:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

DFS and Yarn should be running to setup on hive.

hadoop@gandhari:~$ start-dfs.sh
hadoop@gandhari:~$ start-yarn.sh
hadoop@gandhari:~$ hadoop fs -mkdir /tmp
mkdir: `/tmp’: File exists
hadoop@gandhari:~$ hadoop fs -mkdir /user
hadoop@gandhari:~$ hadoop fs -mkdir /user/hive
hadoop@gandhari:~$ hadoop fs -mkdir /user/hive/warehouse
hadoop@gandhari:~$ hadoop fs -chmod g+w /tmp
hadoop@gandhari:~$ hadoop fs -chmod g+w /user/hive/warehouse

Install MySQL Server

hadoop@gandhari:~$ sudo apt-get install mysql-server
hadoop@gandhari:~$ sudo /etc/init.d/mysql start
[ ok ] Starting mysql (via systemctl): mysql.service.
hadoop@gandhari:~$ sudo apt-get install mysql-client
hadoop@gandhari:~$ sudo apt-get install libmysql-java
hadoop@gandhari:~$ cp /usr/share/java/mysql.jar $HIVE_HOME
hadoop@gandhari:~$ cp /usr/share/java/mysql-connector-java-5.1.38.jar /opt/hadoop/hive/lib/
hadoop@gandhari:~$ /usr/bin/mysql_secure_installation

Creating  Hive database

hadoop@gandhari:~/apache-hive-2.1.0-bin$ mysql -u root -p
Enter password:

mysql> CREATE DATABASE metastore;
Query OK, 1 row affected (0.00 sec)
mysql> USE metastore;
Database changed

mysql> SOURCE /opt/hadoop-2.6.4/hive/scripts/metastore/upgrade/mysql/hive-schema-0.12.0.mysql.sql
mysql> CREATE USER hive@gandhari IDENTIFIED BY ‘P@ssw0rd’;
mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM hive@gandhari;
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON metastore.* TO hive@gandhari;
mysql> FLUSH PRIVILEGES;
mysql> GRANT ALL ON metastore.* TO ‘hive’@’%’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT SELECT,INSERT,UPDATE,SELECT ON *.* TO ‘hive’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL ON *.* TO ‘hive’@’127.0.0.1’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL ON *.* TO ‘hive’@’localhost’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL ON *.* TO ‘hive’@’%’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL ON metastore.* TO ‘hive’@’%’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL PRIVILEGES ON *.* TO ‘hive’@’gandhari’ IDENTIFIED BY ‘P@ssw0rd’;
mysql>  GRANT ALL PRIVILEGES ON *.* TO ‘hive’@’192.168.0.169’ IDENTIFIED BY ‘P@ssw0rd’;
mysql>  GRANT ALL privileges ON metastore.* TO ‘hive’@’127.0.0.1’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL privileges ON *.* TO ‘hive’@’127.0.0.1’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL privileges ON metastore.* TO ‘hive’@’127.0.0.1’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL privileges ON *.* TO ‘%’@’%’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL privileges ON *.* TO ‘*’@’*’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> FLUSH PRIVILEGES;

Grant all permissions to Hive user

mysql> GRANT ALL privileges ON metastore.* TO ‘hive’@’127.0.0.1’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL privileges ON metastore.* TO ‘hive’@’gandhari’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> GRANT ALL privileges ON metastore.* TO ‘hive’@’%’ IDENTIFIED BY ‘P@ssw0rd’;
mysql> FLUSH PRIVILEGES;
mysql> exit;

Creating Hive config file

hadoop@gandhari:~/hive/conf$ cp hive-default.xml.template hive-site.xml

hadoop@gandhari:~/hive/conf$ vi hive-site.xml

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore</value>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>P@ssw0rd</value>
</property>

<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>

<property>
<name>hive.stats.autogather</name>
<value>false</value>
</property>

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>

<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp</value>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/tmp</value>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/tmp/operation_logs</value>
</property>

We’ll be launching hive shortly. Let’s make sure the demons are running

hadoop@gandhari:~$ jps
7410 ResourceManager
6931 NameNode
7254 SecondaryNameNode
7046 DataNode
7527 NodeManager
7817 Jps

 Creating demo table and test

hive> CREATE TABLE demo1 (id int, name string);
OK
Time taken: 1.448 seconds
hive> SHOW TABLES;
OK
demo1
Time taken: 0.195 seconds, Fetched: 1 row(s)
hive> select count(*) from demo1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20160823145925_c4271279-c5c0-4948-a1c3-fb6f79718b5d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2016-08-23 14:59:29,802 Stage-1 map = 0%,  reduce = 100%
Ended Job = job_local1827679072_0001
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
0
Time taken: 4.525 seconds, Fetched: 1 row(s)
hive>

I faced the following issues –

12983836_10153513365165924_3384950318830466131_o

Error while starting Hive – Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D

I got this error while starting Hive for the first time. WRT to

java.net.URISyntaxException when starting HIVE and AdminManual Configuration  I made the following changes to make it working


<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp</value>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/tmp</value>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/tmp/operation_logs</value>
</property>

5139572731_102db69814_o

Error while Starting Hive – DatastoreDriverNotFoundException

Here is a scary exception thrown out when I started Hive.

org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver (“com.mysql.jdbc.Driver”) was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.

I forgot to copy the mysql driver to hive lib folder. Here is the command to copy the same

cp /usr/share/java/mysql-connector-java-5.1.38.jar /opt/hadoop/hive/lib/