Pig Installation and Configuration

2000px-wikipedia-logo-v2-en-svg

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMSs. Pig Latin can be extended using User Defined Functions (UDFs) which the user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the language.

This is the continuation of my Hadoop series. This will follow the folder structures created earlier. Pls refer to the following posts.

Download and Extract

hadoop@gandhari:~$ wget http://download.nus.edu.sg/mirror/apache/pig/pig-0.16.0/pig-0.16.0.tar.gz
hadoop@gandhari:~$ gunzip pig-0.16.0.tar.gz
hadoop@gandhari:~$ tar -xvf pig-0.16.0.tar
hadoop@gandhari:~$ ln -s pig-0.16.0 pig

.bashrc Changes

Add the following environmental variables to your ~/.bashrc

#PIG VARIABLES
export PIG_CONF_DIR=/etc/hadoop/conf
export PIG_CLASSPATH=/etc/hadoop/conf
export PIG_HOME=/opt/hadoop/pig
export PATH=$PATH:/opt/hadoop/pig/bin

Sample Pig Job

Let’s copy a file using hadoop and read it with Pig

hadoop@gandhari:~$ ls -alt>test.txt
hadoop@gandhari:~$ hadoop fs -mkdir /pigdata
hadoop@gandhari:~$ hadoop fs -put test.txt /pigdata

Let’s enure the jobs are running.

hadoop@gandhari:~$ jps
7156 ResourceManager
6788 DataNode
6998 SecondaryNameNode
7846 Jps
7276 NodeManager
6671 NameNode

hadoop@gandhari:~$ pig

grunt> a = LOAD ‘/data/test.txt’ as (name:chararray);

grunt> dump a;

2016-08-24 14:01:27,845 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil – Total input paths to process : 1
(total 452532)
(drwxr-xr-x 15 hadoop hadoop      4096 Aug 24 06:00 .)
(-rw-rw-r–  1 hadoop hadoop         0 Aug 24 06:00 text.txt)
(drwxrwxr-x  3 hadoop hadoop      4096 Aug 24 05:58 logs)
(-rw-rw-r–  1 hadoop hadoop       139 Aug 23 16:09 .hivehistory)
(-rw-r–r–  1 hadoop hadoop      4567 Aug 23 15:59 .bashrc)

 

 

 

Advertisements

2 thoughts on “Pig Installation and Configuration

  1. Pingback: Hadoop Eco System Installation – Contents | JavaShine

  2. Pingback: Lab 22: Getting started with Apache Pig | JavaShine

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s