# Apache Pig – Operators and Functions with Examples

Here you go! Click the below given page for the examples

# Lab 24: Joins using Pig

Let’s make some joins today, dear hadoopers.

### Join

We have a table for employee.

```grunt> dump employee;
(1, Dharma,45, Sr Manager)
(2, Bheema,43, Cook)
(3, Arjuna,41, Instructor)
(4, Nakula,35, Jr Instructor)

We have another table salary as given below.

```grunt> dump salary;

(1,2000)
(2,1500)
(3,1000)
(4,500)
(5,500)```

Let’s join salary table with employee table on emp_id field.

```grunt> salary_table = JOIN employee by emp_id, salary by emp_id;

grunt> dump salary_table;
Input(s):

Output(s):
Successfully stored 5 records (16134806 bytes) in: "hdfs://gandhari:9000/tmp/temp-2101978857/tmp596113376"

Counters:
Total records written : 5
Total bytes written : 16134806
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

(1, Dharma,45, Sr Manager,1,2000)
(2, Bheema,43, Cook,2,1500)
(3, Arjuna,41, Instructor,3,1000)
(4, Nakula,35, Jr Instructor,4,500)

### Inner Join

The employee table looks like this.

```(1, dharma,45, Sr Manager)
(2, Bheema,43, Cook)
(3, Arjuna,41, Instructor)
(4, Nakula,35, Jr Instructor)

grunt> employee = load 'employee.csv' using PigStorage (',') as (emp_id:int,emp_name:chararray,emp_age:int,emp_desig:chararray);

grunt> describe employee;
employee: {emp_id: int,emp_name: chararray,emp_age: int,emp_desig: chararray}```

There is a change in salary table. No salary for Arjuna this month 🙂 Id 3 is missing in salary table.

```grunt> dump salary;
(1,2000)
(2,1500)
(4,500)
(5,500)

grunt> salary = load 'salary.csv' using PigStorage (',') as (emp_id:int,emp_salary:int);

grunt> describe salary;
salary: {emp_id: int,emp_salary: int}```

```grunt> employee_salary = join employee by emp_id, salary by emp_id;

grunt> describe employee_salary;
employee_salary: {employee::emp_id: int,employee::emp_name: chararray,employee::emp_age: int,employee::emp_desig: chararray,salary::emp_id: int,salary::emp_salary: int}

grunt> illustrate employee_salary;

--------------------------------------------------------------------------------------------------
| employee     | emp_id:int    | emp_name:chararray    | emp_age:int    | emp_desig:chararray    |
--------------------------------------------------------------------------------------------------
|              | 5             |  Sahadeva             | 33             |  Jr Instructor         |
|              | 5             |  Sahadeva             | 33             |  Jr Instructor         |
--------------------------------------------------------------------------------------------------
--------------------------------------------------
| salary     | emp_id:int    | emp_salary:int    |
--------------------------------------------------
|            | 5             | 500               |
|            | 5             | 500               |
--------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| employee_salary     | employee::emp_id:int    | employee::emp_name:chararray    | employee::emp_age:int    | employee::emp_desig:chararray    | salary::emp_id:int    | salary::emp_salary:int    |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                     | 5                       |  Sahadeva                       | 33                       |  Jr Instructor                   | 5                     | 500                       |
|                     | 5                       |  Sahadeva                       | 33                       |  Jr Instructor                   | 5                     | 500                       |
|                     | 5                       |  Sahadeva                       | 33                       |  Jr Instructor                   | 5                     | 500                       |
|                     | 5                       |  Sahadeva                       | 33                       |  Jr Instructor                   | 5                     | 500                       |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

grunt> dump employee_salary;

(1, dharma,45, Sr Manager,1,2000)
(2, Bheema,43, Cook,2,1500)
(4, Nakula,35, Jr Instructor,4,500)

### Left Outer Join

```grunt> left_outer = join employee by emp_id left outer, salary by emp_id;

grunt> describe left_outer;
left_outer: {employee::emp_id: int,employee::emp_name: chararray,employee::emp_age: int,employee::emp_desig: chararray,salary::emp_id: int,salary::emp_salary: int}

grunt> dump left_outer;

(1, dharma,45, Sr Manager,1,2000)
(2, Bheema,43, Cook,2,1500)
(3, Arjuna,41, Instructor,,)
(4, Nakula,35, Jr Instructor,4,500)

We get all the records from employee table with null value for those value is not given in salary table.

### Right Outer Join

```grunt> right_outer = join salary by emp_id right, employee by emp_id;

grunt> describe right_outer;
right_outer: {salary::emp_id: int,salary::emp_salary: int,employee::emp_id: int,employee::emp_name: chararray,employee::emp_age: int,employee::emp_desig: chararray}

grunt> dump right_outer;

(1,2000,1, dharma,45, Sr Manager)
(2,1500,2, Bheema,43, Cook)
(,,3, Arjuna,41, Instructor)
(4,500,4, Nakula,35, Jr Instructor)

We could see all the records from right table even if there is no match in the left table.

### Full outer join

Last topic of this blog post – Lets do a full outer join, to record from each table when there is at least a single match.

```grunt> full_outer = join employee by emp_id full outer, salary by emp_id;
grunt> describe full_outer;
full_outer: {employee::emp_id: int,employee::emp_name: chararray,employee::emp_age: int,employee::emp_desig: chararray,salary::emp_id: int,salary::emp_salary: int}

grunt> dump full_outer;

(1, dharma,45, Sr Manager,1,2000)
(2, Bheema,43, Cook,2,1500)
(3, Arjuna,41, Instructor,,)
(4, Nakula,35, Jr Instructor,4,500)

Can we join by two keys? yes. join employee by (emp_id, emp_name)!

### Cross

Lets do a cross to bring the X relationship between tables. To make it simpler, I have taken less data in our employee table.

```grunt> employee = load 'employee.csv' using PigStorage (',') as (emp_id:int,emp_name:chararray,emp_age:int,emp_desig:chararray);

grunt> dump employee;

(1, Dharma,45, Sr Manager)
(2, Bheema,43, Cook)

grunt> leave = load 'leave.csv' using PigStorage (',') as (emp_id:int,emp_leave_date:chararray);

grunt> dump leave;

(1,2016-10-11)
(2,2016-10-12)

grunt> cross_data = cross employee, leave;

grunt> describe cross_data;
cross_data: {employee::emp_id: int,employee::emp_name: chararray,employee::emp_age: int,employee::emp_desig: chararray,leave::emp_id: int,leave::emp_leave_date: chararray}

grunt> dump cross_data;

(2, Bheema,43, Cook,2,2016-10-12)
(2, Bheema,43, Cook,1,2016-10-11)
(1, Dharma,45, Sr Manager,2,2016-10-12)
(1, Dharma,45, Sr Manager,1,2016-10-11)```

See you in another interesting post.