Hadoop EcoSystem

Here comes another important theory after 5Vs. Yet, another interesting concept of Big data paradigm.

Inserting your data:

Sqoop/Flume – These tools would be responsible for inserting the data to the file system from various sources.


HDFS – The Hadoop Distributed File System, which stores the huge volume of data as small blocks across multiple nodes or servers.

HBase – This complements HDFS, where HDFS has handicaps. It offers Streaming or real time updates.

Map Reduce / YARN – This is the set of APIs to collate the data and process it to arrive at the desired result.

HCatalog – This is the ‘Directory’ service for HDFS.. This is helpful to access the data from the data nodes. It helps us to standardize the data access.

Hive/Pig – Analytics tools with Scripting


Oozie – This is used to create work flows

Ambari – This is used to wire the different components of Hadoop ecosystem to form a coherant operation.

Let’s talk about each one of them in detail later, if possible!

One thought on “Hadoop EcoSystem

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s