Hadoop EcoSystem

Here comes another important theory after 5Vs. Yet, another interesting concept of Big data paradigm.

Inserting your data:

Sqoop/Flume – These tools would be responsible for inserting the data to the file system from various sources.

HDFS:

HDFS – The Hadoop Distributed File System, which stores the huge volume of data as small blocks across multiple nodes or servers.

HBase – This complements HDFS, where HDFS has handicaps. It offers Streaming or real time updates.

Map Reduce / YARN – This is the set of APIs to collate the data and process it to arrive at the desired result.

HCatalog – This is the ‘Directory’ service for HDFS.. This is helpful to access the data from the data nodes. It helps us to standardize the data access.

Hive/Pig – Analytics tools with Scripting

Wiring:

Oozie – This is used to create work flows

Ambari – This is used to wire the different components of Hadoop ecosystem to form a coherant operation.

Let’s talk about each one of them in detail later, if possible!

Advertisements

One thought on “Hadoop EcoSystem

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s