Here comes another important theory after 5Vs. Yet, another interesting concept of Big data paradigm.
Inserting your data:
Sqoop/Flume – These tools would be responsible for inserting the data to the file system from various sources.
HDFS – The Hadoop Distributed File System, which stores the huge volume of data as small blocks across multiple nodes or servers.
HBase – This complements HDFS, where HDFS has handicaps. It offers Streaming or real time updates.
Map Reduce / YARN – This is the set of APIs to collate the data and process it to arrive at the desired result.
HCatalog – This is the ‘Directory’ service for HDFS.. This is helpful to access the data from the data nodes. It helps us to standardize the data access.
Hive/Pig – Analytics tools with Scripting
Oozie – This is used to create work flows
Ambari – This is used to wire the different components of Hadoop ecosystem to form a coherant operation.
Let’s talk about each one of them in detail later, if possible!