5Vs of Big data

Big data is penetrating into market with its full speed. When I started to look at it before some years, it started showing its ability to handle enterprise data with the introduction of Yarn and other ecosystem products. We had a meeting with one of our existing customers, who has typical ERP and MS products. When they say they finished exploring Big Data and decided to implement it, it shows that Big Data would become a generic skill set soon.

When I say this, I was amazed to realize how Hadoop handle 5vs of Big Data. Before discussing about how Hadoops manages those Vs.

1. Volume – We are talking about data of huge in size – TB or ZBs. It may be too much of effort to implement such a complex system for a small scale enterprise.

2. Variety – RDBMS exists to handle the structured data. We here talk about variety of data from different data sources in different format. It may be XML from RSS feeds; it may be XLS files, It may be CSV files from market real time data etc

3. Velocity – It is the speed at which we do the data analytics. For a siple example, assume a data analytics engine process the real time market data at higher speed.

4. Varacity/Verification – Facing bad data during ETL process is a common practice. Either the data may not come from the expected source at expected time. Or, the data received may not be as per our limitation. During my earlier days of ETL, I used to add many conditions in the staging tables, so that my aggregation processing will run as expected. But later, I realized i’m omitting many data during ETL process, as they fails because of my constraints. My coworker, who is an Oracle expert used to advice to insert as much data as possible in ETL tables. If aggregation process fails, then we can fine-tune it so that we don’t miss any data. Lets see how Big Data handles this.

5. Value – Okay, we have data. But how does it makes sense to me or my customer? I may have the problem of 100TB of unused data, blocking my DC space. But I may not have space to accomodate 10TB of business critical data. How am I going to face these situations.

Let’s discuss about these in the future under 5vs tag.

I’ll meet you in another post.

Happy independence Day

Advertisements

2 thoughts on “5Vs of Big data

  1. Pingback: Hadoop EcoSystem | JavaShine

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s