Wednesday, August 13, 2014

Big Data - The king of Kings

There was a time when people used to say that the "Customer is king".. well.. You may be puzzled why I say it in past tense. Yeah... This is the golden era of Data.. You heard it right!!! Customer may/ may not be the king... Depends on how good the customer invests in your projects.. The real king is Data... And, Big Data is the king of kings.. I shall give you the insight of Big Data in this article.


Definition

Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. Big data has the potential to help companies improve operations and make faster, more intelligent decisions.


Characteristics of Big Data:


Volume
• Data Volume
• Expected 44x increase from the year 2009 to 2020
• From 0.8 zeta-bytes to 35 zeta-bytes
• Data volume is increasing exponentially



Velocity
• Data is generated fast and need to be processed fast
• New Online Data Analytics
• Late decisions --> missing opportunities
• Examples
–E-Promotions: Based on your current location, your purchase history, what you like -> send promotions right now for store next to you
–Healthcare monitoring: sensors monitoring your activities and body -> any abnormal measurements require immediate reaction

Variety
• Various formats, types, and structures
• Text, numerical, images, audio, video, sequences, time series, social media data, 
multi-dimensional arrays, etc…
• Static data vs. streaming data

• A single application can be generating/collecting many types of data
• To extract knowledge, all these types of data need to be linked together / integrated



Big Data Customers:

Web and e-tailing

• Recommendation Engines
 • Ad Targeting
• Search Quality
• Sentiment Analyses
• Abuse and Click Fraud Detection




Telecommunications

• Customer Churn Prevention
• Network Performance Optimization
• Calling Data Record (CDR) Analysis
• Analyzing Network to Predict Failure


Government

• Fraud Detection
• Cyber Security Welfare
• Justice


Healthcare & Life Sciences

• Health information exchange
• Gene sequencing
• Healthcare improvements
• Drug Safety



Big Data Myths

• Always means data above or in range of TB
• Is always about social media. Doesn't apply to me.
• Will replace Enterprise Data Warehouse
• Is just a buzz word. No Practical Applications!
• Is New Concept
• Will be future.
• Is Expensive
• Is only for data scientists. Or is magic.
• We have enough hardware. Don't need any more.
• We will build it when we need it.
• Big Data is about Hadoop.

WHAT IS HADOOP?

• Cute Little Yellow Toy Elephant
• Framework to handle Big Data
• Open Source - Apache
• Power, Popular & Supported
• For reliable, scalable, distributed computing
• Created by Doug Cutting (of Yahoo) and Mike Cafarella
• Built for Nutch search engine project
• Written in Java

CORE OF HADOOP?

HDFS- Hadoop Distributed File System (Storage)

Stores multiple copies of file parts on multiple machines.
1. Distributed across “nodes”
2. Fault Tolerant & High Throughput
3. Low Cost Hardware

4. NameNode tracks locations.

MapReduce engine (Processing)

Execute your logic on multiple computers in parallel.
1. Splits a task across processors / nodes
2. “near” the data & assembles results
3. Self-Healing, High Bandwidth
4. Clustered storage

5. JobTracker manages the TaskTrackers




Components






Conclusion


Big data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make the business more agile, and to answer questions that were previously considered beyond human reach. 


No comments:

Post a Comment