Successful companies owe much of their survival to big data—those trillions of data points collected across their organizations that can be mined for useful insights. Big data helps their leaders improve their services or products. But, “big data,” said Pearl Zhu, author of the Digital Master book series, “is the starting point, not the end.”
To maximize its value, decision makers need to be aware of its challenges, also known as its five V’s. These are its sheer amount of volume, its exponential velocity, its variety, the need to verify it, and the issue on how to extract value from its content.
The 5 V’s of Big Data
Understanding the challenges of Big Data and knowing which BI tools to use to solve those issues can help you answer questions that were previously considered beyond your reach.
Volume
Volume refers to the colossal amount of data that inundates organizations. We’re well past the days when companies resourced their data internally and stored it in local servers. Companies of 15 years ago handled terabytes of data.
Today, data has grown to petabytes if not exabytes of bytes (that’s 1,000–1 million TB) that come from sources such as transaction processing systems, emails, social networks, customer databases, website lead captures, monitoring devices and mobile apps.
To handle all of this data, managers use data lakes and warehouses or data management systems. They store it on clouds or use service providers such as Google Cloud. And as global data grows from two zettabytes at the beginning of the decade to 181 zettabytes a day by 2025, even these may be insufficient.
An example of data volume
Walmart operates approximately 10,500 stores in 24 countries, handling more than 1 million customer transactions every hour. The result? Walmart imports more than 2.5 petabytes of data per hour, storing it internally on what happens to be the world’s biggest private cloud.
How software can help: Apache’s Hadoop splits big data into chunks, saving it across clusters of servers. Check out Cloudera Enterprise 4.0 with its high-availability features that makes Java-encoded Hadoop more secure.
Velocity
Big data grows fast. Consider that, according to Zettaspere, there are around 3,400,000 emails, 4,595 SMS, 740,741 WhatsApp messages, almost 69,000 Google searches, 55,000 Facebook posts, and 5,700 tweets made per minute.
Around five years ago, data scientists measured incoming data with computerized batch processing that read large files and generated reports. Today, batch processes are unable to handle the continuous rush of real-time data from a growing number of sources.
More critical still, data ages fast. As Walmart’s former senior statistical analyst Naveen Peddamail said, “If you can’t get insights until you’ve analyzed your sales for a week or a month, then you’ve lost sales within that time.”
Competitive companies need some capable business intelligence (BI) tools to make timely decisions.
An example of data velocity
Using real-time alerting, Walmart sales analysts noted that a particular, rather popular, Halloween novelty cookie was not selling in two stores. A quick investigation showed that, due to a stocking oversight, those cookies hadn’t been put on the shelves. By receiving automated alerts, Walmart was quickly able to rectify the situation and save its sales.
How software can help: Splunk Enterprise monitors operational flows in real time, helping business leaders make timely decisions. That said, it’s expensive for large data volumes.
Variety
Variety refers to the different types of digitized data that inundate organizations and how to process and mine these various types of data for insights. At one time, organizations mostly gained their information from structured data that fit into internal databases like Excel.
Today, you also have unstructured information that evades management and comes in diverse forms such as emails, customer comments, SMS, social media posts, sensor data, audio, images, and video. Companies struggle with digesting, processing, and analyzing this type of data and doing so in real time.
An example of data variety
Walmart tracks each one of its 145 million American consumers individually, resulting in accrued data per hour that’s equivalent to 167 times the books in America’s Library of Congress. Most of that is unstructured data that comes from its videos, tweets, Facebook posts, call-center conversations, closed-circuit TV footage, mobile phone calls and texts, and website clicks.
How software can help: Walmart uses a 250-node Hadoop. For small to midsize companies, Tableau is ideal since it is also designed for non-technical users.
Veracity
Veracity is arguably the most important factor of all the five Vs because it serves as the premise for business success. You can only generate business profit and impact change with thorough and correct information.
Data can only help organizations if it’s clean. That’s if it’s accurate, error-free, reliable, consistent, bias-free, and complete. Contaminating factors include:
- Statistical data that misrepresents the information of a particular market
- Meaningless information that creeps into and distorts the data
- Outliers in the dataset that make it deviate from the normal behavior
- Bugs in software that produce distorted information
- Software vulnerabilities that could cause bad actors to hack into and hijack data
- Human agents that make mistakes in reading, processing, or analyzing data, resulting in incorrect information
Example of data veracity
According to Jaya Kolhatkar, vice president of global data for Walmart labs, Walmart’s priority is making sure its data is correct and of high quality. Clean data helps with privacy issues, ensuring sensitive details are encrypted while customer contact information is segregated.
How software can help: Multilingual and scalable Apache Spark is good for quick queries across data sizes. However, it’s expensive and has latency issues.
Value
Big data is the new competitive advantage. But, that’s only if you convert your information into useful insight.
Users can capture value from that data through:
- Making their enterprise information transparent for trust
- Making better management decisions by collecting more accurate and detailed performance information across their business
- Fine-tuning their products or services to narrowly segmented customers
- Minimizing risks and unearthing hidden insights
- Developing the next generation of products and services
Example of data value
Walmart uses its big data to make its pharmacies more efficient, help it improve store checkout, personalize its shopping experience, manage its supply chain, and optimize product assortment among other ends.
How software can help: Splunk enterprise helps businesses analyze data from different points of view and has advanced monitoring features that come at a price.
Using Big Data Management Tools to Optimize the 5 V’s
Savvy companies use robust big data management tools to maximize the value of their big data. Tools like Hadoop help companies store that massive data, clean it, and rapidly process it real time. Such data management tools also help leaders handle unstructured data and extract insights to benefit their companies.