Every day, the world creates 2.5 quintillion bytes of data. So much, in fact, that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data has come to be known as Big Data.
Big Data can be more distinctly defined as: “Data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.” Big Data is comprised of 2 types of information. 10% of Big Data is classified as structured data. This is the data that is already stored in databases across multiple networks. Unstructured data is of the most concern and accounts for 90% of Big Data information. Unstructured information is “human information’ such as emails, videos, tweets, Facebook posts, call-center conversations, closed circuit TV footage, mobile phone calls/texts, and website clicks.
Big data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make your business more agile, and to answer questions that were previously considered beyond your reach.
When trying to capture and analyze unstructured information, the criteria are often categorized in to four sections called The Four Vs. The Four Vs are:
1. Volume – Defined as the total number of bytes associated with the data. Unstructured data are estimated to account for 70-85% of the data in existence and the overall volume of data is rising. Benefits of this category include:
- Turning 12 terabytes of Tweets created each day into improved product sentiment analysis
- Converting 350 billion annual meter readings to better predict power consumption
2. Velocity – Defined as the pace at which the data is to be consumed. As volumes rise, the value of individual data points tend to more rapidly diminish over time. Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. For example:
Scrutinize 5 million trade events created each day to identify potential fraud
Analyze 500 million daily call detail records in real-time to predict customer churn faster
3. Variety – Defined as the complexity of the data in this class. This complexity eschews traditional means of analysis.
Big data is any type of data – structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. With Variety you can:
- Monitor 100’s of live video feeds from surveillance cameras to target points of interest
- Exploit the 80% data growth in images, video and documents to improve customer satisfaction
4. Variability – Defined as the differing ways in which the data may be interpreted. Differing questions require differing interpretations.
1 in 3 business leaders don’t trust the information they use to make decisions. How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows.
As the population of the internet grows, so does the amount of data people create. Big Data is quickly becoming a giant resource for those companies who are able to capture, analyze, and find ways to monetize the output. Big Data applications to take advantage of unstructured data are becoming more readily available. Before too long ordinary data warehousing will be a thing of the past, and Big Data will be king.