How to Get Your Arms Around Big Data
As solution providers’ customers increasingly employ business intelligence tools, dealing with “big data” is becoming a greater challenge. It’s an ideal opportunity for solution providers, as many business execs are unfamiliar with mining this type of information. In fact, a recent CompTIA study found only 37 percent of IT and business executives report being very familiar or mostly familiar with the concept. Ironically, approximately one in five businesses said they have a big data initiative underway; 36 percent plan to embark on one in the next 12 months,. Here, Tom Chew, National General Manager of Slalom Consulting, offers advice on getting a handle on all that data.—Jennifer Bosavage, editor
“Big data” is a term with two small words that describes a very large issue looming for many businesses. The term means exactly what it says: huge volumes of data. Big data is usually measured in terabytes or petabytes, often consolidated from multiple sources into a central location—or sometimes it’s unstructured data or information that companies must keep because they don’t yet know how they might use it (i.e., “gray data”).
[Related: How To Prepare and Develop Talent for Big Data ]
Big data is a fact of life for a rapidly increasing range of solution providers’ customers. Consider, for example, the billions of images and posts that a popular social network must retain and organize, or the analytical power required of scientists to decode genomes. For most businesses, however, big data becomes a concern in their data warehouses when considering the business intelligence functions that are essential to operations, forecasting, and understanding the marketplace.
Technology developers have responded to the emergence of big data with a range of solutions that can store, manage, and streamline the trillions of bytes of data that many businesses need to analyze. How that technology should be applied by a solution provider in any particular organization relates to five conditions—the Five Vs— that indicate whether an enterprise can benefit from a big-data solution in a cost-effective way:
1. Volume: Whether they deal with incoming or outgoing requests, companies with exceptionally large amounts of data always look for faster, more efficient, and lower-cost solutions for data storage and access requirements.
2. Velocity: A high rate of data arriving from multiple, disparate sources in various formats requires solutions that rapidly process query requests for large data, and also support the acquisition and retention of data just as quickly.
3. Variety: Traditionally, companies have only analyzed data in structured formats and have either fought to generate value from unstructured data or have confined their analysis to a structured part of the overall picture. Today’s technology, such as “Not Only SQL” (NoSQL) platforms, let businesses combine structured data with unstructured and semi-structured data to answer questions spanning all of their managed data.
4. Value: IT departments have had to make tough decisions about which data to keep and how long to keep it, and the processing power required to perform large and complex ad hoc analysis often has been beyond the department’s capacity and budget. Big-data solutions can provide value through insights gained by combining larger sets of data than were previously possible to manage. Now, companies can harvest more external data on market conditions, customer satisfaction, and competitive analysis, performing what-if scenarios for new insights.
5. Variability: The variability in data structure and how users want to interpret that data in the short and long term are considerations that may help a solution provider steer an organization toward a big data solution. Often the initial structure and content of data can change over time, and similar data from different sources can exhibit wide variability in structure and format. Big data solutions allow data to be stored in its original form and transformed for in-depth analysis when a user queries the data.
For all those reasons, a big data solution may help organizations make better sense and better use of their data. Such solutions involve any of the three primary architectures:
• Symmetric multiprocessing (SMP)
• Massively parallel processing (MMP) data warehousing appliances
• NoSQL platforms
SMP is an updated version of the traditional symmetric multiprocessing solutions that form the foundation of most data warehouse/business intelligence environments. SMP systems use multiple processors that share a common operating system (OS) and memory. Thus, they are limited by the capacity of the OS to manage the architecture, necessitating solutions with 16 to 32 processors.
Today’s big data technology, like the Microsoft SQL Server 2008 R2 Fast Track Data Warehouse platform, is specifically designed to manage large data sets with a vast increase in the performance capability of SMP. They entail shorter implementation timelines, are less costly to deploy and support, and offer a lower acquisition price and total cost of ownership. These solutions are ideal for handling data in the 5 to 50 terabyte range.
MPP systems harness numerous processors working on different parts of an operation in a coordinated way. Each processor has its own operating system and memory, so MPP systems can grow horizontally simply by adding more processors. MPP solutions often contain 50 to 200 processors or more. MPP pure data-warehousing appliances offer both hardware and software in a single package, while more broadly based appliances provide software with the option of different hardware configurations. Microsoft’s Parallel Data Warehouse solution furnishes the full performance capability of a data-warehouse appliance while permitting the organization to select from hardware options based on their current and future needs.
NoSQL platforms are currently a hot topic. They increase performance at a lower cost, with linear scalability, true commodity hardware, a schema-free structure, and more relaxed data-consistency validation. NoSQL solutions, like Hadoop, perform well with either extremely high data volumes or high levels of unstructured data content, such as documents, multimedia files, and social media content. Microsoft offers a Windows OS, cloud-based version of Hadoop on Microsoft Azure that enables organizations to explore the benefits of a Hadoop platform with minimal initial startup time and investment.