How to Get Your Arms Around Big Data
As solution providers' customers increasingly employ business intelligence
tools, dealing with "big data" is becoming a greater challenge. It's an ideal
opportunity for solution providers, as many business execs are unfamiliar with
mining this type of information. In fact, a recent CompTIA study found only 37
percent of IT and business executives report being very familiar or mostly
familiar with the concept. Ironically, approximately one in five businesses said
they have a big data initiative underway; 36 percent plan to embark on one in
the next 12 months,. Here, Tom Chew, National General Manager of Slalom
Consulting, offers advice on getting a handle on all that data.—Jennifer
Bosavage, editor
“Big data” is a term with two small words that describes a very large issue
looming for many businesses. The term means exactly what it says: huge volumes
of data. Big data is usually measured in terabytes or petabytes, often
consolidated from multiple sources into a central location—or sometimes it’s
unstructured data or information that companies must keep because they don’t yet
know how they might use it (i.e., “gray data”).
[Related: How To Prepare and Develop Talent for Big Data ]
Big data is a fact of life for a rapidly increasing range of solution providers'
customers. Consider, for example, the billions of images and posts that a
popular social network must retain and organize, or the analytical power
required of scientists to decode genomes. For most businesses, however, big data
becomes a concern in their data warehouses when considering the business
intelligence functions that are essential to operations, forecasting, and
understanding the marketplace.
Technology developers have responded to the emergence of big data with a range
of solutions that can store, manage, and streamline the trillions of bytes of
data that many businesses need to analyze. How that technology should be applied
by a solution provider in any particular organization relates to five
conditions—the Five Vs— that indicate whether an enterprise can benefit from a
big-data solution in a cost-effective way:
1. Volume: Whether they deal with incoming or outgoing requests, companies with
exceptionally large amounts of data always look for faster, more efficient, and
lower-cost solutions for data storage and access requirements.
2. Velocity: A high rate of data arriving from multiple, disparate sources in
various formats requires solutions that rapidly process query requests for large
data, and also support the acquisition and retention of data just as quickly.
3. Variety: Traditionally, companies have only analyzed data in structured
formats and have either fought to generate value from unstructured data or have
confined their analysis to a structured part of the overall picture. Today’s
technology, such as “Not Only SQL” (NoSQL) platforms, let businesses combine
structured data with unstructured and semi-structured data to answer questions
spanning all of their managed data.
4. Value: IT departments have had to make tough decisions about which data to
keep and how long to keep it, and the processing power required to perform large
and complex ad hoc analysis often has been beyond the department’s capacity and
budget. Big-data solutions can provide value through insights gained by
combining larger sets of data than were previously possible to manage. Now,
companies can harvest more external data on market conditions, customer
satisfaction, and competitive analysis, performing what-if scenarios for new
insights.
5. Variability: The variability in data structure and how users want to
interpret that data in the short and long term are considerations that may help
a solution provider steer an organization toward a big data solution. Often the
initial structure and content of data can change over time, and similar data
from different sources can exhibit wide variability in structure and format. Big
data solutions allow data to be stored in its original form and transformed for
in-depth analysis when a user queries the data.
For all those reasons, a big data solution may help organizations make better
sense and better use of their data. Such solutions involve any of the three
primary architectures:
• Symmetric multiprocessing (SMP)
• Massively parallel processing (MMP) data warehousing appliances
• NoSQL platforms
SMP
SMP is an updated version of the traditional symmetric multiprocessing solutions
that form the foundation of most data warehouse/business intelligence
environments. SMP systems use multiple processors that share a common operating
system (OS) and memory. Thus, they are limited by the capacity of the OS to
manage the architecture, necessitating solutions with 16 to 32 processors.
Today’s big data technology, like the Microsoft SQL Server 2008 R2 Fast Track
Data Warehouse platform, is specifically designed to manage large data sets with
a vast increase in the performance capability of SMP. They entail shorter
implementation timelines, are less costly to deploy and support, and offer a
lower acquisition price and total cost of ownership. These solutions are ideal
for handling data in the 5 to 50 terabyte range.
MPP
MPP systems harness numerous processors working on different parts of an
operation in a coordinated way. Each processor has its own operating system and
memory, so MPP systems can grow horizontally simply by adding more processors.
MPP solutions often contain 50 to 200 processors or more. MPP pure
data-warehousing appliances offer both hardware and software in a single
package, while more broadly based appliances provide software with the option of
different hardware configurations. Microsoft’s Parallel Data Warehouse solution
furnishes the full performance capability of a data-warehouse appliance while
permitting the organization to select from hardware options based on their
current and future needs.
NoSQL platforms are currently a hot topic. They increase performance at a lower
cost, with linear scalability, true commodity hardware, a schema-free structure,
and more relaxed data-consistency validation. NoSQL solutions, like Hadoop,
perform well with either extremely high data volumes or high levels of
unstructured data content, such as documents, multimedia files, and social media
content. Microsoft offers a Windows OS, cloud-based version of Hadoop on
Microsoft Azure that enables organizations to explore the benefits of a Hadoop
platform with minimal initial startup time and investment.