Table of Contents
Big Data – Quick Overview
Definition of Big Data : Big Data Analytics is the process of studying Big Data to uncover hidden patterns and correlations to make better decisions using technologies like NoSQL databases, Hadoop and Map Reduce.
Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications.
The challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and privacy violations. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to “spot business trends, prevent diseases, combat crime and so on.”
Scientists regularly encounter limitations due to large data sets in many areas, including meteorology, genomics, connectomics, complex physics simulations, and biological and environmental research. The limitations also affect Internet search, finance and business informatics. Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks. The challenge for large enterprises is determining who should own big data initiatives that straddle the entire organization.
Big data is difficult to work with using most relational database management systems and desktop statistics and visualization packages, requiring instead “massively parallel software running on tens, hundreds, or even thousands of servers”. What is considered “big data” varies depending on the capabilities of the organization managing the set, and on the capabilities of the applications that are traditionally used to process and analyze the data set in its domain. Big Data is a moving target; what is considered to be “Big” today will not be so years ahead.
Examples of Big Data success in the real world include
Big Data is more than just a buzzword. It is a proven model for leveraging existing information sources to make smarter, more immediate decisions that result in better business outcomes. And it is already being used today by companies across vertical market segments to improve top and bottom line performance.
Examples of Big Data success in the real world include:
- Telecom providers gaining high-value insight from massive volumes of call- detail records, logs and other data to optimize customer capture, retention and margins.
- Utility companies tapping meter data to create smart grids that deliver pinpoint intelligence on usage, failures and theft.
- Consumer product companies aggregating social data and enterprise CRM resources to continually improve their marketing strategies and budget allocations.
- Financial services companies capturing and analyzing large numbers of transactions to prevent fraud, understand risk, perform forensics and ensure compliance.
Characteristics of Big Data
Big data can be described by the following characteristics:
- Volume – The quantity of data that is generated is very important in this context. It is the size of the data which determines the value and potential of the data under consideration and whether it can actually be considered as Big Data or not. The name ‘Big Data’ itself contains a term which is related to size and hence the characteristic.
- Variety – The next aspect of Big Data is its variety. This means that the category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysts. This helps the people, who are closely analyzing the data and are associated with it, to effectively use the data to their advantage and thus upholding the importance of the Big Data.
- Velocity – The term ‘velocity’ in the context refers to the speed of generation of data or how fast the data is generated and processed to meet the demands and the challenges which lie ahead in the path of growth and development.
- Variability – This is a factor which can be a problem for those who analyze the data. This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.
- Veracity – The quality of the data being captured can vary greatly. Accuracy of analysis depends on the veracity of the source data.
- Complexity – Data management can become a very complex process, especially when large volumes of data come from multiple sources. These data need to be linked, connected and correlated in order to be able to grasp the information that is supposed to be conveyed by these data. This situation, is therefore, termed as the ‘complexity’ of Big Data.
Big Data Drivers and Hurdles
Two factors have combined to make Big Data especially appealing now. One is that so many potentially valuable data resources have come into existence. These sources include the telemetry generated by today’s smart devices, the digital footprints left by people who are increasingly living their lives online, and the rich sources of information commercially available from specialized data vendors. Add to this the tremendous wealth of data — structured and unstructured, historical and real-time — that has come to reside in diverse systems across the enterprise, and it is clear that Big Data offers hugely appealing opportunities to those who can unlock its secrets.
The other factor contributing to Big Data’s appeal is the emergence of powerful technologies for effectively exploiting it. IT organizations can now take advantage of tools such as Hadoop, NoSQL and Gephi to rationalize, analyze and visualize Big Data in ways that enable them to quickly separate the actionable insight from the massive chaff of raw input. As an added bonus, many of these tools are available free under open source licensing. This promises to help keep the cost of Big Data implementation under control.
On the other hand, a variety of obstacles can also seriously impede Big Data adoption. These obstacles typically include:
- Insufficient in-house expertise: Most IT organizations don’t have an army of data scientists on staff to lead the design and implementation of Big Data solutions. Nor do they tend to find the prospect of building Big Data systems from scratch very practical or appealing. This lack of in-house experience and expertise in Big Data technologies and their implementation can greatly delay time-to-benefit and add unacceptable risk to the IT project portfolio.
- A confusing technology landscape: In the rush to cash in on Big Data fever, developers and vendors have introduced a confusing array of tools and technologies that are often longer on hype than they are on clarity. This confusion makes life harder for IT by requiring decision makers to engage in extensive evaluation processes that consume resources and further delay time-to-benefit.
- Uncertainty about ROI. Many IT organizations find it difficult to prioritize use cases, project the economic benefits for those use cases, and right size their investments accordingly. How much data will IT really have to ingest and analyze? What new storage and/or security issues will arise from the move to Big Data? Are there ways to defer capital expense? IT has to answer these questions and others to deliver the required return on investment (ROI) and avoid budget-busting
These three obstacles can make the path to Big Data slower, harder, riskier and more expensive. Companies must overcome these obstacles to quickly, efficiently and confidently reap the substantial business value and potential competitive differentiation offered by Big Data analytics.
“The effective use of Big Data can mean the difference between market leadership and ‘also-ran’ status for many companies,” says Vice President and Global Head, Cisco Business Unit, Wipro Technologies. “The challenge is how to maximize ROI and accelerate time-to-benefit, given the limited financial and human resources companies face in the real world.”
Critical Outcomes for a Big Data World
By taking advantage of the kind of Big Data partnership described above, companies can gain several high-value outcomes around Big Data. These include:
- Rapid, reliable discovery of new, high- impact business insights. Big Data has been clearly proven to help companies improve marketing, deliver an enhanced customer experience, drive operational efficiencies, pinpoint fraud and waste, avoid compliance failures and achieve other outcomes that directly affect top- and bottom-line business performance. By working with the right partner, companies can achieve these positive outcomes with substantially greater certainty and
- Reduced technology implementation and ownership costs. One of the keys to maximizing Big Data ROI is to drive down the “I” as much as is reasonably possible. The right partner can help companies accomplish this investment reduction on several fronts: by sparing them the ramp-up costs associated with extensive technology evaluations, re- skilling and multivendor engagements; by rightsizing and/or hosting compute infrastructure; and by ensuring that the end-to-end solution stack is properly architected for reasonable, predictable total cost of ownership (TCO).
- Repeatable success. “Many companies focus so intently on achieving some ini- tial Big Data proof-of-concept success that they lose sight of the fact that Big Data is much more a long-term strategic play than it is a single project or set of short-term deliverables,”. “As a result, they wind up facing many of the same issues and costs on their second project as they did on their first.” Engagement with the right partner eliminates this problem by bringing consistency and repeatability to successive Big Data deliverables — enabling companies to gain economies of scale, continuously accelerate time-to-benefit and propagate shared Big Data services across the organization.
As people and businesses do more of what they do in an always-on digital envi- ronment, as a growing number of intelligent devices capture and transmit a growing volume of useful data, and as unstructured data becomes an increasingly rich and pervasive source of business intelligence, Big Data will continue to play a more stra- tegic role in enterprise IT. Companies that recognize this reality — and that act on it in a technologically, operationally and economically optimized way — will gain sustainable competitive advantages over those that don’t. Any company pursuing those advantages will substantially benefit by engaging the right Big Data partner.
Using Data Analytics to Cut Losses, Save Costs
Data analytics also helps companies prevent and combat fraud. In real-time surveillance of things like stock market activity or credit card transactions, a company may have no more than a second to respond to a situation. Data analytics could help in some of those situations, but it would be more effective in prevention, according to him. In fraud detection scenarios, the ability to correlate disparate sets of information offers new opportunities, he says, citing the case of an individual who had submitted disability insurance claims while posting pictures of him skiing in Canadian mountain resorts on his Facebook page. The insurance company discovered those Facebook pictures and decided he clearly didn’t qualify for disability insurance. Similarly, some U.S. retailers are using non- traditional sources of information to track employees who have a track record of stealing as a way to combat internal theft.
One retailing company is acquainted with used analytics to improve workforce management. While the conventional approach to increasing sales volumes is to proportionately increase staff levels, analytics demonstrated a different logic. “If your footfalls are more between 2 p.m. and 5 p.m., you need more people at that time,” he says. “You could do such optimization of sales forces if you conduct a deeper analysis.” The retailer found it could gain “hundreds of millions of dollars” each year if it redeployed its staff along the lines the analytics suggested. Labor-related issues didn’t allow it to fully exploit that opportunity, but it did manage to save a quarter of the projected savings, or about $30 million annually.
Conclusion
Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications.
Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. Big data “size” is a constantly moving target, as of ranging from a few dozen terabytes to many petabytes of data. Big data is a set of techniques and technologies that require new forms of integration to uncover large hidden values from large datasets that are diverse, complex, and of a massive scale.
The right Big Data partner will be able to accommodate all of these approaches with a modular engagement model that empowers companies to choose the right mix of solutions for their immediate and long-term objectives.
IT can enhance the speed, certainty and ROI of Big Data initiatives by availing itself of three types of partner resources: A proven, configurable multitier technology stack Big Data implementations essentially require four tiers of technology: compute infrastructure, data management, analytics and applications for their industry-specific use cases.