When Steve Phillpott took over as CIO at HGST (formerly Hitachi Global Storage Technologies) in February 2013 he quickly identified a number of important initiatives for IT to create value at the company. One item at the top of the list was to radically improve how development, quality, and manufacturing operations used the vast data produced during the creation and servicing of hard drives.
What quickly emerged during this process was the fact that there would need to be close collaboration between IT and the business to achieve big data results.
Building a roadmap to capitalize on "big data" provides an opportunity for business and technology leaders to come together, thereby avoiding a common challenge played out in many organizations: the estrangement of IT and business groups. Working together, the teams at HGST identified and prioritized opportunities to use data about HGST's hard drives to improve yield, enhance testing, and better serve customers.
[Enterprises are starting to see the light on big data opportunities. Read Big Data Reaches Inflection Point]
To achieve the best results, the leadership team from HGST's business side approved a Big Data Platform (BDP) to serve each of these business groups with commitment from high up to support change, break down data silos, and to measure metrics of business impact.
Hadoop, the foundation of HGST's BDP, is particularly well suited to breaking through data silos. Traditional relational databases store their data in well-defined table structures, and therefore require detailed data modeling before a single row of data can be loaded. Hadoop, on the other hand, simply stores its data as files on its distributed file system, greatly streamlining the data loading process. With so much of HGST’s data coming from legacy databases, the Avro file format preserves the structure of the data -- and accounts for schema changes.
Six months later, the joint IT and business strategy at HGST has started a transformation for how R&D, quality, and manufacturing teams use data for their daily work. Engineers are no longer hamstrung by systems that limit the ability to access and analyze the volumes of detailed data required to develop and refine products and to resolve issues quickly. With the BDP, data on the entire “DNA” of a hard drive -- from development to manufacturing to reliability testing -- is available and accessible at any time.
In addition, the BDP is opening the door for new avenues of yield and operational improvements, by allowing engineers to run large-scale analyses on years of detailed hard drive data. For example, engineers have started to run analytics across test data for millions of hard drives to provide finer-grain understanding of the drive’s components.
As business leaders have come to understand the potential of Hadoop through these early successes, excitement has grown, leading to a slew of new use cases proposed by business teams. Managers are able to analyze device data to understand process delays in test and manufacturing.
Data analytics also makes more relevant information available for people to act on in real-time. HGST engineers can now access data about any device or component at whatever level of detail they require, whereas previously they required data search parties to hunt for data scattered across diverse databases and tape backups, and were limited to summaries of manufacturing data they had collected previously.
In order to use big data to produce better business results, companies are instituting a number of organizational changes that require close collaboration between business and technology.
Identify data that’s potentially useful for the business, whether it's available internally or externally. Access to internal data often requires IT to move from limiting access for security to encouraging sharing while still governing access to data sets like web logs, customer profiles, and product usage data. Using external data such as online interests, demographics, web crawl, or social activity data requires an investment in purchasing data and a commitment to test and learn how outside data can help improve the business.
Use data science to understand the signal -- the ability to better predict behavior and identify the impact to your business by acting on it -- that's buried in the noise (complex data sets). For instance, my company, Think Big Analytics, has worked with some large hedge funds and financial payment providers to identify the signal in novel data sets like news, reports, social activity, and consumer financial transactions to identify fraud and to improve returns in financial investing. The data scientist is truly an interpreter and facilitator between the complex world of new data and the needs of the business.
Continuously improve operational analytics across the business. The promise of big data is to have technology experts create new capabilities that the business can use to explore and help generate revenue and competitive advantage.
Use automated models that detect data patterns for both strategic and tactical responses. This means using machine learning to continuously update the best response to take automatically in response to events like user or device activity. Moving to a culture of using models instead of human intuition is a small technological step but a big organizational one.
For example, online advertising has migrated from humans defining what ads should run where to a world where participants on online exchanges use automated models to bid for the right to serve ads in real-time.
You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation. (Free registration required.)