Improving Customer Support using Hadoop and Device Data Analytics

March 6, 2013

L – Dave Henry, Pentaho | R – Ben Llyod, NetApp

At Strata 2013 last week, Pentaho had the privilege to host a speaking session with Ben Lloyd, Sr. Program Manager, AutoSupport (ASUP) at NetApp. Ben leads a project called ASUP.Next, which has the goal of implementing a mission-critical data infrastructure for a worldwide customer support program for NetApp’s storage appliances. With design and development assistance from Think Big Analytics and Accenture, NetApp has reached the “go-live” milestone for ASUP.Next and will go into production this month.

A Big Data Problem

More than 250,000 NetApp devices are deployed worldwide; they “phone home” with device statistics and diagnostic information and represent a continuously growing collection of structured data that must be reliably captured, parsed, interpreted and aggregated to support a large collection of use cases. Ben’s presentation highlighted the business and IT challenges of the legacy AutoSupport environment:

  • The total cost of processing, storing and managing data represents a major ongoing expense ($15M / year). The storage required for ASUP-related data doubles every 16 months — by the end of 2013 NetApp will have more than 1PB of ASUP-related data available for analysis
  • The legacy ETL (PL/SQL) and data warehouse-based approach has resulted in increased latency and missed SLAs. Integrated data for reporting and analysis is typically only available 72-hours after the receipt of device messages
  • For NetApp Customer Support, the information required to resolve support cases is not easily available in the time required
  • For NetApp Professional Services, it’s difficult or impossible to aggregate the volume of performance data needed to provide valuable recommendations
  • For Product Engineering, failure analysis and defect signatures over long time periods are impossible to identify

Cloudera Hadoop: at the Core of NetApp’s Solution

The ASUP.Next project aims to address these issues by eliminating data volume constraints and building a Hadoop-centered infrastructure that will scale to support projected volumes. Ben discussed the new architecture in detail during his presentation. It enables a complete end-to-end workflow including:

  • Receipt of ASUP device messages via HTTP and e-mail
  • Message parsing and ingestion into HDFS and HBase
  • Distribution of messages to case-generation processes and downstream ASUP consumers
  • Long –term storage of messages
  • Reporting and analytic access to structured and unstructured data
  • RESTful services that provide access to AutoSupport data and processes

Pentaho’s Data Integration platform (PDI) is used in ASUP.Next for overall orchestration of this workflow as well as implementation of transformation logic using Pentaho’s visual development solution for MapReduce. Pentaho’s main value to NetApp comes from shortening the development cycle and providing ETL and job control capabilities that span the entire data infrastructure, from HDFS, HBase and MapReduce to Oracle and SAP. Pentaho also worked closely with Cloudera to ensure compatibility with the latest CDH client libraries.

NetApp’s use of Hadoop as a scalable infrastructure for ETL is increasingly common. Pentaho is seeing this use case across a variety of industries including capital markets, government, telecommunications, energy and digital publishing. In general, the reasons these customers use PDI with Hadoop include:

  • Leveraging existing team members for rapid development and ongoing maintenance of the solution. Most organizations have a core ETL team that can bring a decade or more of subject matter expertise to the table. By removing the requirement to use Java, a scripting language or raw XML, team members are able to actively help with the build-out of jobs and transformations. This also lessens the need to recruit, hire and orient outside developers
  • Increasing the “logic density” of transformations. As you can see in the demo example below, it’s possible to express a lot of transformation logic in a single mapper or reducer task. This makes it possible to reduce the number of unique jobs that must be run to achieve a complete workflow. In addition to improving performance, this can result in designs that are easier to document and explain


  • Focusing on the “what”, not the “how” of MapReduce development. I was surprised (actually shocked) to see how many of the speakers at Strata were still walking through code examples to illustrate a development technique. The typical organization has no desire and little ability to turn itself into a software development shop. The language-based approach may work for the Big Data “Titans”, but not for businesses that need to implement Big Data solutions quickly and with minimal risk

Key Takeaways

Since this was a Pentaho-sponsored session, Ben summarized his experience working with the Pentaho Services and Engineering teams. His main points are illustrated in the photo above. Most of his points revolve around how Pentaho provided support during early development and testing. A large number of Pentaho employees contributed their time, energy and brain-power to ensure the project’s success. Many enhancements in PDI 4.4 are a direct result of improvements needed to support ASUP.Next use cases.

What has Pentaho learned from this project? Pentaho gained a number of valuable insights:

  • Big Data architectures to support low-latency use cases can be complex. Not only are multiple functional components needed, but they must integrate with existing systems such as enterprise data warehouses. These architectures demand a high degree of flexibility
  • Big Data projects require customers, system integrators and technology providers to “plumb the last 5%” as the solution is being developed. Inevitably, new capabilities are used for the first time and need to be fine-tuned to support real-world use cases, data volumes and encoding formats. A good example is PDI’s support for AVRO. Although we anticipated needing to adapt the existing AVRO Input Step to work with NetApp’s schemas, we only understood the full set of requirements after seeing their actual data during an early system test
  • Pentaho’s plugin-based architecture isolates the core “engines” from the layer where point-functionality is implemented. Pentaho is able to implement all of the required enhancements without a single architectural change. The AVRO enhancements and other improvements (such as HTableInput format support for MapReduce jobs) were all coded and field-deployed via updates to plug-ins, completely eliminating the possibility of introducing defects into PDI’s data flow engine.
  • Open source is a significant “enabler” making it easy for everyone to understand how integration works. It’s hard to overestimate the importance of code transparency. It allows the customer, the system integrators and the technology partners to get right to the point and experiment quickly with different designs.

It’s been a pleasure working with NetApp and its partners on the ASUP.Next solution. We look forward to continuing our work with NetApp as their use of device data evolves to exploit new opportunities not previously possible with their legacy application.


Dave Henry, SVP Enterprise Solutions

Exploring deep into the code

August 20, 2010

I often joke with my colleagues that Pentaho is the world’s largest “BI Operating System.” When you think about the number of core services that are required for any application platform – from authentication/authorization to content persistence to presentation across multiple user interfaces – it’s easy to see how architecture can become an obsession.

I’ve always felt that Pentaho has the best architectural implementation of any BI suite on the market and that the decisions and investments we made in the early years (ca. 2005-2007) would serve our customers and the community well. This has clearly been the case, and the proof is in the velocity and number of enhancements that have been delivered in the last 12 months. In particular the tremendous improvements in reporting and analysis have provided tangible benefits and made us much more competitive vis-à-vis proprietary BI products.  The platform team is now hard at work adding Java Content Repository (JCR) support (just as they did with PDI 4.0) and this will pay huge dividends in 2011 and beyond.

There’s a quiet – you might even say invisible – side to Pentaho’s business which has emerged over the last two years. Pentaho is being embedded into mission-critical web applications by name-brand enterprises and ISV’s. From the very moment they contact Pentaho Services, these companies challenge us with  interesting technical scenarios.  “Yes, we know, Pentaho was designed from the very beginning to be integrated into other applications, but how?” To answer their questions we’ve organized a team of senior Pentaho software engineers and architects who engage directly with their peers. We’ve developed best practices, created code samples and documented many of the useful software interfaces that embedders need to understand in order to deploy Pentaho within their apps. This body of work is now available to the general public in the form of a new public training course.

Developed by Pentaho’s own Gretchen Moran, the Pentaho Architect’s Bootcamp is designed to teach developers how to explore, customize and extend Pentaho solutions and the Pentaho BI Server to meet the needs of customers and community members who are looking for functionality that goes beyond Pentaho’s “out of the box” capabilities. It represents a 5-day immersion into the inner workings of Pentaho action sequences and API extension points. Do you have detailed questions about CDF, CDA and controlling user interaction with your content? Need to call or consume web services, integrate with a single sign-on server or deploy Mondrian in a multi-tenanted environment? If so, this is your class.

For more information and to register for an Architect’s Bootcamp class in Orlando, London or San Francisco, visit

Dave Henry
Vice President, Services
Pentaho Corporation

p.s. If you already understand these topics, check out our new job openings for Enterprise Architects in North America


Get every new post delivered to your Inbox.

Join 11,881 other followers