Achieving Equilibrium Between Business and IT

October 16, 2014

Last week, Pentaho unveiled a Big Data Orchestration Platform instrumental to a data-driven future– one where analytics and data are embedded into the fabric of the organization to drive real time business decisions. Fundamental to Pentaho’s strategy is the concept of governed data delivery – the ability to blend trusted and timely data to power analytics at scale for all users and in all environments.

Today, at Strata + Hadoop World, Pentaho announced new automated data modeling and publishing capabilities to help organizations establish a Streamlined Data Refinery solution architecture – empowering business users while still meeting IT requirements.

Pentaho is putting governed, blended data from across the enterprise at business users’ fingertips. You can watch the video demo here:

Achieving the end result of empowering business users with the ability to blend and refine data for business and regulatory purposes, while supporting IT with governed orchestration processes and data delivery methods.is the basis of a Streamlined Data Refinery Here are two of our customers achieving the business and IT balance.

Tim Garnto, SVP of Product Engineering at edo Interactive explains, “Having lightning fast data analytics is a matter of survival for our team. In early 2013, we hit a wall due in part to the sheer amount of data we could access. We just couldn’t process the data fast enough with our existing SQL database, as a result, we couldn’t get the right offers to the right people quickly enough. Leveraging the Streamlined Data Refinery architecture with Pentaho we were able to extract, integrate and analyze 25 million transactions a day consisting of over 50TB of data. As a result, we cut the processing window from 29 hours to under four all while growing the amount of data processed by 974 percent.”

Robert Clark, VP of Development, 4SightBI shares, “In order for our customer base of agents, brokers, actuaries and managers to deliver quality reports that result in faster sales closings, they require quick turnaround and mobile visibility into the data, without having to wait on IT. With the automation of a Streamlined Data Refinery, our customers planning to use Hadoop will soon be able to see a further reduction in the time it takes for us to produce actionable insights by minimizing time spent on preparing data, all while still delivering relevant data sets with governance.”

Donna Prlich
VP, Product Marketing
Pentaho


Pentaho 5.2, Marching Towards Governed Data Delivery

October 9, 2014

Q on stageIn our hyper-competitive global economy, companies are urgently seeking to harness the value from all their data to find new revenue streams, operate more efficiently, deliver outstanding service and minimize risk. They need a platform that catalyzes collaboration between IT and business to meet demands both for governance and the business need for timely, comprehensive data.

Today at PentahoWorld we’ve committed to leading the charge towards a truly data-driven future, starting with governed data delivery.

In his keynote, Pentaho Chairman and CEO, Quentin Gallivan lifted the veil on a big data orchestration platform that will be instrumental in delivering this vision by overcoming three persistent big data challenges: need for strong data governance, diverse data blending, and delivery of analytics embedded at the point of impact. Fundamental to Pentaho’s strategy is the concept of governed data delivery, defined as the ability to blend trusted and timely data to power analytics at scale for all users in all environments.

Also announced today, Pentaho 5.2 includes new innovations to simplify delivery of blended data sets with IT trust, planting new guideposts on the road to governed data delivery.

chris

Automated Streamlined Data Refinery

IT can hand the power of data driven analytics to the business with an automated Streamlined Data Refinery. Hadoop serves as a central processing hub where analytics ready data sets can be blended, refined, auto-modeled, and automatically published directly to analytic data bases like HP Vertica for high performance interactive analytics using Pentaho Analyzer.

Advanced Data Security in Hadoop

Pentaho 5.2 extends the adaptive big data layer with advanced capabilities for Kerberos security support for major Hadoop distributions, including Cloudera, Hortonworks and MapR. This ensures that only those users with proper credentials can access Hadoop cluster resources in their data orchestration.

Pentaho 5.2 also includes additional features that simplify the user experience for both embedded and direct customers.

Learn more about Pentaho 5.2 and Governed Data Delivery – watch the video and read the whitepaper at www.pentaho.com/product/governed-data-delivery.

Follow PentahoWorld live on Twitter using the hashtag #PWorld2014.

Chuck Yarbrough
Product Marketing, Big Data
Pentaho


Spark on Fire! Integrating Pentaho and Spark

June 30, 2014

One of Pentaho’s great passions is to empower organizations to take advantage of amazing innovations in Big Data to solve new challenges using the existing skill sets they have in their organizations today.  Our Pentaho Labs’ innovations around natively integrating data engineering and analytics with Big Data platforms like Hadoop and Storm have already led dozens of customers to deploy next-generation Big Data solutions. Examples of these solutions include optimizing data warehousing architectures, leveraging Hadoop as a cost effective data refinery, and performing advanced analytics on diverse data sources to achieve a broader 360-degree view of customers.

Not since the early days of Hadoop have we seen so much excitement around a new Big Data technology as we see right now with Apache Spark.  Spark is a Hadoop-compatible computing system that makes big data analysis drastically faster, through in-memory computation, and simpler to write, through easy APIs in Java, Scala and Python.  With the second annual Spark Summit taking place this week in San Francisco, I wanted to share some of the early work Pentaho Labs and our partners over at Databricks are collaborating on to deeply integrate Pentaho and Spark for delivering high performance, Big Data Analytics solutions.

Big Data Integration on Spark

Big Data Integration on SparkAt the core of Pentaho Data Integration (PDI) is a portable ‘data machine’ for ETL which today can be deployed as a stand-alone Pentaho cluster or inside your Hadoop cluster though MapReduce and YARN.  The Pentaho Labs team is now taking this same concept and working on the ability to deploy inside Spark for even faster Big Data ETL processing.  The benefit for ETL designers is the ability to design, test and tune ETL jobs in PDI’s easy-to-use graphical design environment, and then run them at scale on Spark.  This dramatically lowers the skill sets required, increases productivity, and reduces maintenance costs when to taking advantage of Spark for Big Data Integration.

Advanced Analytics on Spark

Last year Pentaho Labs introduced a distributed version of Weka, Pentaho’s machine learning and data mining platform. The goal was to develop a platform-independent approach to using Weka with very large data sets by taking advantage of distributed environments like Hadoop and Spark. Our first implementation proved out this architecture by enabling parallel, in-cluster model training with Hadoop.

Advanced Analytics on Spark

We are now working on a similar level of integration with Spark that includes data profiling and evaluating classification and regression algorithms in Spark.  The early feedback from Pentaho Labs confirms that developing solutions on Spark is faster and easier than with MapReduce. In just a couple weeks of development, we have demonstrated the ability to perform in-cluster Canopy clustering and are very close to having k-means++ working in Spark as well!

Next up: Exploring Data Science Pack Integration with MLlib

MLlib is already one of the most popular technologies for performing advanced analytics on Big Data.  By integrating Pentaho Data Integration with Spark and MLlib, Data Scientists will benefit by having an easy-to-use environment (PDI) to prepare data for use in MLlib-based solutions.  Furthermore, this integration will make it easier for IT to operationalize the work of the Data Science team by orchestrating the entire end-to-end flow from data acquisition, to data preparation, to execution of MLlib-based jobs to sharing the results, all in one simple PDI Job flow.  To get a sense for how this integration might work, I encourage you to look at a similar integration with R we recently launched as part of the Data Science Pack for Pentaho Business Analytics 5.1.

Experiment Today with Pentaho and Spark!

You can experiment with Pentaho and Spark today for both ETL and Reporting.  In conjunction with our partners at Databricks, we recently certified for the following use cases combining Pentaho and Spark:

  • Reading data from Spark as part of an ETL workflow by using Pentaho Data Integration’s Table Input step with Apache Shark (Hive SQL layer runs on Spark)
  • Reporting on Spark data using Pentaho Reporting against Apache Shark

We are excited about this first step in what we both hope to be a collaborative journey towards deeper integration.

Jake Cornelius
Sr. Vice President, Product Management
Pentaho

 


Blueprints to Big Data Success

March 21, 2014

data-refinerySeems like these days, everyone falls into one of the three categories.  You are either:

  1. Executing a big data strategy
  2. Implementing a big data strategy
  3. Taking a wait and see attitude

If you’re in one of the first two categories, good for you!  If you’re in the third, you might want re-think your strategy. Companies that get left behind will have a huge hill to climb just to catch up to the competition.

If one of your concerns with moving forward with big data has been a lack of solid guidance to help pave the path to success, then you are in luck!  Pentaho has recently released four Big Data Blueprints to help guide you through the process of executing a strategy.  These are four use cases that Pentaho has seen customers execute successfully.  So, this isn’t just marketing fluff.  These are real architectures that supports real big data business value.  The four blueprints now available on Pentaho’s website includes:

  • Optimize the data warehouse
  • Streamlined data refinery
  • Customer 360-degree view
  • Monetize my data

These blueprints will help you understand the basic architectures that will support your efforts and achieve your desired results.  If you are like many companies just getting started with big data, these are great tools to guide you through the murky waters that lie ahead.  Here is my quick guide to the four Blueprints, where you may want to get started and why.

The Big Data Blueprints

1.    Optimize the Data Warehouse
The data warehouse optimization (or sometimes referred to as data warehouse offloading or DWO) is a great starter use case for gaining experience and expertise with big data, while reducing costs and improving the analytic opportunities for end users.  The idea is to increase the amount of data being stored, but not by shoving it into the warehouse, but by adding Hadoop to house the additional data.  Once you have Hadoop in the mix, Pentaho makes it easy to move data into Hadoop from external sources, move data bi-directionally between the warehouse and Hadoop, as well as makes it easy to process data in Hadoop.  Again, this is a great place to start.  It’s not as transformative to your business as the other use cases can be, but it will build expertise and save you money.

2.    Streamlined Data Refinery
The idea behind the refinery is to provide a way to stream transaction, customer, machine, and other data from their sources through a scalable big data processing hub, where Hadoop is then used to process transformations, store data, and process analytics that can then be sent to an analytic model for reporting and analysis.  Working with several customers, we have seen this as a great next step after the DWO.

3.    Customer 360-Degree View
This blueprint is perhaps the most transformative of all the potential big data use cases. The idea here is to gain greater insight into what your customer is doing, seeing, feeling and purchasing.  All with the idea that you can then serve and retain that customer better, and attract more customers into your fold.  This blueprint lays out the architecture needed to start understanding your customer better.  It will require significant effort in accessing all the appropriate customer touch points, but the payoff can be huge.  Don’t worry too much about getting the full 360-degree view at first; starting with even one small slice can drive huge revenue and retention rates.

4.    Montetize My Data
What do you have locked up in your corporate servers, or in machines you own?  This blueprint can be as transformative as the Customer 360, in that it can create new revenue streams that you may not have ever thought about before.  In some cases, it could create a whole new business opportunity.  What ever your strategy, take time to investigate where and how you can drive new business by leveraging your data.

There are other blueprints that have been defined and developed by Pentaho, but these are four that typically make the most sense for organizations to leverage first.  Feel free to reach out to us for more information about any of these blueprints or to learn more about how Pentaho helps organizations be successful with big data.

Find out more about the big data blueprints at http://www.pentaho.com/big-data-blueprints.

Please let me know what you think @cyarbrough.

Thanks!
Chuck Yarbrough
Product Marketing, Big Data


Follow

Get every new post delivered to your Inbox.

Join 101 other followers