See you in Vegas at Splunk .conf2013

September 24, 2013


Splunk has been leading the way in the new world of Big Data for some time, and if you look at their recent earnings you will see that they are matching the marketing with solid growth. Their recent announcements like HUNK also show they are building a solid roadmap for their platform that will enable the Enterprise to embrace both the Splunk platform and the newer disruptive technologies like Hadoop well into the future.  I am looking forward to seeing more of this in Vegas later this month at .conf2013.  And yes, I do love Vegas.

We recently developed an integration available via Splunkbase to the Splunk Analytics platform. We are now able to deliver a number of business critical BI and visualization capabilities to Splunk customers. True to the Pentaho’s mission – to deliver the future of analytics – our partnership with Splunk, the leading software platform for real-time operational intelligence is another step on our path to that mission. Our alliance and integration makes it easy and cost effective for the global Splunk customer base to adopt Pentaho’s comprehensive business analytics and visualization platform to manage and gain business class insight from this highly valuable source of data.  By using our integration, business users in the Enterprise will now be able to bring the Splunk data to life in our rich visualizations, be that pixel perfect reporting, rich and dynamic dashboards and also the ability to do all this in a full BI self-service experience.  Business users need this capability in the format they can consume and Pentaho can deliver that.


Critically Pentaho can also combine Splunk data with data from other repositories for enriched context and value. Enterprises can now leverage the integration of Pentaho and Splunk to extend their big data analytics by unlocking and harnessing machine data generated by websites, applications, servers, storage, network, mobile and other devices all the way down to network and system sensors. Data can be brought into the Splunk platform from various and multiple sources, be they EDW’s, the new data repositories like Hadoop and NoSQL, This can also bring proprietary database data in the Enterprise into a place where it can be used to bring business value.

By using our Pentaho ETL capabilities we can extend Spunk’s DB connect to enable a full data integration experience to the EDW teams

We love working with the Splunk product team and are really looking forward to the Splunk .conf2013 September 30 – October 3. Besides the trip to the Hard Rock that will surely happen, we will be demonstrating our solution running on Splunk with Splunk data to the customers who will be attending this great event (look for the Pentaho booth in the sponsor hall). We will also be supporting two Splunk sessions –  I am looking forward to participating in the following panels:

  1. CXO Panel: Addressing Big Data Challenges and Opportunities led by Eddie Satterly , Chief Big Data Evangelist, Office of the CTO, Splunk – October 1, 4:30-5:30, Big Data track
  2. Big Data Ecosystem: Partner Panel, led by Brett Sheppard , Director, Product Marketing for Big Data , Splunk – October 2, 4:30-5:30, Big Data track

What is even more exciting is that we are already working with the Splunk sales teams in the field and at the customers demonstrating and proving our joint capabilities. This is where the rubber hits the road as they say and we look forward to a long and prosperous relationship with the Splunk team.

Hope to see you in Vegas!

Eddie White
EVP Business Development

Pentaho 5 has arrived with something for everyone!

September 18, 2013

I am tremendously excited to announce that Pentaho Business Analytics 5 is available for download!  This release is represents the culmination of over 30 man years of engineering effort and contains over 250 new features and improvements.  There truly is something for everyone in Pentaho 5.  If you are an end user, administrator, executive or developer I wanted to share with you what I think are the 3 top areas of improvement for you:

  1. Improving productivity for end users and administrators
  2. Empowering organizations to easily and accurately answer questions using blended big data sets
  3. Simplifying the experience for developers integrating with or embedding Pentaho Business Analytics

Improving Productivity for End Users and Administrators



18 months ago, we challenged ourselves to think deeply about the different profiles of users working with the Pentaho suite and identify the top areas where we could significantly improve our ease-of-use.  Based on the feedback from countless customer interviews and usability studies, the first thing you will notice about Pentaho 5 is a dramatically overhauled User Console.  Beyond the fresh, new, modern look and feel, we’ve introduced a new concept called “perspectives” making it easier than ever for end users to:

  • navigate between open documents
  • browse the repository
  • manage scheduled activities

Throughout the User Console, end users will enjoy numerous improvements and better feedback for common workflows such as designing dashboards or scheduling the execution of a parameterized report. Administrators will appreciate that we have consolidated all Administration capabilities directly into the User Console, enhanced security with the ability to create more specific role types with control the types of actions they can perform, and bundled a comprehensive audit mart providing out-of-the-box answers to common questions about usage patterns, performance and errors.

Analytics-ready Big Data Blending


partner logos

In the dawn of the Big Data era, a wide range of new storage and processing technologies have flooded the market, each bringing specialized characteristics to help solve the next wave of data challenges.   Pentaho has long been a leader and innovator in delivering an end-to-end platform for designing scalable and easily maintainable Big Data solutions.  Powered by the Pentaho Adaptive Big Data Layer, we’ve dramatically expanded our support for Hadoop with all new certifications for the latest distributions from Cloudera, Hortonworks, MapR and Intel.  Furthermore, we’ve integrated our complete analytics platform for use with Cloudera Impala.  Other Big Data highlights in Pentaho 5 include new integration with Splunk and dramatic ease-of-use improvements when working with NoSQL platforms such as MongoDB and Cassandra.

blendingAs organizations large and small map out their next generation data architectures, we see best practice design patterns emerging that help organizations target the appropriate data technology for each use case.  Evident in all of these design patterns is the fact that Big Data technologies are rarely information silos.  Solving common use cases such as optimizing your data warehousing architecture or performing 360 degree analysis on a customer require that all data be accessible and blended in an accurate way.  Pentaho Data Integration provides the connectivity and design ease-of-use to implement all of these emerging patterns, and with Pentaho 5 I’m excited to announce the world’s first SQL (JDBC) driver for runtime transformation.  This integration empowers data integration designers to accurately design blended data sets from across the enterprise, and put them directly in the hands of end users using tools they are already familiar with – reporting, dashboards and visual discovery – as well as predictive analytics.

Simplified Platform for OEMs and Embedders

marketplace perspective

integration samples

Finally, I’d like to highlight how this release further solidifies the Pentaho suite as the best platform for enterprises and OEMs who want to enrich their applications with better data processing or business analytics.  Pentaho 5 delivers a more customizable User Console providing developers with complete control over the menu bar and toolbar, improvements to the underlying theming engine and an all new plugin layer for adding custom perspectives.  Furthermore, we’ve dramatically simplified our service architecture by introducing a brand new REST-based API along with a rich library of integration samples and documentation to get you started.

These enhancements are just a few of the many great improvements in Pentaho 5. If you want a more in-depth overview and demonstration, register for the Pentaho 5.0 webinar on September 24th – 2 times to choose:  North America/LATAM & EMEA. You can also access great resources from videos to solutions briefs at

Jake Cornelius

SVP Products

Gearing up businesses for every stage of their journey to becoming truly data-driven

September 12, 2013

dashboard-laptopToday I’m incredibly proud to announce the availability of Pentaho Business Analytics version 5.0 – a milestone that’s as much a statement of our long-term vision as a software upgrade. As businesses battle the headwinds of global competition, they increasingly rely on data to improve customer service, grow revenue and become more efficient. As a result, demands on IT have never been greater and many are buckling under the relentless pace of big data proliferation and technological change. UK industry analyst Clive Longbottom of Quocirca said in a recent meeting with us, “the big data market is the ‘wild west’ and someone has got to go in and be the Sheriff that calms everything down!”

When we say in today’s press release that we have ‘geared up our analytics platform for the future of big data,’ we mean that we enable businesses at every stage in their journey to becoming truly data-driven. We see these stages as:

  • Storage — most companies start by moving data in and out of Hadoop stores to learn how to capture big data cost-effectively and enabled early experimentation.  Many customers are using Pentaho to support data warehouse optimization projects, moving some data out of high-performance databases into cost-effective Hadoop or MongoDB stores.
  • Visualizing and Reporting – some customers are visualizing or reporting from those big data stores, whether Hadoop or NoSQL. Pentaho 5.0 includes the industry’s first analytics platform with full support for MongoDB that doesn’t require coding, making it available to more business users.
  • Data blending – true ‘big picture’ insights happen when operational data sources are blended with big data sources. Companies that compete largely on service, in industries like telecommunications and financial services, see big data blending’s potential to help them gain market-share by providing the most personalized and interactive customer experience. Pentaho 5.0 is the first analytics platform to enable analysts to easily blend all data types ‘at the source’ and immediately report, visualize and explore for greater insights.
  • Predicting – it’s been talked about for many years, but I don’t think predictive analysis has yet reached its potential, partly because of tool complexity and partly the historical inability to easily integrate big data sources to go beyond data sampling. We are finally starting to see broader predictive use cases that involve data blending and the ability to tap into huge volumes of non-relational data in near real-time. We will continue to invest in predictive capabilities to build on Pentaho 5.0 as this is a main focus of Pentaho Labs.
  • The data-driven business – ultimately we expect innovative customers of all sizes to embed analytics into every business processes and applications, so analytics become an integral part of operations rather than separate systems they need to access for decision-making.

And finally, it’s always been our view that bringing analytics ‘to the masses’ can only be achieved when the simultaneous needs of business users and the IT departments that serve them are met. Pentaho 5.0 improves the analytics experience for everyone in modern, big data-driven businesses from CIOs responsible for data governance to business analysts exploring customer data to executives viewing dashboards on their mobile devices.

We hope you like what you see in Pentaho 5.0 and look forward to your feedback!


Pentaho 5.0 blends right in!

September 12, 2013

Dear Pentaho friends,

Ever since a number of projects joined forces under the Pentaho umbrella (over 7 years ago) we have been looking for ways to create more synergy across this complete software stack.  That is why today I’m exceptionally happy to be able to announce, not just version 5.0 of Pentaho Data Integration but a new way to integrate Data Integration, Reporting, Analyses, Dashboarding and Data Mining through one single interface called Data Blending, available in Pentaho Business Analytics 5.0 Commercial Edition

Data Blending allows a data integration user to create a transformation capable of delivering data directly to our other Pentaho Business Analytics tools (and even non-Pentaho tools).  Traditionally data is delivered to these tools through a relational database. However, there are cases where that can be inconvenient, for example when the volume of data is just too high or when you can’t wait until the database tables are updated.  This for example leads to a new kind of big data architecture with many moving parts:

Evolving Big Data Architectures

Evolving Big Data Architectures

From what we can see in use at major deployments with our customers, mixing Big Data, NoSQL and classical RDBS technologies is more the rule than the exception.

So, how did we solve this puzzle?

The main problem we faced early on was that the default language used under the covers, in just about any business intelligence user facing tool, is SQL.  At first glance it seems that the worlds of data integration and SQL are not compatible.  In DI we read from a multitude of data sources, such as databases, spreadsheets, NoSQL and Big Data sources, XML and JSON files, web services and much more.  However, SQL itself is a mini-ETL environment on its own as it selects, filters, counts and aggregates data.  So we figured that it might be easiest if we would translate the SQL used by the various BI tools into Pentaho Data Integration transformations. This way, Pentaho Data Integration is doing what it does best, not directed by manually designed transformations but by SQL.  This is at the heart of the Pentaho Data Blending solution.


The internals of Data Blending

In other words: we made it possible for you to create a virtual “database” with “tables” where the data actually comes from a transformation step.

To ensure that the “automatic” part of the data chain doesn’t become an impossible to figure out “black box”, we made once more good use of existing PDI technologies.  We’re logging all executed queries on the Data Integration server (or Carte server) so you have a full view of all the work being done:

Data Blending Transparency

Data Blending Transparency

In addition to this, the statistics from the queries can be logged and viewed in the operations data mart giving you insights into which data is queried and how often.

We sincerely hope that you like these new powerful options for Pentaho Business Analytics 5.0!



–If you want to learn more about the new features in this 5.0 release, Pentaho is hosting a webinar and demonstration on September 24th – Two options to register:  EMEA & North America time zones.

Matt Casters
Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions (Wiley)

Mondrian in Action – win a free copy

September 4, 2013


Mondrian fans your wait is over – the book, Mondrian in Action is complete and available to purchase. We sat down with the three authors to get the inside scoop about the book. Below is our interview with:

  • William (Bill) D. Back @billbackbi – Enterprise Architect and Director of Pentaho Services
  • Nicholas (Nick) Goodman @nagoodman – Business Intelligence pro who has authored training courses on OLAP and Mondrian
  • Julian Hyde @julianhyde – Mondrian Founder and the project’s lead developer

Read below to learn how to win a FREE copy of Mondrian in Action and for a special discount offer from Manning Publications.

1. What inspired you to write Mondrian in Action? 

Bill:  I was assigned to be the Mondrian “expert” on my team and started looking at the documentation.  I discovered that it was pretty spread out and some functionality was in people’s heads.  I started pulling together a set of personal documentation and thought it would be a good resource for others.  I don’t think the book will answer every edge question someone might have about Mondrian, but I do think it does a very good job of covering Mondrian and how it is used for Analysis.

Nick: It was a long overdue book and somehow had to write it!   A full decade after Julian began work on it, there still wasn’t a comprehensive guide addressing the practical soup to nuts of a Mondrian project.  When Bill floated the idea, I was happy to join the project.

Julian: As Nick says, this book is long overdue. It’s my way of saying thank you to the community who have helped Mondrian and Pentaho become successful. In the early days, people were taking a risk when they used a product with little documentation beyond what they could find on the developer mailing list. Yet many people did use it, and their projects were successful, and they wrote blog posts and articles that inspired others to try it too.

2. What is the main goal if the book? What do you aim to bring across?

Bill:  I wanted to create a single resource that people can turn to answer their questions about Mondrian.  This includes understanding how it is used by organizations to analyze their business data, how it is used by various tools, such as Pentaho Analyzer and Reporting, and how to integrate Mondrian into a user’s application.

Nick: The goal of the book was to bring together practical knowledge, from 3 of the top practitioners who have worked with Mondrian at over a hundred customers.  The mix of skills between the 3 authors meant we got diverse topics to make a comprehensive and practical book.  Julian is the code expert, Bill knows the integration/Pentaho aspects, and I have extensive customer modeling/configuration experience.  Readers should benefit from our varied experience with Mondrian!

3. What do you like so much about Mondrian to make you write a book about it? 

Bill:  I love how easy it is for someone with limited technical knowledge to analyze large amounts of data.  Reports are great, but they often provide too much data and are difficult to change to do analysis.  Mondrian lets users quickly analyze their data and make information-based decisions.  It’s a powerful concept for any business in today’s highly competitive environment.

Julian: I was inspired to write Mondrian because I wanted to bring powerful BI to a wide audience. I had used a few multidimensional databases and loved how they enabled business users to ask sophisticated questions of large data sets. But they were expensive, not open source, difficult to install and integrate with, and especially difficult to use from within a Java environment. With Mondrian, I set out to address all of these issues. The book is the logical next step. We can bring BI to a wider audience if we gather everything they need to know about Mondrian into one place.

4. Tell us more about the cover of the book – who is the man on the front and what is he doing? 

Bill:  He’s a man from Konavle, a town Southeast of Dubrovnik, Croatia.  Manning uses a variety of historical figures to demonstrate the diversity in technology.

Julian: I think the “man from Konavle” story is just a cover. I think it’s actually Bill in a false mustache and one of his more elaborate Halloween costumes!

5. What’s next?

Bill:  I’m still trying to figure this out.  I’ve got a huge backlog of things I want to research that I’ve been putting off for the last year and a half.  I have some analysis that I want to do with Mondrian based on topics I’m interested in.  I’m also interested in looking at the emerging predictive and Big Data technologies that take analysis to the next level with machine assisted analysis.

Nick: I’m taking some time off from the Big Data hype machines and staying home with my 6-month-old daughter for the next year.  After that, I’ll look at new data/analytics technologies and jump back in – Mondrian will certainly be relevant when I return. I’m  looking forward to all the great stuff Julian is doing with Optiq and the possible synergy between the two projects.  Also, I’m tinkering with a detailed time based analysis of my infants eating and sleeping schedule and am considering building a “baby-mart.”  Just kidding!

Julian: First of all, I am working on the next Mondrian release. There are lots of new features and architectural improvements, most of which are described in the book. The whole Mondrian dev team is really excited to get it to market.

Outside of Mondrian, in my Optiq project, I am also developing some ideas that will allow fast, easy access to data wherever it resides. It is easy to build an adapter to a new data source, so several projects are already using it, including Apache Drill and Cascading. Optiq could be used to extend Mondrian’s capabilities: to run natively on NoSQL databases and Hadoop just as it currently does on relational data, and to improve Mondrian’s performance by pushing more work into its distributed cache. But Optiq can solve a lot of data management problems besides OLAP, so an option I am considering is to build Optiq into a full product.


Win a free copy of Mondrian in ActionLike Pentaho on Facebook and leave a comment here about which chapter(s) you think will be most useful for you and why (you can see the full index in the book here). You also have the chance to win on Twitter by following Pentaho and tweeting your comment with the hashtag #MondrianIA. Nick, Bill & Julian will pick their favorite comment to win. Deadline to leave a comment is Monday, Sept 9 at 12pm/EST.

Manning Publications is offering an exclusive 37% discount off Mondrian in Action when you purchase through At the shopping cart, simply enter the discount code mlbackp.


Get every new post delivered to your Inbox.

Join 11,883 other followers