Top 10 Reasons Behind Pentaho’s Success

September 2, 2011

To continue our revival of old blog posts, today we have our #2 most popular blog from last July. Pentaho is now 7 years old, with sales continually move up and to the right. In a crazy economy, many are asking, “What is the reason behind your growth and success?” Richard Daley reflected on this question after reporting on quartlery results in 2010 .

*****Originally posted on July 20, 2010*****

Today we announced our Q2 results. In summary Pentaho:

  • More than doubled new Enterprise Edition Subscriptions from Q2 2009 to Q2 2010.
  • Exceeded goals resulting in Q2 being the strongest quarter in company history and most successful for the 3rd quarter in a row.
  • Became the only vendor that lets customers choose the best way to access BI: on-site, in the cloud, or on the go using an iPad.
  • Led the industry with a series of market firsts including delivering on Agile BI.
  • Expanded globally, received many industry recognitions and added several stars to our executive bench.

How did this happen? Mostly because of our laser focus over the past 5 years to build the leading end-to-end open source BI offering. But if we really look closely over the last 12-18 months there are some clear signs pointing to our success (my top ten list):

Top 10 reasons behind Pentaho’s success:

1.     Customer Value – This is the top of my list. Recent analyst reports explain how we surpassed $2 billion mark during Q2 in terms of cumulative customer savings on business intelligence and data integration license and maintenance costs. In addition, ranked #1 in terms of value for price paid and quality of consulting services amongst all Emerging Vendors.

2.     Late 2008-Early 2009 Global Recession – this was completely out of our control but it helped us significantly by forcing companies to look for lower cost BI alternatives that could deliver the same or better results than the high priced mega-vendor BI offerings. Making #1 more attractive to companies worldwide.

3.     Agile BI – we announced our Agile BI initiative in Nov 2009 and received an enormous amount of press and positive reception from the community, partners, and customers. We’ve been showing previews and releasing RCs in Q1-Q2 2010 and put PDI 4.0 in GA at the end of Q2 2009.

4.     Active Community – A major contributing factor to our massive industry adoption is our growing number of developer stars (the Pentaho army) that continue to introduce Pentaho into new BI and data integration projects. Our community triples the amount of work of our QA team, contributes leading plug-ins like CDA and PAT, writes best-selling books about our technologies and self-organizes to spread the word.

5.    BI Suite 3.5 & 3.6 – 3.5 was a huge release for the company and helped boost adoption and sales in Q3-Q4 2009. This brought our reporting up to and beyond that of competitors. In Q2 2010 the Pentaho BI Suite 3.6 GA brought this to another level including enhancements and new functionality for enterprise security, content management and team development as well as the new Enterprise Edition Data integration Server.  The 3.6 GA also includes the new Agile BI integrated ETL, modeling and data visualization environment.

6.     Analyzer – the addition of Pentaho Analyzer to our product lineup in Sept-Oct 2009 was HUGE for our users – the best web-based query and reporting product on the market.

7.     Enterprise Edition 30-Day Free Evaluation – we started this “low-touch/hassle free” approach in March 2009 and it has eliminated the pains that companies used to have to go thru in order to evaluate software.

8.     Sales Leadership – Lars Nordwall officially took over Worldwide Sales in June 2009 and by a combination of building upon the existing talent and hiring great new team members, he has put together a world-class team and best practices in place.

9.     Big Data Analytics – we launched this in May 2010 and have received very strong support and interest in this area. We currently have a Pentaho-Hadoop beta program with over 40 participants. There is a large and unfulfilled requirement for Data Integration and Analytic solutions in this space.

10.   Whole Product & Team – #1-#9 wouldn’t work unless we had all of the key components necessary to succeed – doc, training, services, partners, finance, qa, dev, vibrant community, IT, happy customers and of course a sarcastic CTO ;-)

Thanks to the Pentaho team, community, partners and customers for this great momentum. Everyone should be extremely proud with the fact that we are making history in the BI market. We have a great foundation in which to continue this rapid growth, and with the right team and passion, we’ll push thru our next phase of growth over the next 6-12 months.

Quick story to end the note:  I was talking and white boarding with one of my sons a few weeks ago (yes, I whiteboard with my kids) and he was asking certain questions about our business (how do we make money, why are we different than our competitors, etc.) and I explained at a high level how we are basically “on par and in many cases better” than the Big Guys (IBM, ORCL, SAP) with regards to product, we provide superior support/services, yet we cost about 10% as much as they do. To which my son replied, “Then why doesn’t everyone buy our product?”  Exactly.

Richard
CEO, Pentaho


High availability and scalability with Pentaho Data Integration

March 31, 2011

Experts often possess more data than judgment.” – Colin Powell….hmmm, those experts surely are not using a scalable Business Intelligence solution to optimize that data which can help them make better decisions. :-)

Data is everywhere! The amount of data being collected by organizations today is experiencing explosive growth. In general, ETL (Extract Transform Load) tools have been designed to move, cleanse, integrate, normalize and enrich raw data to make it meaningful and available for knowledge workers and decision support systems. Once data has been “optimized,” only then can it be turned into “actionable” information using the appropriate business applications or Business Intelligence software. This information could then be used to discover how to increase profits, reduce costs or even write a program that suggests what your next movie on Netflix should be. The capability to pre-process this raw-data before making it available to the masses, becomes increasingly vital to organizations who must collect, merge and create a centralized repository containing “one version of the truth.” Having an ETL solution that is always available, extensible and highly scalable is an integral part of processing this data.

Pentaho Data Integration

Pentaho Data Integration (PDI) can provide such a solution for many varying ETL needs. Built upon a open Java framework, PDI uses a metadata driven design approach that eliminates the need to write, compile or maintain code. It provides an intuitive design interface with a rich library of prepacked plug-able design components. ETL developers with skill sets that range from the novice to the Data Warehouse expert can take advantage of the robust capabilities available within PDI immediately with little to no training.

The PDI Component Stack

Creating a highly available and scalable solution with Pentaho Data Integration begins with understanding the PDI component stack.

● Spoon – IDE – for creating Jobs, Transformations including the semantic layer for BI platform
● Pan – command line tool for executing Transformations modeled in Spoon
● Kitchen – command line tool for executing Jobs modeled in Spoon
● Carte – lightweight ETL server for remote execution
● Enterprise Data Integration Server – remote execution, version control repository, enterprise security
● Java API – write your own plug-ins or integrate into your own applications

Spoon is used to create the ETL design flow in the form of a Job or Transformation on a developer’s workstation. A Job coordinates and orchestrates the ETL process with components that control file movement, scripting, conditional flow logic, notification as well as the execution of other Jobs and Transformations. The Transformation is responsible for the extraction, transformation and loading or movement of the data. The flow is then published or scheduled to the Carte or Data Integration Server for remote execution. Kitchen and Pan can be used to call PDI Jobs and Transformations from your external command line shell scripts or 3rd party programs. There is also a complete Java SDK available to integrate and embed these process into your Java applications.

Figure 1: Sample Transformation that performs some data quality and exception checks before loading the cleansed data

PDI Remote Execution and Clusters

The core of a scalable/available PDI ETL solution involves the use of multiple Carte or Data Integration servers defined as “Slaves” in the ETL process. The remote Carte servers are started on different systems in the network infrastructure and listen for further instructions. Within the PDI process, a Cluster Scheme can be defined with one Master and multiple Slave nodes. This Cluster Scheme can be used to distribute the ETL workload in parallel appropriately across these multiple systems. It is also possible to define Dynamic Clusters where the Slave servers are only known at run-time. This is very useful in cloud computing scenarios where hosts are added or removed at will. More information on this topic including load statistics can be found here in an independent consulting white paper created by Nick Goodman from Bayon Technologies, “Scaling Out Large Data Volume Processing in the Cloud or on Premise.”

Figure 2: Cx2 means these steps are executed clustered on two Slave servers
All other steps are executed on the Master server

The Concept of High Availability, Recover-ability and Scalability

Building a highly available, scalable, recoverable solution with Pentaho Data Integration can involve a number of different parts, concepts and people. It is not a check box that you simply toggle when you want to enable or disable it. It involves careful design and planning to prepare and anticipate the events that may occur during an ETL process. Did the RDBMS go down? Did the Slave node die? Did I lose network connectivity during the load? Was there a data truncation error at the database? How much data will be processed on peak times? The list can go on and on. Fortunately PDI arms you with a variety of components including complete ETL metric logging, web services and dynamic variables that can be used to build recover-ability, availability, scalability scenarios into your PDI ETL solution.

For example, Managing Consultant in EMEA, Jens Bleuel developed a PDI implementation of the popular Watchdog concept. A solution that includes checks to monitor if everything is on track is using the concept of a Watchdog when executing its tasks and events. Visit the link above for more information on this implementation.

 

 

Putting it all together – (Sample)

Diethard Steiner, active Pentaho Community member and contributor, has written an excellent tutorial that explains how to set up PDI ETL remote execution using the Carte server. He also provides a complete tutorial (including sample files provided by Matt Casters, Chief Architect and founder of Kettle) on setting up a simple “available” solution to process files, using Pentaho Data Integration. You can get it here. Please note that advanced topics such as this are also covered in greater detail (designed by our Managing Consultant Jens Bleuel – EMEA) in our training course available here.

Summary

When attempting to process the vast amounts of data collected on a daily basis, it is critical to have a Data Integration solution that is not only easy to use but easily extendable. Pentaho Data Integration achieves this extensibility with its open architecture, component stack and object library which can be used to build a scalable and highly available ETL solution without exhaustive training and no code to write, compile or maintain.

Happy ETLing.

Regards,

Michael Tarallo
Senior Director of Sales Engineering
Pentaho

This blog was originally published on the Pentaho Evaluation Sandbox. A comprehensive resource for evaluating and testing Pentaho BI.


Happy 1 year anniversary Pentaho Agile BI!

December 8, 2010

Today we celebrate the one year anniversary of launching our Pentaho Agile BI initiative. Over the past year the Pentaho Agile BI initiative has received tremendous traction and praise throughout the industry and, more importantly, has impacted organizations in a positive way.

We thought an appropriate gift for our community is something that you would find insightful. For you, we have three new white papers that cover Pentaho Agile BI from different perspectives:

  • Pentaho Agile BI: An iterative methodology for fast, flexible and cost-effective Business Intelligence, by Pentaho Chief Geek/CTO, James Dixon
  • Business Intelligence at the Crossroads: The Need for Lean, Agile and Effective End-User Solutions, by Joshua Greenbaum, principal, Enterprise Applications Consulting
  • Realizing the Agile BI Opportunity, by Joshua Greenbaum, principal, Enterprise Applications Consulting

You can download the complimentary Agile BI white papers here.


Now available: Pentaho Kettle Solutions

September 21, 2010

Congrats to Matt Casters, Roland Bourman and Jos van Dongen for their new book available today: Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration. You can buy it on Amazon or Wiley (they also have free exerpts)

If their names sound familiar it is because Roland and Jos are also the authors of the Amazon best seller – Pentaho Solutions: Business intelligence and Data Warehousing with Pentaho and MySQL published August 2009. Matt is the Kettle Project founder and Chief Architect of Pentaho Data Integration.

My copy is ordered and I can’t wait to read it.

Congrats
Doug Moran
Pentaho Community Guy


Top 10 reasons behind Pentaho’s success

July 20, 2010

Today we announced our Q2 results. In summary Pentaho:

  • More than doubled new Enterprise Edition Subscriptions from Q2 2009 to Q2 2010.
  • Exceeded goals resulting in Q2 being the strongest quarter in company history and most successful for the 3rd quarter in a row.
  • Became the only vendor that lets customers choose the best way to access BI: on-site, in the cloud, or on the go using an iPad.
  • Led the industry with a series of market firsts including delivering on Agile BI.
  • Expanded globally, received many industry recognitions and added several stars to our executive bench.

How did this happen? Mostly because of our laser focus over the past 5 years to build the leading end-to-end open source BI offering. But if we really look closely over the last 12-18 months there are some clear signs pointing to our success (my top ten list):

Top 10 reasons behind Pentaho’s success:

1.     Customer Value – This is the top of my list. Recent analyst reports explain how we surpassed $2 billion mark during Q2 in terms of cumulative customer savings on business intelligence and data integration license and maintenance costs. In addition, ranked #1 in terms of value for price paid and quality of consulting services amongst all Emerging Vendors.

2.     Late 2008-Early 2009 Global Recession – this was completely out of our control but it helped us significantly by forcing companies to look for lower cost BI alternatives that could deliver the same or better results than the high priced mega-vendor BI offerings. Making #1 more attractive to companies worldwide.

3.     Agile BI – we announced our Agile BI initiative in Nov 2009 and received an enormous amount of press and positive reception from the community, partners, and customers. We’ve been showing previews and releasing RCs in Q1-Q2 2010 and put PDI 4.0 in GA at the end of Q2 2009.

4.     Active Community – A major contributing factor to our massive industry adoption is our growing number of developer stars (the Pentaho army) that continue to introduce Pentaho into new BI and data integration projects. Our community triples the amount of work of our QA team, contributes leading plug-ins like CDA and PAT, writes best-selling books about our technologies and self-organizes to spread the word.

5.    BI Suite 3.5 & 3.6 – 3.5 was a huge release for the company and helped boost adoption and sales in Q3-Q4 2009. This brought our reporting up to and beyond that of competitors. In Q2 2010 the Pentaho BI Suite 3.6 GA brought this to another level including enhancements and new functionality for enterprise security, content management and team development as well as the new Enterprise Edition Data integration Server.  The 3.6 GA also includes the new Agile BI integrated ETL, modeling and data visualization environment.

6.     Analyzer – the addition of Pentaho Analyzer to our product lineup in Sept-Oct 2009 was HUGE for our users – the best web-based query and reporting product on the market.

7.     Enterprise Edition 30-Day Free Evaluation – we started this “low-touch/hassle free” approach in March 2009 and it has eliminated the pains that companies used to have to go thru in order to evaluate software.

8.     Sales Leadership – Lars Nordwall officially took over Worldwide Sales in June 2009 and by a combination of building upon the existing talent and hiring great new team members, he has put together a world-class team and best practices in place.

9.     Big Data Analytics – we launched this in May 2010 and have received very strong support and interest in this area. We currently have a Pentaho-Hadoop beta program with over 40 participants. There is a large and unfulfilled requirement for Data Integration and Analytic solutions in this space.

10.   Whole Product & Team – #1-#9 wouldn’t work unless we had all of the key components necessary to succeed – doc, training, services, partners, finance, qa, dev, vibrant community, IT, happy customers and of course a sarcastic CTO ;-)

Thanks to the Pentaho team, community, partners and customers for this great momentum. Everyone should be extremely proud with the fact that we are making history in the BI market. We have a great foundation in which to continue this rapid growth, and with the right team and passion, we’ll push thru our next phase of growth over the next 6-12 months.

Quick story to end the note:  I was talking and white boarding with one of my sons a few weeks ago (yes, I whiteboard with my kids) and he was asking certain questions about our business (how do we make money, why are we different than our competitors, etc.) and I explained at a high level how we are basically “on par and in many cases better” than the Big Guys (IBM, ORCL, SAP) with regards to product, we provide superior support/services, yet we cost about 10% as much as they do. To which my son replied, “Then why doesn’t everyone buy our product?”  Exactly.

Richard
CEO, Pentaho


Where is Pentaho this July?

July 12, 2010

This July, Pentaho continues the Worldwide Techcast Series demonstrating how Pentaho’s Agile BI initiative will help you speed development of new BI applications and better ensure that these applications meet the needs of your business users. Learn about Pentaho Data Integration 4.0 and Agile BI in six languages: Italian, German, Portuguese, Spanish, French and Norwegian. Sign-up for a live techcast this month or watch the series on-demand.

We are also holding an executive breakfast the city where high-tech meets southern hospitality, Raleigh, North Carolina on July 20th at EvoApp Live. If you are in the ‘Triangle’ make sure to reserve your seat for this interactive panel discussion about how business analytics are driving top preforming companies to make better, faster, more informed decisions.

Pentaho Featured Events

The highlighted events for the month of July are webcast for those looking to learn more about OSBI and simple ways to get started. These webcast are free, however, early registration is recommended.

Comparing the Cost of Business Intelligence – Proprietary Vs. Open Source.
With leading analyst Mark Madsen
July 13 at 14:00 EDT (18:00 GMT)
Register Now

* If you cannot attend the webcast on July 13, register for the July 22 webcast.
* For additional background to this webcast download Mark Madsen’s latest white paper, Lowering the cost of Business Intelligence with Open Source.

BI On-Demand – See your data in a dashboard in just 3 days!
Wednesday, July 28, 2010 14:00 EDT (18:00 GMT)
Register Now

Visit our events page for more details and updated events.
Follow-us on Twitter or Facebook for event reminders and updates.


Top 3 reasons to download PDI 4.0 and BI Suite 3.6

June 10, 2010

Today we announced General Availability of Pentaho Data Integration 4.0 and Pentaho BI Suite 3.6.  This marks a major step forward on our mission to empower users with the most comprehensive, easy-to-use, and integrated Business Intelligence suite on the market today.  All too often, new BI projects end up on IT’s cutting room floor due a variety of factors:

  • Licensing costs – it’s too expensive to add the additional users/CPUs for the project to reach its intended audience.  A single BI project can have licensing impacts on several software products including data integration (ETL), reporting, dashboards, analytics, database, and on and on…
  • Lengthy time to ROI – a classic pitfall of new BI projects is spending weeks or months ‘getting the data right’ before business users have an opportunity interact and provide feedback.  Inevitably, this leads to missed requirements, delayed rollouts, and blown budgets
  • Technical Resources – do you have the right technical resources available and are they knowledgeable in all of the tools and technology involved?  Am I beholden to availability of IT, or can I accomplish this myself?

Our Agile BI initiative is breaking down these barriers by delivering a BI Suite with:

  • All of the functionality you actually need at 20% the cost of a comparable solution from the big guys… bye bye bloat-ware
  • Integrated design environment combining all aspects of a BI solution from Data Integration (ETL) through data visualization; thereby encouraging collaborative, cross team interactions between solution architects and end users and faster iterations… compare that with a with a hybrid Informatica/Data Stage – Oracle/Business Objects/Cognos solution
  • A modern, standards based architecture that deploys in minutes and is easy to customize or extend to meet the changing needs of your business… guaranteed 97% less super glue and duct tape under the hood than comparable proprietary BI Suites

So you’re thinking… enough with the marketing bullets.  Why should I download or upgrade to these new releases? The top 3 reasons:

1. Design Perspectives – New perspectives in PDI’s designer (Spoon) providing one click visualization of data and simple, drag-and-drop metadata modeling for OLAP and reporting metadata

2. PDI Enterprise Edition Data Integration ServerAll the execution and clustering capabilities of our core offering plus integrated scheduling, advanced security, and enhanced content management including complete revision history on all of your jobs and transformations

3. User Console improvements – There are numerous improvements to our visualization plugins including:

  • Support for scheduling, emailing, member properties and dashboard filter integration in Analyzer
  • Configurable auto-refresh intervals on dashboards and dashboard widgets and integration of Dashboard filters with Pentaho metadata data sources

Get started today by downloading your copy from http://www.pentaho.com/download/

Jake Cornelius
Director of Product Management
Pentaho Corporation


Keep it simple

June 9, 2010

Top 5 advances in ETL and BI in the past 15 years:

  1. Near Real-time Data Warehouses
  2. Dashboards
  3. Drag and Drop ETL
  4. Ad hoc Analytics
  5. Metadata Driven ETL

What is missing from the list above? That would be the fact that BI use has not penetrated into everyday business life like we would have expected.  But wasn’t that the point that we were all heading towards, that of BI for the masses?  That goal of information transparency that everyone was looking for seems to have gotten lost by most vendors in the race to have the biggest and baddest feature list.  Very powerful BI technology exists today to get at pretty much any data and slice and dice it in ways that we could have only imagined years ago. Yet, the complexity and disparity of that technology continues to be a major roadblock to getting everyday business managers the information they need to best run their businesses.  Now, add in the ever changing, dynamic nature of business today, we find the chasm between the IT department and business users largely intact with business users seeking more and more self service BI on their own.

That is why we set out on our Agile BI initiative, to solve these obvious problems that other vendors ignore.  The market and competitive response to our Agile BI initiative has been fun to watch.  Suddenly, lots of competitors are talking about how to make their technology more agile and industry analysts are again writing about agile as well.

Unfortunately, competitors miss the point.  Pentaho’s Agile BI initiative looks to make things simpler, not more complex.  This isn’t about adding more technology to the mix, this is about using the technology that we already have in more agile and elegant ways so that we can bridge the chasm between IT and business users.  It’s not about long winded explanations of technology infrastructure that only technology geniuses can understand, it’s about opening up the BI process so that IT and business users can collaborate and deploy business relevant BI application quicker.

Isn’t that the real point of BI, getting at more information quicker?

Keep it simple folks.

Joe Nicholson
VP, Product Marketing
Pentaho Corporation


Six reasons why Pentaho’s support of Apache Hadoop is great news for ‘big data’

May 19, 2010

Earlier today Pentaho announced support for Apache Hadoop – read about it here.

There are many reasons we are doing this:

  1. Hadoop lacks graphical design tools – Pentaho provides plug-able design tools.
  2. Hadoop is Java -  Pentaho’s technologies are Java.
  3. Hadoop needs embedded ETL – Pentaho Data Integration is easy to embed.
  4. Pentaho’s open source model enables us to provide technology with great price/performance.
  5. Hadoop lacks visualization tools – Pentaho has those
  6. Pentaho provides a full suite of ETL, Reporting, Dashboards, Slice ‘n’ Dice Analysis, and Predictive Analytics/Machine Learning

The thing is, taking all of these in combination, Pentaho is the only technology that satisfies all of these points.

You can see a few of the upcoming integration points in the demo video (above). The ones shown in the video are only a few of the many integration points we are going to deliver.

Most recently I’ve been working on integrating the Pentaho suite with the Hive database. This enables desktop and web-based reporting, integration with the Pentaho BI platform components, and integration with Pentaho Data Integration. Between these use cases, hundreds of different components and transformation steps can be combined in thousands of different ways with Hive data. I had to make some modifications to the Hive JDBC driver and we’ll be working with the Hive community to get these changes contributed. These changes are the minimal changes required to get some of the Pentaho technologies working with Hive. Currently the changes are in a local branch of the Hive codebase. More specifically they are a ‘Short-term Rapid-Iteration Minimal Patch’ fork – a SHRIMP Fork.

Technically, I think the most interesting Hive-related feature so far is the ability to call an ETL process within a SQL statement (as a Hive UDF). This enables all kinds of complex processing and data manipulation within a Hive SQL statement.

There are many more Hadoop-related ETL and BI features and tools to come from Pentaho.  It’s gonna be a big summer.

James Dixon
Chief Geek
Pentaho Corporation

Learn more - watch the demo



You are not alone

March 31, 2010

I’m a happy guy today. Not only was I able to make it onto the lake this morning for some early morning water skiing, I also just reviewed the results and feedback from Pentaho’s PDI 4.0 launch yesterday. This release was a record setter for us on a number of fronts. Here are a few highlights:

  • Over 2,600  PDI 4.0 EE downloads in one day
  • We had 1,850* people registered for the live WebEx today, ‘Pentaho Defines a Better Way to Build BI Solutions’ – the most people we’ve ever had!
  • Pentaho.com reached an all time high for unique visitors.
  • Forum activity is up 251% over normal days
  • Demo.pentaho.com is up 192%
  • Great coverage and positive response from industry press and the community on Twitter.

To me this proves that the market is hungry for a new approach to an old idea. If you are one of the people looking for a BI solution that is quicker to build, easier to adapt and faster time to value, you can see that you are not alone.

*If you were not one of the 1,850 that attended the live webinar, ‘Pentaho Defines a Better Way to Build BI Solutions,’ you can watch the replay here and access additional resources and recordings.

If you were one of the 1,850 that attended the webinar– what did you think? What were some of your key takeaways?


Follow

Get every new post delivered to your Inbox.

Join 88 other followers