Pentaho’s first mover status in the big data space

February 8, 2012

I am very excited to share the news the Pentaho is cited as the only  ‘Strong Performer’ in The Forrester Wave™: Enterprise Hadoop Solutions, Q1 2012 (February 2012).

The best way to summarize Pentaho’s position in the market is straight from the report by James G. Kobielus, Senior Analyst, Forrester Research:

“Pentaho is a strong performer with an impressive Hadoop data integration tool. Among data integration vendors that have added Hadoop functionality to their products over the past year, it has the richest functionality and the most extensive integration with open source Apache Hadoop.”

We believe that the inclusion of the Pentaho Kettle data integration product in the first Forrester Wave: Enterprise Hadoop Solutions report is a strong testimonial to Pentaho’s first mover status in this space. The scores awarded to Pentaho are a testimonial to the fact that Pentaho is helping companies operationalize Big Data by addressing critical pain points associated with Hadoop Data Integration and easy analysis of Big Data.

I encourage you to access the full report and see why Pentaho was named a strong performer and how we stack up against other vendors.

Richard

About the report
The Forrester Wave™: Enterprise Hadoop Solutions, Q1 2012 included 13 enterprise Hadoop solution providers in its assessment for its Enterprise Hadoop Solutions report. The firm examined past research and user needs assessment and conducted vendor and expert interviews in order to formulate a comprehensive set of 15 evaluation criteria, which Forrester grouped into the three high level buckets of current offering, strategy and market presence.


Top 10 Reasons Behind Pentaho’s Success

September 2, 2011

To continue our revival of old blog posts, today we have our #2 most popular blog from last July. Pentaho is now 7 years old, with sales continually move up and to the right. In a crazy economy, many are asking, “What is the reason behind your growth and success?” Richard Daley reflected on this question after reporting on quartlery results in 2010 .

*****Originally posted on July 20, 2010*****

Today we announced our Q2 results. In summary Pentaho:

  • More than doubled new Enterprise Edition Subscriptions from Q2 2009 to Q2 2010.
  • Exceeded goals resulting in Q2 being the strongest quarter in company history and most successful for the 3rd quarter in a row.
  • Became the only vendor that lets customers choose the best way to access BI: on-site, in the cloud, or on the go using an iPad.
  • Led the industry with a series of market firsts including delivering on Agile BI.
  • Expanded globally, received many industry recognitions and added several stars to our executive bench.

How did this happen? Mostly because of our laser focus over the past 5 years to build the leading end-to-end open source BI offering. But if we really look closely over the last 12-18 months there are some clear signs pointing to our success (my top ten list):

Top 10 reasons behind Pentaho’s success:

1.     Customer Value – This is the top of my list. Recent analyst reports explain how we surpassed $2 billion mark during Q2 in terms of cumulative customer savings on business intelligence and data integration license and maintenance costs. In addition, ranked #1 in terms of value for price paid and quality of consulting services amongst all Emerging Vendors.

2.     Late 2008-Early 2009 Global Recession – this was completely out of our control but it helped us significantly by forcing companies to look for lower cost BI alternatives that could deliver the same or better results than the high priced mega-vendor BI offerings. Making #1 more attractive to companies worldwide.

3.     Agile BI – we announced our Agile BI initiative in Nov 2009 and received an enormous amount of press and positive reception from the community, partners, and customers. We’ve been showing previews and releasing RCs in Q1-Q2 2010 and put PDI 4.0 in GA at the end of Q2 2009.

4.     Active Community – A major contributing factor to our massive industry adoption is our growing number of developer stars (the Pentaho army) that continue to introduce Pentaho into new BI and data integration projects. Our community triples the amount of work of our QA team, contributes leading plug-ins like CDA and PAT, writes best-selling books about our technologies and self-organizes to spread the word.

5.    BI Suite 3.5 & 3.6 – 3.5 was a huge release for the company and helped boost adoption and sales in Q3-Q4 2009. This brought our reporting up to and beyond that of competitors. In Q2 2010 the Pentaho BI Suite 3.6 GA brought this to another level including enhancements and new functionality for enterprise security, content management and team development as well as the new Enterprise Edition Data integration Server.  The 3.6 GA also includes the new Agile BI integrated ETL, modeling and data visualization environment.

6.     Analyzer – the addition of Pentaho Analyzer to our product lineup in Sept-Oct 2009 was HUGE for our users – the best web-based query and reporting product on the market.

7.     Enterprise Edition 30-Day Free Evaluation – we started this “low-touch/hassle free” approach in March 2009 and it has eliminated the pains that companies used to have to go thru in order to evaluate software.

8.     Sales Leadership – Lars Nordwall officially took over Worldwide Sales in June 2009 and by a combination of building upon the existing talent and hiring great new team members, he has put together a world-class team and best practices in place.

9.     Big Data Analytics – we launched this in May 2010 and have received very strong support and interest in this area. We currently have a Pentaho-Hadoop beta program with over 40 participants. There is a large and unfulfilled requirement for Data Integration and Analytic solutions in this space.

10.   Whole Product & Team – #1-#9 wouldn’t work unless we had all of the key components necessary to succeed – doc, training, services, partners, finance, qa, dev, vibrant community, IT, happy customers and of course a sarcastic CTO ;-)

Thanks to the Pentaho team, community, partners and customers for this great momentum. Everyone should be extremely proud with the fact that we are making history in the BI market. We have a great foundation in which to continue this rapid growth, and with the right team and passion, we’ll push thru our next phase of growth over the next 6-12 months.

Quick story to end the note:  I was talking and white boarding with one of my sons a few weeks ago (yes, I whiteboard with my kids) and he was asking certain questions about our business (how do we make money, why are we different than our competitors, etc.) and I explained at a high level how we are basically “on par and in many cases better” than the Big Guys (IBM, ORCL, SAP) with regards to product, we provide superior support/services, yet we cost about 10% as much as they do. To which my son replied, “Then why doesn’t everyone buy our product?”  Exactly.

Richard
CEO, Pentaho


Facebook and Pentaho Data Integration

July 15, 2011

Social Networking Data

Recently, I have been asked about Pentaho’s product interaction with social network providers such as Twitter and Facebook. The data stored within these “social graphs” can provide its owners with critical metrics around their content. By analyzing trends within user growth and demographics as well as consumption and creation of content…owners and developers are better equipped to improve their business with Facebook and Twitter. Social networking data can already be viewed and analyzed utilizing existing tools such as FB Insights or even purchasable 3rd party software packages created specifically for this purpose. Now…Pentaho Data Integration in its traditional sense is an ETL (Extract Transform Load) tool. It can be used to extract and extrapolate data from these services and merge or consolidate it with other relative company data. However, it can also be used to automatically push information about a company’s product or service to the social network platforms. You see this in action today if you have ever used Facebook and “Liked” something a company had to offer. At regular intervals, you will sometimes note unsolicited product offers and advertisements posted to your wall from those companies. A great and cost effective way to advertise to the masses.

Application Programming Interface

Interacting with these systems is made possible because they provide an API. (Application Programming Interface) To keep it simple, a developer can write a program in “some language” to run on one machine which communicates with the social networking system on another machine. The API can leverage a 3GL such as Java or JavaScript or even simpler, RESTful services. At times, software developers/vendors will write connectors in the native API that can be distributed and used in many software applications. These connectors can offer a quicker and easier approach than writing code alone. It may be possible within the next release of Pentaho Data Integration, that an out of the box Facebook and/or Twitter transformation step is developed – but until then the RESTful APIs provided work just fine with the simple HTTP POST step.  Using Pentaho Data Integration with this out of the box component, allows quick access to social network graph data. It can also provide the ability to push content to those applications such as Facebook and Twitter without writing any code or purchasing a separate connector.

The Facebook Graph API

Both Facebook and Twitter provide a number of APIs, one worth mentioning is the Facebook Graph API (don’t worry Twitter, I’ll get back to you in my next blog entry).

The Graph API is a RESTful service that returns a JSON response. Simply stated an HTTP request can initiate a connection with the FB systems and publish / return data that can then be parsed with a programming language or even better yet – without programing using Pentaho Data Integration and its JSON input step.

Since the FB Graph API provides both data access and publish capabilities across a number of objects (photos, events, statuses, people pages) supported in the FB Social graph, one can leverage both automated push and pull capabilities.

If you are interested in giving this a try or seeing this in action, take a look at this tutorial available on the Pentaho Evaluation Sandbox.

Kind Regards,

Michael Tarallo
Director of Enterprise Solutions
Pentaho


What does the economic buyer of Agile BI know that you don’t?

May 18, 2011

Agile is a familiar term to product managers and software developers who are looking to build rapid prototypes and iterate through software development life cycles fairly quickly and effectively.

With recent market trends, Agile has now made it to the agenda of the economic BI buyer. If you are a CFO, CIO, or CEO, and have been hearing about Agile BI in the industry, you are probably looking to quantify the benefits of Agile BI in terms of direct cost savings.

As a CxO you know that your Business Intelligence costs are mainly driven by these 4 areas:

  1. License acquisition costs
  2. Skill development and training
  3. Project deployment duration and man hours
  4. Ongoing cost of change management once the solution is deployed

The question is whether Agile BI can save you costs in any of these categories? While Agile BI can immediately imply faster deployment of the BI solution (#3 above), in Pentaho we add value in all the 4 areas. Here is how:

  1. Consolidation of licenses: Any BI implementation requires some form of Data Integration, Data Warehousing/Data Mart development, and Data Visualization (Reports, Analysis, and Dashboards). Current BI vendors in the market have disparate products for each of these areas, and offer each product at a separate license acquisition and maintenance cost. Pentaho provides great value in this area as it includes all these components in “one” subscription to Pentaho BI Suite Enterprise Edition, giving you an ultimate price tag that is by far a fraction of the cost of other BI tools in the market.
  2. Collapsing skill sets into one: Each specialized tool mentioned above also requires a set of highly trained staff. In a traditional BI project, a crew of ETL Developers, DBAs, Data Modelers, and BI Developers were involved, each building one piece of the big puzzle. An all in one tool such as Pentaho BI Suite EE offers “one” single integrated tool set for all areas of BI development. This enables organizations to collapse the diverse skill sets into one. This level of self-sufficiency reduces the amount of IT staff that needs to be on board for building and maintaining a successful BI program.
  3. Rapid deployment – Pentaho offers an Agile Development Environment, as part of its BI Suite EE. This integrated data integration and business intelligence development environment turns data to decision in a matter of days as opposed to months/ years. Interactive data explorations and visualizations for slicing and dicing data across multiple sources are instantly auto-generated using this tool. Unlike a waterfall approach, this tool allows business and technical teams to build quick prototypes, and iterate upon that all within a unified workspace that empowers sharing, collaboration, and rapid results.
  4. Rapid change management – The need for quick turnarounds when adding additional business metrics or changing existing ones is a reality in BI deployments. When disparate tools are used, adding a new data source, or changing a metric, can take a long time. With Agile BI Development Environment, unique to Pentaho, any change to ETL flows or to the business semantic layer, is automatically reflected in the visualization layer (Reporting, Analysis, Dashboards). This helps organizations to quickly incorporate changes and adjust their BI solution to current business requirements, without long wait times and IT bottleneck delays.

Ready to start saving? How about this….try the Agile BI functionality of Pentaho BI Suite or Pentaho Data Integration for FREE (30-day supported enterprise evaluation). Ready now?

Farnaz Erfan
Product Marketing Manager
Pentaho Corporation


High availability and scalability with Pentaho Data Integration

March 31, 2011

Experts often possess more data than judgment.” – Colin Powell….hmmm, those experts surely are not using a scalable Business Intelligence solution to optimize that data which can help them make better decisions. :-)

Data is everywhere! The amount of data being collected by organizations today is experiencing explosive growth. In general, ETL (Extract Transform Load) tools have been designed to move, cleanse, integrate, normalize and enrich raw data to make it meaningful and available for knowledge workers and decision support systems. Once data has been “optimized,” only then can it be turned into “actionable” information using the appropriate business applications or Business Intelligence software. This information could then be used to discover how to increase profits, reduce costs or even write a program that suggests what your next movie on Netflix should be. The capability to pre-process this raw-data before making it available to the masses, becomes increasingly vital to organizations who must collect, merge and create a centralized repository containing “one version of the truth.” Having an ETL solution that is always available, extensible and highly scalable is an integral part of processing this data.

Pentaho Data Integration

Pentaho Data Integration (PDI) can provide such a solution for many varying ETL needs. Built upon a open Java framework, PDI uses a metadata driven design approach that eliminates the need to write, compile or maintain code. It provides an intuitive design interface with a rich library of prepacked plug-able design components. ETL developers with skill sets that range from the novice to the Data Warehouse expert can take advantage of the robust capabilities available within PDI immediately with little to no training.

The PDI Component Stack

Creating a highly available and scalable solution with Pentaho Data Integration begins with understanding the PDI component stack.

● Spoon – IDE – for creating Jobs, Transformations including the semantic layer for BI platform
● Pan – command line tool for executing Transformations modeled in Spoon
● Kitchen – command line tool for executing Jobs modeled in Spoon
● Carte – lightweight ETL server for remote execution
● Enterprise Data Integration Server – remote execution, version control repository, enterprise security
● Java API – write your own plug-ins or integrate into your own applications

Spoon is used to create the ETL design flow in the form of a Job or Transformation on a developer’s workstation. A Job coordinates and orchestrates the ETL process with components that control file movement, scripting, conditional flow logic, notification as well as the execution of other Jobs and Transformations. The Transformation is responsible for the extraction, transformation and loading or movement of the data. The flow is then published or scheduled to the Carte or Data Integration Server for remote execution. Kitchen and Pan can be used to call PDI Jobs and Transformations from your external command line shell scripts or 3rd party programs. There is also a complete Java SDK available to integrate and embed these process into your Java applications.

Figure 1: Sample Transformation that performs some data quality and exception checks before loading the cleansed data

PDI Remote Execution and Clusters

The core of a scalable/available PDI ETL solution involves the use of multiple Carte or Data Integration servers defined as “Slaves” in the ETL process. The remote Carte servers are started on different systems in the network infrastructure and listen for further instructions. Within the PDI process, a Cluster Scheme can be defined with one Master and multiple Slave nodes. This Cluster Scheme can be used to distribute the ETL workload in parallel appropriately across these multiple systems. It is also possible to define Dynamic Clusters where the Slave servers are only known at run-time. This is very useful in cloud computing scenarios where hosts are added or removed at will. More information on this topic including load statistics can be found here in an independent consulting white paper created by Nick Goodman from Bayon Technologies, “Scaling Out Large Data Volume Processing in the Cloud or on Premise.”

Figure 2: Cx2 means these steps are executed clustered on two Slave servers
All other steps are executed on the Master server

The Concept of High Availability, Recover-ability and Scalability

Building a highly available, scalable, recoverable solution with Pentaho Data Integration can involve a number of different parts, concepts and people. It is not a check box that you simply toggle when you want to enable or disable it. It involves careful design and planning to prepare and anticipate the events that may occur during an ETL process. Did the RDBMS go down? Did the Slave node die? Did I lose network connectivity during the load? Was there a data truncation error at the database? How much data will be processed on peak times? The list can go on and on. Fortunately PDI arms you with a variety of components including complete ETL metric logging, web services and dynamic variables that can be used to build recover-ability, availability, scalability scenarios into your PDI ETL solution.

For example, Managing Consultant in EMEA, Jens Bleuel developed a PDI implementation of the popular Watchdog concept. A solution that includes checks to monitor if everything is on track is using the concept of a Watchdog when executing its tasks and events. Visit the link above for more information on this implementation.

 

 

Putting it all together – (Sample)

Diethard Steiner, active Pentaho Community member and contributor, has written an excellent tutorial that explains how to set up PDI ETL remote execution using the Carte server. He also provides a complete tutorial (including sample files provided by Matt Casters, Chief Architect and founder of Kettle) on setting up a simple “available” solution to process files, using Pentaho Data Integration. You can get it here. Please note that advanced topics such as this are also covered in greater detail (designed by our Managing Consultant Jens Bleuel – EMEA) in our training course available here.

Summary

When attempting to process the vast amounts of data collected on a daily basis, it is critical to have a Data Integration solution that is not only easy to use but easily extendable. Pentaho Data Integration achieves this extensibility with its open architecture, component stack and object library which can be used to build a scalable and highly available ETL solution without exhaustive training and no code to write, compile or maintain.

Happy ETLing.

Regards,

Michael Tarallo
Senior Director of Sales Engineering
Pentaho

This blog was originally published on the Pentaho Evaluation Sandbox. A comprehensive resource for evaluating and testing Pentaho BI.


Sex
 & 
Sizzle 
– 
not
 without
 plumbing

November 16, 2010

What sells BI software? Sex and Sizzle! What makes BI projects successful? All of the data work done before any grids or graphs are ever produced. It’s the side of the business most BI vendors don’t talk about as they’d rather just woo and wow people with flash charts and glossy dashboards. Not that there is anything wrong with that – who doesn’t like great looking output? But, if the backend plumbing is either too complicated or non-existent, then it doesn’t matter how sexy this stuff is.

Today Pentaho announced the Pentaho Enterprise Data Services Suite to help make the “plumbing” as easy and efficient as possible. We’ve enabled people to iteratively get from raw data–from virtually any source–all the way through to metadata and onto visualization in less than an hour. We’ve enabled a new set of users to accomplish this by taking away many of the complexities.

In about 80% of the use cases we encounter, our customers want to quickly create and perform analytics on the fly, do this in an iterative approach, and when satisfied put their projects into production. You shouldn’t need a Ph.D in Data Warehousing to accomplish this, nevertheless many tools require extensive knowledge of DW methodologies and practices. It is fine to demand this knowledge with larger Enterprise DWs (EDW) but why make everyone pay the price – both in terms of software cost and experience/training required.

Now it would be one thing to provide data integration with RDBMSs, another thing to integrate with ROLAP, and yet another to integrate with Big Data like Hadoop, but how nice would it be to have a single Data Integration and Business Intelligence platform to work for all of these? Almost as nice as the Florida Gators winning a national championship but we won’t have to worry about that in 2010…had to digress for a moment.

A big part of our product release today centers around Pentaho for Hadoop integration including the GA for Pentaho Data Integration and BI Suite for Hadoop. Big Data and the whole ”data explosion” trend is just starting, so if you aren’t there today, give it time and know that Pentaho is already positioned to help in these use cases.

Pentaho allows you to start down an easy path with Agile BI and then scale up to EDW when and if necessary with enterprise data services. Our engineering team and community have spent significant time and effort to bring these services to market, and today is the official release. Please take a few minutes to read up on the new Pentaho Enterprise Data Services Suite and attend the launch webcast. Or, go ahead and download the Pentaho Enterprise Data Services Suite and start making easier, faster, better decisions.

Richard


Data, Data, Data

October 12, 2010

It’s everywhere and expanding exponentially every day. But it might as well be a pile of %#$& unless you can turn all of that data into information. And do so in a timely, efficient and cost-effective manner.  The old-school vendors don’t operate in a timely (everything is slow), efficient (everything is over-engineered, over-analyzed, over-staffed, etc) or cost-effective mode (the bloated supertanker needs feeding and the customer gets to pay for those inefficiencies), so that means new technologies and business models will drive innovation which ultimately serves the customers and communities.

Back to Data, Data, Data – Enter open source technologies like Hadoop and Pentaho BI/DI to drive next gen big data analytics to the market. Hadoop and Pentaho have both been around about 5 years, are both driven by very active communities, and have both been experiencing explosive growth over the last 18 months. Our community members are the ones who came up with the original integration points for the two techs, not because it was a fun, science project thing to do but because they had real business pains they were trying to solve. This all started in 2009 – we started development in 09, we launched our beta program in June 2010 (had to cap enrollment in the beta program at 60), launched a Pentaho for Hadoop roadshow (which was oversubscribed) and are now announcing the official release of Pentaho Data Integration and BI Suite for Hadoop.

I’m in NYC today at Hadoop World and we’re making four announcements:

  1. Pentaho for Hadoop – our Pentaho BI Suite and Pentaho Data Integration are now both integrated with Hadoop
  2. Partnership with Amazon Web Services – Pentaho for Hadoop now supports Amazon Elastic Map Reduce (EMR) and S3
  3. Partnership with Cloudera – Pentaho for Hadoop will support certified versions of Cloudera’s Distribution for Hadoop (CDH)
  4. Partnership with Impetus – a major Solutions Provider (over 1,000 employees) with a dedicated Large Data Analytics practice.

Consider this as phase I of building out the ecosystem.

We’re all about making Hadoop easy and accessible. Now you can take on those mountains of data and turn them into value. Download Pentaho for Hadoop.

Richard


Follow

Get every new post delivered to your Inbox.

Join 105 other followers