Six reasons why Pentaho’s support of Apache Hadoop is great news for ‘big data’

Earlier today Pentaho announced support for Apache Hadoop – read about it here.

There are many reasons we are doing this:

  1. Hadoop lacks graphical design tools – Pentaho provides plug-able design tools.
  2. Hadoop is Java -  Pentaho’s technologies are Java.
  3. Hadoop needs embedded ETL – Pentaho Data Integration is easy to embed.
  4. Pentaho’s open source model enables us to provide technology with great price/performance.
  5. Hadoop lacks visualization tools – Pentaho has those
  6. Pentaho provides a full suite of ETL, Reporting, Dashboards, Slice ‘n’ Dice Analysis, and Predictive Analytics/Machine Learning

The thing is, taking all of these in combination, Pentaho is the only technology that satisfies all of these points.

You can see a few of the upcoming integration points in the demo video (above). The ones shown in the video are only a few of the many integration points we are going to deliver.

Most recently I’ve been working on integrating the Pentaho suite with the Hive database. This enables desktop and web-based reporting, integration with the Pentaho BI platform components, and integration with Pentaho Data Integration. Between these use cases, hundreds of different components and transformation steps can be combined in thousands of different ways with Hive data. I had to make some modifications to the Hive JDBC driver and we’ll be working with the Hive community to get these changes contributed. These changes are the minimal changes required to get some of the Pentaho technologies working with Hive. Currently the changes are in a local branch of the Hive codebase. More specifically they are a ‘Short-term Rapid-Iteration Minimal Patch’ fork – a SHRIMP Fork.

Technically, I think the most interesting Hive-related feature so far is the ability to call an ETL process within a SQL statement (as a Hive UDF). This enables all kinds of complex processing and data manipulation within a Hive SQL statement.

There are many more Hadoop-related ETL and BI features and tools to come from Pentaho.  It’s gonna be a big summer.

James Dixon
Chief Geek
Pentaho Corporation

Learn more - watch the demo


2 Responses to Six reasons why Pentaho’s support of Apache Hadoop is great news for ‘big data’

  1. [...] Earlier today Pentaho announced support for Apache Hadoop – read about it here. There are many reasons we are doing this [...] The thing is, taking all of these in combination, Pentaho is the only technology that satisfies all of these points. Lire l’article [...]

  2. everything about java…

    [...]Six reasons why Pentaho’s support of Apache Hadoop is great news for ‘big data’ « Business Intelligence From the Swamp[...]…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 88 other followers

%d bloggers like this: