Pentaho’s support of EMC Greenplum HD – what it means and why you should care

May 12, 2011

On Monday we announced our support for the EMC Greenplum distribution of Hadoop called EMC Greenplum HD. You can read about all the details in our press release, Pentaho Makes Hadoop Faster, More Affordable and Easier to Use with EMC.

This week we have been at EMC World in Las Vegas as a sponsor in booth 211 (if you are at the conference come visit us). We’ve had a great crowd and interest in Pentaho BI Suite for Hadoop, Pentaho Data Integration for Hadoop and our new native support for the Greenplum Database GPLoad high performance bulk loader. Two questions that attendees keep asking are: “How is Pentaho supporting EMC Greenplum HD,” and “Why should I care?” You can read my answers below and more details about our announcement in the press release and Pentaho & EMC web page.

How Pentaho supports EMC Greenplum for Hadoop
Pentaho is the only EMC Greenplum partner to provide a complete BI solution from data integration through to reporting, analysis, dashboarding and data mining, from a single BI platform with shared metadata. Pentaho’s support and certification complements the Greenplum distribution of Hadoop by providing an end-to-end data integration and BI suite with the cost advantages of open source that enables:

  • An easy-to-use, graphical ETL environment for input, transformation, and output of Hadoop data;
  • Massively scalable deployment of ETL processing across the Hadoop cluster;
  • Coordination and execution of Hadoop tasks by enabling them to be managed from within the Pentaho management console;
  • Easy spinning off of high performance data marts for interactive analysis;
  • Integration of data from Hadoop with data from other sources for interactive analysis.

Why this is a good thing and how it changes the industry
EMC Greenplum, in combination with key technology partners, for the first time is giving the industry an integrated, supported and certified data management and BI stack that includes storage, a MapReduce framework for processing unstructured data, an analytic database, predictive analytics and business intelligence.

By combining Pentaho’s powerful BI suite with the strength of EMC Greenplum’s storage and data management domain expertise, the industry benefits from maximum data throughput and significantly shorter implementation cycles for new Hadoop deployments.

Already an industry leader in data and storage, EMC is now well-positioned to play a pivotal role in commercializing Hadoop and giving businesses a more cost-effective and simple way to perform advanced analytics in a massively scalable way. For Hadoop to truly get to the next level, it needs to be as easy-to-install and use as off-the-shelf software.

If you are interested to evaluate Pentaho BI Suite and Pentaho Data Integration for the EMC Greenplum distribution of Hadoop, contact us at Pentaho_EMC@pentaho.com

Ian Fyfe
Chief Technology Evangelist
Pentaho

Photos from the Pentaho booth at EMC World this week

This slideshow requires JavaScript.


Thoughts on last week’s Strata big data conference

February 8, 2011

Last week I attended the O’Reilly’s Strata Conference, in Santa Clara, California where Pentaho was an exhibitor. I gave a 5-minute lightning talk during the preceding Big Data Camp “un-conference” on the topic, The importance of the hybrid data model for Hadoop driven analytics, focusing on the importance of combining big data analytic results with the data elements already in firm’s existing systems to give business units the answers to questions that were previously not possible or economic to answer (something that of course Pentaho now makes possible). I also sat down for an interview with Mac Slocum, Online Managing Editor at O’Reilly, you can see the video below where we discuss  what kinds of businesses can benefit from big data technologies such as Hadoop, and what is the tipping point for adopting big data technologies.


The high quality of attendees and activity at this sell-out conference I think further confirms that although development work on solutions for big data has been happening for a few years, this area is undergoing a quantum leap in adoption at businesses both large and small. Simply put this technology allows them to glean “information” from the enormous quantities of often unstructured or semi-structured data that in the past was simply not possible, or was eye-wateringly expensive to achieve using conventional relational database technologies.

I found that the level of “Big Data” understanding maturity among attendees was quite varied. Questions spanned the entire spectrum with a few people asking things like “What is Hadoop?” to many along the lines of “Exactly how does Pentaho integrate with Hadoop’s Map-Reduce Framework, HDFS, and Hive?” Some attendees were clearly still in the discovery and learning phase, but many were confidently moving forward with the idea of leveraging big data, and were looking for solutions that make it easier to work with big data technologies such as Hadoop to deliver new information and insights to their businesses. In fact, it is clear that the emergence of a new type of database professional: the data scientist is rapidly becoming mainstream. This person combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data.

Ian Fyfe
Chief Technology Evangelist
Pentaho Corporation

Here are some in-action photos of our booth at the Strata Conference


Pentaho’s week of Hadoop

October 15, 2010

Congrats to our new partner Cloudera for putting on a great event this week. Hadoop World 2010 was a huge success – with over 900 attendees. It was great to talk to companies using Hadoop and those looking to solve their big data problems. We were also excited to have such a great showing at our presentation at the very end of the day with standing room only!

On Wednesday, following Hadoop World, the Pentaho Agile BI Tour arrived in NYC. Pentaho and its partner Project Leadership Associates presented a special half-day seminar focused on Agile BI and Big Data. We also hosted the first of three special OEM Power Lunches for companies interested in embedding Pentaho.

For an insider’s look at our Week of Hadoop check out our slide show below.

If you missed our four announcements Tuesday about the availability of Pentaho Data Integration and Pentaho BI Suite for Hadoop and our new partnerships you can read what the press and analyst have to say:

Hadoop pitched for business intelligence
ITWorld.com,  Joab Jackson

Pentaho Adds Hadoop Support
CTO Edge, Mike Vizard

Pentaho Brings Businss intellgience to Hadoop
ECRMGuide, Paul Shread

Pentaho brings BI, integration to Hadoop
Computer Business Review, Jason Stamper

You may be thinking, what is Hadoop? If so, I recommend to check out the videos by our Chief Geek, James Dixon. 5 videos are short, to the point and very informative.

Have a great weekend!
Rebecca Shomair
Director, Corporate Communications
Pentaho Corporation

This slideshow requires JavaScript.


Pentaho, Hadoop, and Data Lakes

October 15, 2010

Earlier this week, at Hadoop World in New York,  Pentaho announced availability of our first Hadoop release.

As part of the initial research into the Hadoop arena I talked to many companies that use Hadoop. Several common attributes and themes emerged from these meetings:

  • 80-90% of companies are dealing with structured or semi-structured data (not unstructured).
  • The source of the data is typically a single application or system.
  • The data is typically sub-transactional or non-transactional.
  • There are some known questions to ask of the data.
  • There are many unknown questions that will arise in the future.
  • There are multiple user communities that have questions of the data.
  • The data is of a scale or daily volume such that it won’t fit technically and/or economically into an RDBMS.

In the past the standard way to handle reporting and analysis of this data was to identify the most interesting attributes, and to aggregate these into a data mart. There are several problems with this approach:

  • Only a subset of the attributes are examined, so only pre-determined questions can be answered.
  • The data is aggregated so visibility into the lowest levels is lost

Based on the requirements above and the problems of the traditional solutions we have created a concept called the Data Lake to describe an optimal solution.

If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.

For more information on this concept you can watch a presentation on it here: Pentaho’s Big Data Architecture

Cheers,
James Dixon
Chief Geek
Pentaho Corporation

Originally posted on James Dixon’s blog http://jamesdixon.wordpress.com/


Data, Data, Data

October 12, 2010

It’s everywhere and expanding exponentially every day. But it might as well be a pile of %#$& unless you can turn all of that data into information. And do so in a timely, efficient and cost-effective manner.  The old-school vendors don’t operate in a timely (everything is slow), efficient (everything is over-engineered, over-analyzed, over-staffed, etc) or cost-effective mode (the bloated supertanker needs feeding and the customer gets to pay for those inefficiencies), so that means new technologies and business models will drive innovation which ultimately serves the customers and communities.

Back to Data, Data, Data – Enter open source technologies like Hadoop and Pentaho BI/DI to drive next gen big data analytics to the market. Hadoop and Pentaho have both been around about 5 years, are both driven by very active communities, and have both been experiencing explosive growth over the last 18 months. Our community members are the ones who came up with the original integration points for the two techs, not because it was a fun, science project thing to do but because they had real business pains they were trying to solve. This all started in 2009 – we started development in 09, we launched our beta program in June 2010 (had to cap enrollment in the beta program at 60), launched a Pentaho for Hadoop roadshow (which was oversubscribed) and are now announcing the official release of Pentaho Data Integration and BI Suite for Hadoop.

I’m in NYC today at Hadoop World and we’re making four announcements:

  1. Pentaho for Hadoop – our Pentaho BI Suite and Pentaho Data Integration are now both integrated with Hadoop
  2. Partnership with Amazon Web Services – Pentaho for Hadoop now supports Amazon Elastic Map Reduce (EMR) and S3
  3. Partnership with Cloudera – Pentaho for Hadoop will support certified versions of Cloudera’s Distribution for Hadoop (CDH)
  4. Partnership with Impetus – a major Solutions Provider (over 1,000 employees) with a dedicated Large Data Analytics practice.

Consider this as phase I of building out the ecosystem.

We’re all about making Hadoop easy and accessible. Now you can take on those mountains of data and turn them into value. Download Pentaho for Hadoop.

Richard


Pentaho in October: It’s a Hadoop world, we’re just living in it

October 5, 2010

Big things are happening at Pentaho this month, with an emphasis on Big Data.  We are headlining Hadoop events in New York, London and San Diego. If you are attending Hadoop World in New York City on October 12th, make sure to stop by our booth and attend Richard Daley’s session, ‘Putting Analytics in Big Data Analysis‘.  Then stay around until the 13th for a special FREE half-day seminar, ‘Agile BI Meets Big Data‘ with our partners, Project leadership Associates.

All the while, we’re still on the road with the Agile BI Tour hitting full force in October.  We’ll be visiting 10 more cities around the world with these info-packed seminar and training sessions.  Directly following the Agile BI events in New York, San Mateo, and Houston, Pentaho will be hosting a special OEM Power Lunch, where we will explore Pentaho’s architecture, hear some specific OEM use cases, and introduce you to the Pentaho OEM team.

Hadoop Events

Join us to see how we’re simplifying the complexities of Big Data Analytics with Pentaho’s Biggest initiative to date, Pentaho for Hadoop.

Agile BI Tour : Data to Dashboards in Minutes

Business and technical users alike will benefit from these information-packed half-day seminars.  We will demonstrate and provide a step-by-step training on the Pentaho BI Suite Enterprise Edition.

OEM Power Lunch Series : Enhance you Product with Modern Business Intelligence

Join us for lunch as we explore Pentaho’s easily embeddable architecture, specific OEM partner case studies, and introduce you to our OEM team.  These lunch sessions will take place from 12:00 – 2:00pm, directly following the Pentaho Agile BI Tour stops in the following cities:

It’s an action-packed month and we look forward to seeing you somewhere along the way.  Wishing you a Happy Halloween from Pentaho!


Pentaho’s BIG, Fast, and Agile August

August 4, 2010

Pentaho is hitting the road this month to show you the world’s first BI integration for Hadoop with our three-city roadshow, ‘Harnessing Hadoop for Big Data’.  Next, prepare to see blazing fast business intelligence when we pair Ingres Vectorwise with Pentaho’s Agile BI initative.

BIG – We’re rolling into town to show you how Pentaho, as the face of Hadoop, can leverage the power of business intelligence and data integration for your Big Data analysis needs.  These live seminars are free but space is limited, so be sure to register now.

  • Harnessing Hadoop for Big Data – Live Seminar Series

FAST & AGILE – See what is possible when you combine the power of Agile BI with Ingres Vectorwise, the next generation of analytic database technology during this live webcast.

  • Blazing Fast, Agile BI with Ingres VectorWise and Pentaho
    • Webcast:  Thursday, August 12th 2010

Want to learn more about Pentaho and meet the team?  This month we will be holding Classroom Training classes in Buenos Aires, Argentina and here on the home front in Orlando, Florida.

Where else can you find Pentaho?  This month and every month, we invite you to join the conversation with us on Twitter, Facebook, and LinkedIn.

Visit our Events page for more details and updated events.  Here’s to a BIG, Fast, and Agile August!


Follow

Get every new post delivered to your Inbox.

Join 101 other followers