Pentaho 5.0 blends right in!

September 12, 2013

Dear Pentaho friends,

Ever since a number of projects joined forces under the Pentaho umbrella (over 7 years ago) we have been looking for ways to create more synergy across this complete software stack.  That is why today I’m exceptionally happy to be able to announce, not just version 5.0 of Pentaho Data Integration but a new way to integrate Data Integration, Reporting, Analyses, Dashboarding and Data Mining through one single interface called Data Blending, available in Pentaho Business Analytics 5.0 Commercial Edition

Data Blending allows a data integration user to create a transformation capable of delivering data directly to our other Pentaho Business Analytics tools (and even non-Pentaho tools).  Traditionally data is delivered to these tools through a relational database. However, there are cases where that can be inconvenient, for example when the volume of data is just too high or when you can’t wait until the database tables are updated.  This for example leads to a new kind of big data architecture with many moving parts:

Evolving Big Data Architectures

Evolving Big Data Architectures

From what we can see in use at major deployments with our customers, mixing Big Data, NoSQL and classical RDBS technologies is more the rule than the exception.

So, how did we solve this puzzle?

The main problem we faced early on was that the default language used under the covers, in just about any business intelligence user facing tool, is SQL.  At first glance it seems that the worlds of data integration and SQL are not compatible.  In DI we read from a multitude of data sources, such as databases, spreadsheets, NoSQL and Big Data sources, XML and JSON files, web services and much more.  However, SQL itself is a mini-ETL environment on its own as it selects, filters, counts and aggregates data.  So we figured that it might be easiest if we would translate the SQL used by the various BI tools into Pentaho Data Integration transformations. This way, Pentaho Data Integration is doing what it does best, not directed by manually designed transformations but by SQL.  This is at the heart of the Pentaho Data Blending solution.

MattCasters_Blog_graphic

The internals of Data Blending

In other words: we made it possible for you to create a virtual “database” with “tables” where the data actually comes from a transformation step.

To ensure that the “automatic” part of the data chain doesn’t become an impossible to figure out “black box”, we made once more good use of existing PDI technologies.  We’re logging all executed queries on the Data Integration server (or Carte server) so you have a full view of all the work being done:

Data Blending Transparency

Data Blending Transparency

In addition to this, the statistics from the queries can be logged and viewed in the operations data mart giving you insights into which data is queried and how often.

We sincerely hope that you like these new powerful options for Pentaho Business Analytics 5.0!

Enjoy!

Matt

–If you want to learn more about the new features in this 5.0 release, Pentaho is hosting a webinar and demonstration on September 24th – Two options to register:  EMEA & North America time zones.

Matt Casters
Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions (Wiley)


Rackspace brings ETL to the Cloud with Pentaho: Hadoop Summit Q&A

June 27, 2013

This week Pentaho has been meeting with the movers and shakers of the Apache Hadoop community in San Jose, at the 6th annual Hadoop Summit. Pentaho and Rackspace are drawing attention on this final day of the show with the announcement of a partnership that brings ETL to the cloud. We’re introducing Rackspace Big Data, a powerful enterprise grade Hadoop as a Service solution. As the industry leader in cost effective data integration for Hadoop, Pentaho is proud to team with Rackspace, the industry leader in enterprise IAAS, to deliver this new era of big data in the cloud.

photo.JPG

L) Eddie White, EVP business development, Pentaho | R) Sean Anderson product marketing manager for cloud big data solutions, Rackspace Hosting

To learn more about the news, we’re talking today with Pentaho’s Eddie White, executive vice president of business development.

Give us a quick overview of this Rackspace news, and how Pentaho is involved.

Rackspace Big Data is an exciting Hadoop as a Service offering with full enterprise features. This is the next evolution in the big data ecosystem, delivering the ongoing structure to allow enterprise customers to choose a variety of consumption models over time. Customers can choose managed dedicated servers, and public, private or hybrid cloud options. Pentaho was chosen as the only Hadoop ETL / Data integration partner for this Cloud Tools Hadoop offering.

So is this a solution for enterprise customers looking to grow their big data operations?

Yes, absolutely. Hadoop as a Service is an attractive alternative for customers that need enterprise-level infrastructure support. Pentaho gives Rackspace a partner with the skills and talent on-board to deliver big data for production environments, along with the support and stability that Rackspace customers demand from their service-level agreements. Enterprises are looking for a Cloud partner with an enterprise-grade infrastructure to support running their business; not just test and development efforts.

What makes up this Hadoop as a Service model?

Together, Rackspace, Hortonworks and Pentaho have jointly delivered an offering that facilitates ease of use and ease of adoption of Hadoop as a Service. Rackspace Big Data includes the HortonWorks Data Platform for Hadoop; Pentaho Business Analytics as the ETL / Big Data Integration partner; and Karmasphere providing Hadoop analytics.

Rackspace excels at the enterprise IaaS model, and now they’ve partnered with Hortonworks and Pentaho to introduce an easy-to-use, consume-as-you-scale Hadoop as a Service offering – so customers can get started today, confident their solution will scale along with their big data needs. Rackspace chose to partner with Pentaho because it is the industry-leading Hadoop ETL and Big Data Analytics platform. Rackspace Big Data offers a range of models to meet any organization’s changing needs, from dedicated to hybrid, and for private and public clouds. And the offering ensures the ability to bi-directionally move data in and out of enterprise clusters, with minimal technical effort and cost.

What does Pentaho Data Integration bring to Rackspace Big Data?

Rather than speak for our partner, I’ll let Sean Anderson, Rackspace Hosting’s product marketing manager for cloud big data solutions, answer that. He sums up what Pentaho brings to the partnership nicely:

“Pentaho Data Integration is all about easing adoption and enhancing utilization of Rackspace big data platforms, with native, easy-to-use data integration. Pentaho is leading the innovation of Hadoop Integration and Analytics, and the upcoming cloud offering with Rackspace reduces the barriers to instant success with Hadoop, so customers can adopt and deploy quickly, delivering faster ROI,” said Anderson.

“Pentaho’s powerful data integration engine serves as a platform, enabling delivery of that content right into an enterprise’s pre-existing business intelligence and analytics tools,” continued Anderson. “Rackspace Big Data customers who require multiple data stores can leverage the ease of operation inherent in their visual ETL tool Pentaho provides. Customers will be able to complement their platform offering by adding the validated Pentaho tool via the Cloud Tools Marketplace.”

A key takeaway is that Rackspace Big Data customers may choose to bridge to the Pentaho Business Analytics platform. As an example, Pentaho’s full suite can be used where a Rackspace customer wants to use both Hortonworks and ObjectRocket. We bring the data in both of these databases to life for the Rackspace customer.

Why is Pentaho excited about this announcement?

This is exciting news because it is Pentaho’s first strategic cloud partnership. As the big data market has matured, it’s now time for production workloads to be moved over to Big Data Service offerings. Rackspace is the recognized leader providing the enterprise with IaaS, with an enterprise-grade support model. We see Rackspace and a natural partner for us to make our move into this space. We are market leaders in our respective categories with proven experience that enterprises trust for service, reliability, scalability and support. As the market for Hadoop and Big Data is developing and maturing, we see Rackspace as the natural strategic partner for Pentaho to begin providing Big Data / Hadoop as a Service.

MarketplaceHow can organizations buy Rackspace Big Data?

For anyone looking to leverage Hadoop as a Service, Rackspace Big Data is available directly from Rackspace. For more information and pricing visit: www.rackspace.com/big-data. Pentaho will also be in the Rackspace Cloud Tools marketplace.


“There is nothing more constant than change”—Heraclitus 535BC

June 26, 2013

13-090 Pentaho Labs logo v3

Change and more change. It’s been incredible watching the evolution of and innovation in the big data market.  A few years ago we were helping customers understand Hadoop and the value it could bring in analyzing large volumes of unstructured data. Flash-forward to today as we attend our third Hadoop Summit in San Jose and we see the advances customers have made in adopting these technologies in their production big data environments..

It’s the value of a continuum of innovation. As the market matures we are only limited by what we don’t leave ourselves open to.  Think for a minute about the next “big data,” because there will be one. We can’t anticipate what it look like, where it will come from or how much of it will be of value.  In the same way we couldn’t predict the advent of Facebook or Twitter.

We do know that innovation is a constant. Today’s big data will be tomorrow’s “traditional” data.

Pentaho’s announcement today of an adaptive big data layer and Pentaho Labs are in anticipation of just this type of change.  We’ve simplified for Pentaho and our customers the ability to leverage current and new big data technologies like Hadoop, NoSQL and specialized big data stores.

In the spirit of innovation (which stems from our open source history) we’ve established Pentaho Labs – our place for free thinking innovation that leads to new capabilities in our platform in areas like real time and predictive analytics.

Being a leader at the forefront of a disruptive and ever-changing market means embracing change and the innovation. That’s the future of analytics.

Donna Prlich
Senior Director, Product Marketing, Pentaho


Informatica jumps on the Pentaho bandwagon

June 12, 2013

Big-Data_web.jpgYou know that a technology megatrend has truly arrived when the large vendors start to jump on the bandwagon. Informatica recently announced Informatica Vibe™ – its new virtual data machine (VDM), an embeddable data management engine that allows developers to “Map Once, Deploy Anywhere,” including into Hadoop, without generating or writing code. According to Informatica, developers can instantly become Hadoop developers without having to acquire new skills. Sound familiar?

I applaud Informatica’s efforts – but not for innovating or changing the landscape in data integration.  What I applaud them for is recognizing that the landscape for data integration has indeed changed, and it was time for them to join the party. “Vibe” itself may be new, but it is not a new concept, nor unique to the industry.  In fact, Pentaho recognized the need for a modern, agile, adaptive approach to data integration for OEMs and customers. We pioneered the Kettle “design once, run anywhere” embeddable virtual data engine back in 2005. And let’s set the record straight – Pentaho extended its lightweight data integration capabilities to Hadoop over three years ago as noted in this 2010 press release.

Over the past three years, Pentaho has delivered on Big Data Integration with many successful Hadoop customers, such as BeachMint, MobileThink, TravelTainment and Travian Games and continued our innovation — with not only Hadoop but also NoSQL, Analytical Engines, and other specialized Big Data stores. We have added test, deploy and real time monitoring functionality.  The Pentaho engine is embedded in multiple SaaS, Cloud, and customer applications today such as Marketo, Paytronix, Sharable Ink and Soliditet, with many more on the horizon. Our VDM is completely customer extensible and open. We insulate customers from changes in their data volumes, types, sources, computing platforms, and user types.  In fact, what Informatica states as intention and direction with Vibe, Pentaho Data Integration delivers today, and we continue to lead in this new landscape.

VisualDataManagement

The Data Integration market has changed– the old, heavyweight, proprietary infrastructure players must adapt to current market demands. Agile, extensible, open, embeddable engines with pluggable infrastructures are the base, but it doesn’t end there. Companies of all sizes and verticals are requiring shorter development cycles, broad and deep big data ecosystem support, attractive price points and rich functionality, and all without vendor lock-in.  Informatica is adapting to play in the big data integration world by rebranding its products and signaling new direction.  Tony Baer, principal analyst at Ovum, summarizes this adaptation in his blog, “Informatica aims to get its vibe back.”

The game is on and Pentaho is at the forefront. We have very exciting big data integration news in store for you at the Hadoop Summit in Santa Clara on June 26-27 that unfortunately I have to keep the lid on for now. Stay tuned!

Richard

Richard Daley

Co-founder and chief strategy officer


The Road to Success with Big Data – A Closer Look at Expectations vs. the Reality

June 5, 2013

Stay on course
Big Data is complex. The technologies in Big Data are rapidly maturing, but are still in many ways in an adolescent phase. While Hadoop is dominating the charts for Big Data technologies, in the recent years we have seen a variety of technologies born out of the early starters in this space- such as Google, Yahoo, Facebook and Cloudera. To name a few:

  • MapReduce: Programming model in Java for parallel processing of large data sets in Hadoop clusters
  • Pig: A high-level scripting language to create data flows from and to Hadoop
  • Hive: SQL-like access for data in Hadoop
  • Impala: SQL query engine that runs inside Hadoop for faster query response times

It’s clear, the spectrum of interaction and interfacing with Hadoop has matured beyond pure programming in Java into abstraction layers that look and feel like SQL. Much of this is due to the lack of resources and talent in big data – and therefore the mantra of “the more we make Big Data feel like structured data, the better adoption it will gain.”

But wait, not so fast—->you can make Hadoop act like a SQL data store. However, there are consequences, as Chris Deptula from OpenBI explains in his blog, A Cautionary Tale for Becoming too Reliant on Hive. You are forgoing flexibility and speed if you choose Hive for a more complex query as opposed to pure programming or using a visual interface to MapReduce.

This goes to show that there are numerous areas of advancements in Hadoop that have yet to be achieved – in this case better performance optimization in Hive. I come from a relational world – namely DB2 – where we spent a tremendous amount of time making this high-performance transactional database – that was developed in the 70’s – even more powerful in the 2000s, and that journey continues today.

Granted, the rate of innovation is much faster today than it was 10, 20, 30 years ago, but we are not yet at the finish line with Hadoop. We need to understand the realities of what Hadoop can and cannot do today, while we forge ahead with big data innovation.

Here are a few areas of opportunity for innovation in Hadoop and strategies to fill the gap:

  • High-Performance Analytics: Hadoop was never built to be a high-performance data interaction platform. Although there are newer technologies that are cracking the nut on real-time access and interactivity with Hadoop, fast analytics still need multi-dimensional cubes, in-memory and caching technology, analytic databases or a combination of them.
  • Security: There are security risks within Hadoop. It would not be in your best interest to open the gates for all users to access information within Hadoop. Until this gap is closed further, a data access layer can help you extract just the right data out of Hadoop for interaction.
  • APIs: Business applications have lived a long time on relational data sources. However with web, mobile and social applications, there is a need to read, write and update data in NoSQL data stores such as Hadoop. Instead of direct programming, APIs can simplify this effort for millions of developers who are building the next generation of applications.
  • Data Integration, Enrichment, Quality Control and Movement: While Hadoop stands strong in storing massive amounts of unstructured / semi-structured data, it is not the only infrastructure in place in today’s data management environments. Therefore, easy integration with other data sources is critical for a long-term success.

The road to success with Hadoop is full of opportunities and obstacles and it is important to understand what is possible today and what to expect next. With all the hype around big data, it is easy to expect Hadoop to do anything and everything. However, successful companies are those that choose combination of technologies that works best for them.

What are your Hadoop expectations?

- Farnaz Erfan, Product Marketing, Pentaho


Big Data Integration Webinar Series

May 6, 2013

line-chartDo you have a big data integration plan? Are you implementing big data? Big data, big data, big data. Did we say big data? EVERYONE is talking about big data…..but what are they really talking about? When you pull back the marketing curtains and look at the technology, what are the main elements and important true and tried trends that you should know?

Pentaho is hosting a four-part technical series on the key elements and trends surrounding big data. Each week of the series will bring a new, content-rich webinar helping organizations find the right track to understand, recognize value and cost-effectively deploy big data analytics.

All webinars will be held 8 am PT / 11 am ET / 16:00 GMT. To register follow the links below and for more information contact Rob Morrison at rmorrison at pentaho dot com.

1) Enterprise Data Warehouse Optimization with Hadoop Big Data

With exploding data volumes, increasing costs of the Enterprise Data Warehouse (EDW) and a raising demand for high-performance analytics, companies have no choice but to reduce the strain on their data warehouse and leverage Hadoop’s economies of scale for data processing. In the first webinar of the series, learn how using Hadoop to optimize the EDW gives IT professionals processing power, advanced archiving and the ability to easily add new data sources.

Date/Time:
Wednesday, May 8, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

2) Getting Started and Successful with Big Data

Sizing, designing and building your Hadoop cluster can sometimes be a challenge. To help our customers, Dell has developed: Hadoop Reference Architecture, a best practice documentation and open source tool called, Crowbar. Paul Brook, from Dell, will describe how customers can go from raw servers to Hadoop cluster in under two hours.

Date/Time:
Wednesday, May 15, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

3) Reducing the Implementation Efforts of Hadoop, NoSQL and Analytical Databases

It’s easy to put a working script together as part of an R&D project, but it’s not cost effective to maintain it throughout an ever building stream of user change requests, system and product updates.  Watch the third webinar in the series to learn how choosing the right technologies and tools can provide you the agility and flexibility to transform big data without coding.

Date/Time:
Wednesday, May 22, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.
4)Reporting, Visualization and Predictive from Hadoop

While unlocking data trapped in large and semi-structured data is the first step of a project, the next step is to begin to analyze and proactively identify new opportunities that will grow your bottom-line. Watch the fourth webinar in the series to learn how to innovate with state-of-the-art technology and predictive algorithms.

Date/Time:
Wednesday, May 29, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

 


Pentaho and Cloudera Impala in 5 words

April 29, 2013

Today our big data partner Cloudera, joined us in continuing to deliver innovative, open technologies that bring real business value to customers. Pentaho and Cloudera share a common history and approach to simplifying complex, but powerful technologies to integrate and analyze big data. Our common open source heritage means that we can innovate at the speed of our customers businesses.

What is Cloudera’s latest Innovation? Cloudera Impala powers Cloudera Enterprise RTQ (Real-time Query), the first data management solution that takes Hadoop beyond batch to enable real-time data processing and analysis on any type of data (unstructured and structured) within a centralized, massively scalable system. Impala dramatically improves the economics and performance of large-scale enterprise data management.

Pentaho and Cloudera Impala in 5 words = Affordable scalability meets fast analytics. Cloudera Imapala enables any product that is JDBC-enabled to get fast results from Hadoop, making Hadoop an ideal component for a data warehouse strategy. Customers no longer have to pay for expensive proprietary DBMS or analytical DBs to house their entire data warehouse.

Cloudera’s innovation makes it even easier for customers to use common analytic tools that can access and analyze data in all of these formats. What does this really mean? It means you don’t have buy expensive, proprietary products that can’t work across all of your data platforms.

With Pentaho and Cloudera you can quickly analyze large volumes of disparate data significantly faster with Impala than with Hive. Take a look at how Cloudera Impala is driving a major evolutionary step in the growth of the company’s Platform for Big Data, Cloudera Enterprise, and the Apache Hadoop ecosystem as a whole.

Richard Daley


How to Get to Big Data Value Faster

March 18, 2013

Summary: Everyone talks about how big data is the key to business success, but the process of getting value from big data is time intensive and complex.  Examining the big data analytics workflow provides clues to getting to big data results faster.

Pentaho Value

Most organizations recognize that big data analytics is key to their future business success, but efforts to implement are often slowed due to operational procedures and workflow issues.

At the heart of the issue is the big data analytics workflow including loading, ingesting, manipulating, transforming, accessing, modeling and, finally, visualizing and analyzing data. Each step requires manual intervention by IT with a great amount of hand coding and tools that invite mistakes and delays. New technologies such as Hadoop and NoSQL databases also require specialized skills. Once the data is prepared, business users often have new requests to IT for additional data sources and the linear process begins again.

Given the potential problems that can crop up in managing and incorporating big data into decision-making processes, organizations need easy-to-use solutions that can address today’s challenges, with the flexibility to adapt to meet future challenges. These solutions require data integration with support for structured and unstructured data and tools for visualization and data exploration that support existing and new big data sources.

A single, unified business analytics platform with tightly coupled data integration and business analytics such as Pentaho Business Analytics  is ideal. Pentaho supports the entire big data analytics flow with visual tools to simplify development and remove complexity for developers and powerful analytics to allow a broad set of users to easily access, visualize and explore big data. By dramatically improving developer productivity and offering significant performance advantages, Pentaho significantly reduces time to big data value.

- Donna Prlich
Senior Director, Product and Solution Marketing, Pentaho

this blog originally appeared on GigaOM at http://gigaom.com/2012/12/06/how-to-reduce-complexity-and-get-to-big-data-value-faster/


Looking for the perfect match

February 28, 2013

image

I’m at the O’Reilly Strata Big Data Conference in Santa Clara, CA this week where there’s lots of buzz about the value and reality of big data. It’s a fun time to be part of a hot new market in technology. But, of course, a hot new market brings a new set of challenges.

After talking to several attendees, I would not be surprised if someone took out an advertisement in the San Francisco Guardian that reads:

SEEKING BDT (Big Data Talent)

“Middle-aged attractive company seeks hot-to-trot data geek for mutually enjoyable discrete relationship, mostly involving analytics. Must enjoy long discussions about wild statistical models, short walks to the break room and large quantities of caffeine.”

The feedback from the presentations and attendees at Strata mimics the results from a Big Data survey that Pentaho released last week showing there is a lack of current skills to address new big data technologies such as Hadoop among existing staff and more generally on the market. This is good news for folks looking for jobs in Big Data and a good indication for others who want to learn new skills.

The market has created the perfect storm – the combination of hot new technology mixed with a myriad of very complex systems plus highly complicated statistical models and calculations. This storm is preventing the typical IT generalist or BI expert from applying.  More experienced data scientists who can spin models on their head with a twist of a mouse are in high demand. The need to garner value quickly from Big Data means there is little time to look for the “perfect match.”

It seems like new companies and technologies pop up almost every week, each with the promise of business benefits, but with the added cost of high complexity.  Shouldn’t things get easier with new technologies?

Pentaho’s Visual MapReduce is a prime example of things getting easier.  Getting data out of Hadoop quickly can be a challenge.  However, with Visual MapReduce any IT professional could pull the right information from a Hadoop cluster, improve the performance of a MapReduce job and make results available in the optimal format for business users.

New technologies might need new talent, but in the case of Pentaho Visual MapReduce, new technologies might only need new tools to help address them.

Looks like Pentaho is the perfect match.

Chuck Yarbrough
Technical Solutions Marketing


Going mobile this year? What’s your biggest big data challenge?

November 16, 2012

We received insightful responses to the polls from our “Mobile and Big Data go Instant and Interactive” webinars about the challenges users of all types face with business analytics. The complexity of data integration, lack of skills and resources, and the need to analyze unstructured data are the most significant big data challenges identified for over 80% of attendees. 50% of our attendees either have a current mobile BI solution in place or plan to in the future.

What does this mean for the future of analytics? Whether mobilizing your sales force or empowering data analysts to discover meaning from data in Hadoop, a complete business analytics solution must address the business pressures of a continual inundation of data and the need to access and interact with that data instantly in simple, familiar ways.

Not surprising that the response to Pentaho’s Business Analytics 4.8 has been overwhelmingly positive — the best of analytics offered up in a mobile optimized experience for business users and Instaview broadening big data access to data analysts for data discovery.

If you missed out on our webinar, access the on demand recording at:

Watch the Pentaho 4.8 On-Demand Webinar

Data Integration and business analytics in a single, unified, modern platform — Pentaho is the future of analytics

Let me know what you think about Pentaho 4.8.

Donna Prlich

Director, Product Marketing

Pentaho


Follow

Get every new post delivered to your Inbox.

Join 96 other followers