Pentaho 5.0 blends right in!

September 12, 2013

Dear Pentaho friends,

Ever since a number of projects joined forces under the Pentaho umbrella (over 7 years ago) we have been looking for ways to create more synergy across this complete software stack.  That is why today I’m exceptionally happy to be able to announce, not just version 5.0 of Pentaho Data Integration but a new way to integrate Data Integration, Reporting, Analyses, Dashboarding and Data Mining through one single interface called Data Blending, available in Pentaho Business Analytics 5.0 Commercial Edition

Data Blending allows a data integration user to create a transformation capable of delivering data directly to our other Pentaho Business Analytics tools (and even non-Pentaho tools).  Traditionally data is delivered to these tools through a relational database. However, there are cases where that can be inconvenient, for example when the volume of data is just too high or when you can’t wait until the database tables are updated.  This for example leads to a new kind of big data architecture with many moving parts:

Evolving Big Data Architectures

Evolving Big Data Architectures

From what we can see in use at major deployments with our customers, mixing Big Data, NoSQL and classical RDBS technologies is more the rule than the exception.

So, how did we solve this puzzle?

The main problem we faced early on was that the default language used under the covers, in just about any business intelligence user facing tool, is SQL.  At first glance it seems that the worlds of data integration and SQL are not compatible.  In DI we read from a multitude of data sources, such as databases, spreadsheets, NoSQL and Big Data sources, XML and JSON files, web services and much more.  However, SQL itself is a mini-ETL environment on its own as it selects, filters, counts and aggregates data.  So we figured that it might be easiest if we would translate the SQL used by the various BI tools into Pentaho Data Integration transformations. This way, Pentaho Data Integration is doing what it does best, not directed by manually designed transformations but by SQL.  This is at the heart of the Pentaho Data Blending solution.

MattCasters_Blog_graphic

The internals of Data Blending

In other words: we made it possible for you to create a virtual “database” with “tables” where the data actually comes from a transformation step.

To ensure that the “automatic” part of the data chain doesn’t become an impossible to figure out “black box”, we made once more good use of existing PDI technologies.  We’re logging all executed queries on the Data Integration server (or Carte server) so you have a full view of all the work being done:

Data Blending Transparency

Data Blending Transparency

In addition to this, the statistics from the queries can be logged and viewed in the operations data mart giving you insights into which data is queried and how often.

We sincerely hope that you like these new powerful options for Pentaho Business Analytics 5.0!

Enjoy!

Matt

–If you want to learn more about the new features in this 5.0 release, Pentaho is hosting a webinar and demonstration on September 24th – Two options to register:  EMEA & North America time zones.

Matt Casters
Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions (Wiley)


Awards Are Rolling In for Pentaho

July 11, 2013

Awards CoverPentaho and its customers have been honored with a bevy of awards over the past few months, and I thought I’d share them here. We’re grateful to be recognized so widely as a leader in big data and software development.

Here’s a synopsis of the awards Pentaho has won in the past three months alone.

  • Red Herring 100 – Red Herring identified Pentaho as one of the 100 most promising private technology start-ups in North America for 2013. According to Red Herring, companies are judged based on “financial performance, technology innovation, quality of management, execution of strategy, and integration into their respective industries.”
  • CRN 2013 Big Data 100: Pentaho was named to CRN’s inaugural Big Data 100 list, which identifies “vendors that have demonstrated an ability to innovate in bringing to market products and technologies that help businesses manage big data.”
  • SD Times 100: ‘Best in Show’ in Software Development: This annual award recognizes top leaders and innovators in the software- development industry. Pentaho was included in the new Big Data and BI category, which names vendors the editors believe have “tackled the giant data problem with aplomb.”
  • 2013 SIIA Software CODiE Award Finalist for Best Big Data Solution: This award recognizes the vendors that best enable the “capture, analysis, and mining of large data sets.” Pentaho was the only full data integration and analytics provider among the five finalists.
  • IE Big Data Innovation Award Winner 2013: Pentaho won in the Big Data Technology Provider Category, and accepted the award at the Big Data Innovation Summit In San Francisco. The award is designed to recognize “organizations that have pioneered Big Data initiatives that have or will have an overall positive impact within the Big Data community.”
  • Database Trends and Applications DBTA 100: This inaugural award recognizes “the companies that matter most in data.”

Several of our customers were also honored for their creative deployment of Pentaho’s software to provide business solutions.

  • Apparel Magazine 2013 Top Innovator (Modcloth): Modcloth, an e-commerce retailer, was recognized as a “Top Innovator” for its Pentaho deployment. Modcloth “wanted to transition away from creating reports in Excel and migrate to a more robust platform with richer features,” Jessica Binns wrote in Apparel Magazine. “Pentaho’s reporting software is helping ModCloth to improve the speed and quality of the decisions it makes, [and] get a closer look at customer trends by collecting social media data such as loves, shares and reviews.”
  • Nucleus Research 2013 ROI Awards (STRATO): This award recognizes companies that have made the “best technology deployments.” STRATO, a storage and web hosting company, received the award for its use of Pentaho, which generated a 392% return on investment. According to Nucleus Research, using Pentaho helped STRATO to lower hardware and software costs, increase manager and employee productivity, put off capital investments, and increase revenue.
  • IT Europa’s Government Solution of the Year Finalist (BNova for Italy’s Chamber of Commerce): BNova, an Italian consulting firm, was recognized for a highly innovative application it designed based on Pentaho’s platform. The application will help Italy’s chambers of commerce to forecast economic trends and provide better local support to businesses.
  • Campus Technology Innovators Award 2013: Each year, the Campus Technology Innovators Awards recognize exemplary colleges and universities, their visionary technology project leadership, and their innovative technology vendor partners that have taken technology to extraordinary new heights to meet today’s challenges on campus. Marist College in Poughkeepsie, NY was selected for their Open Academic Analytics Initiative using Pentaho.

It means a lot to us that our work is being recognized by so many big names in the technology world, and across such a range of categories. 
The customer awards are especially meaningful to our team. They prove that Pentaho’s analytics and data integration technology can transform businesses, while delivering high-value ROI. So, don’t be shy. If you’re happy to brag about how transformative big data/business analytics is for your company and customers, send us a note. We’re happy to work together, shining the spotlight on any novel deployments.

Rebecca Shomair
Director of Corporate Communications


Rackspace brings ETL to the Cloud with Pentaho: Hadoop Summit Q&A

June 27, 2013

This week Pentaho has been meeting with the movers and shakers of the Apache Hadoop community in San Jose, at the 6th annual Hadoop Summit. Pentaho and Rackspace are drawing attention on this final day of the show with the announcement of a partnership that brings ETL to the cloud. We’re introducing Rackspace Big Data, a powerful enterprise grade Hadoop as a Service solution. As the industry leader in cost effective data integration for Hadoop, Pentaho is proud to team with Rackspace, the industry leader in enterprise IAAS, to deliver this new era of big data in the cloud.

photo.JPG

L) Eddie White, EVP business development, Pentaho | R) Sean Anderson product marketing manager for cloud big data solutions, Rackspace Hosting

To learn more about the news, we’re talking today with Pentaho’s Eddie White, executive vice president of business development.

Give us a quick overview of this Rackspace news, and how Pentaho is involved.

Rackspace Big Data is an exciting Hadoop as a Service offering with full enterprise features. This is the next evolution in the big data ecosystem, delivering the ongoing structure to allow enterprise customers to choose a variety of consumption models over time. Customers can choose managed dedicated servers, and public, private or hybrid cloud options. Pentaho was chosen as the only Hadoop ETL / Data integration partner for this Cloud Tools Hadoop offering.

So is this a solution for enterprise customers looking to grow their big data operations?

Yes, absolutely. Hadoop as a Service is an attractive alternative for customers that need enterprise-level infrastructure support. Pentaho gives Rackspace a partner with the skills and talent on-board to deliver big data for production environments, along with the support and stability that Rackspace customers demand from their service-level agreements. Enterprises are looking for a Cloud partner with an enterprise-grade infrastructure to support running their business; not just test and development efforts.

What makes up this Hadoop as a Service model?

Together, Rackspace, Hortonworks and Pentaho have jointly delivered an offering that facilitates ease of use and ease of adoption of Hadoop as a Service. Rackspace Big Data includes the HortonWorks Data Platform for Hadoop; Pentaho Business Analytics as the ETL / Big Data Integration partner; and Karmasphere providing Hadoop analytics.

Rackspace excels at the enterprise IaaS model, and now they’ve partnered with Hortonworks and Pentaho to introduce an easy-to-use, consume-as-you-scale Hadoop as a Service offering – so customers can get started today, confident their solution will scale along with their big data needs. Rackspace chose to partner with Pentaho because it is the industry-leading Hadoop ETL and Big Data Analytics platform. Rackspace Big Data offers a range of models to meet any organization’s changing needs, from dedicated to hybrid, and for private and public clouds. And the offering ensures the ability to bi-directionally move data in and out of enterprise clusters, with minimal technical effort and cost.

What does Pentaho Data Integration bring to Rackspace Big Data?

Rather than speak for our partner, I’ll let Sean Anderson, Rackspace Hosting’s product marketing manager for cloud big data solutions, answer that. He sums up what Pentaho brings to the partnership nicely:

“Pentaho Data Integration is all about easing adoption and enhancing utilization of Rackspace big data platforms, with native, easy-to-use data integration. Pentaho is leading the innovation of Hadoop Integration and Analytics, and the upcoming cloud offering with Rackspace reduces the barriers to instant success with Hadoop, so customers can adopt and deploy quickly, delivering faster ROI,” said Anderson.

“Pentaho’s powerful data integration engine serves as a platform, enabling delivery of that content right into an enterprise’s pre-existing business intelligence and analytics tools,” continued Anderson. “Rackspace Big Data customers who require multiple data stores can leverage the ease of operation inherent in their visual ETL tool Pentaho provides. Customers will be able to complement their platform offering by adding the validated Pentaho tool via the Cloud Tools Marketplace.”

A key takeaway is that Rackspace Big Data customers may choose to bridge to the Pentaho Business Analytics platform. As an example, Pentaho’s full suite can be used where a Rackspace customer wants to use both Hortonworks and ObjectRocket. We bring the data in both of these databases to life for the Rackspace customer.

Why is Pentaho excited about this announcement?

This is exciting news because it is Pentaho’s first strategic cloud partnership. As the big data market has matured, it’s now time for production workloads to be moved over to Big Data Service offerings. Rackspace is the recognized leader providing the enterprise with IaaS, with an enterprise-grade support model. We see Rackspace and a natural partner for us to make our move into this space. We are market leaders in our respective categories with proven experience that enterprises trust for service, reliability, scalability and support. As the market for Hadoop and Big Data is developing and maturing, we see Rackspace as the natural strategic partner for Pentaho to begin providing Big Data / Hadoop as a Service.

MarketplaceHow can organizations buy Rackspace Big Data?

For anyone looking to leverage Hadoop as a Service, Rackspace Big Data is available directly from Rackspace. For more information and pricing visit: www.rackspace.com/big-data. Pentaho will also be in the Rackspace Cloud Tools marketplace.


Customers Speak out – Wisdom of the Crowds Business Intelligence Study, 2013

May 21, 2013

Pentaho-wisdom-panel“Responsiveness”, “professionalism”, “knowledge” and “experience”– these are just a few of the words our customers used in giving Pentaho the honor of being recently named a top business intelligence technology vendor in the third annual independent Wisdom of Crowds® Business Intelligence Market Study conducted by Dresner Advisory Services, LLC. The report recognizes Pentaho as a “High Growth BI Software” company with a critical mass of customers growing well above the average.

We have made and continue to make significant investments in simplifying and delivering real value in big data integration and analytics and our customers’ satisfaction. Being named a ‘high growth vendor’ validates that we are experiencing high growth in concert with the big data market, but not at the expense of our customers.

Pentaho earned high marks from its customers on multiple metrics specifically standing out in product, support, consulting and integrity.  This independent research comes straight from the voice of our customers, which is the best possible acknowledgement that we are indeed delivering the future of analytics.

I encourage you to download the full Wisdom of Crowds report to learn how the top vendors stack up and the top BI trends.

Donna Prlich
Senior Director, Product Marketing


Big Data Integration Webinar Series

May 6, 2013

line-chartDo you have a big data integration plan? Are you implementing big data? Big data, big data, big data. Did we say big data? EVERYONE is talking about big data…..but what are they really talking about? When you pull back the marketing curtains and look at the technology, what are the main elements and important true and tried trends that you should know?

Pentaho is hosting a four-part technical series on the key elements and trends surrounding big data. Each week of the series will bring a new, content-rich webinar helping organizations find the right track to understand, recognize value and cost-effectively deploy big data analytics.

All webinars will be held 8 am PT / 11 am ET / 16:00 GMT. To register follow the links below and for more information contact Rob Morrison at rmorrison at pentaho dot com.

1) Enterprise Data Warehouse Optimization with Hadoop Big Data

With exploding data volumes, increasing costs of the Enterprise Data Warehouse (EDW) and a raising demand for high-performance analytics, companies have no choice but to reduce the strain on their data warehouse and leverage Hadoop’s economies of scale for data processing. In the first webinar of the series, learn how using Hadoop to optimize the EDW gives IT professionals processing power, advanced archiving and the ability to easily add new data sources.

Date/Time:
Wednesday, May 8, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

2) Getting Started and Successful with Big Data

Sizing, designing and building your Hadoop cluster can sometimes be a challenge. To help our customers, Dell has developed: Hadoop Reference Architecture, a best practice documentation and open source tool called, Crowbar. Paul Brook, from Dell, will describe how customers can go from raw servers to Hadoop cluster in under two hours.

Date/Time:
Wednesday, May 15, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

3) Reducing the Implementation Efforts of Hadoop, NoSQL and Analytical Databases

It’s easy to put a working script together as part of an R&D project, but it’s not cost effective to maintain it throughout an ever building stream of user change requests, system and product updates.  Watch the third webinar in the series to learn how choosing the right technologies and tools can provide you the agility and flexibility to transform big data without coding.

Date/Time:
Wednesday, May 22, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.
4)Reporting, Visualization and Predictive from Hadoop

While unlocking data trapped in large and semi-structured data is the first step of a project, the next step is to begin to analyze and proactively identify new opportunities that will grow your bottom-line. Watch the fourth webinar in the series to learn how to innovate with state-of-the-art technology and predictive algorithms.

Date/Time:
Wednesday, May 29, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

 


Pentaho and Cloudera Impala in 5 words

April 29, 2013

Today our big data partner Cloudera, joined us in continuing to deliver innovative, open technologies that bring real business value to customers. Pentaho and Cloudera share a common history and approach to simplifying complex, but powerful technologies to integrate and analyze big data. Our common open source heritage means that we can innovate at the speed of our customers businesses.

What is Cloudera’s latest Innovation? Cloudera Impala powers Cloudera Enterprise RTQ (Real-time Query), the first data management solution that takes Hadoop beyond batch to enable real-time data processing and analysis on any type of data (unstructured and structured) within a centralized, massively scalable system. Impala dramatically improves the economics and performance of large-scale enterprise data management.

Pentaho and Cloudera Impala in 5 words = Affordable scalability meets fast analytics. Cloudera Imapala enables any product that is JDBC-enabled to get fast results from Hadoop, making Hadoop an ideal component for a data warehouse strategy. Customers no longer have to pay for expensive proprietary DBMS or analytical DBs to house their entire data warehouse.

Cloudera’s innovation makes it even easier for customers to use common analytic tools that can access and analyze data in all of these formats. What does this really mean? It means you don’t have buy expensive, proprietary products that can’t work across all of your data platforms.

With Pentaho and Cloudera you can quickly analyze large volumes of disparate data significantly faster with Impala than with Hive. Take a look at how Cloudera Impala is driving a major evolutionary step in the growth of the company’s Platform for Big Data, Cloudera Enterprise, and the Apache Hadoop ecosystem as a whole.

Richard Daley


Ensure Your Big Data Integration and Analytics Tools are Optimized for Hadoop

March 27, 2013

Existing data integration and business analytics tools are generally built for relational and structured file data sources, and aren’t architected to take advantage of Hadoop’s massively scalable, but high-latency, distributed data management architecture. Here’s a list of requirements for tools that are truly built for Hadoop.

A data integration and data management tool built for Hadoop must:

  1. Run In-Hadoop: fully leverage the power of Hadoop’s distributed data storage and processing. It should do this via native integration with the Hadoop Distributed Cache, to automate distribution across the cluster. Generating inefficient Pig scripts doesn’t count.
  2. Maximize resource usage on each Hadoop node: each node is a computer, with memory and multiple CPU cores. Tools must fully leverage the power of each node, through multi-threaded parallelized execution of data management tasks and high-performance in-memory caching of intermediate results, customized to the hardware characteristics of nodes.
  3. Leverage Hadoop ecosystem tools: tools must natively leverage the rapidly growing ecosystem of Hadoop add-on projects. For example, using Sqoop for bulk loading of huge datasets or Oozie for sophisticated coordination of Hadoop job workflows.

The widely distributed nature of Hadoop means accessing data can take minutes, or even hours. Data visualization and analytics tools built for Hadoop must mitigate this high data access latency:

  1. Provide end-users direct access to data in Hadoop: and after initial access, provide instant speed-of-thought response times.  It must be done in a way that is simple and intuitive for end users, while providing IT with the controls they need to streamline and manage data access for end users.
  2. Create dynamic data marts: make it easy and quick to spin-off Hadoop data into marts and warehouses for longer-lived high-performance analysis of data from Hadoop.

Learn how big data analytics provider Pentaho is optimized for Hadoop at www.pentahobigdata.com.

- Ian Fyfe, Pentaho

Hadoop Elephantthis blog originally appeared on GigaOM at http://gigaom.com/2012/12/11/ensure-your-big-data-integration-and-analytics-tools-are-optimized-for-hadoop/


Follow

Get every new post delivered to your Inbox.

Join 105 other followers