Good news, your data scientist just got a personal assistant

June 3, 2014

personal asstIf you are or have a data scientist in house you’re in for good news.

Today at Hadoop Summit in San Jose, Pentaho unveiled a toolkit built specifically for data scientists to simplify the messy, time-consuming data preparation, cleansing and orchestration of analytic data sets. Don’t just take it from us…

The Ventana Research Big Data Analytics Benchmark Research estimates the top two time-consuming big data tasks are solving data quality and consistency issues (46%) and preparing data for integration (52%). That’s a whopping amount of time just spent getting data prepped and cleansed, not to mention the time spent in post processing results.  Imagine if time spent preparing, managing and orchestrating these processes could be handed off to a personal assistant leaving more time to focus on analyzing and applying advanced and predictive algorithms to data (i.e. doing what a data scientist is paid to do).

Enter the Pentaho Data Science Pack, the personal assistant to the data scientist.  Built to help operationalize advanced analytic models as part of a big data flow, the data science pack leverages familiar tools like R, the most-used tool for data scientists and Weka, a widely used and popular open source collection of machine learning algorithms. No new tools to learn. In the words of our own customer, Ken Krooner, President at ESRG “There was a gap in the market until now and people like myself were piecing together solutions to help with the data preparation, cleansing and orchestration of analytic data sets. The Pentaho Data Science Pack fills that gap to operationalize the data integration process for advanced and predictive analytics.”

Pentaho is at the forefront of solving big data integration challenges, and we know advanced and predictive analytics are core ingredients for success. Find out how close at hand your data science personal assistant is and take a closer look at the Data Science Pack.

Chuck Yarbrough
Director, Big Data Product Marketing


Announcing Pentaho with Storm and YARN

February 11, 2014

One of Pentaho’s core beliefs is that you can’t prepare for tomorrow with yesterday’s tools. In June of 2013, amidst waves of emerging big data technologies, Pentaho established Pentaho Labs to drive innovation through the incubation of these new technologies. Today, one of our Labs projects hatches.  At the Strata Conference in Santa Clara, we announced native integration of Pentaho Data Integration (PDI) with Storm and YARN. This integration enables developers to process big data and drive analytics in real-time, so businesses can make critical decisions on time-sensitive information.

Read the announcement here.

Here is what people are saying about Pentaho with Storm and YARN:

Pentaho Customer
Bryan Stone, Cloud Platform Lead, Synapse Wireless: “As an M2M leader in the Internet of Everything, our wireless solutions require innovative technology to bring big data insights to business users. The powerful combination of Pentaho Data Integration, Storm and YARN will allow my team to immediately leverage real-time processing, without the delay of batch processing or the overhead of designing additional transformations. No doubt this advancement will have a big impact on the next generation of big data analytics.

Leading Big Data Industry Analyst
Matt Aslett, Research Director, Data Management and Analytics, 451 Research: “YARN is enabling Hadoop to be used as a flexible multi-purpose data processing and analytics platform. We are seeing growing interest in Hadoop not just as a platform for batch-based MapReduce but also rapid data ingestion and analysis, especially using Apache Storm. Native support of YARN and Storm from companies like Pentaho will encourage users to innovate and drive greater value from Hadoop.”

Pentaho founder and Pentaho Labs Leader
Richard Daley, Founder and Chief Strategy Officer, Pentaho: “Our customers are facing fast technology iterations from the relentless evolution of the big data ecosystem. With Pentaho’s Adaptive Big Data Layer and Big Data Analytical Platform our customers are “future proofed” from the rapid pace of evolution in the big data environment. In 2014, we’re leading the way in big data analytics with Storm, YARN, Spark and predictive, and making it easy for customers to leverage these innovations.”

Learn more about the innovation of Pentaho Data Integration for Storm on YARN in Pentaho Labs at pentaho.com/storm

If you are at O’Reilly Strata Conference in Santa Clara this week make sure to stop by booth 710 to see a live demo of Pentaho Data integration with Storm and YARN at the O’Reilly Strata Conference in Santa Clara, February 11-13 at Booth 710. The Pentaho team of technologist, data scientist and executives will be on hand to share the latest big data innovations from Pentaho Labs.

Donna Prlich
Senior Director, Product Marketing
Pentaho


edo optimizes data warehouse, increases loyalty and targets new customers

February 10, 2014

edo

What do you do when you need to track, store, blend, and analyze over 6 billion financial data transactions with the outlook of daily growth by the millions? edo Interactive, inc is a digital advertising company that leverages payment networks to connect brands with consumers. Their legacy data integration and analysis system took more than 27 hours to run, meaning that meeting daily Service Level Agreements was nearly impossible. However, after only a few weeks of implementing a data distribution on Hadoop, with Pentaho for data integration, edo Interactive was able to reduce its processing time to less than 8 hours, often as little as 2 hours.

Minimum timesaving’s of 70% quickly precipitated cost savings. With an optimized data warehouse, edo and its clients also spend less time navigating IT barriers. Pentaho’s graphical user interface, removes cumbersome coding of batch process jobs, enabling sophisticated and simplified conversion of data from PostgreSQL to Hadoop, Hive and HBase. edo and its clients quickly gain insights to customer preferences, refine marketing strategies and provide their customers with improved experience and satisfaction.

Edo Interactive successfully navigated many of the obstacles faced when implementing a big data environment and created a lasting and scalable solution. Their vision to provide end-users a better view of their customers has helped shape a new data architecture and embedded analytics capabilities.

To learn more about edo’s Big Data vision and success, read their customer success overview and case study on Pentaho.com. We are excited to announce that Tim Garnto, SVP of Product Engineering at edo, will share his story live when he presents at O’Reilly + Strata on Thursday, February 13th in Santa Clara (11:30AM, Ballroom G).

Strata Santa Clara is already sold out! If you are interested to learn more about edo’s Big Data deployment, leave your questions in the comments section below and we will ask Tim during his speaking session at Strata.

Ben Mayer
Customer Marketing
Pentaho


Matt Casters on DM Radio – Future of ETL

March 20, 2012

Pentaho’s Matt Casters, Chief Architect, Pentaho Data Integration and Kettle Project Founder was featured last week on DM Radio on their radio broadcast titled: On the Move: Why ETL is Here to Stay.

Listen to Matt’s interview with Hosts Eric Kavanagh and Jim Ericson along with panelist Nimitt Desai of Deloitte, Geoff Malafsky of Phasic Systems and Josh Rogers of Syncsort.

Starting at 13:33 Listen to Matt talk about:

  • How Big data and ETL intersect and what that means
  • Points to keep in mind when starting to working and accessing data in and out of Hadoop
  • How to keep track of changing technologies and architectures
  • Why its important to not just do data integration for data integration sake
  • Why there’s a lack of best practices
  • What Matt’s seeing: need for high level of metadata and modeled ETL generation

Access both Matt’s segment and the full podcast here: http://www.information-management.com/dmradio//-10022068-1.html


Facebook and Pentaho Data Integration

July 15, 2011

Social Networking Data

Recently, I have been asked about Pentaho’s product interaction with social network providers such as Twitter and Facebook. The data stored within these “social graphs” can provide its owners with critical metrics around their content. By analyzing trends within user growth and demographics as well as consumption and creation of content…owners and developers are better equipped to improve their business with Facebook and Twitter. Social networking data can already be viewed and analyzed utilizing existing tools such as FB Insights or even purchasable 3rd party software packages created specifically for this purpose. Now…Pentaho Data Integration in its traditional sense is an ETL (Extract Transform Load) tool. It can be used to extract and extrapolate data from these services and merge or consolidate it with other relative company data. However, it can also be used to automatically push information about a company’s product or service to the social network platforms. You see this in action today if you have ever used Facebook and “Liked” something a company had to offer. At regular intervals, you will sometimes note unsolicited product offers and advertisements posted to your wall from those companies. A great and cost effective way to advertise to the masses.

Application Programming Interface

Interacting with these systems is made possible because they provide an API. (Application Programming Interface) To keep it simple, a developer can write a program in “some language” to run on one machine which communicates with the social networking system on another machine. The API can leverage a 3GL such as Java or JavaScript or even simpler, RESTful services. At times, software developers/vendors will write connectors in the native API that can be distributed and used in many software applications. These connectors can offer a quicker and easier approach than writing code alone. It may be possible within the next release of Pentaho Data Integration, that an out of the box Facebook and/or Twitter transformation step is developed – but until then the RESTful APIs provided work just fine with the simple HTTP POST step.  Using Pentaho Data Integration with this out of the box component, allows quick access to social network graph data. It can also provide the ability to push content to those applications such as Facebook and Twitter without writing any code or purchasing a separate connector.

The Facebook Graph API

Both Facebook and Twitter provide a number of APIs, one worth mentioning is the Facebook Graph API (don’t worry Twitter, I’ll get back to you in my next blog entry).

The Graph API is a RESTful service that returns a JSON response. Simply stated an HTTP request can initiate a connection with the FB systems and publish / return data that can then be parsed with a programming language or even better yet – without programing using Pentaho Data Integration and its JSON input step.

Since the FB Graph API provides both data access and publish capabilities across a number of objects (photos, events, statuses, people pages) supported in the FB Social graph, one can leverage both automated push and pull capabilities.

If you are interested in giving this a try or seeing this in action, take a look at this tutorial available on the Pentaho Evaluation Sandbox.

Kind Regards,

Michael Tarallo
Director of Enterprise Solutions
Pentaho


Follow

Get every new post delivered to your Inbox.

Join 101 other followers