Pentaho 5.2, Marching Towards Governed Data Delivery

October 9, 2014

Q on stageIn our hyper-competitive global economy, companies are urgently seeking to harness the value from all their data to find new revenue streams, operate more efficiently, deliver outstanding service and minimize risk. They need a platform that catalyzes collaboration between IT and business to meet demands both for governance and the business need for timely, comprehensive data.

Today at PentahoWorld we’ve committed to leading the charge towards a truly data-driven future, starting with governed data delivery.

In his keynote, Pentaho Chairman and CEO, Quentin Gallivan lifted the veil on a big data orchestration platform that will be instrumental in delivering this vision by overcoming three persistent big data challenges: need for strong data governance, diverse data blending, and delivery of analytics embedded at the point of impact. Fundamental to Pentaho’s strategy is the concept of governed data delivery, defined as the ability to blend trusted and timely data to power analytics at scale for all users in all environments.

Also announced today, Pentaho 5.2 includes new innovations to simplify delivery of blended data sets with IT trust, planting new guideposts on the road to governed data delivery.

chris

Automated Streamlined Data Refinery

IT can hand the power of data driven analytics to the business with an automated Streamlined Data Refinery. Hadoop serves as a central processing hub where analytics ready data sets can be blended, refined, auto-modeled, and automatically published directly to analytic data bases like HP Vertica for high performance interactive analytics using Pentaho Analyzer.

Advanced Data Security in Hadoop

Pentaho 5.2 extends the adaptive big data layer with advanced capabilities for Kerberos security support for major Hadoop distributions, including Cloudera, Hortonworks and MapR. This ensures that only those users with proper credentials can access Hadoop cluster resources in their data orchestration.

Pentaho 5.2 also includes additional features that simplify the user experience for both embedded and direct customers.

Learn more about Pentaho 5.2 and Governed Data Delivery – watch the video and read the whitepaper at www.pentaho.com/product/governed-data-delivery.

Follow PentahoWorld live on Twitter using the hashtag #PWorld2014.

Chuck Yarbrough
Product Marketing, Big Data
Pentaho


World Cup, Twitter sentiment and equity prices…any correlation?

June 25, 2014

I heard a news story on the radio today about stock markets going quiet during World Cup events, especially when the home country is on the field. This made me think about how live activities affect the major markets. My colleague Bo Borland at Pentaho posed an interesting question on this topic just yesterday at MongoDB World in New York, “Do real time Tweets have an affect on the stock markets?” Working for a Big Data integration and analytics company, Bo of course used Pentaho tools to see if there was indeed a correlation. A cool idea, but what resulted was even cooler than I’d imagined….

Using Pentaho Data Integration, Bo easily pulled minute-by-minute stock tick data which is highly structured, and blended it with unstructured Twitter data. Next, he pushed the blended data into a MongoDB collection to take advantage of its flexibility. (Note: Bo is also the author of Pentaho Analytics for MongoDB). Taking the integration and analysis a step further, he scored the tweet sentiment by including a Weka predictive algorithm as part of the data ingestion process from Twitter. Once the data was in place, he used one of the cool new features in Pentaho 5.1 to “slice and dice” the data stored in MongoDB.

It’s worth pointing out that the ability to analyze data directly from MongoDB with no coding is a first to market feature. Pentaho’s designed and delivered native integration with MongoDB’s Aggregation Framework allowing business users and analysts to immediately access, analyze and visualize MongoDB data for superior insight and governance.

Here’s Bo’s process simplified:

Pentaho Data Integration

  • Ingest data from external data source (TickData) into MongoDB
  • Ingest data from Twitter using public API into MongoDB
  • Execute a Weka Scoring step in during the ingestion process to properly score the incoming tweets and calculate the sentiment

Connect Pentaho Analytics to the Mongo Collection(s)

  • Start analyzing data
  • Slice and dice large amounts of data quickly

Here’s what the process looks like:

diagram mongodb

If you want to see this slicing and dicing directly on data in MongoDB check out this video.

Bo presented this demo yesterday live to a standing room only crowd using Tesla data at MongoDB World. You can access his slides here:

So the question still remains, “Does Twitter sentiment correlate to equity prices?” I’ll let you take a look and decide, but I’ve got some stocks to research….

Chuck Yarbrough
Director, Big Data Product Marketing
Pentaho

 


Good news, your data scientist just got a personal assistant

June 3, 2014

personal asstIf you are or have a data scientist in house you’re in for good news.

Today at Hadoop Summit in San Jose, Pentaho unveiled a toolkit built specifically for data scientists to simplify the messy, time-consuming data preparation, cleansing and orchestration of analytic data sets. Don’t just take it from us…

The Ventana Research Big Data Analytics Benchmark Research estimates the top two time-consuming big data tasks are solving data quality and consistency issues (46%) and preparing data for integration (52%). That’s a whopping amount of time just spent getting data prepped and cleansed, not to mention the time spent in post processing results.  Imagine if time spent preparing, managing and orchestrating these processes could be handed off to a personal assistant leaving more time to focus on analyzing and applying advanced and predictive algorithms to data (i.e. doing what a data scientist is paid to do).

Enter the Pentaho Data Science Pack, the personal assistant to the data scientist.  Built to help operationalize advanced analytic models as part of a big data flow, the data science pack leverages familiar tools like R, the most-used tool for data scientists and Weka, a widely used and popular open source collection of machine learning algorithms. No new tools to learn. In the words of our own customer, Ken Krooner, President at ESRG “There was a gap in the market until now and people like myself were piecing together solutions to help with the data preparation, cleansing and orchestration of analytic data sets. The Pentaho Data Science Pack fills that gap to operationalize the data integration process for advanced and predictive analytics.”

Pentaho is at the forefront of solving big data integration challenges, and we know advanced and predictive analytics are core ingredients for success. Find out how close at hand your data science personal assistant is and take a closer look at the Data Science Pack.

Chuck Yarbrough
Director, Big Data Product Marketing


Blueprints to Big Data Success

March 21, 2014

data-refinerySeems like these days, everyone falls into one of the three categories.  You are either:

  1. Executing a big data strategy
  2. Implementing a big data strategy
  3. Taking a wait and see attitude

If you’re in one of the first two categories, good for you!  If you’re in the third, you might want re-think your strategy. Companies that get left behind will have a huge hill to climb just to catch up to the competition.

If one of your concerns with moving forward with big data has been a lack of solid guidance to help pave the path to success, then you are in luck!  Pentaho has recently released four Big Data Blueprints to help guide you through the process of executing a strategy.  These are four use cases that Pentaho has seen customers execute successfully.  So, this isn’t just marketing fluff.  These are real architectures that supports real big data business value.  The four blueprints now available on Pentaho’s website includes:

  • Optimize the data warehouse
  • Streamlined data refinery
  • Customer 360-degree view
  • Monetize my data

These blueprints will help you understand the basic architectures that will support your efforts and achieve your desired results.  If you are like many companies just getting started with big data, these are great tools to guide you through the murky waters that lie ahead.  Here is my quick guide to the four Blueprints, where you may want to get started and why.

The Big Data Blueprints

1.    Optimize the Data Warehouse
The data warehouse optimization (or sometimes referred to as data warehouse offloading or DWO) is a great starter use case for gaining experience and expertise with big data, while reducing costs and improving the analytic opportunities for end users.  The idea is to increase the amount of data being stored, but not by shoving it into the warehouse, but by adding Hadoop to house the additional data.  Once you have Hadoop in the mix, Pentaho makes it easy to move data into Hadoop from external sources, move data bi-directionally between the warehouse and Hadoop, as well as makes it easy to process data in Hadoop.  Again, this is a great place to start.  It’s not as transformative to your business as the other use cases can be, but it will build expertise and save you money.

2.    Streamlined Data Refinery
The idea behind the refinery is to provide a way to stream transaction, customer, machine, and other data from their sources through a scalable big data processing hub, where Hadoop is then used to process transformations, store data, and process analytics that can then be sent to an analytic model for reporting and analysis.  Working with several customers, we have seen this as a great next step after the DWO.

3.    Customer 360-Degree View
This blueprint is perhaps the most transformative of all the potential big data use cases. The idea here is to gain greater insight into what your customer is doing, seeing, feeling and purchasing.  All with the idea that you can then serve and retain that customer better, and attract more customers into your fold.  This blueprint lays out the architecture needed to start understanding your customer better.  It will require significant effort in accessing all the appropriate customer touch points, but the payoff can be huge.  Don’t worry too much about getting the full 360-degree view at first; starting with even one small slice can drive huge revenue and retention rates.

4.    Montetize My Data
What do you have locked up in your corporate servers, or in machines you own?  This blueprint can be as transformative as the Customer 360, in that it can create new revenue streams that you may not have ever thought about before.  In some cases, it could create a whole new business opportunity.  What ever your strategy, take time to investigate where and how you can drive new business by leveraging your data.

There are other blueprints that have been defined and developed by Pentaho, but these are four that typically make the most sense for organizations to leverage first.  Feel free to reach out to us for more information about any of these blueprints or to learn more about how Pentaho helps organizations be successful with big data.

Find out more about the big data blueprints at http://www.pentaho.com/big-data-blueprints.

Please let me know what you think @cyarbrough.

Thanks!
Chuck Yarbrough
Product Marketing, Big Data


Weka goes BIG

December 4, 2013

funny_science_nerd_cartoon_character_custom_flyer-rb4a8aff0894a4e25932056b8852f8b18_vgvyf_8byvr_512.jpgThe beakers are bubbling more violently than usual at Pentaho Labs and this time predictive analytics is the focus.  The lab coat, pocket-protector and taped glasses clad scientists have turned their attention to the Weka machine learning software.

Weka, a collection of machine learning algorithms for predictive analytics and data mining, has a number of useful applications. Examples include, scoring credit risk, predicting downtime of machines and analyzing sentiment in social feeds.  The technology can be used to facilitate automatic knowledge discovery by uncovering hidden patterns in complex datasets, or to develop accurate predictive models for forecasting.

Organizations have been building predictive models to aid decision making for a number of years, but the recent explosion in the volume of data being recorded (aka “Big Data”) provides unique challenges for data mining practitioners. Weka is efficient and fast when running against datasets that fit in main memory, but larger datasets often require sampling before processing. Sampling can be an effective mechanism when samples are representative of the underlying problem, but in some cases the loss of information can negatively impact predictive performance.

To combat information loss, and scale Weka’s wide selection of predictive algorithms to large data sets, the folks at Pentaho Labs developed a framework to run Weka in Hadoop. Now the sort of tasks commonly performed during the development of a predictive solution – such as model construction, tuning, evaluation and scoring – can be carried out on large datasets without resorting to down-sampling the data. Hadoop was targeted as the initial distributed platform for the system, but the Weka framework contains generic map-reduce building blocks that can be used to develop similar functionality in other distributed environments.

If you’re a predictive solution developer or a data scientist, the new Weka framework is a much faster path to solution development and deployment.  Just think of the questions you can ask at scale!

To learn more technical details about the Weka Hadoop framework I suggest to read the blog, Weka and Hadoop Part 1, by Mark Hall, Weka core developer at Pentaho.

Also, check out Pentaho Labs to learn more about Predictive Analytics from Pentaho, and to see some of the other cool things the team has brewing.

Chuck Yarbrough
Technical Solutions Marketing


Looking for the perfect match

February 28, 2013

image

I’m at the O’Reilly Strata Big Data Conference in Santa Clara, CA this week where there’s lots of buzz about the value and reality of big data. It’s a fun time to be part of a hot new market in technology. But, of course, a hot new market brings a new set of challenges.

After talking to several attendees, I would not be surprised if someone took out an advertisement in the San Francisco Guardian that reads:

SEEKING BDT (Big Data Talent)

“Middle-aged attractive company seeks hot-to-trot data geek for mutually enjoyable discrete relationship, mostly involving analytics. Must enjoy long discussions about wild statistical models, short walks to the break room and large quantities of caffeine.”

The feedback from the presentations and attendees at Strata mimics the results from a Big Data survey that Pentaho released last week showing there is a lack of current skills to address new big data technologies such as Hadoop among existing staff and more generally on the market. This is good news for folks looking for jobs in Big Data and a good indication for others who want to learn new skills.

The market has created the perfect storm – the combination of hot new technology mixed with a myriad of very complex systems plus highly complicated statistical models and calculations. This storm is preventing the typical IT generalist or BI expert from applying.  More experienced data scientists who can spin models on their head with a twist of a mouse are in high demand. The need to garner value quickly from Big Data means there is little time to look for the “perfect match.”

It seems like new companies and technologies pop up almost every week, each with the promise of business benefits, but with the added cost of high complexity.  Shouldn’t things get easier with new technologies?

Pentaho’s Visual MapReduce is a prime example of things getting easier.  Getting data out of Hadoop quickly can be a challenge.  However, with Visual MapReduce any IT professional could pull the right information from a Hadoop cluster, improve the performance of a MapReduce job and make results available in the optimal format for business users.

New technologies might need new talent, but in the case of Pentaho Visual MapReduce, new technologies might only need new tools to help address them.

Looks like Pentaho is the perfect match.

Chuck Yarbrough
Technical Solutions Marketing


Follow

Get every new post delivered to your Inbox.

Join 101 other followers