Preview of PentahoWorld Speakers – Cloudera, Forrester, NASDAQ

July 25, 2014
PentahoWorld

PentahoWorld

We’ve been busy curating compelling keynotes and general sessions for PentahoWorld that will help you develop a roadmap for scale, growth and success. Before the big reveal of all of our speakers, we wanted to give you a preview of some of the big data leaders and mavericks that will be speaking at PentahoWorld in Orlando, October 8-10.

We’ve selected keynotes that are big data innovators and leaders who’ve had the vision to see the potential of big data well ahead of the curve, while our general sessions will focus on more tactical information as you work to leverage the latest technologies, solve your biggest data challenges, and get the most out of your Pentaho implementation.

Preview of the First Wave of Speakers
Quentin GallivanChairman and Chief Executive Officer, Pentaho
Mike OlsonChief Strategy Officer, Cloudera
Christopher DziekanChief Product Officer, Pentaho
Mike GualtieriPrincipal Analyst, Forrester Research
Michael WeissSr Software Engineer, NASDAQ
John DinhLead Product Manager, NASDAQ

Next steps for you:

  1. Check the Agenda page often as we continue to add to the agenda
  2. Make sure to register today! Early bird ends August 31st (30% savings!)
  3. Are you using Pentaho in an interesting or innovative way? Apply for the Pentaho Excellence Awards – Deadline to submit is July 31st.

 


Attention Retail Banks, Its Time for Change!

July 18, 2014

Walhalla_(1896)_by_Max_Brückner

Retail banks, which have been wracked by scandals relating to PPI fraud, LIBOR rigging, unpopular bonus schemes and IT failures, need to think beyond upselling and cross-selling and consider how big data analytics can repair trust and improve the whole customer experience. In the article, Montetising Big Data in Retail Banks Starts with a Better Customer Experience, Davy Nys, VP of EMEA & APAC at Pentaho shares how retail banks can achieve  the ‘Valhalla’ of customer value pricing (CVP), or maximising the total value of a customer to a bank throughout all interactions and transactions. He explains how big data integration and analytics supports CVP in five ways:

  1. Supporting a two-way, 360-degree view
  2. Lower costs
  3. Smarter offers
  4. Customer friendly fraud detection
  5. Measuring customer sentiment

Learn more about how to achieve the ‘Valhalla’ of CVP read the full article here and register/attend the live webinar featuring Forrester Analyst Martha Bennett on the topic: Making the Most of your Data in the Financial Sector on July 22nd at 11am GMT


Pentaho 5.1 in LEGO

July 16, 2014

Two weeks ago we launched Pentaho Business Analytics 5.1. The new capabilities in Pentaho 5.1 support our ongoing strategy to make the hardest aspects of big data analytics faster, easier and more accessible to all. In honor of our Chief Architect, Will Gorman (also a LEGO Master Builder), we decided to have some fun with LEGO and now present to you the LEGO explanation of new features and functionality in Pentaho 5.1:

Lego_5.1

Direct Analytics on MongoDB – Unleash the value of MongoDB analytics for IT and Business Analysts with no coding required.

MongoDB5.1_2

Data Science Pack – Operationalize predictive models for R and Weka, drastically reducing data preparation time and effort.

Lego_RWeka

Full YARN Support – Reduce complexity for big data developers while leveraging the full power of Hadoop

YARN_5.1

Visit the 5.1 landing page to learn more about this release and access resources such as videos, data sheets, customer profiles and download.

 


World Cup Dashboard 2014 – in 15 minutes

July 3, 2014

dashboard fifa

Are you caught in the World Cup craze? Two of my passions are English football and analytics (hence I’m a SE at Pentaho based in London). So when it came time for this years’ World Cup, naturally I combined my passions to analyse who is going to win and what makes a winning team?

It turns out that a Big Data Analytics team in Germany tried to predict the winners based on massive data sets. Thus far three of their top five predicted teams have faltered. So what went wrong? Is Big Data not accurate? Are analytics not the answer?

Fret not. Exploring their methodology, their analysis was based from only one source of data. At Pentaho, we believe that the strongest insights come from blended data. We don’t just connect to large data sets; we make connecting to all data easy, regardless of format or location.

So why is my little World Copa Dashboard different (pictured above)? Its simple, intuitive and quick to create using Pentaho to access and organize the information I wanted — taking only 15 minutes. Yes, raw data to dashboard in 15 minutes! Here is how I did it:

I took 2010 FIFA World Cup data in CSV format and turned it into three panels of information in a format I wanted to see. It was my view into the action. I wanted more information beyond the basic winners and loosers, rather the interest to look behind the scenes (such as how and why) about one of the most spectacular sporting events on the planet. With analytics, I wanted to explore the following questions:

  1. How did strikers compare in the final outcome?
  2. Across a team, how hard did they work?
  3. What was the strategic approach of the winning teams versus the most industrious ones?

Using Pentaho’s built in mapping features I could visualise the countries with the ‘busiest’ forwards. This told me something about the work rate of the formations that country’s manager/coach was using. I could see the efficiency of the forward position players.

leading forward players

With the pivot table I could conditionally format the table to give me an at-a-glance view of the teams with goalkeeping performance, individual defending and attacking maneuvers.

player stats

 

Embedding the Official FIFA website into my dashboard also gave me access to FIFA’s own ‘trending’ players of this tournament. I could see historic performance versus the latest information.

official FIFA

One of the most vital aspects of analytics is offering the information in the right context for the user. With the rich visualisations of Pentaho Analyser it was simple for me to divide up the teams into a scatter plot showing me which ones worked the hardest for the duration they were in the tournament and which balance took the successful teams furthest.

teamstats

In summary, I got a behind the scenes look at my World Cup data in four different visualisations all combined into one dashboard easily, quickly and efficiently using Pentaho User Console and I was able to prepare the data into a format of my choosing rapidly using Pentaho Data Integration.

Turn your data into your own personal viewpoint with the power of Pentaho Business Analytics 5.1.

GO ENGLAND…for the next World Cup, perhaps? ;)

Zaf Khan
Presales Engineer
Pentaho


Spark on Fire! Integrating Pentaho and Spark

June 30, 2014

One of Pentaho’s great passions is to empower organizations to take advantage of amazing innovations in Big Data to solve new challenges using the existing skill sets they have in their organizations today.  Our Pentaho Labs’ innovations around natively integrating data engineering and analytics with Big Data platforms like Hadoop and Storm have already led dozens of customers to deploy next-generation Big Data solutions. Examples of these solutions include optimizing data warehousing architectures, leveraging Hadoop as a cost effective data refinery, and performing advanced analytics on diverse data sources to achieve a broader 360-degree view of customers.

Not since the early days of Hadoop have we seen so much excitement around a new Big Data technology as we see right now with Apache Spark.  Spark is a Hadoop-compatible computing system that makes big data analysis drastically faster, through in-memory computation, and simpler to write, through easy APIs in Java, Scala and Python.  With the second annual Spark Summit taking place this week in San Francisco, I wanted to share some of the early work Pentaho Labs and our partners over at Databricks are collaborating on to deeply integrate Pentaho and Spark for delivering high performance, Big Data Analytics solutions.

Big Data Integration on Spark

Big Data Integration on SparkAt the core of Pentaho Data Integration (PDI) is a portable ‘data machine’ for ETL which today can be deployed as a stand-alone Pentaho cluster or inside your Hadoop cluster though MapReduce and YARN.  The Pentaho Labs team is now taking this same concept and working on the ability to deploy inside Spark for even faster Big Data ETL processing.  The benefit for ETL designers is the ability to design, test and tune ETL jobs in PDI’s easy-to-use graphical design environment, and then run them at scale on Spark.  This dramatically lowers the skill sets required, increases productivity, and reduces maintenance costs when to taking advantage of Spark for Big Data Integration.

Advanced Analytics on Spark

Last year Pentaho Labs introduced a distributed version of Weka, Pentaho’s machine learning and data mining platform. The goal was to develop a platform-independent approach to using Weka with very large data sets by taking advantage of distributed environments like Hadoop and Spark. Our first implementation proved out this architecture by enabling parallel, in-cluster model training with Hadoop.

Advanced Analytics on Spark

We are now working on a similar level of integration with Spark that includes data profiling and evaluating classification and regression algorithms in Spark.  The early feedback from Pentaho Labs confirms that developing solutions on Spark is faster and easier than with MapReduce. In just a couple weeks of development, we have demonstrated the ability to perform in-cluster Canopy clustering and are very close to having k-means++ working in Spark as well!

Next up: Exploring Data Science Pack Integration with MLlib

MLlib is already one of the most popular technologies for performing advanced analytics on Big Data.  By integrating Pentaho Data Integration with Spark and MLlib, Data Scientists will benefit by having an easy-to-use environment (PDI) to prepare data for use in MLlib-based solutions.  Furthermore, this integration will make it easier for IT to operationalize the work of the Data Science team by orchestrating the entire end-to-end flow from data acquisition, to data preparation, to execution of MLlib-based jobs to sharing the results, all in one simple PDI Job flow.  To get a sense for how this integration might work, I encourage you to look at a similar integration with R we recently launched as part of the Data Science Pack for Pentaho Business Analytics 5.1.

Experiment Today with Pentaho and Spark!

You can experiment with Pentaho and Spark today for both ETL and Reporting.  In conjunction with our partners at Databricks, we recently certified for the following use cases combining Pentaho and Spark:

  • Reading data from Spark as part of an ETL workflow by using Pentaho Data Integration’s Table Input step with Apache Shark (Hive SQL layer runs on Spark)
  • Reporting on Spark data using Pentaho Reporting against Apache Shark

We are excited about this first step in what we both hope to be a collaborative journey towards deeper integration.

Jake Cornelius
Sr. Vice President, Product Management
Pentaho

 


World Cup, Twitter sentiment and equity prices…any correlation?

June 25, 2014

I heard a news story on the radio today about stock markets going quiet during World Cup events, especially when the home country is on the field. This made me think about how live activities affect the major markets. My colleague Bo Borland at Pentaho posed an interesting question on this topic just yesterday at MongoDB World in New York, “Do real time Tweets have an affect on the stock markets?” Working for a Big Data integration and analytics company, Bo of course used Pentaho tools to see if there was indeed a correlation. A cool idea, but what resulted was even cooler than I’d imagined….

Using Pentaho Data Integration, Bo easily pulled minute-by-minute stock tick data which is highly structured, and blended it with unstructured Twitter data. Next, he pushed the blended data into a MongoDB collection to take advantage of its flexibility. (Note: Bo is also the author of Pentaho Analytics for MongoDB). Taking the integration and analysis a step further, he scored the tweet sentiment by including a Weka predictive algorithm as part of the data ingestion process from Twitter. Once the data was in place, he used one of the cool new features in Pentaho 5.1 to “slice and dice” the data stored in MongoDB.

It’s worth pointing out that the ability to analyze data directly from MongoDB with no coding is a first to market feature. Pentaho’s designed and delivered native integration with MongoDB’s Aggregation Framework allowing business users and analysts to immediately access, analyze and visualize MongoDB data for superior insight and governance.

Here’s Bo’s process simplified:

Pentaho Data Integration

  • Ingest data from external data source (TickData) into MongoDB
  • Ingest data from Twitter using public API into MongoDB
  • Execute a Weka Scoring step in during the ingestion process to properly score the incoming tweets and calculate the sentiment

Connect Pentaho Analytics to the Mongo Collection(s)

  • Start analyzing data
  • Slice and dice large amounts of data quickly

Here’s what the process looks like:

diagram mongodb

If you want to see this slicing and dicing directly on data in MongoDB check out this video.

Bo presented this demo yesterday live to a standing room only crowd using Tesla data at MongoDB World. You can access his slides here:

So the question still remains, “Does Twitter sentiment correlate to equity prices?” I’ll let you take a look and decide, but I’ve got some stocks to research….

Chuck Yarbrough
Director, Big Data Product Marketing
Pentaho

 


Introducing Pentaho 5.1 – Powering Big Data Analytics at Scale

June 23, 2014

14-054-Pentaho-5.1-Panel-v5You can’t predict tomorrow with yesterday’s tools. At Pentaho, this has been a core tenant in staying nimble and innovating in this disruptive market. Today, at MongoDB World in New York, we announced Pentaho Business Analytics 5.1, a culmination of speed of innovation and community and customer engagement. Pentaho 5.1 supports our ongoing strategy to make big data analytics faster—at scale—and easier and more accessible for more users.

The most powerful insights are revealed when Big Data can be accessed and blended data at the source. 5.1 enables users to do this in a seamless way eliminating the need for specialized set of skills and bridging the data-to-analytics divide. Our recent Data Science Pack blog post, references analyst research estimating that the top two time-consuming big data tasks are solving data quality and consistency issues (46%) and preparing data for integration (52%). We know a huge amount of resources are spent just getting data ‘ready’ to discover the greatest land mine or gold mine of data.

In 5.1 we are streamlining the big data process and making big data a reality for all with three innovations including:

  • Direct analytics on MongoDB – Unlocks the value of data in NoSQL through interactive visual analysis. Native integration leverages the MongoDB Aggregation Framework, Replication and Tag Sets for direct analysis on MongoDB collections with no impact on throughput.
  • Data Science Pack  - Operationalizes predictive models, drastically reducing data preparation time and effort. The pack includes integration with both R and Weka, two of the most popular machine learning and predictive analytic toolsets in use today by data scientists.
  • Full YARN Support – Reduces complexity for big data developers while leveraging the full power of Hadoop.

Just listen to our customer, Chris Palm, Lead Software Architecture Engineer at MultiPlan share just how daunting the data-to-analytics process can be “Traditional RDBMS analytics can get very complicated and, quite frankly, ugly when working with semi or unstructured data. The Pentaho 5.1 platform is meeting market needs, allowing users to directly analyze data in MongoDB. We have seen more accurate results with new analyses and are no longer constrained by having to pull only part of our data. We can now look across a more full set of data and govern our system of record to gain greater insights.”

I encourage you to explore the impressive new capabilities in Pentaho 5.1. You can access resources such as videos, webinar and download at: http://bit.ly/PTHO5-1.

Chris Dziekan
EVP & Chief Product Officer
Pentaho


Recognizing and Rewarding Your Work: The Pentaho Excellence Awards

June 23, 2014

Lego_PEAOne of my favorite aspects of being CEO of Pentaho is the opportunity to talk to our customers around the world. Innovative and motivated individuals and teams are turning data into value and making a major impact for their organization, and in some cases for the better of society. We are proud to announce first annual Pentaho Excellence Awards to recognize and honor our customers and users, rewarding those that have deployed Pentaho technologies in impressive and innovative ways.

The Pentaho Excellence Awards offer an opportunity for you and your team to receive industry recognition for your expertise in analytics and big data deployments and thought leadership. While we know your teams are busy helping to make faster and smarter business decisions, here is the link to more information about the Pentaho Customer Excellence Awards and our short nomination process: http://bit.ly/PWorldPEA. Nominations are open until July 11th.

A panel of expert judges will pick a winner in six different categories. Category winners receive a free pass to PentahoWorld in Orlando October 8-10, 2014, along with several additional unique opportunities at the event such as a VIP dinner, speaking opportunities and recognition at a keynote awards ceremony. As a highlight of the Awards ceremony we will announce the overall User of the Year Award.

We look forward to celebrating the amazing accomplishments achieved through our work together. I hope to see you on stage during the awards ceremony at PentahoWorld.

Quentin

Photo/LEGO credit: @kathrineiben


Award time at Pentaho

June 18, 2014

The past few weeks we’ve been giddy with excitement about several awards we’ve received celebrating our big data technology and how customers are applying it to reap big benefits. The latest awards Pentaho along with our customers have added to our growing trophy case include:

PrintThe CRN Big Data 100 list identifies vendors that have demonstrated an ability to innovate in bringing to market products and services that help businesses work with big data. Pentaho is proud to be named to the Big Data 100 list for the second year in the business analytics category. The award noted Pentaho’s record 83 percent bookings growth in 2013 for big data and embedded analytics products. In addition, the addition of Christopher Dziekan, previously head of analytics product strategy at IBM as Pentaho’s new Chief Product Officer.

 

2014SDT100_logo_120x123Each year the SD Times 100 recognizes companies, non-commercial organizations, open source projects and other initiatives for their innovation and leadership. Judged by the editors of SD Times, the SD Times 100 recognizes the top innovators and leaders in multiple software development industry areas. Pentaho was selected as a top 10 leader for Big Data, alongside Apache Hadoop, Splunk, Cloudera, DataStax, Hortonworks and MongoDB!

 

Computer-Weekly-EuroUserAwards-EnterpriseSoftwarePentaho customer, Bywaters was shortlisted for the ComputerWeekly European User Awards for Enterprise Software in the category of Best Technology Innovation! Bywaters is a waste management and recycling company based in the UK. They aim to make it easy and affordable for customers to improve their environmental performance and meet regulatory compliance through a system they created that embeds Pentaho called BRAD – Bywaters reporting and analytics dashboards. You can read more about their use case on Pentaho.com or feature article on ComputerWeekly.

SIG-Awards-headline-sponsor-logo-300x237Pentaho/Ctools pro-bono customer, Leukaemia & Lymphoma Research organization was awarded GOLD for ‘Best Use of Performance Reporting and Data Visualization’ by the Institute of Fundraising Insight Awards. The CTools team and Dan Keeley (@Codek1) worked with the UK Beating Blood Cancer group dedicated to improving the lives of patients with all types of blood cancer, including Leukaemia, Lymphoma and Myeloma. They created an (now) award winning dashboards to track the charity events they organize – check out the sample version here.

If you love awards as much as we do, then, you must check out the Pentaho Excellence Awards. This is our first annual awards program to recognize and honor our customers, partners and users who have deployed Pentaho in interesting and innovative ways.  Nominations are open until July 11th.

Rebecca Shomair
Director of Communications
Pentaho

 


Dinosaurs Have Had Their Day

June 16, 2014

dinosaur

Once upon a time, (not so) long ago in 2004, two young technologies were born from the same open source origins – Hadoop and Pentaho. Both evolved quickly from the market’s demand for better, larger-scale analytics, that could be adopted faster to benefit more players

Most who adopt Hadoop want to be disruptive leaders in their market without breaking the bank. Earlier this month at Hadoop Summit 2014, I talked to many people who told me, “I’d like to get off of <insert old proprietary software here> for my new big data applications and that’s why we’re looking at Pentaho.” It’s simple – no company is going to adopt Hadoop and then turn around and pay the likes of Informatica, Oracle or SAS outrageous amounts for data engineering or analytics.

Big data is the asteroid that has hit the tech market and changed its landscape forever, giving life to new business models and architectures based on open source technologies. First the ancient dinosaurs ignored open source, then they fought it and now they are trying to embrace it. But the mighty force of evolution had other plans. Dinosaurs are giving way to a more nimble generation that doesn’t depend on a mammoth diet of maintenance revenue, exorbitant license fees and long-term deals just to survive.

In this new world companies must continually evolve to survive and dinosaurs have had their day. It’s incredibly rewarding to be  part of a new analytics ecosystem that thrives on open standards, high performance and better value for customers. So many positive evolutionary changes have taken place in the last ten years, I can’t wait to see what the next ten will bring.

Richard Daley
Founder and Chief Strategy Officer
Pentaho

Image: #147732373 / gettyimages.com


Follow

Get every new post delivered to your Inbox.

Join 97 other followers