Analyze 10 years of Chicago Crime with Pentaho, Cloudera Search and Impala

December 23, 2013

Hadoop is a complex technology stack and many people getting started with Hadoop spend an inordinate amount of time focusing on operational aspects – getting the cluster up and running, obtaining foundational training, and ingesting data. Consequently it can be difficult to get a good picture of the true value that Hadoop provides, namely unlocking insight across multiple data streams that add valuable context to the transactional history comprising most of the core data in the enterprise.

At Strata Hadoop World in October, Pentaho’s Lord of 1’s and 0’s or CTO, James Dixon, unveiled a powerful demonstration of the true value that Hadoop – combined with enabling technology from Pentaho and our partner Cloudera – can provide. He took a publicly available data set provided by the City of Chicago and built a demo around it that enables nontechnical end-users to understand how crime patterns have changed over time in Chicago, unlocking insight into the type of crimes being committed in different areas of the city – not only historically but also broken down by time of day and day of week. As a result, citizenry as well as law enforcement have a much better sense of what to expect on the streets of Chicago from the insight the demonstration provides.

In the demo, end-users start with a dashboard that provides a high-level understanding of the mix of crimes historically committed on the streets of Chicago over the last ten years. Watch the demo here:

This kind of top-to-bottom understanding of (in this case) crime patterns is uniquely enabled by the capability Pentaho delivers to the market, combining dashboarding, analytics and data integration into one easily-embedded platform that leverages blending across multiple data sets.

The deep understanding that Pentaho’s solution delivers to end-users is enabled by two key technologies from Cloudera: Cloudera Search and Impala. The original data set provided by the City of Chicago was loaded into a Cloudera Hadoop cluster using Pentaho’s data integration tool, Pentaho Data Integration (“PDI”). End-user drilldown is powered by Cloudera Search, which executes a faceted search on behalf of Pentaho’s dashboard. Once an area of interest has been located, Cloudera’s Impala executes low-latency performance of SQL on the raw data stored in the Hadoop cluster to bring up individual crime records.

Although Hadoop is often perceived as a geek’s playground, the power of Pentaho’s business-friendly interface is readily apparent when engaging this demo. Unlocking the power of Hadoop can be as simple as engaging Pentaho’s integrated approach to analytics together with Cloudera’s foundational platform to deliver an integrated solution whose value is apparent to nontechnical executives wondering whether Hadoop is the right choice for a key initiative.

Rob Rosen
Field Big Data Lead
Pentaho


Big Data, Big Revenue for Marketers

December 12, 2013

Why might Big Data mean millions for marketing?  Because it has the potential to create a more complete picture of the buyer, thereby empowering marketers to more effectively deliver the right message to the right individual at the right time – and ultimately increase sales.  In the following brief video from DMA 2013, Marketo VP/Co-founder Jon Miller and Pentaho CMO Rosanne Saccone provide a crash course on what Big Data means for marketers.  It covers:

  • The defining characteristics of Big Data – Velocity, Variety, & Volume
  • How marketers can leverage Big Data to blend operational information (CRM, ERP) and online data (web activity, social networking interactions) for new insights
  • Sample Big Data use cases that organizations are green-lighting today to optimize customer interactions and drive marketing’s contribution to revenue

Note that this is an excerpt from a larger presentation – for the full video please click here.

We’d also recommend this blog post by Jon Miller for more context on Big Data in marketing.

For additional compelling use cases that leverage Big Data for marketing and other functions, see here.

Ben Hopkins
Product Marketing
Pentaho


Big Data 2014: Powering Up the Curve

December 5, 2013

Last year, I predicted that 2013 would be the year big data analytics started to go into mainstream deployment and the research we recently commissioned with Enterprise Management Consultants indicates that’s happened. What really surprised me though is the extent to which the demand for data blending has powered up the curve and I believe this trend will accelerate big data growth in 2014.

Prediction one: The big data ‘power curve’ in 2014 will be shaped by business users’ demand for data blending
Customers like Andrew Robbins of Paytronix and Andrea Dommers-Nilgen of TravelTainment, who recently spoke about their Pentaho projects at events in NY and London, both come from the business side and are achieving specific goals for their companies by blending big and relational data. Business users like these are getting inspired by the potential to tap into blended data to gain new insights from a 360 degree customer view, including the ability to analyze customer behavior patterns and predict the likelihood that customers will take advantage of targeted offers.

Prediction two: big data needs to play well with others!
Historically, big data projects have largely sat in the IT departments because of the technical skills needed and the growing and bewildering array of technologies that can be combined to build reference architectures. Customers must choose from the various commercial and open source technologies including Hadoop distributions, NoSQL databases, high-speed databases, analytics platforms and many other tools and plug-ins. But they also need to consider existing infrastructure including relational data and data warehouses and how they’ll fit into the picture.

The plus side of all this choice and diversity is that after decades of tyranny and ‘lock-in’ imposed by enterprise software vendors, in 2014, even greater buying power will shift to customers. But there are also challenges. It can be cumbersome to manage this heterogeneous data environment involved with big data analytics. It also means that IT will be looking for Big Data tools to help deploy and manage these complex emerging reference architectures, and to simplify them.  It will be incumbent on the Big Data technology vendors to play well with each other and work towards compatibility. After all, it’s the ability to access and manage information from multiple sources that will add value to big data analytics.

Prediction three: you will see even more rapid innovation from the big data open source community
New open source projects like Hadoop 2.0 and YARN, as the next generation Hadoop resource manager, will make the Hadoop infrastructure more interactive. New open source projects like STORM, a streaming communications protocol, will enable more real-time, on-demand blending of information in the big data ecosystem.

Since we announced the industry’s first native Hadoop connectors in 2010, we’ve been on a mission to make the transition to big data architectures easier and less risky in the context of this expanding ecosystem. In 2013 we made some massive breakthroughs towards this, starting with our most fundamental resource, the adaptive big data layer. This enables IT departments to feel smarter, safer and more confident about their reference architectures and open up big data solutions to people in the business, whether they be data scientists, data analysts, marketing operations analysts or line of business managers.

Prediction four: you can’t prepare for tomorrow with yesterday’s tools
We’re continuing to refine our platform to support the future of analytics. In 2014, we’ll release new functionality, upgrades and plug-ins to make it even easier and faster to move, blend and analyze relational and big data sources. We’re planning to improve the capabilities of the adaptive data layer and make it more secure and easy for customers to manage data flow. On the analytics side, we’re working to simplify data discovery on the fly for all business users and make it easier to find patterns and catch anomalies. In Pentaho Labs, we’ll continue to work with early adopters to cook up new technologies to bring things like predictive, machine data and real-time analytics into mainstream production.

As people in the business continue to see what’s possible with blended big data, I believe we’re going to witness some really exciting breakthroughs and results. I hope you’re as excited as I am about 2014!

Quentin Gallivan, CEO, Pentaho

Big-Data-2014-Predictions-Blog-Graphic


9 Years Later….

October 8, 2013
5founders

Photo taken the day Pentaho was founded – October 8, 2004

On Oct 8, 2004 five guys got some crazy idea to create a commercial open source BI offering to provide customers of all sizes with a better and more affordable solution than existed from proprietary vendors. Nine years later – “BI” became “BA”,  the core platform just underwent its biggest overhaul since its inception,  our UI/UX is the best ever, the open source community is still key, big data has become our core growth strategy, predictive is awakening, we went thru the biggest financial crisis since the Great Depression and are achieving great Y/Y bookings growth. Last, but not least, we have been very fortunate to attract and retain a fantastic, talented, passionate team to make us the leader in big data analytics. Big Data is one of the biggest business impacts our industry has seen in decades and we’re making it happen.

Congrats to the entire company for making this real. Happy Birthday Pentaho.

Richard

Richard Daley
Co-Founder and Chief Strategy Officer, Pentaho


Pentaho 5.0 blends right in!

September 12, 2013

Dear Pentaho friends,

Ever since a number of projects joined forces under the Pentaho umbrella (over 7 years ago) we have been looking for ways to create more synergy across this complete software stack.  That is why today I’m exceptionally happy to be able to announce, not just version 5.0 of Pentaho Data Integration but a new way to integrate Data Integration, Reporting, Analyses, Dashboarding and Data Mining through one single interface called Data Blending, available in Pentaho Business Analytics 5.0 Commercial Edition

Data Blending allows a data integration user to create a transformation capable of delivering data directly to our other Pentaho Business Analytics tools (and even non-Pentaho tools).  Traditionally data is delivered to these tools through a relational database. However, there are cases where that can be inconvenient, for example when the volume of data is just too high or when you can’t wait until the database tables are updated.  This for example leads to a new kind of big data architecture with many moving parts:

Evolving Big Data Architectures

Evolving Big Data Architectures

From what we can see in use at major deployments with our customers, mixing Big Data, NoSQL and classical RDBS technologies is more the rule than the exception.

So, how did we solve this puzzle?

The main problem we faced early on was that the default language used under the covers, in just about any business intelligence user facing tool, is SQL.  At first glance it seems that the worlds of data integration and SQL are not compatible.  In DI we read from a multitude of data sources, such as databases, spreadsheets, NoSQL and Big Data sources, XML and JSON files, web services and much more.  However, SQL itself is a mini-ETL environment on its own as it selects, filters, counts and aggregates data.  So we figured that it might be easiest if we would translate the SQL used by the various BI tools into Pentaho Data Integration transformations. This way, Pentaho Data Integration is doing what it does best, not directed by manually designed transformations but by SQL.  This is at the heart of the Pentaho Data Blending solution.

MattCasters_Blog_graphic

The internals of Data Blending

In other words: we made it possible for you to create a virtual “database” with “tables” where the data actually comes from a transformation step.

To ensure that the “automatic” part of the data chain doesn’t become an impossible to figure out “black box”, we made once more good use of existing PDI technologies.  We’re logging all executed queries on the Data Integration server (or Carte server) so you have a full view of all the work being done:

Data Blending Transparency

Data Blending Transparency

In addition to this, the statistics from the queries can be logged and viewed in the operations data mart giving you insights into which data is queried and how often.

We sincerely hope that you like these new powerful options for Pentaho Business Analytics 5.0!

Enjoy!

Matt

–If you want to learn more about the new features in this 5.0 release, Pentaho is hosting a webinar and demonstration on September 24th – Two options to register:  EMEA & North America time zones.

Matt Casters
Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions (Wiley)


Customers Speak out – Wisdom of the Crowds Business Intelligence Study, 2013

May 21, 2013

Pentaho-wisdom-panel“Responsiveness”, “professionalism”, “knowledge” and “experience”– these are just a few of the words our customers used in giving Pentaho the honor of being recently named a top business intelligence technology vendor in the third annual independent Wisdom of Crowds® Business Intelligence Market Study conducted by Dresner Advisory Services, LLC. The report recognizes Pentaho as a “High Growth BI Software” company with a critical mass of customers growing well above the average.

We have made and continue to make significant investments in simplifying and delivering real value in big data integration and analytics and our customers’ satisfaction. Being named a ‘high growth vendor’ validates that we are experiencing high growth in concert with the big data market, but not at the expense of our customers.

Pentaho earned high marks from its customers on multiple metrics specifically standing out in product, support, consulting and integrity.  This independent research comes straight from the voice of our customers, which is the best possible acknowledgement that we are indeed delivering the future of analytics.

I encourage you to download the full Wisdom of Crowds report to learn how the top vendors stack up and the top BI trends.

Donna Prlich
Senior Director, Product Marketing


Big Data Integration Webinar Series

May 6, 2013

line-chartDo you have a big data integration plan? Are you implementing big data? Big data, big data, big data. Did we say big data? EVERYONE is talking about big data…..but what are they really talking about? When you pull back the marketing curtains and look at the technology, what are the main elements and important true and tried trends that you should know?

Pentaho is hosting a four-part technical series on the key elements and trends surrounding big data. Each week of the series will bring a new, content-rich webinar helping organizations find the right track to understand, recognize value and cost-effectively deploy big data analytics.

All webinars will be held 8 am PT / 11 am ET / 16:00 GMT. To register follow the links below and for more information contact Rob Morrison at rmorrison at pentaho dot com.

1) Enterprise Data Warehouse Optimization with Hadoop Big Data

With exploding data volumes, increasing costs of the Enterprise Data Warehouse (EDW) and a raising demand for high-performance analytics, companies have no choice but to reduce the strain on their data warehouse and leverage Hadoop’s economies of scale for data processing. In the first webinar of the series, learn how using Hadoop to optimize the EDW gives IT professionals processing power, advanced archiving and the ability to easily add new data sources.

Date/Time:
Wednesday, May 8, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

2) Getting Started and Successful with Big Data

Sizing, designing and building your Hadoop cluster can sometimes be a challenge. To help our customers, Dell has developed: Hadoop Reference Architecture, a best practice documentation and open source tool called, Crowbar. Paul Brook, from Dell, will describe how customers can go from raw servers to Hadoop cluster in under two hours.

Date/Time:
Wednesday, May 15, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

3) Reducing the Implementation Efforts of Hadoop, NoSQL and Analytical Databases

It’s easy to put a working script together as part of an R&D project, but it’s not cost effective to maintain it throughout an ever building stream of user change requests, system and product updates.  Watch the third webinar in the series to learn how choosing the right technologies and tools can provide you the agility and flexibility to transform big data without coding.

Date/Time:
Wednesday, May 22, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.
4)Reporting, Visualization and Predictive from Hadoop

While unlocking data trapped in large and semi-structured data is the first step of a project, the next step is to begin to analyze and proactively identify new opportunities that will grow your bottom-line. Watch the fourth webinar in the series to learn how to innovate with state-of-the-art technology and predictive algorithms.

Date/Time:
Wednesday, May 29, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

 


Follow

Get every new post delivered to your Inbox.

Join 102 other followers