Blueprints to Big Data Success

March 21, 2014

data-refinerySeems like these days, everyone falls into one of the three categories.  You are either:

  1. Executing a big data strategy
  2. Implementing a big data strategy
  3. Taking a wait and see attitude

If you’re in one of the first two categories, good for you!  If you’re in the third, you might want re-think your strategy. Companies that get left behind will have a huge hill to climb just to catch up to the competition.

If one of your concerns with moving forward with big data has been a lack of solid guidance to help pave the path to success, then you are in luck!  Pentaho has recently released four Big Data Blueprints to help guide you through the process of executing a strategy.  These are four use cases that Pentaho has seen customers execute successfully.  So, this isn’t just marketing fluff.  These are real architectures that supports real big data business value.  The four blueprints now available on Pentaho’s website includes:

  • Optimize the data warehouse
  • Streamlined data refinery
  • Customer 360-degree view
  • Monetize my data

These blueprints will help you understand the basic architectures that will support your efforts and achieve your desired results.  If you are like many companies just getting started with big data, these are great tools to guide you through the murky waters that lie ahead.  Here is my quick guide to the four Blueprints, where you may want to get started and why.

The Big Data Blueprints

1.    Optimize the Data Warehouse
The data warehouse optimization (or sometimes referred to as data warehouse offloading or DWO) is a great starter use case for gaining experience and expertise with big data, while reducing costs and improving the analytic opportunities for end users.  The idea is to increase the amount of data being stored, but not by shoving it into the warehouse, but by adding Hadoop to house the additional data.  Once you have Hadoop in the mix, Pentaho makes it easy to move data into Hadoop from external sources, move data bi-directionally between the warehouse and Hadoop, as well as makes it easy to process data in Hadoop.  Again, this is a great place to start.  It’s not as transformative to your business as the other use cases can be, but it will build expertise and save you money.

2.    Streamlined Data Refinery
The idea behind the refinery is to provide a way to stream transaction, customer, machine, and other data from their sources through a scalable big data processing hub, where Hadoop is then used to process transformations, store data, and process analytics that can then be sent to an analytic model for reporting and analysis.  Working with several customers, we have seen this as a great next step after the DWO.

3.    Customer 360-Degree View
This blueprint is perhaps the most transformative of all the potential big data use cases. The idea here is to gain greater insight into what your customer is doing, seeing, feeling and purchasing.  All with the idea that you can then serve and retain that customer better, and attract more customers into your fold.  This blueprint lays out the architecture needed to start understanding your customer better.  It will require significant effort in accessing all the appropriate customer touch points, but the payoff can be huge.  Don’t worry too much about getting the full 360-degree view at first; starting with even one small slice can drive huge revenue and retention rates.

4.    Montetize My Data
What do you have locked up in your corporate servers, or in machines you own?  This blueprint can be as transformative as the Customer 360, in that it can create new revenue streams that you may not have ever thought about before.  In some cases, it could create a whole new business opportunity.  What ever your strategy, take time to investigate where and how you can drive new business by leveraging your data.

There are other blueprints that have been defined and developed by Pentaho, but these are four that typically make the most sense for organizations to leverage first.  Feel free to reach out to us for more information about any of these blueprints or to learn more about how Pentaho helps organizations be successful with big data.

Find out more about the big data blueprints at http://www.pentaho.com/big-data-blueprints.

Please let me know what you think @cyarbrough.

Thanks!
Chuck Yarbrough
Product Marketing, Big Data


Weka goes BIG

December 4, 2013

funny_science_nerd_cartoon_character_custom_flyer-rb4a8aff0894a4e25932056b8852f8b18_vgvyf_8byvr_512.jpgThe beakers are bubbling more violently than usual at Pentaho Labs and this time predictive analytics is the focus.  The lab coat, pocket-protector and taped glasses clad scientists have turned their attention to the Weka machine learning software.

Weka, a collection of machine learning algorithms for predictive analytics and data mining, has a number of useful applications. Examples include, scoring credit risk, predicting downtime of machines and analyzing sentiment in social feeds.  The technology can be used to facilitate automatic knowledge discovery by uncovering hidden patterns in complex datasets, or to develop accurate predictive models for forecasting.

Organizations have been building predictive models to aid decision making for a number of years, but the recent explosion in the volume of data being recorded (aka “Big Data”) provides unique challenges for data mining practitioners. Weka is efficient and fast when running against datasets that fit in main memory, but larger datasets often require sampling before processing. Sampling can be an effective mechanism when samples are representative of the underlying problem, but in some cases the loss of information can negatively impact predictive performance.

To combat information loss, and scale Weka’s wide selection of predictive algorithms to large data sets, the folks at Pentaho Labs developed a framework to run Weka in Hadoop. Now the sort of tasks commonly performed during the development of a predictive solution – such as model construction, tuning, evaluation and scoring – can be carried out on large datasets without resorting to down-sampling the data. Hadoop was targeted as the initial distributed platform for the system, but the Weka framework contains generic map-reduce building blocks that can be used to develop similar functionality in other distributed environments.

If you’re a predictive solution developer or a data scientist, the new Weka framework is a much faster path to solution development and deployment.  Just think of the questions you can ask at scale!

To learn more technical details about the Weka Hadoop framework I suggest to read the blog, Weka and Hadoop Part 1, by Mark Hall, Weka core developer at Pentaho.

Also, check out Pentaho Labs to learn more about Predictive Analytics from Pentaho, and to see some of the other cool things the team has brewing.

Chuck Yarbrough
Technical Solutions Marketing


Looking for the perfect match

February 28, 2013

image

I’m at the O’Reilly Strata Big Data Conference in Santa Clara, CA this week where there’s lots of buzz about the value and reality of big data. It’s a fun time to be part of a hot new market in technology. But, of course, a hot new market brings a new set of challenges.

After talking to several attendees, I would not be surprised if someone took out an advertisement in the San Francisco Guardian that reads:

SEEKING BDT (Big Data Talent)

“Middle-aged attractive company seeks hot-to-trot data geek for mutually enjoyable discrete relationship, mostly involving analytics. Must enjoy long discussions about wild statistical models, short walks to the break room and large quantities of caffeine.”

The feedback from the presentations and attendees at Strata mimics the results from a Big Data survey that Pentaho released last week showing there is a lack of current skills to address new big data technologies such as Hadoop among existing staff and more generally on the market. This is good news for folks looking for jobs in Big Data and a good indication for others who want to learn new skills.

The market has created the perfect storm – the combination of hot new technology mixed with a myriad of very complex systems plus highly complicated statistical models and calculations. This storm is preventing the typical IT generalist or BI expert from applying.  More experienced data scientists who can spin models on their head with a twist of a mouse are in high demand. The need to garner value quickly from Big Data means there is little time to look for the “perfect match.”

It seems like new companies and technologies pop up almost every week, each with the promise of business benefits, but with the added cost of high complexity.  Shouldn’t things get easier with new technologies?

Pentaho’s Visual MapReduce is a prime example of things getting easier.  Getting data out of Hadoop quickly can be a challenge.  However, with Visual MapReduce any IT professional could pull the right information from a Hadoop cluster, improve the performance of a MapReduce job and make results available in the optimal format for business users.

New technologies might need new talent, but in the case of Pentaho Visual MapReduce, new technologies might only need new tools to help address them.

Looks like Pentaho is the perfect match.

Chuck Yarbrough
Technical Solutions Marketing


Follow

Get every new post delivered to your Inbox.

Join 88 other followers