Weka goes BIG

December 4, 2013

funny_science_nerd_cartoon_character_custom_flyer-rb4a8aff0894a4e25932056b8852f8b18_vgvyf_8byvr_512.jpgThe beakers are bubbling more violently than usual at Pentaho Labs and this time predictive analytics is the focus.  The lab coat, pocket-protector and taped glasses clad scientists have turned their attention to the Weka machine learning software.

Weka, a collection of machine learning algorithms for predictive analytics and data mining, has a number of useful applications. Examples include, scoring credit risk, predicting downtime of machines and analyzing sentiment in social feeds.  The technology can be used to facilitate automatic knowledge discovery by uncovering hidden patterns in complex datasets, or to develop accurate predictive models for forecasting.

Organizations have been building predictive models to aid decision making for a number of years, but the recent explosion in the volume of data being recorded (aka “Big Data”) provides unique challenges for data mining practitioners. Weka is efficient and fast when running against datasets that fit in main memory, but larger datasets often require sampling before processing. Sampling can be an effective mechanism when samples are representative of the underlying problem, but in some cases the loss of information can negatively impact predictive performance.

To combat information loss, and scale Weka’s wide selection of predictive algorithms to large data sets, the folks at Pentaho Labs developed a framework to run Weka in Hadoop. Now the sort of tasks commonly performed during the development of a predictive solution – such as model construction, tuning, evaluation and scoring – can be carried out on large datasets without resorting to down-sampling the data. Hadoop was targeted as the initial distributed platform for the system, but the Weka framework contains generic map-reduce building blocks that can be used to develop similar functionality in other distributed environments.

If you’re a predictive solution developer or a data scientist, the new Weka framework is a much faster path to solution development and deployment.  Just think of the questions you can ask at scale!

To learn more technical details about the Weka Hadoop framework I suggest to read the blog, Weka and Hadoop Part 1, by Mark Hall, Weka core developer at Pentaho.

Also, check out Pentaho Labs to learn more about Predictive Analytics from Pentaho, and to see some of the other cool things the team has brewing.

Chuck Yarbrough
Technical Solutions Marketing


“There is nothing more constant than change”—Heraclitus 535BC

June 26, 2013

13-090 Pentaho Labs logo v3

Change and more change. It’s been incredible watching the evolution of and innovation in the big data market.  A few years ago we were helping customers understand Hadoop and the value it could bring in analyzing large volumes of unstructured data. Flash-forward to today as we attend our third Hadoop Summit in San Jose and we see the advances customers have made in adopting these technologies in their production big data environments..

It’s the value of a continuum of innovation. As the market matures we are only limited by what we don’t leave ourselves open to.  Think for a minute about the next “big data,” because there will be one. We can’t anticipate what it look like, where it will come from or how much of it will be of value.  In the same way we couldn’t predict the advent of Facebook or Twitter.

We do know that innovation is a constant. Today’s big data will be tomorrow’s “traditional” data.

Pentaho’s announcement today of an adaptive big data layer and Pentaho Labs are in anticipation of just this type of change.  We’ve simplified for Pentaho and our customers the ability to leverage current and new big data technologies like Hadoop, NoSQL and specialized big data stores.

In the spirit of innovation (which stems from our open source history) we’ve established Pentaho Labs – our place for free thinking innovation that leads to new capabilities in our platform in areas like real time and predictive analytics.

Being a leader at the forefront of a disruptive and ever-changing market means embracing change and the innovation. That’s the future of analytics.

Donna Prlich
Senior Director, Product Marketing, Pentaho


Follow

Get every new post delivered to your Inbox.

Join 102 other followers