Marist College Positively Impacts Higher Education with Pentaho

September 24, 2014

MaristLargeLogoRedMarist College is a medium-sized school with a Big Data promise – to help all students graduate. The liberal arts college in Poughkeepsie, New York, is taking a data-driven approach to education by utilizing big data and predictive analytics to tackle the degree completion crisis that’s trending in higher education across America. As of January 2014, the percentage of full-time students at four-year institutions who complete a bachelor’s degree in four years is only 37.9%, and the completion rate after six years is only 58.3%. By deploying Pentaho Business Analytics, Marist College has developed a completely open-source “academic early alert” system for identifying students at the start of a course who are at risk to not complete it and then deploying interventions to help those students succeed.

Based on a predictive model, the goal of the Open Academic Analytics Initiative (OAAI) was to create an open model for academic success that all institutions can use, customize and build upon through open-source community collaboration.

Marist hopes this initiative will help advance our general understanding of technology-mediated intervention strategies, and through the leadership of Bill Thirsk, Vice President of IT/CIO at Marist College, the program is already proving beneficial – Marist College’s OAAI can determine at the 75-79% accuracy range how well a student is going to perform in a course within two weeks!

Learn more about how with Pentaho’s solution at Marist College is positively impacting the world of higher education.


The Pentaho Test Drive

September 2, 2014

120724_TECH_TESLA_S_SEDAN.jpg.CROP.rectangle3-large

Before making an investment in a new car, you may first narrow down what features you want such as fast, easy to use and cost effective. Next, once you have a strategy you probably head to the dealer to test drive a few of the targeted options before making a big investment in a car that you know will improve your commute.

With the rapidly changing data landscape, there are newer, faster, and easier to use options to access, analyze and predict your data that are now in the marketplace. Similar to the flashy car advertisements, its hard to separate hype from reality of which tools will give your organization the performance and flexibility needed to achieve a return on your investment.

Similar to the process you would take to buy a new car, at PentahoWorld, we are giving you a unique opportunity to test drive three different innovative use cases that we see making a huge impact on our customers. Test drive options include: Pentaho Data Integration (PDI) with Hadoop, PDI with MongoDB, and PDI with Weka (Predictive) – see below for descriptions.

At the PentahoWorld Hands-On Product Training, we hand over the keys or in reality – we provide the experts, computers and lunch. This really is a great opportunity to take your knowledge to the next level with in-depth instructions with these innovative technologies. I recommend signing up for the test drive sessions ASAP as space is limited, and the early bird pricing ends September 5th.

John Durkin
Senior Training Manager
Pentaho

Test Drive Pentaho with Hadoop

This Test Drive covers the steps to use Hadoop in the process of optimizing your data warehouse. You’ll gain hands-on experience with creating folders in HDFS, loading data, working with PDI transformations, configuring Pentaho Map Reduce and reviewing your results in HDFS. You’ll orchestrate it all by using a Hadoop job in PDI.

Test Drive Pentaho with MongoDB

This Test Drive outlines a real world scenario based on creating a 360° view of your customers. You’ll gain hands-on experience with connecting MongoDB to PDI, loading data into a MongoDB document, creating arrays, creating an aggregate pipeline query, and visualizing the results. You’ll orchestrate all of these tasks using a MongoDB job in PDI.

Test Drive Pentaho with Weka

This Test Drive introduces basic data mining concepts and terminology, along with the parts of the Pentaho suite that facilitate the development and application of predictive modelling. If you are new to data mining or you are considering a predictive solution for your business challenge, this is the test drive for you. You’ll gain hands-on experience with the Pentaho tools using real world direct-marketing use case. In particular, you will be introduced to some common types of data mining models and guided through the process of creating, evaluating, exporting and deploying a predictive model.

Sign-up for the Pentaho test drive today!

Photo credit: Slate.com


Good news, your data scientist just got a personal assistant

June 3, 2014

personal asstIf you are or have a data scientist in house you’re in for good news.

Today at Hadoop Summit in San Jose, Pentaho unveiled a toolkit built specifically for data scientists to simplify the messy, time-consuming data preparation, cleansing and orchestration of analytic data sets. Don’t just take it from us…

The Ventana Research Big Data Analytics Benchmark Research estimates the top two time-consuming big data tasks are solving data quality and consistency issues (46%) and preparing data for integration (52%). That’s a whopping amount of time just spent getting data prepped and cleansed, not to mention the time spent in post processing results.  Imagine if time spent preparing, managing and orchestrating these processes could be handed off to a personal assistant leaving more time to focus on analyzing and applying advanced and predictive algorithms to data (i.e. doing what a data scientist is paid to do).

Enter the Pentaho Data Science Pack, the personal assistant to the data scientist.  Built to help operationalize advanced analytic models as part of a big data flow, the data science pack leverages familiar tools like R, the most-used tool for data scientists and Weka, a widely used and popular open source collection of machine learning algorithms. No new tools to learn. In the words of our own customer, Ken Krooner, President at ESRG “There was a gap in the market until now and people like myself were piecing together solutions to help with the data preparation, cleansing and orchestration of analytic data sets. The Pentaho Data Science Pack fills that gap to operationalize the data integration process for advanced and predictive analytics.”

Pentaho is at the forefront of solving big data integration challenges, and we know advanced and predictive analytics are core ingredients for success. Find out how close at hand your data science personal assistant is and take a closer look at the Data Science Pack.

Chuck Yarbrough
Director, Big Data Product Marketing


Weka goes BIG

December 4, 2013

funny_science_nerd_cartoon_character_custom_flyer-rb4a8aff0894a4e25932056b8852f8b18_vgvyf_8byvr_512.jpgThe beakers are bubbling more violently than usual at Pentaho Labs and this time predictive analytics is the focus.  The lab coat, pocket-protector and taped glasses clad scientists have turned their attention to the Weka machine learning software.

Weka, a collection of machine learning algorithms for predictive analytics and data mining, has a number of useful applications. Examples include, scoring credit risk, predicting downtime of machines and analyzing sentiment in social feeds.  The technology can be used to facilitate automatic knowledge discovery by uncovering hidden patterns in complex datasets, or to develop accurate predictive models for forecasting.

Organizations have been building predictive models to aid decision making for a number of years, but the recent explosion in the volume of data being recorded (aka “Big Data”) provides unique challenges for data mining practitioners. Weka is efficient and fast when running against datasets that fit in main memory, but larger datasets often require sampling before processing. Sampling can be an effective mechanism when samples are representative of the underlying problem, but in some cases the loss of information can negatively impact predictive performance.

To combat information loss, and scale Weka’s wide selection of predictive algorithms to large data sets, the folks at Pentaho Labs developed a framework to run Weka in Hadoop. Now the sort of tasks commonly performed during the development of a predictive solution – such as model construction, tuning, evaluation and scoring – can be carried out on large datasets without resorting to down-sampling the data. Hadoop was targeted as the initial distributed platform for the system, but the Weka framework contains generic map-reduce building blocks that can be used to develop similar functionality in other distributed environments.

If you’re a predictive solution developer or a data scientist, the new Weka framework is a much faster path to solution development and deployment.  Just think of the questions you can ask at scale!

To learn more technical details about the Weka Hadoop framework I suggest to read the blog, Weka and Hadoop Part 1, by Mark Hall, Weka core developer at Pentaho.

Also, check out Pentaho Labs to learn more about Predictive Analytics from Pentaho, and to see some of the other cool things the team has brewing.

Chuck Yarbrough
Technical Solutions Marketing


Follow

Get every new post delivered to your Inbox.

Join 102 other followers