Big Data 2014: Powering Up the Curve

December 5, 2013

Last year, I predicted that 2013 would be the year big data analytics started to go into mainstream deployment and the research we recently commissioned with Enterprise Management Consultants indicates that’s happened. What really surprised me though is the extent to which the demand for data blending has powered up the curve and I believe this trend will accelerate big data growth in 2014.

Prediction one: The big data ‘power curve’ in 2014 will be shaped by business users’ demand for data blending
Customers like Andrew Robbins of Paytronix and Andrea Dommers-Nilgen of TravelTainment, who recently spoke about their Pentaho projects at events in NY and London, both come from the business side and are achieving specific goals for their companies by blending big and relational data. Business users like these are getting inspired by the potential to tap into blended data to gain new insights from a 360 degree customer view, including the ability to analyze customer behavior patterns and predict the likelihood that customers will take advantage of targeted offers.

Prediction two: big data needs to play well with others!
Historically, big data projects have largely sat in the IT departments because of the technical skills needed and the growing and bewildering array of technologies that can be combined to build reference architectures. Customers must choose from the various commercial and open source technologies including Hadoop distributions, NoSQL databases, high-speed databases, analytics platforms and many other tools and plug-ins. But they also need to consider existing infrastructure including relational data and data warehouses and how they’ll fit into the picture.

The plus side of all this choice and diversity is that after decades of tyranny and ‘lock-in’ imposed by enterprise software vendors, in 2014, even greater buying power will shift to customers. But there are also challenges. It can be cumbersome to manage this heterogeneous data environment involved with big data analytics. It also means that IT will be looking for Big Data tools to help deploy and manage these complex emerging reference architectures, and to simplify them.  It will be incumbent on the Big Data technology vendors to play well with each other and work towards compatibility. After all, it’s the ability to access and manage information from multiple sources that will add value to big data analytics.

Prediction three: you will see even more rapid innovation from the big data open source community
New open source projects like Hadoop 2.0 and YARN, as the next generation Hadoop resource manager, will make the Hadoop infrastructure more interactive. New open source projects like STORM, a streaming communications protocol, will enable more real-time, on-demand blending of information in the big data ecosystem.

Since we announced the industry’s first native Hadoop connectors in 2010, we’ve been on a mission to make the transition to big data architectures easier and less risky in the context of this expanding ecosystem. In 2013 we made some massive breakthroughs towards this, starting with our most fundamental resource, the adaptive big data layer. This enables IT departments to feel smarter, safer and more confident about their reference architectures and open up big data solutions to people in the business, whether they be data scientists, data analysts, marketing operations analysts or line of business managers.

Prediction four: you can’t prepare for tomorrow with yesterday’s tools
We’re continuing to refine our platform to support the future of analytics. In 2014, we’ll release new functionality, upgrades and plug-ins to make it even easier and faster to move, blend and analyze relational and big data sources. We’re planning to improve the capabilities of the adaptive data layer and make it more secure and easy for customers to manage data flow. On the analytics side, we’re working to simplify data discovery on the fly for all business users and make it easier to find patterns and catch anomalies. In Pentaho Labs, we’ll continue to work with early adopters to cook up new technologies to bring things like predictive, machine data and real-time analytics into mainstream production.

As people in the business continue to see what’s possible with blended big data, I believe we’re going to witness some really exciting breakthroughs and results. I hope you’re as excited as I am about 2014!

Quentin Gallivan, CEO, Pentaho

Big-Data-2014-Predictions-Blog-Graphic


Pentaho 5.0 blends right in!

September 12, 2013

Dear Pentaho friends,

Ever since a number of projects joined forces under the Pentaho umbrella (over 7 years ago) we have been looking for ways to create more synergy across this complete software stack.  That is why today I’m exceptionally happy to be able to announce, not just version 5.0 of Pentaho Data Integration but a new way to integrate Data Integration, Reporting, Analyses, Dashboarding and Data Mining through one single interface called Data Blending, available in Pentaho Business Analytics 5.0 Commercial Edition

Data Blending allows a data integration user to create a transformation capable of delivering data directly to our other Pentaho Business Analytics tools (and even non-Pentaho tools).  Traditionally data is delivered to these tools through a relational database. However, there are cases where that can be inconvenient, for example when the volume of data is just too high or when you can’t wait until the database tables are updated.  This for example leads to a new kind of big data architecture with many moving parts:

Evolving Big Data Architectures

Evolving Big Data Architectures

From what we can see in use at major deployments with our customers, mixing Big Data, NoSQL and classical RDBS technologies is more the rule than the exception.

So, how did we solve this puzzle?

The main problem we faced early on was that the default language used under the covers, in just about any business intelligence user facing tool, is SQL.  At first glance it seems that the worlds of data integration and SQL are not compatible.  In DI we read from a multitude of data sources, such as databases, spreadsheets, NoSQL and Big Data sources, XML and JSON files, web services and much more.  However, SQL itself is a mini-ETL environment on its own as it selects, filters, counts and aggregates data.  So we figured that it might be easiest if we would translate the SQL used by the various BI tools into Pentaho Data Integration transformations. This way, Pentaho Data Integration is doing what it does best, not directed by manually designed transformations but by SQL.  This is at the heart of the Pentaho Data Blending solution.

MattCasters_Blog_graphic

The internals of Data Blending

In other words: we made it possible for you to create a virtual “database” with “tables” where the data actually comes from a transformation step.

To ensure that the “automatic” part of the data chain doesn’t become an impossible to figure out “black box”, we made once more good use of existing PDI technologies.  We’re logging all executed queries on the Data Integration server (or Carte server) so you have a full view of all the work being done:

Data Blending Transparency

Data Blending Transparency

In addition to this, the statistics from the queries can be logged and viewed in the operations data mart giving you insights into which data is queried and how often.

We sincerely hope that you like these new powerful options for Pentaho Business Analytics 5.0!

Enjoy!

Matt

–If you want to learn more about the new features in this 5.0 release, Pentaho is hosting a webinar and demonstration on September 24th – Two options to register:  EMEA & North America time zones.

Matt Casters
Chief Data Integration, Kettle founder, Author of Pentaho Kettle Solutions (Wiley)


Big Data Integration Webinar Series

May 6, 2013

line-chartDo you have a big data integration plan? Are you implementing big data? Big data, big data, big data. Did we say big data? EVERYONE is talking about big data…..but what are they really talking about? When you pull back the marketing curtains and look at the technology, what are the main elements and important true and tried trends that you should know?

Pentaho is hosting a four-part technical series on the key elements and trends surrounding big data. Each week of the series will bring a new, content-rich webinar helping organizations find the right track to understand, recognize value and cost-effectively deploy big data analytics.

All webinars will be held 8 am PT / 11 am ET / 16:00 GMT. To register follow the links below and for more information contact Rob Morrison at rmorrison at pentaho dot com.

1) Enterprise Data Warehouse Optimization with Hadoop Big Data

With exploding data volumes, increasing costs of the Enterprise Data Warehouse (EDW) and a raising demand for high-performance analytics, companies have no choice but to reduce the strain on their data warehouse and leverage Hadoop’s economies of scale for data processing. In the first webinar of the series, learn how using Hadoop to optimize the EDW gives IT professionals processing power, advanced archiving and the ability to easily add new data sources.

Date/Time:
Wednesday, May 8, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

2) Getting Started and Successful with Big Data

Sizing, designing and building your Hadoop cluster can sometimes be a challenge. To help our customers, Dell has developed: Hadoop Reference Architecture, a best practice documentation and open source tool called, Crowbar. Paul Brook, from Dell, will describe how customers can go from raw servers to Hadoop cluster in under two hours.

Date/Time:
Wednesday, May 15, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

3) Reducing the Implementation Efforts of Hadoop, NoSQL and Analytical Databases

It’s easy to put a working script together as part of an R&D project, but it’s not cost effective to maintain it throughout an ever building stream of user change requests, system and product updates.  Watch the third webinar in the series to learn how choosing the right technologies and tools can provide you the agility and flexibility to transform big data without coding.

Date/Time:
Wednesday, May 22, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.
4)Reporting, Visualization and Predictive from Hadoop

While unlocking data trapped in large and semi-structured data is the first step of a project, the next step is to begin to analyze and proactively identify new opportunities that will grow your bottom-line. Watch the fourth webinar in the series to learn how to innovate with state-of-the-art technology and predictive algorithms.

Date/Time:
Wednesday, May 29, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

 


Ensure Your Big Data Integration and Analytics Tools are Optimized for Hadoop

March 27, 2013

Existing data integration and business analytics tools are generally built for relational and structured file data sources, and aren’t architected to take advantage of Hadoop’s massively scalable, but high-latency, distributed data management architecture. Here’s a list of requirements for tools that are truly built for Hadoop.

A data integration and data management tool built for Hadoop must:

  1. Run In-Hadoop: fully leverage the power of Hadoop’s distributed data storage and processing. It should do this via native integration with the Hadoop Distributed Cache, to automate distribution across the cluster. Generating inefficient Pig scripts doesn’t count.
  2. Maximize resource usage on each Hadoop node: each node is a computer, with memory and multiple CPU cores. Tools must fully leverage the power of each node, through multi-threaded parallelized execution of data management tasks and high-performance in-memory caching of intermediate results, customized to the hardware characteristics of nodes.
  3. Leverage Hadoop ecosystem tools: tools must natively leverage the rapidly growing ecosystem of Hadoop add-on projects. For example, using Sqoop for bulk loading of huge datasets or Oozie for sophisticated coordination of Hadoop job workflows.

The widely distributed nature of Hadoop means accessing data can take minutes, or even hours. Data visualization and analytics tools built for Hadoop must mitigate this high data access latency:

  1. Provide end-users direct access to data in Hadoop: and after initial access, provide instant speed-of-thought response times.  It must be done in a way that is simple and intuitive for end users, while providing IT with the controls they need to streamline and manage data access for end users.
  2. Create dynamic data marts: make it easy and quick to spin-off Hadoop data into marts and warehouses for longer-lived high-performance analysis of data from Hadoop.

Learn how big data analytics provider Pentaho is optimized for Hadoop at www.pentahobigdata.com.

- Ian Fyfe, Pentaho

Hadoop Elephantthis blog originally appeared on GigaOM at http://gigaom.com/2012/12/11/ensure-your-big-data-integration-and-analytics-tools-are-optimized-for-hadoop/


Make Your Voice Heard! – 2013 Wisdom of Crowds Business Intelligence Market Study

March 12, 2013

Make your voice heard!

Participate in the 2013 Wisdom of Crowds ® Business Intelligence Market Study and get a complimentary copy of the study findings. 

Dresner Advisory Services is inviting all Business Intelligence (BI) users to participate in its annual examination of the state of the BI marketplace focusing on BI usage, deployment trends, and products.

The 2013 report will build on previous years’ research and will expand to include questions on the latest and emerging trends such as Collaborative BI, BI in the Cloud, and Embedded BI. It will also rank vendors and products, providing an important tool for organizations seeking to invest in BI solutions.

BI users in all roles and throughout all industries are invited to contribute their insight, which should take approximately 15 minutes.  The final report is scheduled to be out in late Spring, and qualified survey participants will receive a complimentary copy.

Click here to start the survey today!


Xyratex and Pentaho – Making Big Data, Fast Data.

February 11, 2013

Pentaho and Xyratex today announced our strategic partnership to deliver the world’s first integrated Big Data analytics and scalable storage solution.  We have been working on this joint initiative for some time with the ClusterStor team at Xyratex. ClusterStor is the worlds fastest and most performant storage sub-system.  This will be significantly enhanced by the addition of Hortonworks Hadoop and Pentaho Business Analytics.

Xyratex and Pentaho will make Big Data, Fast Data.  This solves a key pain point for Xyratex’s customers. With all of the compute, storage, database and analytics in one true integrated platform, this appliance will eliminate the large data silos as well as put all of that Big Data, into the hands of the business users.  And it will do that fast!  The ClusterStor, Hadoop and Pentaho Big Data Appliance will deliver business analytics on huge data sets, at the lowest TCO and allow the ClusterStor customers to realize rapid business value from their data with a very short time to value.

Xyratex has taken the complexity of deploying Hadoop away from the customer with this integrated appliance. Critically, ClusterStor also meets all the key criteria in the deployment of an enterprise class Big Data solution; scalable, best in class performance, reliability and rapid time.


Looking to the Future of Business Analytics with Pentaho 4.8

November 12, 2012

Last week Pentaho announced Pentaho 4.8, another milestone in delivering the future of analytics. It has been an exciting ride. Our partners’ and our customers’ feedback have kept us ecstatic and ready to excel further into the future.

Pentaho 4.8 is a true testament on what the future of analytics needs. The future of analytics is driven by the data problems that businesses face every day – and is dependent on the information users and their expectations for solving those problems.

Let me give you a good example. I recently had the pleasure to meet with one of our customers – BeachMint. BeachMint is a fashion and style ecommerce company who uses celebrities / celebrity stylists to promote its retail business.

This rapidly growing online retailer needed to keep tabs on its large twitter and facebook communities to track customer sentiment and social influence. It then uses the social data to define customer cohorts and design marketing campaigns that best target each cohort.

For BeachMint insight to data is extremely important. But on one hand, the volumes and variety of data – in this case unstructured social data and click-through ad feeds – has increased its complexity. And on the other hand, the speed in which it gets created has accelerated rapidly. For example, in addition to analyzing the impact of customer sentiments on their purchasing behavior, BeachMint also needed to gain up-to-the-minute information on the activity of key promotional codes – to immediately identify those that leak out.

Pentaho understands these data challenges and user expectations. In this release Pentaho takes full advantage of its tightly coupled Data Integration and Business Analytics platform – to simplify data exploration, discovery and visualization for all users and all data types – and to deliver this information to users immediately – sometimes even at a micro-second level. In this release Pentaho delivers:

- Pentaho Mobile – the only Mobile BI application with the power to instantly create new analysis on the go.

- Pentaho Instaview – the industry’s first instant and interactive big data visualization application.

Want to find out more? Register for Pentaho 4.8 webinar and see for yourself.

- Farnaz Erfan, Product Marketing, Pentaho


Follow

Get every new post delivered to your Inbox.

Join 101 other followers