The Road to Success with Big Data – A Closer Look at Expectations vs. the Reality

June 5, 2013

Stay on course
Big Data is complex. The technologies in Big Data are rapidly maturing, but are still in many ways in an adolescent phase. While Hadoop is dominating the charts for Big Data technologies, in the recent years we have seen a variety of technologies born out of the early starters in this space- such as Google, Yahoo, Facebook and Cloudera. To name a few:

  • MapReduce: Programming model in Java for parallel processing of large data sets in Hadoop clusters
  • Pig: A high-level scripting language to create data flows from and to Hadoop
  • Hive: SQL-like access for data in Hadoop
  • Impala: SQL query engine that runs inside Hadoop for faster query response times

It’s clear, the spectrum of interaction and interfacing with Hadoop has matured beyond pure programming in Java into abstraction layers that look and feel like SQL. Much of this is due to the lack of resources and talent in big data – and therefore the mantra of “the more we make Big Data feel like structured data, the better adoption it will gain.”

But wait, not so fast—->you can make Hadoop act like a SQL data store. However, there are consequences, as Chris Deptula from OpenBI explains in his blog, A Cautionary Tale for Becoming too Reliant on Hive. You are forgoing flexibility and speed if you choose Hive for a more complex query as opposed to pure programming or using a visual interface to MapReduce.

This goes to show that there are numerous areas of advancements in Hadoop that have yet to be achieved – in this case better performance optimization in Hive. I come from a relational world – namely DB2 – where we spent a tremendous amount of time making this high-performance transactional database – that was developed in the 70’s – even more powerful in the 2000s, and that journey continues today.

Granted, the rate of innovation is much faster today than it was 10, 20, 30 years ago, but we are not yet at the finish line with Hadoop. We need to understand the realities of what Hadoop can and cannot do today, while we forge ahead with big data innovation.

Here are a few areas of opportunity for innovation in Hadoop and strategies to fill the gap:

  • High-Performance Analytics: Hadoop was never built to be a high-performance data interaction platform. Although there are newer technologies that are cracking the nut on real-time access and interactivity with Hadoop, fast analytics still need multi-dimensional cubes, in-memory and caching technology, analytic databases or a combination of them.
  • Security: There are security risks within Hadoop. It would not be in your best interest to open the gates for all users to access information within Hadoop. Until this gap is closed further, a data access layer can help you extract just the right data out of Hadoop for interaction.
  • APIs: Business applications have lived a long time on relational data sources. However with web, mobile and social applications, there is a need to read, write and update data in NoSQL data stores such as Hadoop. Instead of direct programming, APIs can simplify this effort for millions of developers who are building the next generation of applications.
  • Data Integration, Enrichment, Quality Control and Movement: While Hadoop stands strong in storing massive amounts of unstructured / semi-structured data, it is not the only infrastructure in place in today’s data management environments. Therefore, easy integration with other data sources is critical for a long-term success.

The road to success with Hadoop is full of opportunities and obstacles and it is important to understand what is possible today and what to expect next. With all the hype around big data, it is easy to expect Hadoop to do anything and everything. However, successful companies are those that choose combination of technologies that works best for them.

What are your Hadoop expectations?

- Farnaz Erfan, Product Marketing, Pentaho


Big Data Integration Webinar Series

May 6, 2013

line-chartDo you have a big data integration plan? Are you implementing big data? Big data, big data, big data. Did we say big data? EVERYONE is talking about big data…..but what are they really talking about? When you pull back the marketing curtains and look at the technology, what are the main elements and important true and tried trends that you should know?

Pentaho is hosting a four-part technical series on the key elements and trends surrounding big data. Each week of the series will bring a new, content-rich webinar helping organizations find the right track to understand, recognize value and cost-effectively deploy big data analytics.

All webinars will be held 8 am PT / 11 am ET / 16:00 GMT. To register follow the links below and for more information contact Rob Morrison at rmorrison at pentaho dot com.

1) Enterprise Data Warehouse Optimization with Hadoop Big Data

With exploding data volumes, increasing costs of the Enterprise Data Warehouse (EDW) and a raising demand for high-performance analytics, companies have no choice but to reduce the strain on their data warehouse and leverage Hadoop’s economies of scale for data processing. In the first webinar of the series, learn how using Hadoop to optimize the EDW gives IT professionals processing power, advanced archiving and the ability to easily add new data sources.

Date/Time:
Wednesday, May 8, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

2) Getting Started and Successful with Big Data

Sizing, designing and building your Hadoop cluster can sometimes be a challenge. To help our customers, Dell has developed: Hadoop Reference Architecture, a best practice documentation and open source tool called, Crowbar. Paul Brook, from Dell, will describe how customers can go from raw servers to Hadoop cluster in under two hours.

Date/Time:
Wednesday, May 15, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

3) Reducing the Implementation Efforts of Hadoop, NoSQL and Analytical Databases

It’s easy to put a working script together as part of an R&D project, but it’s not cost effective to maintain it throughout an ever building stream of user change requests, system and product updates.  Watch the third webinar in the series to learn how choosing the right technologies and tools can provide you the agility and flexibility to transform big data without coding.

Date/Time:
Wednesday, May 22, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.
4)Reporting, Visualization and Predictive from Hadoop

While unlocking data trapped in large and semi-structured data is the first step of a project, the next step is to begin to analyze and proactively identify new opportunities that will grow your bottom-line. Watch the fourth webinar in the series to learn how to innovate with state-of-the-art technology and predictive algorithms.

Date/Time:
Wednesday, May 29, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

 


Pentaho and Cloudera Impala in 5 words

April 29, 2013

Today our big data partner Cloudera, joined us in continuing to deliver innovative, open technologies that bring real business value to customers. Pentaho and Cloudera share a common history and approach to simplifying complex, but powerful technologies to integrate and analyze big data. Our common open source heritage means that we can innovate at the speed of our customers businesses.

What is Cloudera’s latest Innovation? Cloudera Impala powers Cloudera Enterprise RTQ (Real-time Query), the first data management solution that takes Hadoop beyond batch to enable real-time data processing and analysis on any type of data (unstructured and structured) within a centralized, massively scalable system. Impala dramatically improves the economics and performance of large-scale enterprise data management.

Pentaho and Cloudera Impala in 5 words = Affordable scalability meets fast analytics. Cloudera Imapala enables any product that is JDBC-enabled to get fast results from Hadoop, making Hadoop an ideal component for a data warehouse strategy. Customers no longer have to pay for expensive proprietary DBMS or analytical DBs to house their entire data warehouse.

Cloudera’s innovation makes it even easier for customers to use common analytic tools that can access and analyze data in all of these formats. What does this really mean? It means you don’t have buy expensive, proprietary products that can’t work across all of your data platforms.

With Pentaho and Cloudera you can quickly analyze large volumes of disparate data significantly faster with Impala than with Hive. Take a look at how Cloudera Impala is driving a major evolutionary step in the growth of the company’s Platform for Big Data, Cloudera Enterprise, and the Apache Hadoop ecosystem as a whole.

Richard Daley


How to Get to Big Data Value Faster

March 18, 2013

Summary: Everyone talks about how big data is the key to business success, but the process of getting value from big data is time intensive and complex.  Examining the big data analytics workflow provides clues to getting to big data results faster.

Pentaho Value

Most organizations recognize that big data analytics is key to their future business success, but efforts to implement are often slowed due to operational procedures and workflow issues.

At the heart of the issue is the big data analytics workflow including loading, ingesting, manipulating, transforming, accessing, modeling and, finally, visualizing and analyzing data. Each step requires manual intervention by IT with a great amount of hand coding and tools that invite mistakes and delays. New technologies such as Hadoop and NoSQL databases also require specialized skills. Once the data is prepared, business users often have new requests to IT for additional data sources and the linear process begins again.

Given the potential problems that can crop up in managing and incorporating big data into decision-making processes, organizations need easy-to-use solutions that can address today’s challenges, with the flexibility to adapt to meet future challenges. These solutions require data integration with support for structured and unstructured data and tools for visualization and data exploration that support existing and new big data sources.

A single, unified business analytics platform with tightly coupled data integration and business analytics such as Pentaho Business Analytics  is ideal. Pentaho supports the entire big data analytics flow with visual tools to simplify development and remove complexity for developers and powerful analytics to allow a broad set of users to easily access, visualize and explore big data. By dramatically improving developer productivity and offering significant performance advantages, Pentaho significantly reduces time to big data value.

- Donna Prlich
Senior Director, Product and Solution Marketing, Pentaho

this blog originally appeared on GigaOM at http://gigaom.com/2012/12/06/how-to-reduce-complexity-and-get-to-big-data-value-faster/


Looking for the perfect match

February 28, 2013

image

I’m at the O’Reilly Strata Big Data Conference in Santa Clara, CA this week where there’s lots of buzz about the value and reality of big data. It’s a fun time to be part of a hot new market in technology. But, of course, a hot new market brings a new set of challenges.

After talking to several attendees, I would not be surprised if someone took out an advertisement in the San Francisco Guardian that reads:

SEEKING BDT (Big Data Talent)

“Middle-aged attractive company seeks hot-to-trot data geek for mutually enjoyable discrete relationship, mostly involving analytics. Must enjoy long discussions about wild statistical models, short walks to the break room and large quantities of caffeine.”

The feedback from the presentations and attendees at Strata mimics the results from a Big Data survey that Pentaho released last week showing there is a lack of current skills to address new big data technologies such as Hadoop among existing staff and more generally on the market. This is good news for folks looking for jobs in Big Data and a good indication for others who want to learn new skills.

The market has created the perfect storm – the combination of hot new technology mixed with a myriad of very complex systems plus highly complicated statistical models and calculations. This storm is preventing the typical IT generalist or BI expert from applying.  More experienced data scientists who can spin models on their head with a twist of a mouse are in high demand. The need to garner value quickly from Big Data means there is little time to look for the “perfect match.”

It seems like new companies and technologies pop up almost every week, each with the promise of business benefits, but with the added cost of high complexity.  Shouldn’t things get easier with new technologies?

Pentaho’s Visual MapReduce is a prime example of things getting easier.  Getting data out of Hadoop quickly can be a challenge.  However, with Visual MapReduce any IT professional could pull the right information from a Hadoop cluster, improve the performance of a MapReduce job and make results available in the optimal format for business users.

New technologies might need new talent, but in the case of Pentaho Visual MapReduce, new technologies might only need new tools to help address them.

Looks like Pentaho is the perfect match.

Chuck Yarbrough
Technical Solutions Marketing


Going mobile this year? What’s your biggest big data challenge?

November 16, 2012

We received insightful responses to the polls from our “Mobile and Big Data go Instant and Interactive” webinars about the challenges users of all types face with business analytics. The complexity of data integration, lack of skills and resources, and the need to analyze unstructured data are the most significant big data challenges identified for over 80% of attendees. 50% of our attendees either have a current mobile BI solution in place or plan to in the future.

What does this mean for the future of analytics? Whether mobilizing your sales force or empowering data analysts to discover meaning from data in Hadoop, a complete business analytics solution must address the business pressures of a continual inundation of data and the need to access and interact with that data instantly in simple, familiar ways.

Not surprising that the response to Pentaho’s Business Analytics 4.8 has been overwhelmingly positive — the best of analytics offered up in a mobile optimized experience for business users and Instaview broadening big data access to data analysts for data discovery.

If you missed out on our webinar, access the on demand recording at:

Watch the Pentaho 4.8 On-Demand Webinar

Data Integration and business analytics in a single, unified, modern platform — Pentaho is the future of analytics

Let me know what you think about Pentaho 4.8.

Donna Prlich

Director, Product Marketing

Pentaho


A Day of Choices that Impact the Future

November 6, 2012

The timing is auspicious for the launch of Pentaho’s latest business analytics platform release, which coincides with Election Day in the U.S.!  Both events offer the freedom to choose a platform that is right for you today and into the future. The election platforms offer social, economic and political philosophies to help you meet your personal values and goals. Your choice of business analytics platform should improve your organization’s performance by liberating and integrating all your data and serving it up to your corporate citizens to analyze.

We trust that you’ve made the choice of political candidates and cast your vote today. In case you haven’t chosen your business analytics platform yet, we hope you’ll allow us a little more campaigning! Our business analytics ‘candidate,’ which tightly couples data integration with advanced analytics, has proven its value across private, public and nonprofit sector organizations around the globe. Today, with the launch of our latest version Pentaho Business Analytics 4.8, we have made great strides in democratizing big data and business analytics by adding some exciting new capabilities:

  • Pentaho’s Instaview, the industry’s first instant and interactive big data analytics application, dramatically reduces the time and complexity required for data analysts to discover, visualize and explore big and diverse data
  • Pentaho Mobile BI brings the full power of the Pentaho Business Analytics Platform to the iPad, including instant and interactive visualization and the power to create new analysis on the go

With Pentaho 4.8, we bring real freedom to deliver power to all business users and a clear choice for a better future in the world of business analytics. To learn more about the future of business analytics, check out Pentaho.com/48.

Rosanne Saccone
Chief Marketing Officer
Pentaho


Because You Don’t Have Time to F* Around.

November 5, 2012

At Pentaho we are confident that we are providing the most complete solution for big data analytics. But that doesn’t mean that there isn’t always room for improvement — that is where you come in. The big data market is rapidly growing and evolving and we want to ensure we are at the forefront.

Pentaho invites you to participate in our first Big Data Product Strategy Survey. The survey only takes 3 – 5 minutes, can be taken anonymously and you will automatically be entered to win a $100 American Express gift card!*

Click here to take the survey now and help Pentaho provide the big data product that meets your needs – because you are busy and don’t have time to f* around with your big data!

*you must enter your email address at the end of the survey to be contacted to receive your gift card or copy of the final report.


Pentaho’s November Euro-Trip!

November 2, 2012

With October coming to an end, Pentaho is getting ready for a Euro-trip!  Our October events concluded in New York City this past week at O’Reilly’s Strata Conference.

November’s forecast is Big Data and long flights.  First stop is “Big Data Days 2012” in Munich, Germany on November 6-7.  This is a great conference for networking and is focused on linking together business and IT concepts.  Do we have a booth?

We have one stop on our tour in the U.S. of A. – in the hometown of Pentaho’s headquarters, Orlando, FL., for TDWI.  TDWI’s World Conference Series will be held at The Renaissance Orlando Hotel at SeaWorld November 11-16.  TDWI promises an in depth look at the emerging trends for the upcoming year in Big Data.  If you are at TDWI don’t forget to stop by our booth (rumor has it there are some t-shirts left over from Strata).

From there we take a short trip to Zurich for DW2012  on November 12-13.  In its twelfth year, DW2012 is a great place to meet leaders in Big Data and BI – but be warned, the conference is hosted in German, so if you don’t “sprechen sie Deutsch” you might want to hire a translator.  Do we have a booth?

Next stop – Milan, Italy.  The Big Data Congress on November 22nd is a free conference that will be “themed conversations” between big data solutions suppliers and users.  Not only is it going to be greatly informative, it is free, and in Milan – how many more reasons do you need to attend?

Finally, we wrap up the month in London for Enterprise Business Intelligence (EBI) 2012 on November 28th at the Russell Hotel.  Pentaho is proud to be a silver sponsor of the event put together by Whitehall Media featuring the leading BI world.

To register for any of the events, or for more information visit Pentaho’s event page.

If you are attending any of these events and would like to set-up an in person meeting with Pentaho, contact us!


Leading venture funds back Pentaho’s big data strategy

October 30, 2012

It’s an immensely proud day for Pentaho. Today we announced that a syndicate of venture capital firms led by NEA that includes Benchmark Capital, Index Ventures and DAG Ventures has agreed to invest $23 million in the company. You can read the press release here.

These firms share a common belief that the big data revolution offers significant upside and is part of a larger transformation taking place in the enterprise software industry that includes cloud computing. NEA’s recent blog post, SaaS 3.0: Big Data that Delivers, describes this very eloquently.

However, no matter how cool a company’s technology is, the ability to attract funding depends on how well the company is delivering on its strategy. I’m pleased to report that our first-mover advantage in big data really started to pay off in 2012, with our big data sales growing by more than 300%!  And I believe the kind of growth we’ve seen this year is only the tip of the iceberg.

Still, revenue growth is only part of the picture.  As Warren Buffet famously said, “value is what you get” and sustainable growth depends on the ability to create value for others. On this score, our big data solutions are creating value for a new generation of disruptive companies like Beachmint, Shareable Ink and Travian Games, which recognize the potential of advanced analytics to drive innovation, customer loyalty and new sources of revenue. Established companies like Soliditet and several large firms in financial services are using Pentaho to prepare for a future that will include big data analytics.

We plan to use the new funds to accelerate further technology innovation in big data and other areas, recruit the best talent in the industry and further our expansion into global markets so that we can create even more market value through our solutions.

The big data journey is in full flight – hang on for a fast, fun filled ride!

Quentin Gallivan


Follow

Get every new post delivered to your Inbox.

Join 88 other followers