Democratizing Analytics

March 27, 2015

Davy BEP_0299Politics is all that stands in the way of democratizing analytics

Following the whole ‘BI for the masses’ movement, today’s buzz is all about democratizing analytics – giving everyone from Alice in the mailroom to Joe CEO the tools to make data-informed decisions. It’s a lively debate. Entrepreneurial types insist that it’s a ‘do or die’ imperative while the more cautious amongst us liken it to running with scissors.

Last Wednesday, I joined the panel of Computing’s “Practical steps towards democratising analytics” web conference chaired by Stuart Sumner to explore the topic in more depth. You can read a recap of the event here but if you can spare half an hour, do watch the replay. The quality of the debate was excellent, reflecting IT’s growing involvement and maturity in the enterprise analytics domain.

Given that 47% of those Computing surveyed said that access to analytics at their organisations were restricted to specialists, my co-panelist Trevor Attridge and I agreed that democratizing analytics should be high on companies’ agendas. And if not a do or die imperative today, it almost certainly will be in the not-too-distant future. Even the most traditional companies from energy suppliers to shipping companies to hospitals are starting to apply analytics and the Internet of Things to improve productivity, efficiency and growth. Our customer St Antonius Hospital is a great case in point.

The impending election serves as a reminder that healthy democracies depend on strong leadership, cultural acceptance, good governance and transparency. In fact, these were the very things that delegates raised as concerns when it came to rolling analytics out more broadly in their companies. One theme that kept resurfacing in our debate was the importance of a strong “coalition” between IT and the business to address these.

It occurred to me that most of these concerns boil down to company politics – leadership, culture and changing the status quo – and not technology. Technologies for big data blending, to real-time processing, through to predictive data mining algorithms are all out there and in production as our “Mavericks of Big Data” customers demonstrate.

Even cost and ROI, which delegates raised as the greatest ultimate concern is no longer the barrier it once was. The old school per-user licence model – wholly unsuited to analytics democracies – is fast being overtaken by more attractive usage-based and subscription models.

Is politics standing in the way of your company democratizing analytics?

Davy Nys
VP EMEA & APAC
Pentaho
@davynys


Pentaho 5.3 – Taming messy and extreme data

February 18, 2015

cloudpoints
There has definitely been an evolution of how the industry talks about data. About five years ago the term ‘Big Data’ emerged to define the volume aspect of Big Data. Soon after, the definition of Big Data expanded to a better one that explains what it really is; not just big, but data that moves extremely fast, often lacks structure, varies greatly from existing data, doesn’t fit well with more traditional database technologies, and frankly, is best described as “messy”.

Fast-forward to 2015 and Pentaho’s announcement of version 5.3 this week to deliver on demand big data analytics at scale on Amazon Web Services and Cloudera Impala. This release is driven by what we see in more and more of our customers – (a new data term for you) — EXTREME data problems! Our customer NASDAQ is a very interesting example of where traditional relational data systems have maxed out and have been replaced by cloud architectures that include Hadoop, Pentaho and AWS Redshift.  You can read their story here. What NASDAQ found was that pushing vast amounts of data at extreme levels (10 billion rows everyday) was more easily accomplished by combining cloud and big data technologies, creating a more scalable solution that is highly elastic.

We’ve seen many of our customers processing vast volumes of data in Hadoop with the help of Pentaho to enable analytics at scale like never before.  The biggest challenge these customers face is getting the results out of Hadoop and into the hands of the users who can make the most of fresh insights.  That’s where Pentaho 5.3 comes into play. This release opens the data refinery to Amazon Redshift AND Cloudera Impala to push the limits of analytics through blended and governed data delivery on demand. In addition to adding Redshift and Impala support to the data refinery, 5.3 includes several other key features:

  1. Advanced Auto-Modeling – Advances in the auto-modeling accelerate the creation and increase the sophistication of generated data models offering better analytics and ease of use
  2. Additional Hadoop Support – Support for the latest Hadoop distributions from Cloudera and MapR, Hadoop cluster naming for simplified connectivity and management, and enhanced performance for scale-out integration jobs.
  3. Analyzer API Enhancements – Complete control over the end user experience for highly tailored and easy to deliver embedded analytics.
  4. Simplified Customer Experience – Easier, more simplified mechanism for embedding analytics and documentation improvements to simplify learning

If your data is big, messy, extreme or just plain annoying and needs to be tamed, I encourage you to learn more about Pentaho 5.3. Check out the great resources like the video and white paper to get started taming your data today.

Chuck Yarbrough
Product Marketing, Big Data
Pentaho

 


The Pentaho Journey – A Big Data Booster

February 13, 2015

Pentaho-Journey-ingographic-updated

Ten years ago we set out to commoditize BI, disrupt the existing old school proprietary vendors and give customers a better choice. It’s been an exciting journey building up a scalable analytic platform, building an open source community, surviving a deep recession, beating up the competition, building a great team, providing great products and services to our customers, and being a major player in the big data market.

Some of the key points along the way:

2008 – the recession hits and frankly as painful as that was it actually helped Pentaho as we were the best value proposition in BI and people were looking to reduce expenditures. It also drastically reduced the cost of commercial office space and we opened up our San Fran office in Q2 2009.

2009 – at a happy hour in Orlando in November we stumbled upon our ability to natively integrate Pentaho Data Integration with Hadoop (which I had never even heard of prior to that evening). Eight months later we launched our big data initiative.

2011 – based on traction and momentum with our big data capabilities we decided to go all in around this space. This lead to a dedicated big data team, a change in packaging and pricing, and beefing up our management team.

2013 – we acquired longtime partner Webdetails and immediately had a world class UI/UX team that our customers love to help them build custom dashboards…….

2014 – we held our first PentahoWorld user conference with a theme that strongly resonated with the market: Bring your Big Data to Life. We were proud to host 400 attendees from 27 countries around the globe.

2015 – Hitachi Data Systems acquires Pentaho! This is an exciting time for both companies, our customers, and partners. We both share a vision around analytics and in particular around big data opportunities.

So the next part of our exciting journey begins as a wholly owned subsidiary of Hitachi Data Systems and I couldn’t be happier. We’ll be going after the big data market and specifically the Internet of Things and we’ll have a blast doing so. Buckle up, because it’s going to be a fast and thrilling ride!

Richard Daley
Founder and Chief Strategy Officer
Pentaho


A bolder, brighter future for big data analytics and the Internet of Things that matter: Pentaho + HDS

February 10, 2015

Big Data and the Internet of Things are disrupting entire markets, with machine data blurring the virtual world with the physical world. This market matters —a recent Goldman Sachs report cites an astounding $2 Trillion opportunity by 2020 for IoT, with the potential to impact everything from new product opportunities, to shop floor optimization, to factory worker efficiency gains that will power top-line and bottom-line gains. The company that delivers high quality big data solutions fastest and enables customers to connect people, data and things to transform their industries and organizations will win.

That is why I am very excited to share with you that today Hitachi Data Systems (HDS) announced its intention to acquire Pentaho. Pentaho and HDS share a common vision of the impact of big data on our industry and society as a whole  This acquisition builds on the existing OEM relationship between Pentaho and HDS, forged to accelerate the HDS IoT initiative known as Social Innovation. Social Innovation enables Hitachi to deliver solutions for the Internet of Things and big data – solutions that enable healthier, safer and smarter societies, for generations to come. The Pentaho vision of the interconnectedness of data, people and things supported by a big data orchestration platform to power embedded analytics aligns perfectly. Indeed, Social Innovation is a big, bold strategy and Pentaho is a critical part of it.

HDS plans to both retain the existing business model responsible for the success of Pentaho, and use Pentaho software to develop new big data services that will go to market in FY15, accelerating delivery of HDS Social Innovation solutions. Once closed, this acquisition brings together two companies that deliver innovative and proven solutions to enterprises around the globe. Hitachi owns the infrastructure and Pentaho owns the data integration and analytics platform and know-how to harness the value in big data. Together Pentaho + HDS form a powerhouse to deliver on the promise of big data with easier, faster deployments and quicker time to value.

For customers to succeed in this new world of big data and internet of things that matter, both hardware and software must scale flexibly to keep pace with the speed, diversity and velocity of data, regardless of where it is created. No two companies know these challenges better than Pentaho and HDS. Together we are delivering a transformative future for our industry.

Game on,

Quentin

 

 


Building Your Big Data Team in 2015 – Top 5 Pieces of Real-World Advice

January 27, 2015

There’s lots of advice out there on building a big data team, from industry or expert analysts and leading publications. But we wanted to see how this is being implemented in real life, so we talked to the real world big data mavericks – those who’ve faced the challenge of gaining true business value from big data and succeeded.  They shared real-world insights into how they made it happen and the advice they’d give to those ready to take the plunge. (Scroll to the bottom to meet our mavericks.)

1. Clearly define your business goal, and don’t be afraid to start small.
When you work with big data, you have to know first what you’re going to do with that data” – Marc Hayem, VP of Platform Transformation, RichRelevance

It may seem obvious but is often overlooked.  Whether you’re a data-driven company whose entire business model revolves around crunching big data, or a manufacturer looking to optimize your operational efficiency using machine data, you need to be clear about the challenge you’re trying to tackle with big data. Omitting this step, you risk ending up with inappropriate technologies, a lack of executive support, and an ill-prepared team. Saad Khalid, a product manager at Paytronix, echoes the advice about starting small:

Starting small to get into big data can be useful, because you can get lost in a lot of technical jargon: Hadoop, Hive, MapReduce. My advice to people considering big data as a project would be to take it slow, have a smaller project in mind where you can actually think about the questions that you want to answer and achieve results…. Have a team that is dedicated towards that goal, and those results.  Start slow and then grow big and then scale your project. ” Saad Khalid

Andrew Robbins is CEO of Paytronix, a company that helps restaurants build brand loyalty and get rich, big-data driven insights into their customers’ behavior for better sales and marketing.  The questions that big data could answer for them were endless – but in the end, zeroing in on one small, simple question – “Who had breakfast for dinner?” – helped them define the scope of their entire project:

“For us, we sat around and thought of so many ideas and it became so big and we boiled it down to a single question and it was who had breakfast for dinner?  In that question, it seems kind of simple.  The “who” is pretty complicated.  Who are the people?  Can you give me the collection and what are they like?  What are their demographics?  “The “had breakfast,” what does that mean?  You got to get into the details of a check.  Is it scrambled eggs?  …All of those pieces led to a simple thing that we could all shoot for and that was our minimal viable product and you can get to it quicker and then the team goes, “Aha.  That’s success.” Andrew Robbins

Finally, as you define your scope, make sure the projects have a measurable return to achieve your business goals.  Because big data projects can be complex, people need to be motivated to work through the challenges and that happens when your project impacts the business in a demonstrable way. Marc Hayem is VP of Platform Transformation at RichRelevance, a company that helps retailers provide personalized recommendations to shoppers.

“I think the important thing when you get into big data is to be able to prove the value rapidly, which is to really pick the right problem and demonstrate very rapidly that you can find solutions to that problem. That you can create value around that problem… If you have identified that something that will give you a competitive advantage and the technology is applied right, then the payoff can be monumental.” Marc Hayem

2. Choose your technologies carefully, based on the challenge you’re trying to address and your organizational culture.
“Pick the tools that work and ignore all the religion that’s out there.” – Andrew Robbins, CEO, Paytronix

You should only start to investigate your technologies once you define your problem.  Many of the big data leaders we spoke to acknowledged that the big data technology ecosystem can be complex, and cautioned against being driven by the current frenzy to adopt a particularly hot technology.  Their advice is unanimous: start with one problem, start small, and work backwards from there in picking your technologies.  Always pick the tools that solve the problem at hand and find tools increase your teams’ productivity, not create obstacles.  Andrew Robbins discussed how heated the debate can be:

I think one of the things that surprised me the most was just how fragmented the tool sets are and it really seems like the wild west of different components and how religious people are that you’re using a component .… ‘If you’re using Hive, you must be crazy.  You must use Impala.  Anybody who is not using Impala is just … that doesn’t make sense’. Pick the tools that work and ignore all the religion that’s out there.  Be practical.  Pick the tools that work.  You can always switch them out in the future if you need to” Andrew Robbins

Marc Hayem shares his perspective on what makes a good fit:

Evaluating the tools can be overwhelming. There are new tools that come out constantly. There is a tendency to look at always the next shiny thing that comes out and think this will solve even more magical problems. At some point, you have to settle. You have to choose your tools. The tools that you’re comfortable with. That you have the tools for. That you have the staff for, more importantly. This is basically your tool set. That’s what you’re going to use. There is definitely with this ecosystem of open source tools, a tendency to go after the next big thing, constantly. It’s something that you have to fight a little bit. We have used a lot of open sourced software…Essentially, we believe that when you use open source solutions there is a community behind those tools. The tools get better over time, very, very rapidly.” Marc Hayem

Marc’s comments illuminate that in evaluating technologies, vendors, and platforms it’s important to consider what’s a good fit for your organization based on common values like transparency and innovation. Paytronix’s head of technology, Stefan Kochi, also believes this is an important factor:

Once we decide to implement a big data solution then we started looking at different providers, different vendors. The initial guiding principles were the ones that we use for other decisions we have made, such as they have to feel like an extended part of our company. … Some of the things we look for are – what was the technology based on? Open source versus private? How easy it for them to innovate? Innovation is critical. Do they serve things that we need? We have some guiding principles that we apply in general, the transparency of the company, how easy it to communicate with, and how solid and mature the product is. Pentaho was an attractive options early on. They use open source technologies, and that was very attractive to us. Paytronix uses a lot of open source technologies, so right there you have a connection with the approach that Pentaho has taken.”  Stefan Kochi

3. Identify key players on a cross functional team

While in some cases, a big data implementation can be done with one person or a very small team, the general consensus is that having a dedicated, cross functional team will ensure success. This is critical to ensuring that business needs are understood and data is successfully prepared and accessible to meet the defined the business needs. So what roles are needed?  We asked our big data leaders and internal big data services team to comment on what is working and compiled the results.  While structures vary from organization to organization, here are some key roles to consider.

  • Executive Sponsor- This senior level person understands the business needs, rallies support, and funds the solution. Andrew Robbins is an example:

“Paytronix is full of bright, curious, empathetic people. I wasn’t the star of this …we have a really bright engineer who is at the forefront of thinking about [big data] and I probably just provided some air cover so that we’re safe to go after it and be successful.”  Andrew Robbins

  • Business User – This individual defines and prioritizes the business requirements and then translates them into high level technical requirements.

“My favorite part about what I do currently is gathering requirements and actually really thinking about what our next product’s going to be.  What our next feature’s going to be.  Talking to our clients, and talking to my internal clients, which is the rest of the team here.  Really start to think about a new feature, a new product, and gathering those requirements, and thinking about design.  I love working with the engineering team, and really trying to think about how to approach problems in several different manners, and really try to come up with a creative solution so our clients can benefit from it.” Saad Khalid

  • Subject matter expert – Especially important in non-technical industries where the gap between a data developer and the Business user can be very large, this person knows the business intimately.
  • Data scientist – This individual understands the data and can extract information from that data to meet the business requirements. The data scientist ideally has both domain knowledge, statistical analysis background, and basic understanding of computer science.

“As I mentioned earlier, we have hundreds of algorithms that basically constantly try to decide what is best for our customer. You have to be able to build those algorithms. You have to understand the mathematics behind it. you have to understand the technologies. You also need very good data scientists. You need people who understand very well the mathematics behind the predictive modeling that takes place in personalization.” Marc Hayem

  • Data Engineer/Software Engineer – This individual has a software engineering background and experience in developing software for distributed or multi-threaded applications. This person typically is a server side Java developer who can implement ETL at scale using various Big Data technologies. Someone with experience in statistics and machine learning is a plus.

“Paytronix has a small engineering group. We’re not a large firm, but we’re fortunate to have a very talented engineering team. Those engineers who have done a lot of existing development of the product are also able to explore and go from an idea and a concept to a real product….There is a lot to manage when it comes to big data.  We have a dedicated team that looks after our structure and architecture.  There is an architecture that oversees big data and we also have 2 software developers. You need to have a dedicated team to take care of this structure.  It is extremely important. ” Saad Khalid

  • Data journalist – We’re hearing more and more about a data journalist – someone who looks at the data from a storytelling aspect. Forbes even predicts that storytelling will be the hot new job in big data analytics in 2015. This person serves as the link to the larger audience for the data, making it understandable to the audience consuming the data.
  • Platform/Systems Architect – This is a senior technical architect responsible for designing the entire end-to-end solution that meets the business requirements for both short-term deliverables and long-term needs. Typically this person has a software engineering background in large scale clustering/distributed processing systems and is responsible for technology selections and implementation process.  The architect defines the big data blueprints, or architectural model, that an organization will implement.

Another lesson that Paytronix has learned is the importance of building a working model first. You can get caught up in the big picture, being very strategic, but you have to build the working model first. If you have a billion transactions that you want to ETL, you should probably ETL a thousand. You get an idea how the systems are working with a thousand transactions. Another important thing that we learned is that you have to be very focused on system integration and architect should be always present as you connect. Systems talking to each other is like building many bridges. You have people focus on each bridge, but someone needs to oversee all the bridges together.” Stefan Kochi

  • IT/Operations manager – This person operationalizes, deploys, manages, and monitors the systems. They should understand Hadoop and big data to successfully deploy across systems and scale to hundreds or thousands of servers, instead of just a few.  Yug Muppala, a software engineer at RichRelevance, points out the critical nature of this role:

We at RichRelevance have a really good operations team that keeps our servers up and running all the time. That is really important they make the cluster available to us and keep the health of the cluster up and running.”  Yug Muppala

4. Be creative to make the most of your human and technology resources
“Instead of search for the mythical people, we would take people we know and create a team that could be successful”  - Andrew Robbins, Paytronix

While the above list provides general guideline for a big data team, it’s only a starting point.  There’s a well-known meme about how looking for the perfect data scientist – who combines analytics with business savvy  and development skills and mathematics – is like looking for a unicorn: it doesn’t exist.  Companies who’ve successfully launched big data initiatives haven’t used unicorns – they’ve been innovated and are clever with how they resource their project and leverage their team.  Andrew Robbins acknowledges this:

When you make the move the Big Data, what are you concerned about?  What we’re concerned about in Paytronix and probably the biggest one is can you be successful, and then you go back from that and you say, “Where are the people?  What people are going to implement this solution?”  Is it internal people or are we going to go hire people?  Then people talk about data scientists.  Have you seen a data scientist?  Do you live next to one?  Can you find them on the street? I think one of the things that made us successful at Paytronix was to say we would, instead of search for the mythical.  To us, a data scientist is a function, not a person.  Data science might include a strategist, an analyst and an engineer.  In between them, they can satisfy the need of data science.” Andrew Robbins

Creative thinking and innovative technologies offer other options to remove the need for unicorns.  There are many emerging technologies that help minimize the dependence on coding and other hard to find skillsets – for smaller companies that can’t afford data scientists, these technologies are attractive options. Yug Muppala, a software engineer at RichRelevance, talks about why they use Hive:

Hive is very easy for anyone with SQL knowledge to start writing, querying the Hadoop cluster. That’s a big advantage. Not many people have knowledge around Pig scripts and stuff like that and most of our data science team is very comfortable with writing SQL queries. Hive gives them that advantage so that they could just go write queries themselves instead of having to wait for someone else to write the extraction for them.” Yug Muppala

Pentaho’s own visual interface helps here, by reducing the amount of code needed to join data, and reducing the time Paytronix spent on this task from two weeks to a mere hour and a half:

“We have some data in our transactional database and we have some data in Hadoop. Joining these two together was a hassle before and Pentaho helped us solve this problem. . . .It’s a simple step within Pentaho. ..We don’t have to write a lot of code which we were doing before and it’s a simple process of dragging and dropping steps to connect these different data sources.” Yug Muppala

5. Look to the future
Last – as you look ahead to building a team in 2015, there are a few thing to keep in mind:

  1. Consider the cloud. More and more companies are running all or part of their big data environment in the cloud.  As cloud becomes more widely adopted and becomes more mature and secure.  Look for team members with experience in the cloud, in addition to those who have dealt with data governance and compliance issues.
  2. Consider self-service analytics. Whether the end user is a customer or an internal user, you’ll need to consider how to make the insights created from your big data environment available for consumption both inside and outside your firewalls.  How will you deliver high-quality governed data to end users for analysis? Will you embed analytics in customer-facing software, or perhaps within an enterprise application?
  3. Consider the profile of people willing to tackle these big data challenges. In addition to experience with the relevant technologies and having people to embrace and learn from the challenge that big data provides. Marc Hayem says, “The people I’ve worked with are very much start-up people. They are adventurous a little bit more than your average IT person.”

Meet the Mavericks:

Andrew Robbins, Paytronix2
Andrew Robbins, CEO, Paytronix
Learn more about Andrew’s journey with big data here.

Marc Hayem, RichRelevance2
Marc Hayem, VP of Platform Transformation, RichRelevance
Learn more about Marc’s journey with big data here.

Saad Khalid, Paytronix2
Saad Khalid, Product Manager, Paytronix

Stefan Kochi, Paytronix2
Stefan Kochi, Head of Technology, Paytronix

Yug Muppala, RichRelevance2
Yug Muppala, Software Engineer, RichRelvance


Union of the State – A Data Lake Use Case

January 22, 2015

rebeccapentaho:

Pentaho co-founder and CTO, James Dixon is who we have to thank for the term, ‘Data Lake.’ He first wrote about the Data Lake concept on his blog in 2010, Pentaho, Hadoop and Data Lakes. After the numerous interpretations and feedback, he revisited the concept and definition here: Data Lakes Revisited.

Now, in his latest blog, Dixon explores a use case based off the Data Lake concept – calling it The Union of State. Read the blog below, to learn how the Union of State can provide the equivalent of a rewind, pause, and forward remote control on the state of your business. Let us know what you think and if you have deployed one of the four use cases.

Originally posted on James Dixon's Blog:

Many business applications are essentially workflow applications or state machines. This includes CRM systems, ERP systems, asset tracking, case tracking, call center, and some financial systems. The real-world entities (employees, customers, devices, accounts, orders etc.) represented in these systems are stored as a collection of attributes that define their current state. Examples of these attributes include someone’s current address or number of dependents, an account’s current balance, who is in possession of laptop X, which documents for a loan approval have been provided, and the date of Fluffy’s last Feline Distemper vaccination.
-
State machines are very good at answering questions about the state of things. They are, after all, machines that handle state. But what about reporting on trends and changes over the short and long term? How do we do this? The answer for this is to track changes to the attributes in change logs. These change logs…

View original 1,240 more words


Big Data in 2015—Power to the People!

December 16, 2014

Last year I speculated that the big data ‘power curve’ in 2014 would be shaped by business demands for data blending. Customers presenting at our debut PentahoWorld conference last October, from Paytronix, to RichRelevance, to NASDAQ, certainly proved my speculations to be true. Businesses like these are examples of how increasingly large and varied data sets can be used to deliver high and sustainable ROI. In fact, Ventana Research recently confirmed that 22 percent of organizations now use upwards of 20 data sources, and 19 percent use between 11 – 20 data sources.[1]

Moving into 2015, and fired up by their initial big data bounties, businesses will seek even more power to explore data freely, structure their own data blends, and gain profitable insights faster. They know “there’s gold in them hills” and they want to mine for even more!

With that said, here are my big data predictions for 2015:

Big Data Meets the Big Blender!

The digital universe is exploding at a rate that even Carl Sagan might struggle to articulate. Analysts believe it’s doubling every year, but with the unstructured component doubling every three months. By 2025, IDC estimates that 40 percent of the digital universe will be generated by machine data and devices, while unstructured data is getting all the headlines.[2] The ROI business use cases we’ve seen require the blending of unstructured data with more traditional, relational, data. For example, one of the most common use cases we are helping companies create is a 360 view of their customers. The de facto reference architecture involves the blending of relational/transactional data detailing what the customer has bought, with unstructured weblog and clickstream data highlighting customer behavior patterns around what they might buy in the future. This blended data set is further mashed up with social media data describing sentiment around the company’s products and customer demographics. This “Big Blend” is fed into recommendation platforms to drive higher conversion rates, increase sales, and improve customer engagement. This “blended data” approach is fundamental to other popular big data use cases like Internet of Things, security and intelligence applications, supply chain management and regulatory and compliance demands in Financial Services, Healthcare and Telco industries.

Internet of Things Will Fuel the New ‘Industrial Internet’

Early big data adoption drove the birth of new business models at companies like our customers Beachmint and Paytronix. In 2015, I’m convinced that we’ll see big data starting to transform traditional industrial businesses by delivering operational, strategic and competitive advantage. Germany is running an ambitious Industry 4.0 project to create “Smart Factories” that are flexible, resource efficient, ergonomic and integrated with customers and business partners. The machine data generated from sensors and devices, are fueling key opportunities like Smart Homes, Smart Cities, and Smart Medicine, which all require big data analytics. Much like the ‘Industrial Internet’ movement in the U.S., Industry 4.0 is is being defined by the Internet of Things. According to Wikibon, the value of efficiency from machine data could reach close to $1.3 trillion dollars and will drive $514B in IT spend by 2020.[3]The bottlenecks are challenges related to data security and governance, data silos, and systems integration.

Big Data Gets Cloudy!

As companies with huge data volumes seek to operate in more elastic environments, we’re starting to see some running all, or part of, their big data infrastructures in the cloud. This says to me that the cloud is now “IT approved” as a safe, secure, and flexible data host. At PentahoWorld, I told a story about a “big datathrow down” that occurred during our Strategic Advisory Board meeting. At one point in the meeting, two enterprise customers in highly regulated industries started one-upping each other about how much data they stored in Amazon Redshift Cloud. One shared that they processed and analysed 5-7 billion records daily. The next shared that they stored a half petabyte of new data every day and on top of that, they had to hold the data for seven years while still making it available for quick analysis. Both of these customers are held to the highest standards for data governance and compliance – regardless of who won, the forecast for their big data environments is the cloud!

Embedded Analytics is the New BI

Although “classic BI,” which involves a business analyst looking at data with a separate tool outside the flow of the business application, will be around for a while, a new wave is rising in which business users increasingly consume analytics embedded within applications to drive faster, smarter decisions. Gartner’s latest research estimates that more than half the enterprises that use BI now use embedded analytics.[4] Whether it’s a RichRelevance data scientist building a predictive algorithm for a recommendation engine, or a marketing director accessing Marketo to consume analytics related to lead scoring or campaign effectiveness, the way our customers are deploying Pentaho leave me with no doubt that this prediction will bear out.

As classic BI matured, we witnessed a final “tsunami” in which data visualization and self-service inspired business people to imagine the potential for advanced analytics. Users could finally see all their data – warts and all – and also start to experiment with rudimentary blending techniques. Self-service and data visualization prepared the market for what I firmly expect to be the most significant analytics trend in 2015….

Data Refineries Give Real Power to the People!

The big data stakes are higher than ever before. No longer just about quantifying ‘virtual’ assets like sentiment and preference, analytics are starting to inform how we manage physical assets like inventory, machines and energy. This means companies must turn their focus to the traditional ETL processes that result in safe, clean and trustworthy data. However, for the types of ROI use cases we’re talking about today, this traditional IT process needs to be made fast, easy, highly scalable, cloud-friendly and accessible to business. And this has been a stumbling block – until now. Enter Pentaho’s Streamlined Data Refinery, a market-disrupting innovation that effectively brings the power of governed data delivery to “the people,” unlocking big data’s full operational potential. I’m tremendously excited about 2015 and the journey we’re on with both customers and partners. You’re going to hear a lot more about the Streamlined Data Refinery in 2015 – and that’s a prediction I can guarantee will come true!

Finally, as I promised at PentahoWorld in October, we’re only going to succeed when you tell us you’ve delivered an ROI for your business. Let’s go forth and prosper together in 2015!

Quentin Gallivan, CEO, Pentaho

120914-Quentin-Big-Data-2015-Prediction-Graphic

[1] Ventana Research, Big Data Integration Report, Tony Cosentino and Mark Smith, May 2014.

[2] IDC, The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things, Gantz, Minton, Turner, Reinsel, Turner, April 2014.

[3] Wikibon, Defining and Sizing the Industrial Internet, David Floyer, June 2013.

[4] Gartner, Use Embedded Analytics to Extend the Reach and Benefits of Business Analytics, Daniel Yuen, October 3, 2014.


Follow

Get every new post delivered to your Inbox.

Join 11,883 other followers