Building Your Big Data Team in 2015 – Top 5 Pieces of Real-World Advice

January 27, 2015

There’s lots of advice out there on building a big data team, from industry or expert analysts and leading publications. But we wanted to see how this is being implemented in real life, so we talked to the real world big data mavericks – those who’ve faced the challenge of gaining true business value from big data and succeeded.  They shared real-world insights into how they made it happen and the advice they’d give to those ready to take the plunge. (Scroll to the bottom to meet our mavericks.)

1. Clearly define your business goal, and don’t be afraid to start small.
When you work with big data, you have to know first what you’re going to do with that data” – Marc Hayem, VP of Platform Transformation, RichRelevance

It may seem obvious but is often overlooked.  Whether you’re a data-driven company whose entire business model revolves around crunching big data, or a manufacturer looking to optimize your operational efficiency using machine data, you need to be clear about the challenge you’re trying to tackle with big data. Omitting this step, you risk ending up with inappropriate technologies, a lack of executive support, and an ill-prepared team. Saad Khalid, a product manager at Paytronix, echoes the advice about starting small:

Starting small to get into big data can be useful, because you can get lost in a lot of technical jargon: Hadoop, Hive, MapReduce. My advice to people considering big data as a project would be to take it slow, have a smaller project in mind where you can actually think about the questions that you want to answer and achieve results…. Have a team that is dedicated towards that goal, and those results.  Start slow and then grow big and then scale your project. ” Saad Khalid

Andrew Robbins is CEO of Paytronix, a company that helps restaurants build brand loyalty and get rich, big-data driven insights into their customers’ behavior for better sales and marketing.  The questions that big data could answer for them were endless – but in the end, zeroing in on one small, simple question – “Who had breakfast for dinner?” – helped them define the scope of their entire project:

“For us, we sat around and thought of so many ideas and it became so big and we boiled it down to a single question and it was who had breakfast for dinner?  In that question, it seems kind of simple.  The “who” is pretty complicated.  Who are the people?  Can you give me the collection and what are they like?  What are their demographics?  “The “had breakfast,” what does that mean?  You got to get into the details of a check.  Is it scrambled eggs?  …All of those pieces led to a simple thing that we could all shoot for and that was our minimal viable product and you can get to it quicker and then the team goes, “Aha.  That’s success.” Andrew Robbins

Finally, as you define your scope, make sure the projects have a measurable return to achieve your business goals.  Because big data projects can be complex, people need to be motivated to work through the challenges and that happens when your project impacts the business in a demonstrable way. Marc Hayem is VP of Platform Transformation at RichRelevance, a company that helps retailers provide personalized recommendations to shoppers.

“I think the important thing when you get into big data is to be able to prove the value rapidly, which is to really pick the right problem and demonstrate very rapidly that you can find solutions to that problem. That you can create value around that problem… If you have identified that something that will give you a competitive advantage and the technology is applied right, then the payoff can be monumental.” Marc Hayem

2. Choose your technologies carefully, based on the challenge you’re trying to address and your organizational culture.
“Pick the tools that work and ignore all the religion that’s out there.” – Andrew Robbins, CEO, Paytronix

You should only start to investigate your technologies once you define your problem.  Many of the big data leaders we spoke to acknowledged that the big data technology ecosystem can be complex, and cautioned against being driven by the current frenzy to adopt a particularly hot technology.  Their advice is unanimous: start with one problem, start small, and work backwards from there in picking your technologies.  Always pick the tools that solve the problem at hand and find tools increase your teams’ productivity, not create obstacles.  Andrew Robbins discussed how heated the debate can be:

I think one of the things that surprised me the most was just how fragmented the tool sets are and it really seems like the wild west of different components and how religious people are that you’re using a component .… ‘If you’re using Hive, you must be crazy.  You must use Impala.  Anybody who is not using Impala is just … that doesn’t make sense’. Pick the tools that work and ignore all the religion that’s out there.  Be practical.  Pick the tools that work.  You can always switch them out in the future if you need to” Andrew Robbins

Marc Hayem shares his perspective on what makes a good fit:

Evaluating the tools can be overwhelming. There are new tools that come out constantly. There is a tendency to look at always the next shiny thing that comes out and think this will solve even more magical problems. At some point, you have to settle. You have to choose your tools. The tools that you’re comfortable with. That you have the tools for. That you have the staff for, more importantly. This is basically your tool set. That’s what you’re going to use. There is definitely with this ecosystem of open source tools, a tendency to go after the next big thing, constantly. It’s something that you have to fight a little bit. We have used a lot of open sourced software…Essentially, we believe that when you use open source solutions there is a community behind those tools. The tools get better over time, very, very rapidly.” Marc Hayem

Marc’s comments illuminate that in evaluating technologies, vendors, and platforms it’s important to consider what’s a good fit for your organization based on common values like transparency and innovation. Paytronix’s head of technology, Stefan Kochi, also believes this is an important factor:

Once we decide to implement a big data solution then we started looking at different providers, different vendors. The initial guiding principles were the ones that we use for other decisions we have made, such as they have to feel like an extended part of our company. … Some of the things we look for are – what was the technology based on? Open source versus private? How easy it for them to innovate? Innovation is critical. Do they serve things that we need? We have some guiding principles that we apply in general, the transparency of the company, how easy it to communicate with, and how solid and mature the product is. Pentaho was an attractive options early on. They use open source technologies, and that was very attractive to us. Paytronix uses a lot of open source technologies, so right there you have a connection with the approach that Pentaho has taken.”  Stefan Kochi

3. Identify key players on a cross functional team

While in some cases, a big data implementation can be done with one person or a very small team, the general consensus is that having a dedicated, cross functional team will ensure success. This is critical to ensuring that business needs are understood and data is successfully prepared and accessible to meet the defined the business needs. So what roles are needed?  We asked our big data leaders and internal big data services team to comment on what is working and compiled the results.  While structures vary from organization to organization, here are some key roles to consider.

  • Executive Sponsor- This senior level person understands the business needs, rallies support, and funds the solution. Andrew Robbins is an example:

“Paytronix is full of bright, curious, empathetic people. I wasn’t the star of this …we have a really bright engineer who is at the forefront of thinking about [big data] and I probably just provided some air cover so that we’re safe to go after it and be successful.”  Andrew Robbins

  • Business User – This individual defines and prioritizes the business requirements and then translates them into high level technical requirements.

“My favorite part about what I do currently is gathering requirements and actually really thinking about what our next product’s going to be.  What our next feature’s going to be.  Talking to our clients, and talking to my internal clients, which is the rest of the team here.  Really start to think about a new feature, a new product, and gathering those requirements, and thinking about design.  I love working with the engineering team, and really trying to think about how to approach problems in several different manners, and really try to come up with a creative solution so our clients can benefit from it.” Saad Khalid

  • Subject matter expert – Especially important in non-technical industries where the gap between a data developer and the Business user can be very large, this person knows the business intimately.
  • Data scientist – This individual understands the data and can extract information from that data to meet the business requirements. The data scientist ideally has both domain knowledge, statistical analysis background, and basic understanding of computer science.

“As I mentioned earlier, we have hundreds of algorithms that basically constantly try to decide what is best for our customer. You have to be able to build those algorithms. You have to understand the mathematics behind it. you have to understand the technologies. You also need very good data scientists. You need people who understand very well the mathematics behind the predictive modeling that takes place in personalization.” Marc Hayem

  • Data Engineer/Software Engineer – This individual has a software engineering background and experience in developing software for distributed or multi-threaded applications. This person typically is a server side Java developer who can implement ETL at scale using various Big Data technologies. Someone with experience in statistics and machine learning is a plus.

“Paytronix has a small engineering group. We’re not a large firm, but we’re fortunate to have a very talented engineering team. Those engineers who have done a lot of existing development of the product are also able to explore and go from an idea and a concept to a real product….There is a lot to manage when it comes to big data.  We have a dedicated team that looks after our structure and architecture.  There is an architecture that oversees big data and we also have 2 software developers. You need to have a dedicated team to take care of this structure.  It is extremely important. ” Saad Khalid

  • Data journalist – We’re hearing more and more about a data journalist – someone who looks at the data from a storytelling aspect. Forbes even predicts that storytelling will be the hot new job in big data analytics in 2015. This person serves as the link to the larger audience for the data, making it understandable to the audience consuming the data.
  • Platform/Systems Architect – This is a senior technical architect responsible for designing the entire end-to-end solution that meets the business requirements for both short-term deliverables and long-term needs. Typically this person has a software engineering background in large scale clustering/distributed processing systems and is responsible for technology selections and implementation process.  The architect defines the big data blueprints, or architectural model, that an organization will implement.

Another lesson that Paytronix has learned is the importance of building a working model first. You can get caught up in the big picture, being very strategic, but you have to build the working model first. If you have a billion transactions that you want to ETL, you should probably ETL a thousand. You get an idea how the systems are working with a thousand transactions. Another important thing that we learned is that you have to be very focused on system integration and architect should be always present as you connect. Systems talking to each other is like building many bridges. You have people focus on each bridge, but someone needs to oversee all the bridges together.” Stefan Kochi

  • IT/Operations manager – This person operationalizes, deploys, manages, and monitors the systems. They should understand Hadoop and big data to successfully deploy across systems and scale to hundreds or thousands of servers, instead of just a few.  Yug Muppala, a software engineer at RichRelevance, points out the critical nature of this role:

We at RichRelevance have a really good operations team that keeps our servers up and running all the time. That is really important they make the cluster available to us and keep the health of the cluster up and running.”  Yug Muppala

4. Be creative to make the most of your human and technology resources
“Instead of search for the mythical people, we would take people we know and create a team that could be successful”  - Andrew Robbins, Paytronix

While the above list provides general guideline for a big data team, it’s only a starting point.  There’s a well-known meme about how looking for the perfect data scientist – who combines analytics with business savvy  and development skills and mathematics – is like looking for a unicorn: it doesn’t exist.  Companies who’ve successfully launched big data initiatives haven’t used unicorns – they’ve been innovated and are clever with how they resource their project and leverage their team.  Andrew Robbins acknowledges this:

When you make the move the Big Data, what are you concerned about?  What we’re concerned about in Paytronix and probably the biggest one is can you be successful, and then you go back from that and you say, “Where are the people?  What people are going to implement this solution?”  Is it internal people or are we going to go hire people?  Then people talk about data scientists.  Have you seen a data scientist?  Do you live next to one?  Can you find them on the street? I think one of the things that made us successful at Paytronix was to say we would, instead of search for the mythical.  To us, a data scientist is a function, not a person.  Data science might include a strategist, an analyst and an engineer.  In between them, they can satisfy the need of data science.” Andrew Robbins

Creative thinking and innovative technologies offer other options to remove the need for unicorns.  There are many emerging technologies that help minimize the dependence on coding and other hard to find skillsets – for smaller companies that can’t afford data scientists, these technologies are attractive options. Yug Muppala, a software engineer at RichRelevance, talks about why they use Hive:

Hive is very easy for anyone with SQL knowledge to start writing, querying the Hadoop cluster. That’s a big advantage. Not many people have knowledge around Pig scripts and stuff like that and most of our data science team is very comfortable with writing SQL queries. Hive gives them that advantage so that they could just go write queries themselves instead of having to wait for someone else to write the extraction for them.” Yug Muppala

Pentaho’s own visual interface helps here, by reducing the amount of code needed to join data, and reducing the time Paytronix spent on this task from two weeks to a mere hour and a half:

“We have some data in our transactional database and we have some data in Hadoop. Joining these two together was a hassle before and Pentaho helped us solve this problem. . . .It’s a simple step within Pentaho. ..We don’t have to write a lot of code which we were doing before and it’s a simple process of dragging and dropping steps to connect these different data sources.” Yug Muppala

5. Look to the future
Last – as you look ahead to building a team in 2015, there are a few thing to keep in mind:

  1. Consider the cloud. More and more companies are running all or part of their big data environment in the cloud.  As cloud becomes more widely adopted and becomes more mature and secure.  Look for team members with experience in the cloud, in addition to those who have dealt with data governance and compliance issues.
  2. Consider self-service analytics. Whether the end user is a customer or an internal user, you’ll need to consider how to make the insights created from your big data environment available for consumption both inside and outside your firewalls.  How will you deliver high-quality governed data to end users for analysis? Will you embed analytics in customer-facing software, or perhaps within an enterprise application?
  3. Consider the profile of people willing to tackle these big data challenges. In addition to experience with the relevant technologies and having people to embrace and learn from the challenge that big data provides. Marc Hayem says, “The people I’ve worked with are very much start-up people. They are adventurous a little bit more than your average IT person.”

Meet the Mavericks:

Andrew Robbins, Paytronix2
Andrew Robbins, CEO, Paytronix
Learn more about Andrew’s journey with big data here.

Marc Hayem, RichRelevance2
Marc Hayem, VP of Platform Transformation, RichRelevance
Learn more about Marc’s journey with big data here.

Saad Khalid, Paytronix2
Saad Khalid, Product Manager, Paytronix

Stefan Kochi, Paytronix2
Stefan Kochi, Head of Technology, Paytronix

Yug Muppala, RichRelevance2
Yug Muppala, Software Engineer, RichRelvance


Horses for courses

April 29, 2014

The social media channels are buzzing today with chatter about Tibco’s acquisition of Jaspersoft.  As fellow players in the spheres of commercial open source and business analytics, industry watchers are naturally speculating on Pentaho as well so I thought I’d take this opportunity to share how our business strategy differs to Jaspersoft’s.

When explaining our commitment to open standards, we tell our customers that one size IT strategy does not fit all – rather, it’s ‘horses for courses’. Business strategies follow this principle as well and ours is very different from Jaspersoft’s.

Our strategic paths started to diverge four years ago when we started building on our open source heritage to focus on big data and advanced analytics capabilities for technologies like Hadoop and NoSQL, and more recently YARN and Storm.

Hell bent on solving big and diverse data challenges with a full data integration and analytics platform, last year we introduced ground-breaking technology like our Adaptive Big Data Layer, which protects enterprises against the risks inherent in a disruptive big data market. Today we have the most comprehensive big data integration capabilities currently on the market.

As a result, Pentaho has seen enormous growth in big data and embedded analytics, with more than 83% growth in these segments.  And, significantly, our big data analytics platform has driven a huge increase in large enterprise clients, growing at over 200% from the prior year and now representing 26% of our business.

Jaspersoft took a different route, opting to provide low-cost open source alternatives to traditional BI and reporting, particularly to the SMB market. Larry Dignan for Between the Lines, ZDNet comments, “With Jaspersoft, Tibco can offer low-priced subscriptions and later upsell to its other applications.”

Although our total cost of ownership is highly attractive to mid-market and enterprise buyers alike, we’re not aiming to be the cheapest.  We continue to innovate with R&D so we can continue to break new ground in big data and embedded analytics.

“The TIBCO acquisition of Jaspersoft is additional validation that the business analytics market is continuing to bifurcate — with more traditional departmental BI on the one hand, and big-data analytics on the other. TIBCO has added a low-cost open source reporting capability to its portfolio to address the SMB BI market, a market we view as continuing to commoditize.” 

“Four years ago, we saw the big data market opportunity. While Jaspersoft turned right, we turned left towards big data analytics and the associated new projects funded in large enterprises,” Gallivan continued. “The Pentaho platform is in the pole position to integrate and blend diverse big data sets together to deliver end-to-end big data analytics solutions to large enterprises – a platform that experienced 83% year-over-year growth in big data and embedded analytics in 2013 alone,” said Quentin Gallivan, Chairman and CEO of Pentaho.

Pentaho continues to stay focused on our course of big data integration and analytics platform solving for the future of analytics. I encourage you to explore how we are doing this by exploring our Big Data Blueprints or Visualization gallery.

Rosanne Saccone
CMO
Pentaho


Big Data Integration Webinar Series

May 6, 2013

line-chartDo you have a big data integration plan? Are you implementing big data? Big data, big data, big data. Did we say big data? EVERYONE is talking about big data…..but what are they really talking about? When you pull back the marketing curtains and look at the technology, what are the main elements and important true and tried trends that you should know?

Pentaho is hosting a four-part technical series on the key elements and trends surrounding big data. Each week of the series will bring a new, content-rich webinar helping organizations find the right track to understand, recognize value and cost-effectively deploy big data analytics.

All webinars will be held 8 am PT / 11 am ET / 16:00 GMT. To register follow the links below and for more information contact Rob Morrison at rmorrison at pentaho dot com.

1) Enterprise Data Warehouse Optimization with Hadoop Big Data

With exploding data volumes, increasing costs of the Enterprise Data Warehouse (EDW) and a raising demand for high-performance analytics, companies have no choice but to reduce the strain on their data warehouse and leverage Hadoop’s economies of scale for data processing. In the first webinar of the series, learn how using Hadoop to optimize the EDW gives IT professionals processing power, advanced archiving and the ability to easily add new data sources.

Date/Time:
Wednesday, May 8, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

2) Getting Started and Successful with Big Data

Sizing, designing and building your Hadoop cluster can sometimes be a challenge. To help our customers, Dell has developed: Hadoop Reference Architecture, a best practice documentation and open source tool called, Crowbar. Paul Brook, from Dell, will describe how customers can go from raw servers to Hadoop cluster in under two hours.

Date/Time:
Wednesday, May 15, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

3) Reducing the Implementation Efforts of Hadoop, NoSQL and Analytical Databases

It’s easy to put a working script together as part of an R&D project, but it’s not cost effective to maintain it throughout an ever building stream of user change requests, system and product updates.  Watch the third webinar in the series to learn how choosing the right technologies and tools can provide you the agility and flexibility to transform big data without coding.

Date/Time:
Wednesday, May 22, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.
4)Reporting, Visualization and Predictive from Hadoop

While unlocking data trapped in large and semi-structured data is the first step of a project, the next step is to begin to analyze and proactively identify new opportunities that will grow your bottom-line. Watch the fourth webinar in the series to learn how to innovate with state-of-the-art technology and predictive algorithms.

Date/Time:
Wednesday, May 29, 2013
8 am PT / 11 am ET / 16:00 GMT

Registration:
To register for the live webinar click here.
To receive the on-demand webinar click here.

 


How to Get to Big Data Value Faster

March 18, 2013

Summary: Everyone talks about how big data is the key to business success, but the process of getting value from big data is time intensive and complex.  Examining the big data analytics workflow provides clues to getting to big data results faster.

Pentaho Value

Most organizations recognize that big data analytics is key to their future business success, but efforts to implement are often slowed due to operational procedures and workflow issues.

At the heart of the issue is the big data analytics workflow including loading, ingesting, manipulating, transforming, accessing, modeling and, finally, visualizing and analyzing data. Each step requires manual intervention by IT with a great amount of hand coding and tools that invite mistakes and delays. New technologies such as Hadoop and NoSQL databases also require specialized skills. Once the data is prepared, business users often have new requests to IT for additional data sources and the linear process begins again.

Given the potential problems that can crop up in managing and incorporating big data into decision-making processes, organizations need easy-to-use solutions that can address today’s challenges, with the flexibility to adapt to meet future challenges. These solutions require data integration with support for structured and unstructured data and tools for visualization and data exploration that support existing and new big data sources.

A single, unified business analytics platform with tightly coupled data integration and business analytics such as Pentaho Business Analytics  is ideal. Pentaho supports the entire big data analytics flow with visual tools to simplify development and remove complexity for developers and powerful analytics to allow a broad set of users to easily access, visualize and explore big data. By dramatically improving developer productivity and offering significant performance advantages, Pentaho significantly reduces time to big data value.

Donna Prlich
Senior Director, Product and Solution Marketing, Pentaho

this blog originally appeared on GigaOM at http://gigaom.com/2012/12/06/how-to-reduce-complexity-and-get-to-big-data-value-faster/


Looking to the Future of Business Analytics with Pentaho 4.8

November 12, 2012

Last week Pentaho announced Pentaho 4.8, another milestone in delivering the future of analytics. It has been an exciting ride. Our partners’ and our customers’ feedback have kept us ecstatic and ready to excel further into the future.

Pentaho 4.8 is a true testament on what the future of analytics needs. The future of analytics is driven by the data problems that businesses face every day – and is dependent on the information users and their expectations for solving those problems.

Let me give you a good example. I recently had the pleasure to meet with one of our customers – BeachMint. BeachMint is a fashion and style ecommerce company who uses celebrities / celebrity stylists to promote its retail business.

This rapidly growing online retailer needed to keep tabs on its large twitter and facebook communities to track customer sentiment and social influence. It then uses the social data to define customer cohorts and design marketing campaigns that best target each cohort.

For BeachMint insight to data is extremely important. But on one hand, the volumes and variety of data – in this case unstructured social data and click-through ad feeds – has increased its complexity. And on the other hand, the speed in which it gets created has accelerated rapidly. For example, in addition to analyzing the impact of customer sentiments on their purchasing behavior, BeachMint also needed to gain up-to-the-minute information on the activity of key promotional codes – to immediately identify those that leak out.

Pentaho understands these data challenges and user expectations. In this release Pentaho takes full advantage of its tightly coupled Data Integration and Business Analytics platform – to simplify data exploration, discovery and visualization for all users and all data types – and to deliver this information to users immediately – sometimes even at a micro-second level. In this release Pentaho delivers:

– Pentaho Mobile – the only Mobile BI application with the power to instantly create new analysis on the go.

– Pentaho Instaview – the industry’s first instant and interactive big data visualization application.

Want to find out more? Register for Pentaho 4.8 webinar and see for yourself.

– Farnaz Erfan, Product Marketing, Pentaho


A Day of Choices that Impact the Future

November 6, 2012

The timing is auspicious for the launch of Pentaho’s latest business analytics platform release, which coincides with Election Day in the U.S.!  Both events offer the freedom to choose a platform that is right for you today and into the future. The election platforms offer social, economic and political philosophies to help you meet your personal values and goals. Your choice of business analytics platform should improve your organization’s performance by liberating and integrating all your data and serving it up to your corporate citizens to analyze.

We trust that you’ve made the choice of political candidates and cast your vote today. In case you haven’t chosen your business analytics platform yet, we hope you’ll allow us a little more campaigning! Our business analytics ‘candidate,’ which tightly couples data integration with advanced analytics, has proven its value across private, public and nonprofit sector organizations around the globe. Today, with the launch of our latest version Pentaho Business Analytics 4.8, we have made great strides in democratizing big data and business analytics by adding some exciting new capabilities:

  • Pentaho’s Instaview, the industry’s first instant and interactive big data analytics application, dramatically reduces the time and complexity required for data analysts to discover, visualize and explore big and diverse data
  • Pentaho Mobile BI brings the full power of the Pentaho Business Analytics Platform to the iPad, including instant and interactive visualization and the power to create new analysis on the go

With Pentaho 4.8, we bring real freedom to deliver power to all business users and a clear choice for a better future in the world of business analytics. To learn more about the future of business analytics, check out Pentaho.com/48.

Rosanne Saccone
Chief Marketing Officer
Pentaho


Because You Don’t Have Time to F* Around.

November 5, 2012

At Pentaho we are confident that we are providing the most complete solution for big data analytics. But that doesn’t mean that there isn’t always room for improvement — that is where you come in. The big data market is rapidly growing and evolving and we want to ensure we are at the forefront.

Pentaho invites you to participate in our first Big Data Product Strategy Survey. The survey only takes 3 – 5 minutes, can be taken anonymously and you will automatically be entered to win a $100 American Express gift card!*

Click here to take the survey now and help Pentaho provide the big data product that meets your needs – because you are busy and don’t have time to f* around with your big data!

*you must enter your email address at the end of the survey to be contacted to receive your gift card or copy of the final report.


Follow

Get every new post delivered to your Inbox.

Join 11,879 other followers