Hadoop Summit 2015 has ended and my team and I had a very successful trip. I attended Hadoop Summit for the first time last year. Attending the conference and talking with others using and considering big data technologies, and seeing what others were already doing was the catalyst for my excitement about big data and these technologies.
This year was quite a different experience from last year. I noticed several significant differences in my own behavior and goals between attending for the first time and the second time, and I also noticed some differences in theme and feeling around the conference as a whole. Here are some of my observations:
Ideas Instead of Excitement
Last year, I was still a bit new to Hadoop and the terms and use cases of big data. As a first time attendee, I was swept up in the excitement of the sessions and keynotes. There was so much that was new, but also so much that aligned with the traditional data warehouse / business intelligence work we have done for years. I drank from the firehose and tried to absorb as much technical information and terminology as I could.
This year, I came to the Summit with a far better understanding of the Hadoop ecosystem, the use cases where it can be applied and many customer conversations under my belt. When I listened to the keynotes and speakers this year, I found I listened in a different way. Rather than just trying to understand the technologies and architectures, I was able to really identify where the pieces fit for my customers and envision how various solutions will work or not work for my customers’ situations. This year’s Summit really drove my idea machine into high-gear.
New Themes this Year
Last year, the main theme was Data Lake: get your data into the lake, get it all there and get it there now! While the data lake is still being discussed and many organizations have adopted that strategy, it was a less central theme of this year’s Summit.
Here are five themes I heard resurface over and over.
1. Hadoop is for Data Science
This was by far the most prevalent theme. There were many presentations on data science using various Hadoop technologies. What I appreciated about a number of presentations I attended was they focused less on the specific technology and more on the business value. I saw many architecture diagrams and a little bit of code, but very few deep-dives into the technology. The presenters were always willing to drill in as far as you wanted, but what I enjoyed was the business value conversation: understanding how the customer was going to benefit from the solution being displayed. One session I particularly enjoyed was entitled IoT: How Data Science Driven Software is Eating the Connected World presented by Sarah Aerni of Pivotal. She showed some really outstanding use cases of how the drilling industry can use data collected from sensors to predict key metrics and also equipment failures using a small number of variables.
2. Hadoop is for IoT
Internet of Things was a very big topic as you can see from the paragraph above. Hadoop is all about collecting larger volumes of data from more sources than we have ever had access to before. It is the ideal platform for landing, shaping and analyzing this data. I saw several presentations discussing ingestion and analysis of data from many sorts of devices including drilling equipment, medical equipment and smartphone apps. Tim Tully of Yahoo presented on its Flurry technology at the keynote on the final day of the conference. The numbers were staggering. Data collected from hundreds of thousands of apps. Billions of transactions a day. Almost 5,000 Hadoop nodes.
3. Hadoop Works in the Cloud
Last year, most people I talked to would say Hadoop is an on-premises technology. It’s too expensive and too big to run in the cloud. This year, I spent a lot of time talking with both Hortonworks and Microsoft. Both companies are fully embracing cloud implementations of Hadoop. Microsoft has their HDInsight distribution, a fully managed implementation of Hadoop as a Service. Hortonworks has acquired the company Cloudbreak. This exciting technology allows you to provision full Hadoop clusters on your cloud of choice (Azure, AWS, Google or OpenStack) with a few clicks. One of the biggest barriers to entry for companies that want to try Hadoop is the challenge of provisioning hardware and configuring the cluster. Cloud services like these make those tasks child’s play and let you get straight to work gaining insights and building business value.
4. Hadoop Needs Governance
Last year, one of the questions I asked often of many people was, “How do you keep track of what’s in the cluster?” I was kind of dismayed at the answers I got. A lot of time, people kind of looked down at the ground and said, “Well…”
If you’ve done any amount of data management at any organization, the prospect of landing all data of any type from any source might immediately conjure visions of your data lake taking on the aspect of a data swamp (landfill, quicksand, black hole…insert cataclysmic vision of an end state here).
This year, I was very happy to see governance as a top priority being discussed. There were several sessions on governing data in Hadoop and Hortonworks has announced the Atlas project that is targeted to address data governance and lineage issues. It is still under development and I can’t wait to see more about it.
5. Business Value is Key
I probably already covered this in the paragraphs above, but a very key theme was business value. Last year, most of the sessions I attended were very technology oriented. Architects did deep dives on their solutions and talked a lot about how they implemented them with various technologies. There was still a lot of that this year, but the themes in the key notes and many of the sessions really revolved around the data worker and their needs. The message was all about how we can drive business insight and change using data solutions. Geoffrey Moore delivered a keynote discussing the evolution of business systems from “Systems of Record” to “Systems of Engagement” to “Systems of Intelligence.” He discussed how this evolution has changed the way businesses relate to their customers. I loved an analogy he made: ”Do customers want to buy a drill, or do they really want to buy holes in metal?” The evolution of systems has allowed us to move away from selling the drills to selling the holes.
Hadoop is Here to Stay
Data is going to continue to get bigger. Pretty soon our houses, cars, refrigerators, beds – everything – is going to be connected. Data analytics is going to drive business to places we never thought possible. My Uber driver that took me to the airport after the conference asked me what Hadoop was and I started talking with him about Big Data using Uber as an example. He got it immediately and started telling me about his idea for a startup and asking me questions about how to find the technical resources to help him realize his vision. Technologies like Hadoop and cloud infrastructure allow guys like him to get started with less money and fewer resources than ever before and reach audiences that would never have been possible not so long ago.
If you have interest in Hadoop, Modern Data Architecture and Big Data solutions, contact 3Cloud for a discovery call.