Every new technology has a cycle of introduction, awareness, evaluation and–if the technology is relevant–eventual mainstream acceptance and adoption. Gartner has even coined a term for it: the hype cycle.
Hadoop’s Meteoric Rise
Hadoop technology is working through its acceptance cycle too. What’s been stunning, though, is to watch how Hadoop technologies have moved so quickly through the cycle. Its progress in finding effective use cases and proving its worth has been nothing short of stunning. Its momentum seems–as Forrester calls it in its Q1/2014 Forrester Wave analysis (downloadable from HortonWorks):”Unstoppable”.
As part of our Strategy & Architecture Services, BlueGranite frequently consults with key clients to understand what technologies are on their radar and which are the most relevant to them. Not two years ago when we spoke with clients about Hadoop, most viewed it as a curiosity, and possibly an over-hyped technology that would be soon exposed as one not relevant to “real” enterprise customers.
How things have changed! Today about half our enterprise customers have deployed Hadoop. Some with the help of our design and implementation teams, others on their own. For the majority, deployments are still within the scope of evaluation and organizational learning. But we’re starting to see production deployments for new data processing and analytical uses cases–and the results are compelling.
Why Hadoop will evolve much faster than Linux
A short few years ago, the best thing Hadoop had going for it was its open source distribution model that can bring large-scale data processing capabilties free of licensing cost and exotic hardware. Yet–at that time–the trade-off to enjoying this low-cost platform was to give up the comfort of enterprise-grade vendor support and product stability.
I clearly recall not long ago listening to Hadoop thought leaders suggesting that only large-scale tech companies could support Hadoop because only they could hire Stanford PhDs that were required to patch and recompile the Apache code. With the rock-solid backstop of BlueGranite’s Hadoop distribution partners, HortonWorks and Microsoft, that line of thinking today would be laughable. The vast majority of enterprise customers will only think about Hadoop as a vendor-provided software that’s installed (not open source software that’s compiled on-site).
The situation was not unlike the early days of Linux, when using the free operating system meant needing to add staff with the ability to provide self-support for low-level operating system malfunctions. But Linux matured, distribution vendors like Red Hat and SuSE made Linux a supportable platform enterprise customers could trust. A healthy ecosystem of systems integrators and consultants to fill knowledge gaps developed to further reduce the risk and cost of Linux implementation for enterprises.
The same is happening with Hadoop. But this time, a successful distribution/support model pulled directly from the playbook of Linux vendors and free distributions was quickly adopted. Add to that an unprecedented level of venture capital and corporate sponsorship that funds the development of the platform, and a sometimes mystifying level of collaboration by competitors on the technology itself. The result is progress at a break-neck pace. Today Hadoop has already evolved into a data platform whose capabilities–in many ways–rival the world’s best proprietary data platforms.
Is Hadoop for Everyone?
Whether Hadoop is for Everyone is certainly a loaded question. Of course not! No single technology is a silver bullet for everyone. There are organizations who will never need Hadoop, just as there are those who will never need an intranet portal or a cloud strategy.
But for most enterprises (and midsize companies that have all the trappings of enterprises but at a smaller scale), the answer is “Yes”.
Hadoop will soon be about much more than Gartner’s the three V’s model (Volume, Velocity, Variability of data). It’s evolving into a data processing platform that will be better and less expensive than what most enterprises use today. Yarn gives it resource management (control over shared computing power allocated to various workloads). Hive, Impala and Spark are beginning to turbocharge SQL performance against data stored in Hadoop. And the bottom line: TCO. Hadoop provides Saks Fifth Avenue level MPP (Massively Parallel Processing) technology at WalMart prices anyone can afford.
Even deploying Hadoop at small scale to address specific workloads is compelling. For example, I recently posted an article on my personal blog showing how to use Hadoop to collect Twitter data containing a set of keywords. Sure, you can probably figure out how to do this in Excel. But how scalable is that? It’s not. Try it yourself: it takes maybe a half hour to get up and running on a Hadoop cluster. Don’t have one? Use a free, pre-configured, single-server Sandbox VM. Value? High. Cost? Free. Easy? Yes. Stanford PhD required? Definitely not.
We live in interesting times
In technology, every few years we seem to be on the cusp of a pivotal moment. Moving from inflexible, not-very-scalable ISAM databases to RDBMS technologies was pivotal. Moving from Client/Server to multi-tiered software architectures was pivotal. Moving to the cloud is definitely pivotal.
Putting massively parallel processing (MPP) technology in the hands of every organization will prove to be equally pivotal. It’s not less significant than how the Internet has leveled the playing field, giving small Indie Developers equal marketing opportunities to Fortune 1000 companies.
Yes, we live in very interesting times.