BIG DATA is a term that’s been buzzing around a lot lately, and its use is a trend that’s been increasing at a steady pace over the past few years. It’s quite likely you’ve also encountered the term Hadoop when hearing about Big Data. Apache Hadoop is an open-source framework for storing extremely large amounts of data. In the last 2–3 years, many big players in the industry have developed their own Hadoop distributions including Intel, Microsoft, IBM, and EMC. Startups that focus only on Hadoop, such as Cloudera and Hortonworks, have now grown to be big players, too.
The beauty of Hadoop distributions is that they can be customized with a wide range of feature sets that address the specific needs of different sets of users. When choosing where to spend its money, it’s essential that a company find a distribution that’s flexible enough to meet both current and future needs.
Various user groups requiring Hadoop, each with its own diverse needs, include:
- Upper management in large companies wanting to adopt Big Data solutions.
- Developers wanting to build tools for the Hadoop Ecosystem.
- Newbies learning Hadoop for the first time and wanting a temporary or non-serious Hadoop deployment.
For these and other types of users, we’ve studied, analyzed and thoroughly compared the following distribution sources:
- Intel Distribution for Apache Hadoop (IDH)
- Cloudera Distribution Including Apache Hadoop (CDH)
- Hortonworks Data Platform (HDP), and
In this paper, we’ll share our experiences with each distributor, and provide both objective and subjective assessments of their features and their performance measures. It will help you to select the distributor that will best serve your specific user requirements.