Online banking, shopping, advertisement, stock exchange — the majority of services and businesses nowadays need to store their data in computer systems. However, with the significant increase in both volume and complexity of the data collected from customers and business interactions, it becomes a challenging task to effectively manage large influx of data, store them and analyze them to predict market behavior. Not only restricted to online shopping and banking, large data sets are also an essential part of genome sequencing and clinical research. An effective platform to manage these massive amount of data is necessary.
Hadoop, an open source framework developed by computer scientists Mike Cafarella and Doug Cutting, came to the rescue. The basic idea of Hadoop is that instead of dealing with data at once, it would distribute those data and calculations across different nodes in the cluster. By breaking application into blocks, we could accomplish multiple tasks simultaneously and “handle thousands of terabytes of data”(1). Hadoop consists of two main parts: the Hadoop Distributed File System (HDFS) and Data Processing Framework, known as MapReduce. Each has its own specific function. While the HDFS stores data that you can retrieve and run an analysis on at any time, the MapReduce is responsible for processing data.
What’s great about Hadoop is not only that it’s fast and efficient, but also that it is robust enough to continue running even when some nodes fail to work. This is essential for companies wanting to avoid disastrous system failure and a loss of data. If some nodes cease to function, the processing task would then be rapidly redirected to other functional nodes, keeping the system operating like normal. Hadoop also automatically stored many copies of data, in case of hardware failure. Plus, this open-source framework is free, and easily scalable - you can just add more nodes for your system to handle more data. Hadoop is thus a great solution for the Big Data problem.
Works Cited
(1) http://searchcloudcomputing.techtarget.com/definition/Hadoop
(2) http://sites.gsu.edu/skondeti1/2015/10/17/hadoop-and-the-hype/ (Image)

No comments:
Post a Comment