Wednesday, July 14, 2010

Book Review: Hadoop: The Definitive Guide

I took a vacation to Michigan two weeks ago, and a few days before leaving, I purchased a Barnes & Noble Nook.  This was the first full-length book I put on the Nook.

First of all, the Nook is a decent PDF reader if you are reading a lot of narrative.  But, you can't use the device to "zoom" into a page that has a lot of code because the only "zoom" functionality the device has is a selector to change the font which causes code to re-flow and become unreadable.  I solved this little problem by cropping the PDF on my computer before loading it onto the Nook.

Now, the review:  This book was great.  The narrative was descriptive and not overly-complex.  Having read the book cover-to-cover (as it where given it was a PDF on a Nook) without walking through any of the examples left me feeling like I could take on a small Hadoop project and know where to go to do it right the first time.  Additionally, I gained a much richer understanding of distributed programming using MapReducers as well as some of the tools build on Hadoop.  There was a chapter for each of the following tools: Pig, HBase, Zookeeper, plus a chapter on use cases that introduced Hive, Nutch, and Cascading using real-world examples from developers at well known companies actually using Hadoop such as Yahoo, Facebook, and Last.fm (CBS).

I recommend this book to anybody who needs an introduction into MapReduce to anybody who wants to actually build a Hadoop cluster.  Some of the information required my pre-requisite Computer Science background in distributed systems, networking, etc (specifically the Algebra of network typologies) and a good understanding of Java (to read MapReduce job illustrations) to comprehend.

No comments:

Post a Comment

AddThis

Bookmark and Share