Tuesday, May 24, 2011
A few points while choosing Hadoop to solve our problems
Update your Open Source Tools frequently
Open source technologies are those being contributed from various developers across the globe, there are not much of rigorous testing happening behind the hood before the open source technologies are released compared to other propriety products like Oracle, Informatica etc. When we use Hadoop tools, every tool is getting improved day by day with new patches on to it and new major releases rolled out frequently. It is highly recommended that we keep our work confined to the latest releases of open source tools. In Hadoop we use a variety of tools like Pig, Hive, Hbase, Sqoop etc, There should be an effective mechanism in cluster management that would keep track of the latest updates available on the softwares , and tools , and in turn keep it updated with the new releases. This practice would considerably reduce the wastage of development time in debugging a few issues that has been fixed already in latest releases. Such an incremental approach to cluster management is very essential in cases of technologies like Hadoop which is still not cent percent stable.
Choose your Open Source Tool wisely
This is a crucial choice for m/r developers. When we are heading to the implementation of a requirement the choice of tools plays a vital role in the long term existence and smooth running of the project. We have to choose only those open source Software/tool that has a big name associated with it or supporting the same ie a pool of developers are constantly working on its improvement. This is a matter of high priority because in most scenarios an open source tool/project is short lived if not adopted by a major IT giant. Hence choose your tool wisely. If you need an interactive distributed data base what would you go for, HBase? Why Hbase? Is it adopted my any IT major? These queries should be addressed before you finalize your tool .
Minimize Custom Map Reduce Codes
A small question to you all, in the web era if you want to develop a web scale application, what would be your preferred language of development? Assembly Language or any High Level Language. Definitely it would be a High Level Language. The same applies in the Hadoop scenario as well, always depend on High level tools on Hadoop built over map reduce such as Pig, Hive etc for your application. These tools are already highly optimized for better performance and effective resource utilization. You should go for a custom map reduce if and only if it is inevitable, when you are developing a custom Map Reduce code a lot of effort has to go behind in fine tuning the same for better performance and effective cluster utilization.It is hard for any map reduce developer to take into account the configurable parameters in hadoop which counts to more than 300.