This Blog is intended to give budding MapReduce developers a start off in developing hadoop based applications. It involves some development tips and tricks on hadoop MapReduce programming, tools that use map reduce under the hood and some practical applications of hadoop using these tools. Most of the code samples provided here is tested on hadoop environment but still do post me if you find any not working.
Thursday, April 28, 2011
How to run hadoop - map reduce jobs without a cluster? With cloudera VM.
This document is indented to aid basic java developers to kick start practical investigation on Hadoop map reduce jobs without any cluster set up on their end. To understand this document you need to possess basic theoretical knowledge on Hadoop, hdfs and map reduce jobs. It is also advisable to have some prior knowledge on basic linux commands.
It is possible to try sample map reduce jobs on your windows PC without any cumbersome hadoop setup if you have cloudera test VM with you. This environment is ideal for functionality testing on very small sample data volumes, larger input data wont be supported due to memory constrains posed by the VM.
To test your sample map reduce job on local hadoop environment (cloudera training VM) follow the below mentioned steps in order.
1.Download the following software and ubuntu image on your windows pc.
2.Install the VM ware player. Extract the Cloudera Training VM and there you can find a *.vmx file. Open the same.(you can notice that it opens on VM Ware Player)
User credentials for Cloudera VM
User name: training
3.Copy the jar and the required input files into the cloudera linux box.
Here I have copied jar to home -> training -> use-case -> source-code and
Input files to home -> training -> use-case -> Input
(To browse to folder in linux box, click on places link on the top menu bar -> select home folder. Now you would be in /home/training folder in your local linux file system from there you can browse and create directories as you do in windows.)
You can copy paste files from your Windows file system to the cloudera linux box just like you do it between folders in Windows.
4.Open linux terminal(terminal icon available on cloudera linux box desk top)
5.Create a input directory in your hdfs
hadoop fs -mkdir <full path of directory in hdfs>
hadoop fs -mkdir /userdata/bejoy/input
6.Copy the contents from the linux box input folder to hdfs input folder
hadoop fs –copyFromLocal <source directory from local linux box> <destination directory in hdfs>