For detailed understanding on the working and control flow of this example refer
Mapper Class - WordCountMapper.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
//hadoop supported data types
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
//taking one line at a time and tokenizing the same
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
//iterating through all the words available in that line and forming the key value pair
while (tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
//sending to output collector which inturn passes the same to reducer
context.write(word, one);
}
}
}
Reducer Class - WordCountReducer.java
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>
{
//Reduce method for just outputting the key from mapper as the value from mapper is just an empty string
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
{
int sum = 0;
/*iterates through all the values available with a key and add them together and give the
final result as the key and sum of its values*/
for (IntWritable value : values)
{
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}
Driver Class - WordCount.java
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class WordCount extends Configured implements Tool
{
public int run(String[] args) throws Exception
{
//getting configuration object and setting job name
Configuration conf = getConf();
Job job = new Job(conf, "Word Count hadoop-0.20");
//setting the class names
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
//setting the output data type classes
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//to accept the hdfs input and outpur dir at run time
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new WordCount(), args);
System.exit(res);
}
}
thanks for the article
ReplyDeleteThank for the example, i would like to know what's the role for each class.
ReplyDeletethanks and regards
Hi Mehdi
ReplyDeleteAs the name suggests
- Mapper
- Reducer
- Driver
Driver is a trigger point. To know more on mapper and reduce please have a look at http://kickstarthadoop.blogspot.com/2011/04/word-count-hadoop-map-reduce-example.html
For a detailed understanding I'd recommend "Hadoop - The definitive guide by Tom White"
how do i configure hadoop using java so that i can perform these wordcount n other tasks ??
DeleteAs the name suggests
ReplyDeleteWordCountMapper.java - Mapper
WordCountReducer.java - Reducer
WordCount.java - Driver
How do you extend this functionality to an xml input file instead of a text file?
ReplyDeleteException in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
ReplyDeleteat org.apache.hadoop.conf.Configuration.(Configuration.java:139)
at WordCount.main(WordCount.java:39)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
... 2 more
hadsko mindri
DeleteYour posts is really helpful for me.Thanks for your wonderful post. I am very happy to read your post. It is really very helpful for us and I have gathered some important information from this blog.
ReplyDeleteHadoop Training in Chennai
Hi this is vignesh i am having 3 years of experience as a android developer and i am certified. i have knowledge on OOPS concepts in android but dont know indepth. After learning hadoop will be enough to get a good career in IT with good package? and i crossed hadoop training in chennai website where someone please help me to identity the syllabus covers everything or not??
ReplyDeleteThanks,
vignesh
Thanks for sharing this informative blog. If anyone wants to get Big Data Training in Chennai visit fita academy located at Chennai, which offers best Hadoop Training in Chennai with years of experienced professionals.
ReplyDeleteThank you so much for sharing this wonderful article. From this i have earned more knowledge since I have been following your blog for a long time. This will be very useful for me in finding the best institute for Big Data Course in Chennai
ReplyDeleteThanks for sharing your view to our knowledge’s, its helps me plenty keep sharing…
ReplyDeleteHadoop training chennai, Hadoop training in chennai
Thanks for sharing this informative blog. FITA provides Salesforce Course in Chennai with years of experienced professionals and fully hands-on classes. Salesforce is a cloud based CRM software. Today's most of the IT industry use this software for customer relationship management. To know more details about salesforce reach FITA Academy. Rated as No.1 Salesforce Training Institutes in Chennai.
ReplyDeleteJava Training
ReplyDeleteHi I am Johnson lives in Chennai. I am a technology freak. Recently I did Java Course in Chennai at a leading Java Training Institutes in Chennai. This is really helpful for me to make a bright career in IT industry.
Java Training in Chennai
I see this content as a Unique and very informative article. Impressive article like this may help many like me in finding the best Hadoop training institute in chennai
ReplyDeleteUnix Training
ReplyDeleteThanks for sharing this informative blog. Suppose if anyone interested to learn Unix Training in Chennai, Please visit Fita Academy located at Chennai, Velachery.
Regards....
Unix Training Institutes in Chennai
Cloud Computing Training
ReplyDeleteI have read your blog and i got a very useful and knowledgeable information from your blog.its really a very nice article.You have done a great job . If anyone want to get real time Cloud Computing Course in Chennai, Please visit FITA academy located at Chennai Velachery which offer best Cloud Computing Training in Chennai.
I gathered a lot of information through this article.Every example is easy to undestandable and explaining the logic easily.Thanks!
ReplyDeleteAWS course chennai | AWS Certification in chennai | AWS Certification chennai
I have read your blog and i got good information from you blog visual path is one of the best training institute in hyderabad ameerpet and it have hadoop and lombardi bpm
ReplyDeleteBest Big Data Hadoop Training in Hyderabad @ Kalyan Orienit
ReplyDeleteFollow the below links to know more knowledge on Hadoop
WebSites:
================
Hadoop Training in Hyderabad
Hadoop Training in Hyderabad
Hadoop Training in Hyderabad
Videos:
===============
Hadoop Training in Hyderabad
Hadoop Training in Hyderabad
Hadoop Training in Hyderabad
Hadoop Training in Hyderabad
Hadoop Training in Hyderabad
Hadoop Training in Hyderabad
Best Big Data Hadoop Training in Hyderabad @ Kalyan Orienit
EDUWIZZ provides an excellent job opportunity in Hybris Trainingfor JAVA professionals
ReplyDeletewho are seeking for job or looking to change to latest and advanced technologies.
Thanks for sharing this informative blog by Hybris Training
ReplyDeleteSalesforce is a cloud based CRM software. Today's most of the IT industry use this software for customer relationship management. To get more details about salesforce please refer this site.
ReplyDeleteRegards..
Salesforce Admin Training in Chennai
Hi Bijoy,
ReplyDeleteExcellent post..
Your blog (both older and newer API version) helped me understood the mapper and reducer job specially the shuffling and sorting part which I was not able to get it... Thanks for the post... Keep it up...
very nice blogs!!! i have to learning for lot of information for this sites...Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing. cloud computing training in chennai | cloud computing training chennai | cloud computing course in chennai | cloud computing course chennai
ReplyDeleteWell post in recent day’s customer relationship play vital role to get good platform in business industry, Salesforce crm tool helps you to maintain your customer relationship enhancement.
ReplyDeleteRegards,
Salesforce training in Chennai | Salesforce course in Chennai | Salesforce training institute in Chennai