Custom Writables in Hadoop (the RIGHT way)

Ok, there are a few reasons why I am writing this post.
1/ Resources on Hadoop online are just freaking TERRIBLE. I’m sorry, you guys are pretty good coders but why is your documentation absolute crap? Not everybody knows the “basics” of Hadoop and not everybody is a super star such as yourself.

2/ See #1

SO, say you are doing a fairly complex Hadoop program where you need to pass a lot of information between the Mapper and Reducer. For example, lets say you had a <key> <Value> pair like this?

     Key: Doc1  Value: Word1, 3
     Key: Doc2  Value: Word2, 10
     Key: Doc3  Value: Word3, 9

So I’m saying here that you have a document and you are outputting a word that appears therein and the count. That second part is the value, yes 2 things in the value. How would you normally do this? Make a TextWritable Object right?

Ok, this can be really inefficient when your value consists of dealing with insane amounts of data. So dude what are you doing!?? Use Hadoop’s built in Writable class to increase efficiency instead of passing Text Objects everywhere!

Here, I have a custom Writable that stores a Text (word) and IntWritable(count) for me.

package lincs.drexel.edu;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;

public class wordCountWritable implements Writable{

private Text word; //Stores a word
private IntWritable count; //stores the count

//null arguments YOU NEED THIS so hadoop can init the writable
public wordCountWritable(){
setWord(“”);
setCount(0);
}

public wordCountWritable(String word,String count){
setWord(word);
setCount(Integer.parseInt(count));

}

public wordCountWritable(String word, int v1) {
setWord(word);
setCount(v1);
}
@Override
public void readFields(DataInput in) throws IOException {
// TODO Auto-generated method stub
word.readFields(in);
count.readFields(in);
}

@Override
public void write(DataOutput out) throws IOException {
// TODO Auto-generated method stub
word.write(out);
count.write(out);
}

public int getCount() {
return count.get();
}
public void setCount(int count) {
this.count = new IntWritable(count);
}
public String getWord() {
return word.toString();
}
public void setWord(String word) {
this.word = new Text(word);
}

}

Ok, now the methods you see in the code are universal to all custom writables. First you need your private variables (the info you want to store) and make sure they are already hadoop writables. Next, MAKE SURE TO INCLUDE AN EMPTY CONSTRUCTOR!!

Next, make sure to define the readFields() and writeFields() methods. They are standard. Cool. This is your custom writable class so when you are outputting stuff from the Mapper, you can simply do

context.write(watever_is_your_key, your_custom_writable_object) //I use context cus I’m using hadoop 0.20 api

Cool.
Now, this is the reason why the other resources online are crap. THEY NEVER TELL YOU WHAT TO DO WITH YOUR CUSTOM WRITEABLE!
HOW DO YOU USE IT???

No worries, I will show you. The writable we created was a “wordCountWritable” right? So we will probably use it in our mapper to output the value. (when using a writable as a key, you need to implement the “equals” method).

Ok, so our Mapper should output a wordCountWritable value. How do we do this?

Here is how

    job.setMapOutputKeyClass(IntWritable.class);
    job.setMapOutputValueClass(wordCountWritable.class);

Easy right? Make sure to put THAT in the main program which runs your mapper and reducer. You also have to specify the I/O of the Mapper and Reducer just like you always do in a standard hadoop program

 public static class WordMapper extends Mapper<LongWritable, Text, IntWritable,wordCountWritable>

So on.
Now that wasn’t so bad was it?
Was that SO HARD TO SAY IN SOME OF THE GUIDES? I found this out through a gracious stackoverflow question where the answer was BURIED on the answer thread.

I hope now you understand how to make your own custom writeable and hopefully I saved you a little trouble.

I am a beginner to Hadoop but feel free to ask me questions on the comment section and I will try my best to answer them.

Advertisements

3 thoughts on “Custom Writables in Hadoop (the RIGHT way)

  1. Hey so after I make my custom writable how should I specify in my mapper function that which two inputs are the two things for my value?

Tell me what you think!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s