We can use TableMapReduceUtil.initTableReducerJob and put the method in Hbase API, but we can also use doBulkLoad to load data to Hbase. In a previous post, I introduced using importtsv and completebulkload hbase shell command to load data to Hbase. In this post, I will introduce how to implement it by JAVA language.
Advantage:
Put method in Hbase API: Easy to understand; it is better to load little data.
TableMapReduceUtil.initTableReducerJob: Able to load data to Hbase after MapReduce, it is better when you need handle data with complex logic.
doBulkLoad: It can relieve stress on HBase, also it is better way when load huge data.
Below is my test result:
Environment:
Pseudo-Distributed Local Install
Hadoop 2.6.0
HBase 0.98.13
- Create ‘scores’,’grade’,’course’ in hbase
- Use the Java code to generate hdfs file to Hfile.
HFileGenerator.java
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat;
import org.apache.hadoop.hbase.mapreduce.KeyValueSortReducer;
import org.apache.hadoop.hbase.mapreduce.SimpleTotalOrderPartitioner;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
//import org.apache.hadoop.util.GenericOptionsParser;
public class HFileGenerator {
public static class HFileMapper extends
Mapper<LongWritable, Text, ImmutableBytesWritable, KeyValue> {
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String family =”course”;
String[] items = line.split(” “);
ImmutableBytesWritable rowkey = new ImmutableBytesWritable(
items[0].getBytes());
KeyValue kv = new KeyValue(
Bytes.toBytes(items[0]),Bytes.toBytes(family),Bytes.toBytes(items[1]),Bytes.toBytes(items[2]));
if (null != kv) {
context.write(rowkey, kv);
}
}
}
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
Configuration conf = HBaseConfiguration.create();
conf.set(“hbase.zookeeper.quorum”, “10.2.7.52”);
//conf.set(“hbase.master”, “10.2.6.116:60010”);
Job job = new Job(conf, “HFile bulk load test”);
job.setJarByClass(HFileGenerator.class);
job.setMapperClass(HFileMapper.class);
job.setReducerClass(KeyValueSortReducer.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(KeyValue.class);
job.setOutputFormatClass(HFileOutputFormat.class);
job.setPartitionerClass(SimpleTotalOrderPartitioner.class);
String tableName = “scores”;
HTable table = new HTable(conf,tableName.getBytes());
HFileOutputFormat.configureIncrementalLoad(job,table);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
HFileLoader.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles;
public class HFileLoader {
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
conf.set(“hbase.zookeeper.quorum”, “10.2.7.52”);
//conf.set(“hbase.master”, “10.2.6.116:60010”);
String tableName = “scores”;
HTable table;
table = new HTable(conf,tableName.getBytes());
LoadIncrementalHFiles loader = new LoadIncrementalHFiles(
conf);
loader.doBulkLoad(new Path(“hdfs://10.2.7.52:9000/join_out1”), table);
}
}
Below is the test result:
First load scores.txt.
Secondary load scores1.txt.
References:
http://blog.pureisle.net/archives/1950.html
http://www.importnew.com/3912.html
http://blog.csdn.net/heyongluoyao8/article/details/25426481#comments