Skip to main content

Development

How to Load Data to Hbase – doBulkLoad

shutterstock_122786062_350We can use TableMapReduceUtil.initTableReducerJob and put the method in Hbase API, but we can also use doBulkLoad to load data to Hbase. In a previous post, I introduced using importtsv and completebulkload hbase shell command to load data to Hbase. In this post, I will introduce how to implement it by JAVA language.

Advantage:

Put method in Hbase API: Easy to understand; it is better to load little data.

TableMapReduceUtil.initTableReducerJob: Able to load data to Hbase after MapReduce, it is better when you need handle data with complex logic.

doBulkLoad: It can relieve stress on HBase, also it is better way when load huge data.

Below is my test result:

Environment:

Pseudo-Distributed Local Install

Hadoop 2.6.0

HBase 0.98.13

  1. Create ‘scores’,’grade’,’course’ in hbase
  2. Use the Java code to generate hdfs file to Hfile.

HFileGenerator.java

import java.io.IOException;

 

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.KeyValue;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat;

import org.apache.hadoop.hbase.mapreduce.KeyValueSortReducer;

import org.apache.hadoop.hbase.mapreduce.SimpleTotalOrderPartitioner;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

//import org.apache.hadoop.util.GenericOptionsParser;

 

 

public class HFileGenerator {

 

public static class HFileMapper extends

Mapper<LongWritable, Text, ImmutableBytesWritable, KeyValue> {

@Override

protected void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException {

String line = value.toString();

String family =”course”;

String[] items = line.split(” “);

ImmutableBytesWritable rowkey = new ImmutableBytesWritable(

items[0].getBytes());

 

KeyValue kv = new KeyValue(

Bytes.toBytes(items[0]),Bytes.toBytes(family),Bytes.toBytes(items[1]),Bytes.toBytes(items[2]));

if (null != kv) {

context.write(rowkey, kv);

}

}

}

 

public static void main(String[] args) throws IOException,

InterruptedException, ClassNotFoundException {

Configuration conf = HBaseConfiguration.create();

conf.set(“hbase.zookeeper.quorum”, “10.2.7.52”);

//conf.set(“hbase.master”, “10.2.6.116:60010”);

Job job = new Job(conf, “HFile bulk load test”);

job.setJarByClass(HFileGenerator.class);

 

job.setMapperClass(HFileMapper.class);

job.setReducerClass(KeyValueSortReducer.class);

 

job.setMapOutputKeyClass(ImmutableBytesWritable.class);

job.setMapOutputValueClass(KeyValue.class);

job.setOutputFormatClass(HFileOutputFormat.class);

job.setPartitionerClass(SimpleTotalOrderPartitioner.class);

String tableName = “scores”;

HTable table = new HTable(conf,tableName.getBytes());

HFileOutputFormat.configureIncrementalLoad(job,table);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

 

HFileLoader.java

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles;

 

public class HFileLoader {

 

public static void main(String[] args) throws Exception {

Configuration conf = HBaseConfiguration.create();

conf.set(“hbase.zookeeper.quorum”, “10.2.7.52”);

//conf.set(“hbase.master”, “10.2.6.116:60010”);

String tableName = “scores”;

HTable table;

table = new HTable(conf,tableName.getBytes());

 

LoadIncrementalHFiles loader = new LoadIncrementalHFiles(

conf);

loader.doBulkLoad(new Path(“hdfs://10.2.7.52:9000/join_out1”), table);

}

 

}

 

Below is the test result:

First load scores.txt.

01

02

Secondary load scores1.txt.

03

04

 

References:

http://blog.pureisle.net/archives/1950.html

http://www.importnew.com/3912.html

http://blog.csdn.net/heyongluoyao8/article/details/25426481#comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Follow Us