3 Ways to Load Data From HDFS to HBase | Optimized Global Delivery
Optimized Global Delivery Blog

3 Ways to Load Data From HDFS to HBase

Apache HBase™ is the Hadoop database: a distributed, scalable, big data store. If you are importing into a new table, you can bypass the HBase API and write your content directly to the filesystem, formatted into HBase data files (HFiles). Your import will run much faster.

There are several ways to load data from HDFS to HBase. I practiced loading data from HDFS to HBase and listed my process step-by-step below.

Environment:

Pseudo-Distributed Local Install

Hadoop 2.6.0

HBase 0.98.13

1.Using ImportTsv to load txt to HBase

a) Create table in hbase

command:create ‘tab3′,’cf’

3 Ways to Load Data From HDFS to HBase

b) Uploading simple1.txt to HDFS

command:bin/hadoop fs -copyFromLocal simple1.txt  /user/hadoop/simple1.txt

The context in the txt is:
1,tom
2,sam
3,jerry
4,marry
5,john

c) Using ImportTsv to load txt to HBase

command:bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=”,” -Dimporttsv.columns=HBASE_ROW_KEY,cf tab4 /user/hadoop/simple1.txt
ImportTsv execute result:

3 Ways to Load Data From HDFS to HBase

Data loaded into Hbase:

3 Ways to Load Data From HDFS to HBase

2.Using completebulkload to load txt to HBase

a) creating table in hbase

command:create ‘hbase-tb1-003′,’cf’

3 Ways to Load Data From HDFS to HBase

b) Using ImportTsv to generate HFile for txt in HDFS

command:bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=”,” -Dimporttsv.bulk.output=hfile_tmp5 -Dimporttsv.columns=HBASE_ROW_KEY,cf hbase-tbl-003 /user/hadoop/simple1.txt

This command will be executed by MapReduce job:

3 Ways to Load Data From HDFS to HBase

As a result, the Hfile hfile_tmp5 is generated.

3 Ways to Load Data From HDFS to HBase

But the data wasn’t loaded into the Hbase table: hbase-tb1-003.

3 Ways to Load Data From HDFS to HBase

3.Using completebulkload to load Hfile to HBase

command: hadoop jar lib/hbase-server-0.98.13-hadoop2.jar completebulkload hfile_tmp5 hbase-tbl-003

3 Ways to Load Data From HDFS to HBase

Result:

3 Ways to Load Data From HDFS to HBase

Note: When we execute this command, Hadoop probably won’t be able to find the hbase dependency jar files through the exception of ClassNotFoundException. An easy way to resolve this is by adding HBase dependency jar files to the path ${HADOOP_HOME}/share/hadoop/common/lib. Do not forget htrace-core-*.**.jar.

This command is really an action of hdfs mv, and will not execute hadoop MapReduce. You can see the log; it is LoadIncrementalHFiles.

Reference:http://hbase.apache.org/book.html#arch.bulk.load

Subscribe to the Optimized Global Delivery Weekly Digest

* indicates required

Leave a Reply

Optimized Global Delivery Blog

What it takes to deliver successful global engagements

Archives