3 Ways to Load Data From HDFS to HBase | Optimized Global Delivery
Optimized Global Delivery Blog

3 Ways to Load Data From HDFS to HBase

Apache HBase™ is the Hadoop database, a distributed, scalable, big data store. If you are importing into a new table, you can bypass the HBase API and write your content directly to the filesystem, formatted into HBase data files (HFiles). Your import will run much faster.

There are some ways to load data from HDFS to HBase. I practiced loading data from HDFS to HBase and listed my process step-by-step below.

Environment:

Pseudo-Distributed Local Install

Hadoop 2.6.0

HBase 0.98.13

1.Using ImportTsv to load txt to HBase

a) Create table in hbase

command:create ‘tab3′,’cf’

01

b) Uploading simple1.txt to HDFS

command:bin/hadoop fs -copyFromLocal simple1.txt  /user/hadoop/simple1.txt

The context in the txt is:
1,tom
2,sam
3,jerry
4,marry
5,john

c) Using ImportTsv to load txt to HBase

command:bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=”,” -Dimporttsv.columns=HBASE_ROW_KEY,cf tab4 /user/hadoop/simple1.txt
ImportTsv execute result:

02

Data loaded into Hbase:

03

2.Using completebulkload to load txt to HBase

a) creating table in hbase

command:create ‘hbase-tb1-003′,’cf’

04

b) Using ImportTsv to generate HFile for txt in HDFS

command:bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=”,” -Dimporttsv.bulk.output=hfile_tmp5 -Dimporttsv.columns=HBASE_ROW_KEY,cf hbase-tbl-003 /user/hadoop/simple1.txt

This command will be executed by MapReduce job:

05

As a result, the Hfile hfile_tmp5 is generated.

06

But the data wasn’t loaded into the Hbase table: hbase-tb1-003.

07

3.Using completebulkload to load Hfile to HBase

command: hadoop jar lib/hbase-server-0.98.13-hadoop2.jar completebulkload hfile_tmp5 hbase-tbl-003

08

Result:

09

Note: When we execute this command, Hadoop probably won’t be able to find hbase dependency jar files, through the exception of ClassNotFoundException. An easy way to resolve this is by adding HBase dependency jar files to the path ${HADOOP_HOME}/share/hadoop/common/lib, especially do not forget htrace-core-*.**.jar.

This command is really an action of hdfs mv, and will not execute hadoop MapReduce. You can see the log; it is LoadIncrementalHFiles.

Reference:http://hbase.apache.org/book.html#arch.bulk.load

Subscribe to the Multi-Shoring Weekly Digest

* indicates required

Leave a Reply

Optimized Global Delivery Blog

What it takes to deliver successful global engagements

Archives