3 Ways to Load Data From HDFS to HBase / Blogs / Perficient

Apache HBase™ is the Hadoop database: a distributed, scalable, big data store. If you are importing into a new table, you can bypass the HBase API and write your content directly to the filesystem, formatted into HBase data files (HFiles). Your import will run much faster.

There are several ways to load data from HDFS to HBase. I practiced loading data from HDFS to HBase and listed my process step-by-step below.

Environment:

Pseudo-Distributed Local Install

Hadoop 2.6.0

HBase 0.98.13

1.Using ImportTsv to load txt to HBase

a) Create table in hbase

command：create ‘tab3′,’cf’

b) Uploading simple1.txt to HDFS

command：bin/hadoop fs -copyFromLocal simple1.txt /user/hadoop/simple1.txt

The context in the txt is:
1,tom
2,sam
3,jerry
4,marry
5,john

c) Using ImportTsv to load txt to HBase

command：bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=”,” -Dimporttsv.columns=HBASE_ROW_KEY,cf tab4 /user/hadoop/simple1.txt
ImportTsv execute result:

Data loaded into Hbase:

Choosing a Global Software Development Partner to Accelerate Your Digital Strategy

To be successful and outpace the competition, you need a software development partner that excels in exactly the type of digital projects you are now faced with accelerating, and in the most cost effective and optimized way possible.

Get the Guide

2.Using completebulkload to load txt to HBase

a) creating table in hbase

command：create ‘hbase-tb1-003′,’cf’

b) Using ImportTsv to generate HFile for txt in HDFS

command：bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=”,” -Dimporttsv.bulk.output=hfile_tmp5 -Dimporttsv.columns=HBASE_ROW_KEY,cf hbase-tbl-003 /user/hadoop/simple1.txt

This command will be executed by MapReduce job:

As a result, the Hfile hfile_tmp5 is generated.

But the data wasn’t loaded into the Hbase table: hbase-tb1-003.

3.Using completebulkload to load Hfile to HBase

command: hadoop jar lib/hbase-server-0.98.13-hadoop2.jar completebulkload hfile_tmp5 hbase-tbl-003

Result:

Note: When we execute this command, Hadoop probably won’t be able to find the hbase dependency jar files through the exception of ClassNotFoundException. An easy way to resolve this is by adding HBase dependency jar files to the path ${HADOOP_HOME}/share/hadoop/common/lib. Do not forget htrace-core-*.**.jar.

This command is really an action of hdfs mv, and will not execute hadoop MapReduce. You can see the log; it is LoadIncrementalHFiles.

Reference：http://hbase.apache.org/book.html#arch.bulk.load

3 Ways to Load Data From HDFS to HBase

by Callen Wu on September 9th, 2015 | ~ minute read

Environment:

1.Using ImportTsv to load txt to HBase

a) Create table in hbase

b) Uploading simple1.txt to HDFS

c) Using ImportTsv to load txt to HBase

Choosing a Global Software Development Partner to Accelerate Your Digital Strategy

a) creating table in hbase

b) Using ImportTsv to generate HFile for txt in HDFS

Tags

Leave a Reply

Callen Wu

Categories

Follow Us