Apache HBase™ is the Hadoop database: a distributed, scalable, big data store. If you are importing into a new table, you can bypass the HBase API and write your content directly to the filesystem, formatted into HBase data files (HFiles). Your import will run much faster.
There are several ways to load data from HDFS to HBase. I practiced loading data from HDFS to HBase and listed my process step-by-step below.
Environment:
Pseudo-Distributed Local Install
Hadoop 2.6.0
HBase 0.98.13
1.Using ImportTsv to load txt to HBase
a) Create table in hbase
command:create ‘tab3′,’cf’
b) Uploading simple1.txt to HDFS
command:bin/hadoop fs -copyFromLocal simple1.txt /user/hadoop/simple1.txt
The context in the txt is:
1,tom
2,sam
3,jerry
4,marry
5,john
c) Using ImportTsv to load txt to HBase
command:bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=”,” -Dimporttsv.columns=HBASE_ROW_KEY,cf tab4 /user/hadoop/simple1.txt
ImportTsv execute result:
Data loaded into Hbase:
2.Using completebulkload to load txt to HBase
a) creating table in hbase
command:create ‘hbase-tb1-003′,’cf’
b) Using ImportTsv to generate HFile for txt in HDFS
command:bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=”,” -Dimporttsv.bulk.output=hfile_tmp5 -Dimporttsv.columns=HBASE_ROW_KEY,cf hbase-tbl-003 /user/hadoop/simple1.txt
This command will be executed by MapReduce job:
As a result, the Hfile hfile_tmp5 is generated.
But the data wasn’t loaded into the Hbase table: hbase-tb1-003.
3.Using completebulkload to load Hfile to HBase
command: hadoop jar lib/hbase-server-0.98.13-hadoop2.jar completebulkload hfile_tmp5 hbase-tbl-003
Result:
Note: When we execute this command, Hadoop probably won’t be able to find the hbase dependency jar files through the exception of ClassNotFoundException. An easy way to resolve this is by adding HBase dependency jar files to the path ${HADOOP_HOME}/share/hadoop/common/lib. Do not forget htrace-core-*.**.jar.
This command is really an action of hdfs mv, and will not execute hadoop MapReduce. You can see the log; it is LoadIncrementalHFiles.