Wednesday, 22 June 2016

HBase Summary



Data Modeling Overview



HBase is different from RDBMS in the sense that it has cells and column families..




Unlike in RDBMS, HBase has row, column family, column and timestamp 
in there.


One dimension you don't see in below picture is the time stamp associated 
to value in the cell.








Customer regional server has multiple column families and data is stored in HFile.






Best Practices..













If we don't have hotspotting, there will be nice distribution of data across the cluster.








Note that row key is repeated with every column and cell. It occupies significant amount of
space.

































































































































Securing HBase


Server side configuration:




Client side configuration:





MapReduce Integration with HBase

bin/hbase mapredcp  command returns the class-path for mapreduce dependencies








HBase is atomic and consistent (not eventual consistence..)





























































Ideal for local testing.




















Installing HBase in Local Mode








Set hbase.rootdir and hbase.zookeeper.property.datadir in conf/hbase-site.xml to write 
data other than /tmp.





bin/start-hbase.sh command can be used to start HBase..



bin/stop-hbase.sh command can be used to stop HBase..







HBase cluster can have up to 9 back up masters.






HBase Web-Based Management Console













Using the HBase shell

Make sure HBase is running before starting the shell. 
bin/hbase shell command can be used to start the shell.















Using the HBase as a Data Sink for MapReduce Jobs


TableMapReduceUtil is HBase specific util class that will setup configuration needed for
HBase.


Using the HBase as a Data Source for MapReduce Jobs

TableMapReduceUtil.initTbaleMapperJob takes name of HBase table used for mapper, scan (may contain
filters), mapper class, key (ImmutableBytesWritable.class) and values (IntWritable.class).



Bulk Loading Data





Splitting Map Tasks when Sourcing an HBase Table



Accessing Other HBase Tables within a MapReduce Job



Taking a Snapshot








2 comments: