The Gluster Blog

Gluster blog stories provide high-level spotlights on our users all over the world

Debugging HBASE cluster setup

Gluster
2013-05-22

Setting up HBase can be tricky because of the intermediate states of processes which may be running.  Here are some important configs I found + an idempotent install script, reproducible hbase deployment which cleans your system and restarts hbase from scratch, so that you can easily update configs until everything works correctly.  I also used some log grepping scripts alongside this to quickly and automatically report errors in the setup after running the script.

0) Off all the things you have to worry about: This isn’t one of them 🙂

This exception always scares me…. but its usually nothing.  

Got user-level KeeperException when processing sessionid:0x14184e8362a0000 type:create cxid:0x18 zxid:0x8c txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-…….

Its basically an error which is indicating that a znode already exists, so no need to create one.  

 
1) The most important thing to get right is /etc/hosts.  “Can’t connect to master” exceptions might ensue if its not right…. /etc/hosts It should look something like this:

#Note that the loopback NEEDS TO BE 127.0.0.1 (ubuntu deviates from this, so you have to fix it).
127.0.0.1   localhost  localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.122.200 hbase-master
192.168.122.201 hbase-regionserver1
192.168.122.202 hbase-regionserver2
192.168.122.203 hbase-regionserver3

(note – hbase-master isn’t identified as localhost)… this is important.  Also note that the related “PleaseHoldException” is related to a failed master – but can be caused by more than just bad hosts.  It can be caused , for example, if the Hmaster fails to start due to internal or file system errors.

It can also be caused by zookeeper being in a bad state, see http://architects.dzone.com/articles/hbase-error-region-not-online, which can be the underlying exception behind the master is initializing message.

2) Make sure zookeeper is running properly, otherwise nothing will work.  Each region server AND the master should be defined in a comma delimited string in the <value> tag in the hbase-site.xml file:

                <property>
                <name>hbase.zookeeper.quorum</name>
                <value>hbase-master,hbase-regionserver1,hbase-regionserver2,hbase-regionserver3</value>
                </property>

3) If queries fail because no active master is found, you need to (duh) make sure your zookeeper “parent” node is running (in my case, its the same as hbase master).  For example, given this configuration.  Note that you dont need this on master.

                <property>
                <name>zookeeper.znode.parent</name>
                <value>/hbase-master</value>
                </property>

You should see the following on the “hbase-master” machine.

[root@hbase-master]> jps

12388 HQuorumPeer
14136 HMaster
2952 JobTracker


4) Avoid time sync exceptions which will prevent cluster startup — Install NTP and start it on all nodes – clocks on the nodes need to be synchronized.

http://www.cyberciti.biz/faq/howto-install-ntp-to-synchronize-server-clock/

5) This script will totally clean out your file system and Hbase daemons… Don’t use it in production !  Just use it for installation… 

#Setting up and debugging hbase can be a little tricky – best to automate it
#with log cleaning in a script like this (run it from your head node on your
#newly installed test cluster).  At the end of this script, a test table is created
#in debug mode – so you can see any ensuing errors…  This script should be
#idempotent (modify highlighted parts for your cluster).

nodes=(hbase-master hbase-regionserver1 hbase-regionserver2 hbase-regionserver3)

echo $nodes
hbaseinstall/hbase-0.94.7/bin/hbase-daemon.sh stop master

echo “WARNING !!!! CLEARING OUT ALL OF YOUR HBASE DATA HIT A KEY TO CONTINUE !!!”
read

echo “CLEARING hbase/ IN 5 SECONDS!”
sleep 5

hadoop fs -rmr hbase/*  #if using other file systems (S3, gluster, etc..), you might modify this line.

for i in “${nodes[@]}”
do
    echo “Cleaning $i”
    #Get rid of logs, so that after restart/reconfiguring you can easily  debug the changes.
    ssh root@$i rm -rf /tmp/hbase-root/*
    ssh root@$i rm -rf hbaseinstall/hbase-0.94.7/logs/*
    #Reliably kill Zookeeper/RegionServers .
    ssh root@$i killall -9 java
    echo “Done…”
done

#############################
sleep 2
#############################

echo “restarting hbase”
hbaseinstall/hbase-0.94.7/bin/start-hbase.sh

##############################

sleep 2
#######################################################

#Now , invoke a shell in debug mode and create a table. 
hbaseinstall/hbase-0.94.7/bin/hbase shell -d <<EOF
create 't1','f1' 

put 't1', 'row1', 'f1:a', 'val1'
scan 't1'
EOF
 


BLOG

  • 06 Dec 2020
    Looking back at 2020 – with g...

    2020 has not been a year we would have been able to predict. With a worldwide pandemic and lives thrown out of gear, as we head into 2021, we are thankful that our community and project continued to receive new developers, users and make small gains. For that and a...

    Read more
  • 27 Apr 2020
    Update from the team

    It has been a while since we provided an update to the Gluster community. Across the world various nations, states and localities have put together sets of guidelines around shelter-in-place and quarantine. We request our community members to stay safe, to care for their loved ones, to continue to be...

    Read more
  • 03 Feb 2020
    Building a longer term focus for Gl...

    The initial rounds of conversation around the planning of content for release 8 has helped the project identify one key thing – the need to stagger out features and enhancements over multiple releases. Thus, while release 8 is unlikely to be feature heavy as previous releases, it will be the...

    Read more