Debugging HBASE cluster setup

Gluster

2013-05-22

Setting up HBase can be tricky because of the intermediate states of processes which may be running. Here are some important configs I found + an idempotent install script, reproducible hbase deployment which cleans your system and restarts hbase from scratch, so that you can easily update configs until everything works correctly. I also used some log grepping scripts alongside this to quickly and automatically report errors in the setup after running the script.

0) Off all the things you have to worry about: This isn’t one of them 🙂

This exception always scares me…. but its usually nothing.

Got user-level KeeperException when processing sessionid:0x14184e8362a0000 type:create cxid:0x18 zxid:0x8c txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for /hbase/online-…….

Its basically an error which is indicating that a znode already exists, so no need to create one.

1) The most important thing to get right is /etc/hosts. “Can’t connect to master” exceptions might ensue if its not right…. /etc/hosts It should look something like this:

#Note that the loopback NEEDS TO BE 127.0.0.1 (ubuntu deviates from this, so you have to fix it).
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.122.200 hbase-master

192.168.122.201 hbase-regionserver1

192.168.122.202 hbase-regionserver2

192.168.122.203 hbase-regionserver3

(note – hbase-master isn’t identified as localhost)… this is important. Also note that the related “PleaseHoldException” is related to a failed master – but can be caused by more than just bad hosts. It can be caused , for example, if the Hmaster fails to start due to internal or file system errors.

It can also be caused by zookeeper being in a bad state, see http://architects.dzone.com/articles/hbase-error-region-not-online, which can be the underlying exception behind the master is initializing message.

2) Make sure zookeeper is running properly, otherwise nothing will work. Each region server AND the master should be defined in a comma delimited string in the <value> tag in the hbase-site.xml file:

                <property>
                <name>hbase.zookeeper.quorum</name>
                <value>hbase-master,hbase-regionserver1,hbase-regionserver2,hbase-regionserver3</value>
                </property>

3) If queries fail because no active master is found, you need to (duh) make sure your zookeeper “parent” node is running (in my case, its the same as hbase master). For example, given this configuration. Note that you dont need this on master.

                <property>
                <name>zookeeper.znode.parent</name>
                <value>/hbase-master</value>
                </property>

You should see the following on the “hbase-master” machine.

[root@hbase-master]> jps

12388 HQuorumPeer
14136 HMaster
2952 JobTracker

4) Avoid time sync exceptions which will prevent cluster startup — Install NTP and start it on all nodes – clocks on the nodes need to be synchronized.

http://www.cyberciti.biz/faq/howto-install-ntp-to-synchronize-server-clock/

5) This script will totally clean out your file system and Hbase daemons… Don’t use it in production ! Just use it for installation…

#Setting up and debugging hbase can be a little tricky – best to automate it
#with log cleaning in a script like this (run it from your head node on your
#newly installed test cluster). At the end of this script, a test table is created
#in debug mode – so you can see any ensuing errors… This script should be
#idempotent (modify highlighted parts for your cluster).

nodes=(hbase-master hbase-regionserver1 hbase-regionserver2 hbase-regionserver3)

echo $nodes
hbaseinstall/hbase-0.94.7/bin/hbase-daemon.sh stop master

echo “WARNING !!!! CLEARING OUT ALL OF YOUR HBASE DATA HIT A KEY TO CONTINUE !!!”
read

echo “CLEARING hbase/ IN 5 SECONDS!”
sleep 5

hadoop fs -rmr hbase/* #if using other file systems (S3, gluster, etc..), you might modify this line.

for i in “${nodes[@]}”
do
    echo “Cleaning $i”
    #Get rid of logs, so that after restart/reconfiguring you can easily debug the changes.
    ssh root@$i rm -rf /tmp/hbase-root/*
    ssh root@$i rm -rf hbaseinstall/hbase-0.94.7/logs/*
    #Reliably kill Zookeeper/RegionServers .
    ssh root@$i killall -9 java
    echo “Done…”
done

#############################
sleep 2
#############################

echo “restarting hbase”
hbaseinstall/hbase-0.94.7/bin/start-hbase.sh

##############################
sleep 2
#######################################################

#Now , invoke a shell in debug mode and create a table.
hbaseinstall/hbase-0.94.7/bin/hbase shell -d <<EOF create 't1','f1'
put 't1', 'row1', 'f1:a', 'val1'
scan 't1'
EOF

Debugging HBASE cluster setup

BLOG

Looking back at 2020 – with g...

Update from the team

Building a longer term focus for Gl...