<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>

</head>

<body dir="ltr">

<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;">

<p>Greetings Gluster Users,</p>

<p>I started a rebalance operation on my distributed volume today (CentOS 6.6/GlusterFS 3.6.3), and one of the three hosts comprising the cluster is just sitting at 0.00 for 'run time in secs', and shows 0 files scanned, failed, or skipped.</p>

<p><br>

</p>

<p>I've reviewed the rebalance log for the affected server, and I'm seeing these messages:</p>

<p><br>

</p>

<p>[2015-06-03 15:34:32.703692] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.6.3 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/bigdata2 --xlator-option *dht.use-readdirp=yes

 --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off

 --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=1 --xlator-option *dht.node-uuid=3b5025d4-3230-4914-ad0d-32f78587c4db --socket-file /var/run/gluster/gluster-rebalance-2cd214fa-6fa4-49d0-93f6-de2c510d4dd4.sock --pid-file /var/lib/glusterd/vols/bigdata2/rebalance/3b5025d4-3230-4914-ad0d-32f78587c4db.pid

 -l /var/log/glusterfs/bigdata2-rebalance.log)<br>

[2015-06-03 15:34:32.704217] E [MSGID: 100018] [glusterfsd.c:1677:glusterfs_pidfile_update] 0-glusterfsd: pidfile /var/lib/glusterd/vols/bigdata2/rebalance/3b5025d4-3230-4914-ad0d-32f78587c4db.pid lock failed [Resource temporarily unavailable]</p>

<p><br>

</p>

<p>I initially investigated the first warning, readv on 127.0.0.1:24007 failed. netstat shows that ip/port belonging to a glusterd process. Beyond that I wasn't able to tell why there would be a problem.</p>

<p><br>

</p>

<p>Next, I checked out what was up with the lock file that reported resource temprarily unavailable. The file is present and contains the pid of a running glusterd process:</p>

<p><br>

</p>

<p>root&nbsp;&nbsp;&nbsp;&nbsp; 12776&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp; 0 10:18 ?&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id rebalance/bigdata2 --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option

 *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=1

 --xlator-option *dht.node-uuid=3b5025d4-3230-4914-ad0d-32f78587c4db --socket-file /var/run/gluster/gluster-rebalance-2cd214fa-6fa4-49d0-93f6-de2c510d4dd4.sock --pid-file /var/lib/glusterd/vols/bigdata2/rebalance/3b5025d4-3230-4914-ad0d-32f78587c4db.pid -l

 /var/log/glusterfs/bigdata2-rebalance.log<br>

</p>

<p><br>

</p>

<p>Finally, one other thing I saw from running 'gluster volume status &lt;volname&gt; clients' is that the affected server is the only one of the three that lists a 127.0.0.1:&lt;port&gt; client for each of it's bricks. I don't know why there would be a client coming from

 loopback on the server, but it seems strange. Additionally, it makes me wonder if the fact that I have auth.allow set to a single subnet (that doesn't include 127.0.0.1) is causing this problem for some reason, or if loopback is implicitly allowed to connect.</p>

<p><br>

</p>

<p>Any tips or suggestions would be much appreciated. Thanks!</p>

<p><br>

</p>

</div>

</body>

</html>