<p dir="ltr"></p>
<p dir="ltr">-Atin<br>
Sent from one plus one<br>
On Aug 10, 2015 7:19 PM, "Kingsley" <<a href="mailto:gluster@gluster.dogwind.com">gluster@gluster.dogwind.com</a>> wrote:<br>
><br>
> Further to this, the volume doesn't seem overly healthy. Any idea how I<br>
> can get it back into a working state?<br>
><br>
> Trying to access one particular directory on the clients just hangs. If<br>
> I query heal info, that directory appears in the output as possibly<br>
> undergoing heal (actual directory name changed as it's private info):<br>
Can you execute strace and see which call is stuck? That would help us to get to the exact component which we would need to look at.<br>
><br>
> [root@gluster1b-1 ~]# gluster volume heal callrec info<br>
> Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/<br>
> <gfid:164f888f-2049-49e6-ad26-c758ee091863><br>
> /recordings/834723/14391 - Possibly undergoing heal<br>
><br>
> <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f><br>
> <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e><br>
> <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c><br>
> <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb><br>
> <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd><br>
> Number of entries: 7<br>
><br>
> Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/<br>
> Number of entries: 0<br>
><br>
> Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/<br>
> <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f><br>
> <gfid:164f888f-2049-49e6-ad26-c758ee091863><br>
> <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd><br>
> <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e><br>
> /recordings/834723/14391 - Possibly undergoing heal<br>
><br>
> <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c><br>
> <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb><br>
> Number of entries: 7<br>
><br>
> Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/<br>
> Number of entries: 0<br>
><br>
><br>
> If I query each brick directly for the number of files/directories<br>
> within that, I get 1731 on gluster1a-1 and gluster2a-1, but 1737 on the<br>
> other two, using this command:<br>
><br>
> # find /data/brick/callrec/recordings/834723/14391 -print | wc -l<br>
><br>
> Cheers,<br>
> Kingsley.<br>
><br>
> On Mon, 2015-08-10 at 11:05 +0100, Kingsley wrote:<br>
> > Sorry for the blind panic - restarting the volume seems to have fixed<br>
> > it.<br>
> ><br>
> > But then my next question - why is this necessary? Surely it undermines<br>
> > the whole point of a high availability system?<br>
> ><br>
> > Cheers,<br>
> > Kingsley.<br>
> ><br>
> > On Mon, 2015-08-10 at 10:53 +0100, Kingsley wrote:<br>
> > > Hi,<br>
> > ><br>
> > > We have a 4 way replicated volume using gluster 3.6.3 on CentOS 7.<br>
> > ><br>
> > > Over the weekend I did a yum update on each of the bricks in turn, but<br>
> > > now when clients (using fuse mounts) try to access the volume, it hangs.<br>
> > > Gluster itself wasn't updated (we've disabled that repo so that we keep<br>
> > > to 3.6.3 for now).<br>
> > ><br>
> > > This was what I did:<br>
> > ><br>
> > > * on first brick, "yum update"<br>
> > > * reboot brick<br>
> > > * watch "gluster volume status" on another brick and wait for it<br>
> > > to say all 4 bricks are online before proceeding to update the<br>
> > > next brick<br>
> > ><br>
> > > I was expecting the clients might pause 30 seconds while they notice a<br>
> > > brick is offline, but then recover.<br>
> > ><br>
> > > I've tried re-mounting clients, but that hasn't helped.<br>
> > ><br>
> > > I can't see much data in any of the log files.<br>
> > ><br>
> > > I've tried "gluster volume heal callrec" but it doesn't seem to have<br>
> > > helped.<br>
> > ><br>
> > > What shall I do next?<br>
> > ><br>
> > > I've pasted some stuff below in case any of it helps.<br>
> > ><br>
> > > Cheers,<br>
> > > Kingsley.<br>
> > ><br>
> > > [root@gluster1b-1 ~]# gluster volume info callrec<br>
> > ><br>
> > > Volume Name: callrec<br>
> > > Type: Replicate<br>
> > > Volume ID: a39830b7-eddb-4061-b381-39411274131a<br>
> > > Status: Started<br>
> > > Number of Bricks: 1 x 4 = 4<br>
> > > Transport-type: tcp<br>
> > > Bricks:<br>
> > > Brick1: gluster1a-1:/data/brick/callrec<br>
> > > Brick2: gluster1b-1:/data/brick/callrec<br>
> > > Brick3: gluster2a-1:/data/brick/callrec<br>
> > > Brick4: gluster2b-1:/data/brick/callrec<br>
> > > Options Reconfigured:<br>
> > > performance.flush-behind: off<br>
> > > [root@gluster1b-1 ~]#<br>
> > ><br>
> > ><br>
> > > [root@gluster1b-1 ~]# gluster volume status callrec<br>
> > > Status of volume: callrec<br>
> > > Gluster process Port Online Pid<br>
> > > ------------------------------------------------------------------------------<br>
> > > Brick gluster1a-1:/data/brick/callrec 49153 Y 6803<br>
> > > Brick gluster1b-1:/data/brick/callrec 49153 Y 2614<br>
> > > Brick gluster2a-1:/data/brick/callrec 49153 Y 2645<br>
> > > Brick gluster2b-1:/data/brick/callrec 49153 Y 4325<br>
> > > NFS Server on localhost 2049 Y 2769<br>
> > > Self-heal Daemon on localhost N/A Y 2789<br>
> > > NFS Server on gluster2a-1 2049 Y 2857<br>
> > > Self-heal Daemon on gluster2a-1 N/A Y 2814<br>
> > > NFS Server on 88.151.41.100 2049 Y 6833<br>
> > > Self-heal Daemon on 88.151.41.100 N/A Y 6824<br>
> > > NFS Server on gluster2b-1 2049 Y 4428<br>
> > > Self-heal Daemon on gluster2b-1 N/A Y 4387<br>
> > ><br>
> > > Task Status of Volume callrec<br>
> > > ------------------------------------------------------------------------------<br>
> > > There are no active volume tasks<br>
> > ><br>
> > > [root@gluster1b-1 ~]#<br>
> > ><br>
> > ><br>
> > > [root@gluster1b-1 ~]# gluster volume heal callrec info<br>
> > > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/<br>
> > > /to_process - Possibly undergoing heal<br>
> > ><br>
> > > Number of entries: 1<br>
> > ><br>
> > > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/<br>
> > > Number of entries: 0<br>
> > ><br>
> > > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/<br>
> > > /to_process - Possibly undergoing heal<br>
> > ><br>
> > > Number of entries: 1<br>
> > ><br>
> > > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/<br>
> > > Number of entries: 0<br>
> > ><br>
> > > [root@gluster1b-1 ~]#<br>
> > ><br>
> > ><br>
> > > _______________________________________________<br>
> > > Gluster-users mailing list<br>
> > > <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
> > > <a href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
> > ><br>
> ><br>
> > _______________________________________________<br>
> > Gluster-users mailing list<br>
> > <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
> > <a href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
> ><br>
><br>
> _______________________________________________<br>
> Gluster-users mailing list<br>
> <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
> <a href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>
</p>