<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Apr 14, 2016 at 2:33 PM, Atin Mukherjee <span dir="ltr"><<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class=""><br>
<br>
On 04/05/2016 03:35 PM, ABHISHEK PALIWAL wrote:<br>
><br>
><br>
> On Tue, Apr 5, 2016 at 2:22 PM, Atin Mukherjee <<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a><br>
</span><div><div class="h5">> <mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>>> wrote:<br>
><br>
><br>
><br>
> On 04/05/2016 01:04 PM, ABHISHEK PALIWAL wrote:<br>
> > Hi Team,<br>
> ><br>
> > We are using Gluster 3.7.6 and facing one problem in which brick is not<br>
> > comming online after restart the board.<br>
> ><br>
> > To understand our setup, please look the following steps:<br>
> > 1. We have two boards A and B on which Gluster volume is running in<br>
> > replicated mode having one brick on each board.<br>
> > 2. Gluster mount point is present on the Board A which is sharable<br>
> > between number of processes.<br>
> > 3. Till now our volume is in sync and everthing is working fine.<br>
> > 4. Now we have test case in which we'll stop the glusterd, reboot the<br>
> > Board B and when this board comes up, starts the glusterd again on it.<br>
> > 5. We repeated Steps 4 multiple times to check the reliability of system.<br>
> > 6. After the Step 4, sometimes system comes in working state (i.e. in<br>
> > sync) but sometime we faces that brick of Board B is present in<br>
> > “gluster volume status” command but not be online even waiting for<br>
> > more than a minute.<br>
> As I mentioned in another email thread until and unless the log shows<br>
> the evidence that there was a reboot nothing can be concluded. The last<br>
> log what you shared with us few days back didn't give any indication<br>
> that brick process wasn't running.<br>
><br>
> How can we identify that the brick process is running in brick logs?<br>
><br>
> > 7. When the Step 4 is executing at the same time on Board A some<br>
> > processes are started accessing the files from the Gluster mount point.<br>
> ><br>
> > As a solution to make this brick online, we found some existing issues<br>
> > in gluster mailing list giving suggestion to use “gluster volume start<br>
> > <vol_name> force” to make the brick 'offline' to 'online'.<br>
> ><br>
> > If we use “gluster volume start <vol_name> force” command. It will kill<br>
> > the existing volume process and started the new process then what will<br>
> > happen if other processes are accessing the same volume at the time when<br>
> > volume process is killed by this command internally. Will it impact any<br>
> > failure on these processes?<br>
> This is not true, volume start force will start the brick processes only<br>
> if they are not running. Running brick processes will not be<br>
> interrupted.<br>
><br>
> we have tried and check the pid of process before force start and after<br>
> force start.<br>
> the pid has been changed after force start.<br>
><br>
> Please find the logs at the time of failure attached once again with<br>
> log-level=debug.<br>
><br>
> if you can give me the exact line where you are able to find out that<br>
> the brick process<br>
> is running in brick log file please give me the line number of that file.<br>
<br>
</div></div>Here is the sequence at which glusterd and respective brick process is<br>
restarted.<br>
<br>
1. glusterd restart trigger - line number 1014 in glusterd.log file:<br>
<br>
[2016-04-03 10:12:29.051735] I [MSGID: 100030] [glusterfsd.c:2318:main]<br>
0-/usr/sbin/glusterd: Started running /usr/sbin/ glusterd<br>
version 3.7.6 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid<br>
--log-level DEBUG)<br>
<br>
2. brick start trigger - line number 190 in opt-lvmdir-c2-brick.log<br>
<br>
[2016-04-03 10:14:25.268833] I [MSGID: 100030] [glusterfsd.c:2318:main]<br>
0-/usr/sbin/glusterfsd: Started running /usr/sbin/ glusterfsd<br>
version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144 --volfile-id<br>
c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p /<br>
system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid<br>
-S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005. socket<br>
--brick-name /opt/lvmdir/c2/brick -l<br>
/var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log --xlator-option<br>
*-posix.glusterd- uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256<br>
--brick-port 49329 --xlator-option c_glusterfs-server.listen-port=49329)<br>
<br>
3. The following log indicates that brick is up and is now started.<br>
Refer to line 16123 in glusterd.log<br>
<br>
[2016-04-03 10:14:25.336855] D [MSGID: 0]<br>
[glusterd-handler.c:4897:__glusterd_brick_rpc_notify] 0-management:<br>
Connected to 10.32.1.144:/opt/lvmdir/c2/brick<br>
<br>
This clearly indicates that the brick is up and running as after that I<br>
do not see any disconnect event been processed by glusterd for the brick<br>
process.<br></blockquote><div><br></div><div>Thanks for replying descriptively but please also clear some more doubts:<br><br></div><div>1. At this 10:14:25 moment of time brick is available because we have removed brick and added it again to make it online:<br></div><div>following are the logs from cmd-history.log file of 000300<br><br>[2016-04-03 10:14:21.446570] : volume status : SUCCESS<br>[2016-04-03 10:14:21.665889] : volume remove-brick c_glusterfs replica 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS<br>[2016-04-03 10:14:21.764270] : peer detach 10.32.1.144 : SUCCESS <br>[2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 : SUCCESS <br>[2016-04-03 10:14:25.649525] : volume add-brick c_glusterfs replica 2 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS<br><br></div><div>and also 10:12:29 was the last reboot time before this failure. So I am totally agree what you said earlier.<br><br></div><div>2 .As you said at 10:12:29 glusterd restarted then why we are not getting 'brick start trigger' related logs<br></div><div> like below between 10:12:29 to 10:14:25 time stamp which is something two minute of time interval.<br><br>[2016-04-03 10:14:25.268833] I [MSGID: 100030] [glusterfsd.c:2318:main]<br>
0-/usr/sbin/glusterfsd: Started running /usr/sbin/ glusterfsd<br>
version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144 --volfile-id<br>
c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p /<br>
system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid<br>
-S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005. socket<br>
--brick-name /opt/lvmdir/c2/brick -l<br>
/var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log --xlator-option<br>
*-posix.glusterd- uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256<br>
--brick-port 49329 --xlator-option c_glusterfs-server.listen-port=49329)<br></div><div><br></div><div>3. We are continuously checking brick status in the above time duration using "gluster volume status" refer the cmd-history.log file from 000300<br><br></div><div>In glusterd.log file we are also getting below logs<br><br>[2016-04-03 10:12:31.771051] D [MSGID: 0] [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] 0-management: Connected to 10.32.1.144:/opt/lvmdir/c2/brick<br><br>[2016-04-03 10:12:32.981152] D [MSGID: 0] [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] 0-management: Connected to 10.32.1.144:/opt/lvmdir/c2/brick<br></div><div><br></div><div>two times b/w 10:12:29 and 10:14:25 and as you said these logs " clearly indicates that the brick is up and running as after" then why brick is not online in "gluster volume status" command<br><br>[2016-04-03 10:12:33.990487] : volume status : SUCCESS<br>[2016-04-03 10:12:34.007469] : volume status : SUCCESS<br>[2016-04-03 10:12:35.095918] : volume status : SUCCESS<br>[2016-04-03 10:12:35.126369] : volume status : SUCCESS<br>[2016-04-03 10:12:36.224018] : volume status : SUCCESS<br>[2016-04-03 10:12:36.251032] : volume status : SUCCESS<br>[2016-04-03 10:12:37.352377] : volume status : SUCCESS<br>[2016-04-03 10:12:37.374028] : volume status : SUCCESS<br>[2016-04-03 10:12:38.446148] : volume status : SUCCESS<br>[2016-04-03 10:12:38.468860] : volume status : SUCCESS<br>[2016-04-03 10:12:39.534017] : volume status : SUCCESS<br>[2016-04-03 10:12:39.553711] : volume status : SUCCESS<br>[2016-04-03 10:12:40.616610] : volume status : SUCCESS<br>[2016-04-03 10:12:40.636354] : volume status : SUCCESS<br>......<br>......<br>......<br>[2016-04-03 10:14:21.446570] : volume status : SUCCESS<br>[2016-04-03 10:14:21.665889] : volume remove-brick c_glusterfs replica 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS<br>[2016-04-03 10:14:21.764270] : peer detach 10.32.1.144 : SUCCESS <br>[2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 : SUCCESS <br>[2016-04-03 10:14:25.649525] : volume add-brick c_glusterfs replica 2 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS<br><br></div><div>In above logs we are continuously checking brick status but when we don't find brick status 'online' even after ~2 minutes then we removed it and add it again to make it online.<br><br>[2016-04-03 10:14:21.665889] : volume remove-brick c_glusterfs replica 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS<br>[2016-04-03 10:14:21.764270] : peer detach 10.32.1.144 : SUCCESS <br>[2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 : SUCCESS <br>[2016-04-03 10:14:25.649525] : volume add-brick c_glusterfs replica 2 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS<br><br></div><div>that is why in logs we are gettting "brick start trigger logs" at time stamp 10:14:25<br><br>[2016-04-03 10:14:25.268833] I [MSGID: 100030] [glusterfsd.c:2318:main]<br>
0-/usr/sbin/glusterfsd: Started running /usr/sbin/ glusterfsd<br>
version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144 --volfile-id<br>
c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p /<br>
system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid<br>
-S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005. socket<br>
--brick-name /opt/lvmdir/c2/brick -l<br>
/var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log --xlator-option<br>
*-posix.glusterd- uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256<br>
--brick-port 49329 --xlator-option c_glusterfs-server.listen-port=49329)<br><br><br></div><div>Regards,<br></div><div>Abhishek<br></div><div>
<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Please note that all the logs referred and pasted are from 002500.<br>
<br>
~Atin<br>
<span class="">><br>
> 002500 - Board B that brick is offline<br>
> 00300 - Board A logs<br>
><br>
> ><br>
> > *Question : What could be contributing to brick offline?*<br>
> ><br>
> ><br>
> > --<br>
> ><br>
> > Regards<br>
> > Abhishek Paliwal<br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > Gluster-devel mailing list<br>
</span>> > <a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a> <mailto:<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a>><br>
> > <a href="http://www.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br>
> ><br>
><br>
><br>
><br>
><br>
</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr"><br><br><br><br>Regards<br>
Abhishek Paliwal<br>
</div></div>
</div></div>