<div dir="ltr"><div>I am talking about the time taken by the GlusterD to mark the process offline because <br></div><div>here GlusterD is responsible to making brick online/offline.<br></div><div><br></div>is it configurable?<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 4, 2016 at 5:53 PM, Atin Mukherjee <span dir="ltr"><<a href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Abhishek,<br>
<br>
See the response inline.<br>
<span class=""><br>
<br>
On 05/04/2016 05:43 PM, ABHISHEK PALIWAL wrote:<br>
> Hi Atin,<br>
><br>
> please reply, is there any configurable time out parameter for brick<br>
> process to go offline which we can increase?<br>
><br>
> Regards,<br>
> Abhishek<br>
><br>
> On Thu, Apr 21, 2016 at 12:34 PM, ABHISHEK PALIWAL<br>
</span><span class="">> <<a href="mailto:abhishpaliwal@gmail.com">abhishpaliwal@gmail.com</a> <mailto:<a href="mailto:abhishpaliwal@gmail.com">abhishpaliwal@gmail.com</a>>> wrote:<br>
><br>
> Hi Atin,<br>
><br>
> Please answer following doubts as well:<br>
><br>
> 1 .If there is a temporary glitch in the network , will that affect<br>
> the gluster brick process in anyway, Is there any timeout for the<br>
> brick process to go offline in case of the glitch in the network.<br>
</span> If there is disconnection, GlusterD will receive it and mark the<br>
brick as disconnected even if the brick process is online. So answer to<br>
this question is both yes and no. From process perspective they are<br>
still up but not to the other components/layers and that may impact the<br>
operations (both mgmt & I/O given there is a disconnect between client<br>
and brick processes too)<br>
<span class="">><br>
> 2. Is there is any configurable time out parameter which we can<br>
> increase ?<br>
</span>I don't get this question. What time out are you talking about?<br>
<span class="">><br>
> 3.Brick and glusterd connected by unix domain socket.It is just a<br>
> local socket then why it is disconnect in below logs:<br>
</span> This is not true, its over TCP socket.<br>
<span class="im HOEnZb">><br>
> 1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005]<br>
> [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management:<br>
> Brick 10.32. 1.144:/opt/lvmdir/c2/brick has disconnected from<br>
> glusterd.<br>
> 1668 [2016-04-03 10:12:32.984366] D [MSGID: 0]<br>
> [glusterd-utils.c:4872:glusterd_set_brick_status] 0-glusterd: Setting<br>
> brick 10.32.1. 144:/opt/lvmdir/c2/brick status to stopped<br>
><br>
> Regards,<br>
> Abhishek<br>
><br>
><br>
> On Tue, Apr 19, 2016 at 1:12 PM, ABHISHEK PALIWAL<br>
</span><span class="im HOEnZb">> <<a href="mailto:abhishpaliwal@gmail.com">abhishpaliwal@gmail.com</a> <mailto:<a href="mailto:abhishpaliwal@gmail.com">abhishpaliwal@gmail.com</a>>> wrote:<br>
><br>
> Hi Atin,<br>
><br>
> Thanks.<br>
><br>
> Have more doubts here.<br>
><br>
> Brick and glusterd connected by unix domain socket.It is just a<br>
> local socket then why it is disconnect in below logs:<br>
><br>
> 1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005]<br>
> [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management:<br>
> Brick 10.32. 1.144:/opt/lvmdir/c2/brick has disconnected from<br>
> glusterd.<br>
> 1668 [2016-04-03 10:12:32.984366] D [MSGID: 0]<br>
> [glusterd-utils.c:4872:glusterd_set_brick_status] 0-glusterd:<br>
> Setting<br>
> brick 10.32.1. 144:/opt/lvmdir/c2/brick status to stopped<br>
><br>
><br>
> Regards,<br>
> Abhishek<br>
><br>
><br>
> On Fri, Apr 15, 2016 at 9:14 AM, Atin Mukherjee<br>
</span><span class="im HOEnZb">> <<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a> <mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>>> wrote:<br>
><br>
><br>
><br>
> On 04/14/2016 04:07 PM, ABHISHEK PALIWAL wrote:<br>
> ><br>
> ><br>
> > On Thu, Apr 14, 2016 at 2:33 PM, Atin Mukherjee <<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a> <mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>><br>
</span><div class="HOEnZb"><div class="h5">> > <mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a> <mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>>>> wrote:<br>
> ><br>
> ><br>
> ><br>
> > On 04/05/2016 03:35 PM, ABHISHEK PALIWAL wrote:<br>
> > ><br>
> > ><br>
> > > On Tue, Apr 5, 2016 at 2:22 PM, Atin Mukherjee <<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a> <mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>><br>
> <mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a> <mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>>><br>
> > > <mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a><br>
> <mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>> <mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a><br>
> <mailto:<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>>>>> wrote:<br>
> > ><br>
> > ><br>
> > ><br>
> > > On 04/05/2016 01:04 PM, ABHISHEK PALIWAL wrote:<br>
> > > > Hi Team,<br>
> > > ><br>
> > > > We are using Gluster 3.7.6 and facing one<br>
> problem in which<br>
> > brick is not<br>
> > > > comming online after restart the board.<br>
> > > ><br>
> > > > To understand our setup, please look the<br>
> following steps:<br>
> > > > 1. We have two boards A and B on which Gluster<br>
> volume is<br>
> > running in<br>
> > > > replicated mode having one brick on each board.<br>
> > > > 2. Gluster mount point is present on the Board<br>
> A which is<br>
> > sharable<br>
> > > > between number of processes.<br>
> > > > 3. Till now our volume is in sync and<br>
> everthing is working fine.<br>
> > > > 4. Now we have test case in which we'll stop<br>
> the glusterd,<br>
> > reboot the<br>
> > > > Board B and when this board comes up, starts<br>
> the glusterd<br>
> > again on it.<br>
> > > > 5. We repeated Steps 4 multiple times to check the<br>
> > reliability of system.<br>
> > > > 6. After the Step 4, sometimes system comes in<br>
> working state<br>
> > (i.e. in<br>
> > > > sync) but sometime we faces that brick of<br>
> Board B is present in<br>
> > > > “gluster volume status” command but not be<br>
> online even<br>
> > waiting for<br>
> > > > more than a minute.<br>
> > > As I mentioned in another email thread until and<br>
> unless the<br>
> > log shows<br>
> > > the evidence that there was a reboot nothing can<br>
> be concluded.<br>
> > The last<br>
> > > log what you shared with us few days back didn't<br>
> give any<br>
> > indication<br>
> > > that brick process wasn't running.<br>
> > ><br>
> > > How can we identify that the brick process is<br>
> running in brick logs?<br>
> > ><br>
> > > > 7. When the Step 4 is executing at the same<br>
> time on Board A some<br>
> > > > processes are started accessing the files from<br>
> the Gluster<br>
> > mount point.<br>
> > > ><br>
> > > > As a solution to make this brick online, we<br>
> found some<br>
> > existing issues<br>
> > > > in gluster mailing list giving suggestion to<br>
> use “gluster<br>
> > volume start<br>
> > > > <vol_name> force” to make the brick 'offline'<br>
> to 'online'.<br>
> > > ><br>
> > > > If we use “gluster volume start <vol_name><br>
> force” command.<br>
> > It will kill<br>
> > > > the existing volume process and started the<br>
> new process then<br>
> > what will<br>
> > > > happen if other processes are accessing the<br>
> same volume at<br>
> > the time when<br>
> > > > volume process is killed by this command<br>
> internally. Will it<br>
> > impact any<br>
> > > > failure on these processes?<br>
> > > This is not true, volume start force will start<br>
> the brick<br>
> > processes only<br>
> > > if they are not running. Running brick processes<br>
> will not be<br>
> > > interrupted.<br>
> > ><br>
> > > we have tried and check the pid of process before<br>
> force start and<br>
> > after<br>
> > > force start.<br>
> > > the pid has been changed after force start.<br>
> > ><br>
> > > Please find the logs at the time of failure attached<br>
> once again with<br>
> > > log-level=debug.<br>
> > ><br>
> > > if you can give me the exact line where you are able<br>
> to find out that<br>
> > > the brick process<br>
> > > is running in brick log file please give me the line<br>
> number of<br>
> > that file.<br>
> ><br>
> > Here is the sequence at which glusterd and respective<br>
> brick process is<br>
> > restarted.<br>
> ><br>
> > 1. glusterd restart trigger - line number 1014 in<br>
> glusterd.log file:<br>
> ><br>
> > [2016-04-03 10:12:29.051735] I [MSGID: 100030]<br>
> [glusterfsd.c:2318:main]<br>
> > 0-/usr/sbin/glusterd: Started running /usr/sbin/<br>
> glusterd<br>
> > version 3.7.6 (args: /usr/sbin/glusterd -p<br>
> /var/run/glusterd.pid<br>
> > --log-level DEBUG)<br>
> ><br>
> > 2. brick start trigger - line number 190 in<br>
> opt-lvmdir-c2-brick.log<br>
> ><br>
> > [2016-04-03 10:14:25.268833] I [MSGID: 100030]<br>
> [glusterfsd.c:2318:main]<br>
> > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/<br>
> glusterfsd<br>
> > version 3.7.6 (args: /usr/sbin/glusterfsd -s<br>
> 10.32.1.144 --volfile-id<br>
> > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p /<br>
> ><br>
> system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid<br>
> > -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005.<br>
> socket<br>
> > --brick-name /opt/lvmdir/c2/brick -l<br>
> > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log<br>
> --xlator-option<br>
> > *-posix.glusterd-<br>
> uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256<br>
> > --brick-port 49329 --xlator-option<br>
> c_glusterfs-server.listen-port=49329)<br>
> ><br>
> > 3. The following log indicates that brick is up and is<br>
> now started.<br>
> > Refer to line 16123 in glusterd.log<br>
> ><br>
> > [2016-04-03 10:14:25.336855] D [MSGID: 0]<br>
> > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify]<br>
> 0-management:<br>
> > Connected to 10.32.1.144:/opt/lvmdir/c2/brick<br>
> ><br>
> > This clearly indicates that the brick is up and<br>
> running as after that I<br>
> > do not see any disconnect event been processed by<br>
> glusterd for the brick<br>
> > process.<br>
> ><br>
> ><br>
> > Thanks for replying descriptively but please also clear<br>
> some more doubts:<br>
> ><br>
> > 1. At this 10:14:25 moment of time brick is available<br>
> because we have<br>
> > removed brick and added it again to make it online:<br>
> > following are the logs from cmd-history.log file of 000300<br>
> ><br>
> > [2016-04-03 10:14:21.446570] : volume status : SUCCESS<br>
> > [2016-04-03 10:14:21.665889] : volume remove-brick<br>
> c_glusterfs replica<br>
> > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS<br>
> > [2016-04-03 10:14:21.764270] : peer detach 10.32.1.144 :<br>
> SUCCESS<br>
> > [2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 :<br>
> SUCCESS<br>
> > [2016-04-03 10:14:25.649525] : volume add-brick<br>
> c_glusterfs replica 2<br>
> > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS<br>
> ><br>
> > and also 10:12:29 was the last reboot time before this<br>
> failure. So I am<br>
> > totally agree what you said earlier.<br>
> ><br>
> > 2 .As you said at 10:12:29 glusterd restarted then why we<br>
> are not<br>
> > getting 'brick start trigger' related logs<br>
> > like below between 10:12:29 to 10:14:25 time stamp which<br>
> is something<br>
> > two minute of time interval.<br>
> So here is the culprit:<br>
><br>
> 1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005]<br>
> [glusterd-handler.c:4908:__glusterd_brick_rpc_notify]<br>
> 0-management:<br>
> Brick 10.32. 1.144:/opt/lvmdir/c2/brick has<br>
> disconnected from<br>
> glusterd.<br>
> 1668 [2016-04-03 10:12:32.984366] D [MSGID: 0]<br>
> [glusterd-utils.c:4872:glusterd_set_brick_status]<br>
> 0-glusterd: Setting<br>
> brick 10.32.1. 144:/opt/lvmdir/c2/brick status to stopped<br>
><br>
><br>
> GlusterD received a disconnect event for this brick process<br>
> and mark it<br>
> as stopped. This could happen due to two reasons. 1. brick<br>
> process goes<br>
> down or 2. Network issue. In this case its the later I<br>
> believe since the<br>
> brick process was running at that time. I'd request you to<br>
> check this<br>
> from the N/W side.<br>
><br>
><br>
> ><br>
> > [2016-04-03 10:14:25.268833] I [MSGID: 100030]<br>
> [glusterfsd.c:2318:main]<br>
> > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/<br>
> glusterfsd<br>
> > version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144<br>
> --volfile-id<br>
> > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p /<br>
> ><br>
> system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid<br>
> > -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005.<br>
> socket<br>
> > --brick-name /opt/lvmdir/c2/brick -l<br>
> > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log<br>
> --xlator-option<br>
> > *-posix.glusterd-<br>
> uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256<br>
> > --brick-port 49329 --xlator-option<br>
> c_glusterfs-server.listen-port=49329)<br>
> ><br>
> > 3. We are continuously checking brick status in the above<br>
> time duration<br>
> > using "gluster volume status" refer the cmd-history.log<br>
> file from 000300<br>
> ><br>
> > In glusterd.log file we are also getting below logs<br>
> ><br>
> > [2016-04-03 10:12:31.771051] D [MSGID: 0]<br>
> > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify]<br>
> 0-management:<br>
> > Connected to 10.32.1.144:/opt/lvmdir/c2/brick<br>
> ><br>
> > [2016-04-03 10:12:32.981152] D [MSGID: 0]<br>
> > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify]<br>
> 0-management:<br>
> > Connected to 10.32.1.144:/opt/lvmdir/c2/brick<br>
> ><br>
> > two times b/w 10:12:29 and 10:14:25 and as you said these<br>
> logs "<br>
> > clearly indicates that the brick is up and running as<br>
> after" then why<br>
> > brick is not online in "gluster volume status" command<br>
> ><br>
> > [2016-04-03 10:12:33.990487] : volume status : SUCCESS<br>
> > [2016-04-03 10:12:34.007469] : volume status : SUCCESS<br>
> > [2016-04-03 10:12:35.095918] : volume status : SUCCESS<br>
> > [2016-04-03 10:12:35.126369] : volume status : SUCCESS<br>
> > [2016-04-03 10:12:36.224018] : volume status : SUCCESS<br>
> > [2016-04-03 10:12:36.251032] : volume status : SUCCESS<br>
> > [2016-04-03 10:12:37.352377] : volume status : SUCCESS<br>
> > [2016-04-03 10:12:37.374028] : volume status : SUCCESS<br>
> > [2016-04-03 10:12:38.446148] : volume status : SUCCESS<br>
> > [2016-04-03 10:12:38.468860] : volume status : SUCCESS<br>
> > [2016-04-03 10:12:39.534017] : volume status : SUCCESS<br>
> > [2016-04-03 10:12:39.553711] : volume status : SUCCESS<br>
> > [2016-04-03 10:12:40.616610] : volume status : SUCCESS<br>
> > [2016-04-03 10:12:40.636354] : volume status : SUCCESS<br>
> > ......<br>
> > ......<br>
> > ......<br>
> > [2016-04-03 10:14:21.446570] : volume status : SUCCESS<br>
> > [2016-04-03 10:14:21.665889] : volume remove-brick<br>
> c_glusterfs replica<br>
> > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS<br>
> > [2016-04-03 10:14:21.764270] : peer detach 10.32.1.144 :<br>
> SUCCESS<br>
> > [2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 :<br>
> SUCCESS<br>
> > [2016-04-03 10:14:25.649525] : volume add-brick<br>
> c_glusterfs replica 2<br>
> > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS<br>
> ><br>
> > In above logs we are continuously checking brick status<br>
> but when we<br>
> > don't find brick status 'online' even after ~2 minutes<br>
> then we removed<br>
> > it and add it again to make it online.<br>
> ><br>
> > [2016-04-03 10:14:21.665889] : volume remove-brick<br>
> c_glusterfs replica<br>
> > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS<br>
> > [2016-04-03 10:14:21.764270] : peer detach 10.32.1.144 :<br>
> SUCCESS<br>
> > [2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 :<br>
> SUCCESS<br>
> > [2016-04-03 10:14:25.649525] : volume add-brick<br>
> c_glusterfs replica 2<br>
> > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS<br>
> ><br>
> > that is why in logs we are gettting "brick start trigger<br>
> logs" at time<br>
> > stamp 10:14:25<br>
> ><br>
> > [2016-04-03 10:14:25.268833] I [MSGID: 100030]<br>
> [glusterfsd.c:2318:main]<br>
> > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/<br>
> glusterfsd<br>
> > version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144<br>
> --volfile-id<br>
> > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p /<br>
> ><br>
> system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid<br>
> > -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005.<br>
> socket<br>
> > --brick-name /opt/lvmdir/c2/brick -l<br>
> > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log<br>
> --xlator-option<br>
> > *-posix.glusterd-<br>
> uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256<br>
> > --brick-port 49329 --xlator-option<br>
> c_glusterfs-server.listen-port=49329)<br>
> ><br>
> ><br>
> > Regards,<br>
> > Abhishek<br>
> ><br>
> ><br>
> > Please note that all the logs referred and pasted are<br>
> from 002500.<br>
> ><br>
> > ~Atin<br>
> > ><br>
> > > 002500 - Board B that brick is offline<br>
> > > 00300 - Board A logs<br>
> > ><br>
> > > ><br>
> > > > *Question : What could be contributing to<br>
> brick offline?*<br>
> > > ><br>
> > > ><br>
> > > > --<br>
> > > ><br>
> > > > Regards<br>
> > > > Abhishek Paliwal<br>
> > > ><br>
> > > ><br>
> > > > _______________________________________________<br>
> > > > Gluster-devel mailing list<br>
> > > > <a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>
> <mailto:<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a>><br>
> <mailto:<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>
> <mailto:<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a>>><br>
> > <mailto:<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>
> <mailto:<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a>><br>
> <mailto:<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>
> <mailto:<a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a>>>><br>
> > > ><br>
> <a href="http://www.gluster.org/mailman/listinfo/gluster-devel" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br>
> > > ><br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> > --<br>
> ><br>
> ><br>
> ><br>
> ><br>
><br>
><br>
><br>
><br>
><br>
><br>
> --<br>
><br>
><br>
><br>
><br>
> Regards<br>
> Abhishek Paliwal<br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr"><br><br><br><br>Regards<br>
Abhishek Paliwal<br>
</div></div>
</div>