<html><body><div style="font-family: times new roman, new york, times, serif; font-size: 12pt; color: #000000"><div><br></div><div>Hi Chen,<br></div><div><br></div><div>I thought I replied to your previous mail.<br></div><div>This issue has been faced by other users also. Serkan is the one if you follow his mail on gluster-user.<br></div><div><br></div><div>I still have to dig further into it. Soon we will try to reproduce it and debug it.<br></div><div>My observation is that we face this issue while IO is going on and one of the server gets disconnect and reconnects.<br></div><div>This incident might happen because of update or network issue.<br></div><div>But in any way we should not come to this situation.<br></div><div><br></div><div>I am adding Pranith and Xavi who can address any unanswered queries and explanation.<br></div><div><br></div><div>-----<br></div><div>Ashish<br></div><div><br></div><hr id="zwchr"><div style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Chen Chen" <chenchen@smartquerier.com><br><b>To: </b>"Joe Julian" <joe@julianfamily.org>, "Ashish Pandey" <aspandey@redhat.com><br><b>Cc: </b>"Gluster Users" <gluster-users@gluster.org><br><b>Sent: </b>Friday, April 22, 2016 8:28:48 AM<br><b>Subject: </b>Re: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd<br><div><br></div>Hi Ashish,<br><div><br></div>Are you still watching this thread? I got no response after I sent the <br>info you requested. Also, could anybody explain what heal-lock is doing?<br><div><br></div>I got another inode lock yesterday. Only one lock occured in the whole <br>12 bricks, yet it stopped the cluster from working again. None of my <br>peer's OS is frozen, and this time "start force" worked.<br><div><br></div>------<br>[xlator.features.locks.mainvol-locks.inode]<br>path=<gfid:2092ae08-81de-4717-a7d5-6ad955e18b58>/NTD/variants_calling/primary_gvcf/A2612/13.g.vcf<br>mandatory=0<br>inodelk-count=2<br>lock-dump.domain.domain=mainvol-disperse-0<br>inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = <br>1, owner=dc3dbfac887f0000, client=0x7f649835adb0, <br>connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, <br>granted at 2016-04-21 11:45:30<br>inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = <br>1, owner=d433bfac887f0000, client=0x7f649835adb0, <br>connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, <br>blocked at 2016-04-21 11:45:33<br>------<br><div><br></div>I've also filed a bug report on bugzilla.<br>https://bugzilla.redhat.com/show_bug.cgi?id=1329466<br><div><br></div>Best regards,<br>Chen<br><div><br></div>On 4/13/2016 10:31 PM, Joe Julian wrote:<br>><br>><br>> On 04/13/2016 03:29 AM, Ashish Pandey wrote:<br>>> Hi Chen,<br>>><br>>> What do you mean by "instantly get inode locked and teared down<br>>> the whole cluster" ? Do you mean that whole disperse volume became<br>>> unresponsive?<br>>><br>>> I don't have much idea about features.lock-heal so can't comment how<br>>> can it help you.<br>><br>> So who should get added to this email that would have an idea? Let's get<br>> that person looped in.<br>><br>>><br>>> Could you please explain second part of your mail? What exactly are<br>>> you trying to do and what is the setup?<br>>> Also volume info, logs statedumps might help.<br>>><br>>> -----<br>>> Ashish<br>>><br>>><br>>> ------------------------------------------------------------------------<br>>> *From: *"Chen Chen" <chenchen@smartquerier.com><br>>> *To: *"Ashish Pandey" <aspandey@redhat.com><br>>> *Cc: *gluster-users@gluster.org<br>>> *Sent: *Wednesday, April 13, 2016 3:26:53 PM<br>>> *Subject: *Re: [Gluster-users] Need some help on Mismatching xdata /<br>>> Failed combine iatt / Too many fd<br>>><br>>> Hi Ashish and other Gluster Users,<br>>><br>>> When I put some heavy IO load onto my cluster (a rsync operation,<br>>> ~600MB/s), one of the node instantly get inode locked and teared down<br>>> the whole cluster. I've already turned on "features.lock-heal" but it<br>>> didn't help.<br>>><br>>> My clients is using a round-robin tactic to mount servers, hoping to<br>>> average the pressure. Could it be caused by a race between NFS servers<br>>> on different nodes? Should I instead create a dedicated NFS Server with<br>>> huge memory, no brick, and multiple Ethernet cables?<br>>><br>>> I really appreciate any help from you guys.<br>>><br>>> Best wishes,<br>>> Chen<br>>><br>>> PS. Don't know why the native fuse client is 5 times inferior than the<br>>> old good NFSv3.<br>>><br>>> On 4/4/2016 6:11 PM, Ashish Pandey wrote:<br>>> > Hi Chen,<br>>> ><br>>> > As I suspected, there are many blocked call for inodelk in<br>>> sm11/mnt-disk1-mainvol.31115.dump.1459760675.<br>>> ><br>>> > =============================================<br>>> > [xlator.features.locks.mainvol-locks.inode]<br>>> > path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar<br>>> > mandatory=0<br>>> > inodelk-count=4<br>>> > lock-dump.domain.domain=mainvol-disperse-0:self-heal<br>>> > lock-dump.domain.domain=mainvol-disperse-0<br>>> > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid<br>>> = 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0,<br>>> connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0,<br>>> blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58<br>>> > inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,<br>>> pid = 1, owner=1414371e1a7f0000, client=0x7ff034204490,<br>>> connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0,<br>>> blocked at 2016-04-01 16:58:51<br>>> > inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0,<br>>> pid = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0,<br>>> connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0, blocked<br>>> at 2016-04-01 17:03:41<br>>> > inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0,<br>>> pid = 1, owner=b41a0482867f0000, client=0x7ff01800e670,<br>>> connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0,<br>>> blocked at 2016-04-01 17:05:09<br>>> > =============================================<br>>> ><br>>> > This could be the cause of hang.<br>>> > Possible Workaround -<br>>> > If there is no IO going on for this volume, we can restart the<br>>> volume using - gluster v start <volume-name> force. This will restart<br>>> the nfs process too which will release the locks and<br>>> > we could come out of this issue.<br>>> ><br>>> > Ashish<br>>><br>>> --<br>>> Chen Chen<br>>> Shanghai SmartQuerier Biotechnology Co., Ltd.<br>>> Add: Add: 3F, 1278 Keyuan Road, Shanghai 201203, P. R. China<br>>> Mob: +86 15221885893<br>>> Email: chenchen@smartquerier.com<br>>> Web: www.smartquerier.com<br>>><br>>><br>>> _______________________________________________<br>>> Gluster-users mailing list<br>>> Gluster-users@gluster.org<br>>> http://www.gluster.org/mailman/listinfo/gluster-users<br>>><br>>><br>>><br>>> _______________________________________________<br>>> Gluster-users mailing list<br>>> Gluster-users@gluster.org<br>>> http://www.gluster.org/mailman/listinfo/gluster-users<br>><br><div><br></div>-- <br>Chen Chen<br>Shanghai SmartQuerier Biotechnology Co., Ltd.<br>Add: Add: 3F, 1278 Keyuan Road, Shanghai 201203, P. R. China<br>Mob: +86 15221885893<br>Email: chenchen@smartquerier.com<br>Web: www.smartquerier.com<br><div><br></div><br>_______________________________________________<br>Gluster-users mailing list<br>Gluster-users@gluster.org<br>http://www.gluster.org/mailman/listinfo/gluster-users</div><div><br></div></div></body></html>