<div dir="ltr">Hello,<br><br>I&#39;m trying to track down a problem with my setup (version 3.7.3 on Debian stable).<br><br>I have a couple of volumes setup in 3-node configuration with 1 brick as an arbiter for each. <br><br>There are 4 volumes set up in cross-over across 3 physical servers, like this:<br><br><br><br>             -------------------------------------&gt;[ GigabitEthernet switch ]&lt;--------------------------<br>             |                                                ^                                        |<br>             |                                                |                                        |<br>             V                                                V                                        V<br>/-------------------------- \                   /-------------------------- \             /-------------------------- \<br>| web-rep                   |                   | cluster-rep               |             | mail-rep                  |<br>|                           |                   |                           |             |                           |<br>| vols:                     |                   | vols:                     |             | vols:                     |<br>| system_www1               |                   | system_www1               |             | system_www1(arbiter)      |<br>| data_www1                 |                   | data_www1                 |             | data_www1(arbiter)        |<br>| system_mail1(arbiter)     |                   | system_mail1              |             | system_mail1              |<br>| data_mail1(arbiter)       |                   | data_mail1                |             | data_mail1                |<br>\---------------------------/                   \---------------------------/             \---------------------------/<br><br><br>Now, after a fresh boot-up, everything seems to be running fine.<br>Then I start copying big files (KVM disk images) from local disk to gluster mounts.<br>In the beginning it seems to be running fine (although iowait seems go so high that it clogs up io operations<br>at some moments, but that&#39;s an issue for later). After some time the transfer freezes, then<br>after some (long) time, it advances in a short burst to freeze again. Another interesting thing is that<br>I see constant flow of the network traffic on interfaces dedicated to gluster, even when there&#39;s a &quot;freeze&quot;.<br><br>I have done &quot;gluster volume statedump&quot; at that time of transfer (file is copied from local disk on cluster-rep<br>onto local mount of &quot;system_www1&quot; volume). I&#39;ve observer a following section in the dump for cluster-rep node:<br><br>[xlator.features.locks.system_www1-locks.inode]<br>path=/images/101/vm-101-disk-1.qcow2<br>mandatory=0<br>inodelk-count=12<br>lock-dump.domain.domain=system_www1-replicate-0:self-heal<br>inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=c811600cd67f0000, client=0x7fbe100df280, connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0, granted at 2015-10-21 11:36:22<br>lock-dump.domain.domain=system_www1-replicate-0<br>inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=2195849216, len=131072, pid = 18446744073709551610, owner=c811600cd67f0000, client=0x7fbe100df280, connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0, granted at 2015-10-21 11:37:45<br>inodelk.inodelk[1](ACTIVE)=type=WRITE, whence=0, start=9223372036854775805, len=1, pid = 18446744073709551610, owner=c811600cd67f0000, client=0x7fbe100df280, connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0, granted at 2015-10-21 11:36:22<br>inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=c4fd2d78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45<br>inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=dc752e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45<br>inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=34832e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45<br>inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=d44d2e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45<br>inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=306f2e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45<br>inodelk.inodelk[7](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=8c902e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45<br>inodelk.inodelk[8](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=782c2e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45<br>inodelk.inodelk[9](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=1c0b2e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45<br>inodelk.inodelk[10](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=24332e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45<br><br>There seem to be multiple locks in BLOCKED state - which doesn&#39;t look normal to me. The other 2 nodes have<br>only 2 ACTIVE locks at the same time.<br><br>Below is &quot;gluster volume info&quot; output.<br><br># gluster volume info<br> <br>Volume Name: data_mail1<br>Type: Replicate<br>Volume ID: fc3259a1-ddcf-46e9-ae77-299aaad93b7c<br>Status: Started<br>Number of Bricks: 1 x 3 = 3<br>Transport-type: tcp<br>Bricks:<br>Brick1: cluster-rep:/GFS/data/mail1<br>Brick2: mail-rep:/GFS/data/mail1<br>Brick3: web-rep:/GFS/data/mail1<br>Options Reconfigured:<br>performance.readdir-ahead: on<br>cluster.quorum-count: 2<br>cluster.quorum-type: fixed<br>cluster.server-quorum-ratio: 51%<br> <br>Volume Name: data_www1<br>Type: Replicate<br>Volume ID: 0c37a337-dbe5-4e75-8010-94e068c02026<br>Status: Started<br>Number of Bricks: 1 x 3 = 3<br>Transport-type: tcp<br>Bricks:<br>Brick1: cluster-rep:/GFS/data/www1<br>Brick2: web-rep:/GFS/data/www1<br>Brick3: mail-rep:/GFS/data/www1<br>Options Reconfigured:<br>performance.readdir-ahead: on<br>cluster.quorum-type: fixed<br>cluster.quorum-count: 2<br>cluster.server-quorum-ratio: 51%<br> <br>Volume Name: system_mail1<br>Type: Replicate<br>Volume ID: 0568d985-9fa7-40a7-bead-298310622cb5<br>Status: Started<br>Number of Bricks: 1 x 3 = 3<br>Transport-type: tcp<br>Bricks:<br>Brick1: cluster-rep:/GFS/system/mail1<br>Brick2: mail-rep:/GFS/system/mail1<br>Brick3: web-rep:/GFS/system/mail1<br>Options Reconfigured:<br>performance.readdir-ahead: on<br>cluster.quorum-type: none<br>cluster.quorum-count: 2<br>cluster.server-quorum-ratio: 51%<br> <br>Volume Name: system_www1<br>Type: Replicate<br>Volume ID: 147636a2-5c15-4d9a-93c8-44d51252b124<br>Status: Started<br>Number of Bricks: 1 x 3 = 3<br>Transport-type: tcp<br>Bricks:<br>Brick1: cluster-rep:/GFS/system/www1<br>Brick2: web-rep:/GFS/system/www1<br>Brick3: mail-rep:/GFS/system/www1<br>Options Reconfigured:<br>performance.readdir-ahead: on<br>cluster.quorum-type: none<br>cluster.quorum-count: 2<br>cluster.server-quorum-ratio: 51%<br><br>The issue does not occur when I get rid of 3rd arbiter brick.<br><br>If there&#39;s any additional information that is missing and I could provide, please let me know.<br><br>Greetings,<br>Adrian</div>