<div dir="ltr">Hello Pranith,<div><br></div><div>Thank you for prompt reaction. I didn&#39;t get back to this until now, because I had other problems to deal with.</div><div><br></div><div>Are there chances that it will get released this or next month? If not, I will probably have to resort to compiling on my own.</div><div><br></div><div>Regards,</div><div>Adrian</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-10-26 12:37 GMT+01:00 Pranith Kumar Karampuri <span dir="ltr">&lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000"><div><div class="h5">
    <br>
    <br>
    <div>On 10/23/2015 10:10 AM, Ravishankar N
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <br>
      <br>
      <div>On 10/21/2015 05:55 PM, Adrian
        Gruntkowski wrote:<br>
      </div>
      <blockquote type="cite">
        <div dir="ltr">Hello,<br>
          <br>
          I&#39;m trying to track down a problem with my setup (version
          3.7.3 on Debian stable).<br>
          <br>
          I have a couple of volumes setup in 3-node configuration with
          1 brick as an arbiter for each. <br>
          <br>
          There are 4 volumes set up in cross-over across 3 physical
          servers, like this:<br>
          <br>
          <br>
          <br>
                       -------------------------------------&gt;[
          GigabitEthernet switch ]&lt;--------------------------<br>
                       |                                              
           ^                                        |<br>
                       |                                              
           |                                        |<br>
                       V                                              
           V                                        V<br>
          /-------------------------- \                  
          /-------------------------- \            
          /-------------------------- \<br>
          | web-rep                   |                   | cluster-rep
                        |             | mail-rep                  |<br>
          |                           |                   |            
                        |             |                           |<br>
          | vols:                     |                   | vols:      
                        |             | vols:                     |<br>
          | system_www1               |                   | system_www1
                        |             | system_www1(arbiter)      |<br>
          | data_www1                 |                   | data_www1  
                        |             | data_www1(arbiter)        |<br>
          | system_mail1(arbiter)     |                   | system_mail1
                       |             | system_mail1              |<br>
          | data_mail1(arbiter)       |                   | data_mail1  
                       |             | data_mail1                |<br>
          \---------------------------/                  
          \---------------------------/            
          \---------------------------/<br>
          <br>
          <br>
          Now, after a fresh boot-up, everything seems to be running
          fine.<br>
          Then I start copying big files (KVM disk images) from local
          disk to gluster mounts.<br>
          In the beginning it seems to be running fine (although iowait
          seems go so high that it clogs up io operations<br>
          at some moments, but that&#39;s an issue for later). After some
          time the transfer freezes, then<br>
          after some (long) time, it advances in a short burst to freeze
          again. Another interesting thing is that<br>
          I see constant flow of the network traffic on interfaces
          dedicated to gluster, even when there&#39;s a &quot;freeze&quot;.<br>
          <br>
          I have done &quot;gluster volume statedump&quot; at that time of
          transfer (file is copied from local disk on cluster-rep<br>
          onto local mount of &quot;system_www1&quot; volume). I&#39;ve observer a
          following section in the dump for cluster-rep node:<br>
          <br>
          [xlator.features.locks.system_www1-locks.inode]<br>
          path=/images/101/vm-101-disk-1.qcow2<br>
          mandatory=0<br>
          inodelk-count=12<br>
          lock-dump.domain.domain=system_www1-replicate-0:self-heal<br>
          inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
          len=0, pid = 18446744073709551610, owner=c811600cd67f0000,
          client=0x7fbe100df280,
          connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,

          granted at 2015-10-21 11:36:22<br>
          lock-dump.domain.domain=system_www1-replicate-0<br>
          inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
          start=2195849216, len=131072, pid = 18446744073709551610,
          owner=c811600cd67f0000, client=0x7fbe100df280,
          connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,

          granted at 2015-10-21 11:37:45<br>
          inodelk.inodelk[1](ACTIVE)=type=WRITE, whence=0,
          start=9223372036854775805, len=1, pid = 18446744073709551610,
          owner=c811600cd67f0000, client=0x7fbe100df280,
          connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,

          granted at 2015-10-21 11:36:22<br>
        </div>
      </blockquote>
      <br>
      From the statedump, It looks like self-heal daemon had taken locks
      to heal the file due to which the locks attempted by the client
      (mount) are in blocked state.<br>
      In Arbiter volumes the client (mount) takes full locks (start=0,
      len=0) for every write() as opposed to normal replica volumes
      which take range locks (i.e. appropriate start,len values) for
      that write(). This is done to avoid network split-brains.<br>
      So in normal replica volumes, clients can still write to a file
      while heal is going on, as long as the offsets don&#39;t overlap. This
      is not the case with arbiter volumes.<br>
      You can look at the client or glustershd logs to see if there are
      messages that indicate healing of a file, something along the
      lines of &quot;Completed data selfheal on xxx&quot;<br>
    </blockquote></div></div>
    hi Adrian,<br>
          Thanks for taking the time to send this mail. I raised this as
    bug @<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1275247" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1275247</a>, fix is
    posted for review @ <a href="http://review.gluster.com/#/c/12426/" target="_blank">http://review.gluster.com/#/c/12426/</a><span class="HOEnZb"><font color="#888888"><br>
    <br>
    Pranith</font></span><div><div class="h5"><br>
    <blockquote type="cite"> <br>
      <blockquote type="cite">
        <div dir="ltr">inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0,
          start=0, len=0, pid = 0, owner=c4fd2d78487f0000,
          client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45<br>
          inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=dc752e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45<br>
          inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=34832e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45<br>
          inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=d44d2e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45<br>
          inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=306f2e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45<br>
          inodelk.inodelk[7](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=8c902e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45<br>
          inodelk.inodelk[8](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=782c2e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45<br>
          inodelk.inodelk[9](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=1c0b2e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45<br>
          inodelk.inodelk[10](BLOCKED)=type=WRITE, whence=0, start=0,
          len=0, pid = 0, owner=24332e78487f0000, client=0x7fbe100e1380,
          connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,

          blocked at 2015-10-21 11:37:45<br>
          <br>
          There seem to be multiple locks in BLOCKED state - which
          doesn&#39;t look normal to me. The other 2 nodes have<br>
          only 2 ACTIVE locks at the same time.<br>
          <br>
          Below is &quot;gluster volume info&quot; output.<br>
          <br>
          # gluster volume info<br>
           <br>
          Volume Name: data_mail1<br>
          Type: Replicate<br>
          Volume ID: fc3259a1-ddcf-46e9-ae77-299aaad93b7c<br>
          Status: Started<br>
          Number of Bricks: 1 x 3 = 3<br>
          Transport-type: tcp<br>
          Bricks:<br>
          Brick1: cluster-rep:/GFS/data/mail1<br>
          Brick2: mail-rep:/GFS/data/mail1<br>
          Brick3: web-rep:/GFS/data/mail1<br>
          Options Reconfigured:<br>
          performance.readdir-ahead: on<br>
          cluster.quorum-count: 2<br>
          cluster.quorum-type: fixed<br>
          cluster.server-quorum-ratio: 51%<br>
           <br>
          Volume Name: data_www1<br>
          Type: Replicate<br>
          Volume ID: 0c37a337-dbe5-4e75-8010-94e068c02026<br>
          Status: Started<br>
          Number of Bricks: 1 x 3 = 3<br>
          Transport-type: tcp<br>
          Bricks:<br>
          Brick1: cluster-rep:/GFS/data/www1<br>
          Brick2: web-rep:/GFS/data/www1<br>
          Brick3: mail-rep:/GFS/data/www1<br>
          Options Reconfigured:<br>
          performance.readdir-ahead: on<br>
          cluster.quorum-type: fixed<br>
          cluster.quorum-count: 2<br>
          cluster.server-quorum-ratio: 51%<br>
           <br>
          Volume Name: system_mail1<br>
          Type: Replicate<br>
          Volume ID: 0568d985-9fa7-40a7-bead-298310622cb5<br>
          Status: Started<br>
          Number of Bricks: 1 x 3 = 3<br>
          Transport-type: tcp<br>
          Bricks:<br>
          Brick1: cluster-rep:/GFS/system/mail1<br>
          Brick2: mail-rep:/GFS/system/mail1<br>
          Brick3: web-rep:/GFS/system/mail1<br>
          Options Reconfigured:<br>
          performance.readdir-ahead: on<br>
          cluster.quorum-type: none<br>
          cluster.quorum-count: 2<br>
          cluster.server-quorum-ratio: 51%<br>
           <br>
          Volume Name: system_www1<br>
          Type: Replicate<br>
          Volume ID: 147636a2-5c15-4d9a-93c8-44d51252b124<br>
          Status: Started<br>
          Number of Bricks: 1 x 3 = 3<br>
          Transport-type: tcp<br>
          Bricks:<br>
          Brick1: cluster-rep:/GFS/system/www1<br>
          Brick2: web-rep:/GFS/system/www1<br>
          Brick3: mail-rep:/GFS/system/www1<br>
          Options Reconfigured:<br>
          performance.readdir-ahead: on<br>
          cluster.quorum-type: none<br>
          cluster.quorum-count: 2<br>
          cluster.server-quorum-ratio: 51%<br>
          <br>
          The issue does not occur when I get rid of 3rd arbiter brick.<br>
        </div>
      </blockquote>
      <br>
      What do you mean by &#39;getting rid of&#39;? Killing the 3rd brick
      process of the volume?<br>
      <br>
      Regards,<br>
      Ravi<br>
      <blockquote type="cite">
        <div dir="ltr"><br>
          If there&#39;s any additional information that is missing and I
          could provide, please let me know.<br>
          <br>
          Greetings,<br>
          Adrian</div>
        <br>
        <fieldset></fieldset>
        <br>
        <pre>_______________________________________________
Gluster-users mailing list
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>
      </blockquote>
      <br>
      <br>
      <fieldset></fieldset>
      <br>
      <pre>_______________________________________________
Gluster-users mailing list
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div>