<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <br>
    <div class="moz-cite-prefix">On 11/04/2015 09:10 PM, Adrian
      Gruntkowski wrote:<br>
    </div>
    <blockquote
cite="mid:CAE_wqnO4s_xgWyq_XFvA0pBwYbf-O_-7V-7b9OgRQ4BAa1gW5w@mail.gmail.com"
      type="cite">
      <div dir="ltr">Hello,
        <div><br>
        </div>
        <div>I have applied Pranith's patch myself on current 3.7.5
          release and rebuilt packages. Unfortunately, the issue is
          still there :( It behaves exactly the same.</div>
      </div>
    </blockquote>
    Could you get the statedumps of the bricks again? I will take a
    look? May be the hang I observed is different from what you are
    observing and I only fixed the one I observed.<br>
    <br>
    Pranith<br>
    <blockquote
cite="mid:CAE_wqnO4s_xgWyq_XFvA0pBwYbf-O_-7V-7b9OgRQ4BAa1gW5w@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>Regards,</div>
        <div>Adrian</div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">2015-10-28 12:02 GMT+01:00 Pranith
          Kumar Karampuri <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000"><span class=""> <br>
                <br>
                <div>On 10/28/2015 04:27 PM, Adrian Gruntkowski wrote:<br>
                </div>
                <blockquote type="cite">
                  <div dir="ltr">Hello Pranith,
                    <div><br>
                    </div>
                    <div>Thank you for prompt reaction. I didn't get
                      back to this until now, because I had other
                      problems to deal with.</div>
                    <div><br>
                    </div>
                    <div>Are there chances that it will get released
                      this or next month? If not, I will probably have
                      to resort to compiling on my own.</div>
                  </div>
                </blockquote>
              </span> I am planning to get this in for 3.7.6 which is to
              be released by end of this month. I guess in 4-5 days :-).
              I will update you<span class="HOEnZb"><font
                  color="#888888"><br>
                  <br>
                  Pranith</font></span>
              <div>
                <div class="h5"><br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div><br>
                      </div>
                      <div>Regards,</div>
                      <div>Adrian</div>
                      <div><br>
                      </div>
                    </div>
                    <div class="gmail_extra"><br>
                      <div class="gmail_quote">2015-10-26 12:37
                        GMT+01:00 Pranith Kumar Karampuri <span
                          dir="ltr">&lt;<a moz-do-not-send="true"
                            href="mailto:pkarampu@redhat.com"
                            target="_blank">pkarampu@redhat.com</a>&gt;</span>:<br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          <div bgcolor="#FFFFFF" text="#000000">
                            <div>
                              <div> <br>
                                <br>
                                <div>On 10/23/2015 10:10 AM, Ravishankar
                                  N wrote:<br>
                                </div>
                                <blockquote type="cite"> <br>
                                  <br>
                                  <div>On 10/21/2015 05:55 PM, Adrian
                                    Gruntkowski wrote:<br>
                                  </div>
                                  <blockquote type="cite">
                                    <div dir="ltr">Hello,<br>
                                      <br>
                                      I'm trying to track down a problem
                                      with my setup (version 3.7.3 on
                                      Debian stable).<br>
                                      <br>
                                      I have a couple of volumes setup
                                      in 3-node configuration with 1
                                      brick as an arbiter for each. <br>
                                      <br>
                                      There are 4 volumes set up in
                                      cross-over across 3 physical
                                      servers, like this:<br>
                                      <br>
                                      <br>
                                      <br>
                                                 
                                       -------------------------------------&gt;[
                                      GigabitEthernet switch
                                      ]&lt;--------------------------<br>
                                                   |                    
                                                                 ^      
                                                                       |<br>
                                                   |                    
                                                                 |      
                                                                       |<br>
                                                   V                    
                                                                 V      
                                                                       V<br>
                                      /-------------------------- \    
                                                   
                                      /-------------------------- \    
                                             
                                      /-------------------------- \<br>
                                      | web-rep                   |    
                                                    | cluster-rep      
                                              |             | mail-rep  
                                                     |<br>
                                      |                           |    
                                                    |                  
                                              |             |          
                                                      |<br>
                                      | vols:                     |    
                                                    | vols:            
                                              |             | vols:    
                                                      |<br>
                                      | system_www1               |    
                                                    | system_www1      
                                              |             |
                                      system_www1(arbiter)      |<br>
                                      | data_www1                 |    
                                                    | data_www1        
                                              |             |
                                      data_www1(arbiter)        |<br>
                                      | system_mail1(arbiter)     |    
                                                    | system_mail1      
                                             |             |
                                      system_mail1              |<br>
                                      | data_mail1(arbiter)       |    
                                                    | data_mail1        
                                             |             | data_mail1
                                                     |<br>
                                      \---------------------------/    
                                                   
                                      \---------------------------/    
                                             
                                      \---------------------------/<br>
                                      <br>
                                      <br>
                                      Now, after a fresh boot-up,
                                      everything seems to be running
                                      fine.<br>
                                      Then I start copying big files
                                      (KVM disk images) from local disk
                                      to gluster mounts.<br>
                                      In the beginning it seems to be
                                      running fine (although iowait
                                      seems go so high that it clogs up
                                      io operations<br>
                                      at some moments, but that's an
                                      issue for later). After some time
                                      the transfer freezes, then<br>
                                      after some (long) time, it
                                      advances in a short burst to
                                      freeze again. Another interesting
                                      thing is that<br>
                                      I see constant flow of the network
                                      traffic on interfaces dedicated to
                                      gluster, even when there's a
                                      "freeze".<br>
                                      <br>
                                      I have done "gluster volume
                                      statedump" at that time of
                                      transfer (file is copied from
                                      local disk on cluster-rep<br>
                                      onto local mount of "system_www1"
                                      volume). I've observer a following
                                      section in the dump for
                                      cluster-rep node:<br>
                                      <br>
[xlator.features.locks.system_www1-locks.inode]<br>
path=/images/101/vm-101-disk-1.qcow2<br>
                                      mandatory=0<br>
                                      inodelk-count=12<br>
lock-dump.domain.domain=system_www1-replicate-0:self-heal<br>
                                      inodelk.inodelk[0](ACTIVE)=type=WRITE,
                                      whence=0, start=0, len=0, pid =
                                      18446744073709551610,
                                      owner=c811600cd67f0000,
                                      client=0x7fbe100df280,
                                      connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,



                                      granted at 2015-10-21 11:36:22<br>
lock-dump.domain.domain=system_www1-replicate-0<br>
                                      inodelk.inodelk[0](ACTIVE)=type=WRITE,
                                      whence=0, start=2195849216,
                                      len=131072, pid =
                                      18446744073709551610,
                                      owner=c811600cd67f0000,
                                      client=0x7fbe100df280,
                                      connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,



                                      granted at 2015-10-21 11:37:45<br>
                                      inodelk.inodelk[1](ACTIVE)=type=WRITE,
                                      whence=0,
                                      start=9223372036854775805, len=1,
                                      pid = 18446744073709551610,
                                      owner=c811600cd67f0000,
                                      client=0x7fbe100df280,
                                      connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,



                                      granted at 2015-10-21 11:36:22<br>
                                    </div>
                                  </blockquote>
                                  <br>
                                  From the statedump, It looks like
                                  self-heal daemon had taken locks to
                                  heal the file due to which the locks
                                  attempted by the client (mount) are in
                                  blocked state.<br>
                                  In Arbiter volumes the client (mount)
                                  takes full locks (start=0, len=0) for
                                  every write() as opposed to normal
                                  replica volumes which take range locks
                                  (i.e. appropriate start,len values)
                                  for that write(). This is done to
                                  avoid network split-brains.<br>
                                  So in normal replica volumes, clients
                                  can still write to a file while heal
                                  is going on, as long as the offsets
                                  don't overlap. This is not the case
                                  with arbiter volumes.<br>
                                  You can look at the client or
                                  glustershd logs to see if there are
                                  messages that indicate healing of a
                                  file, something along the lines of
                                  "Completed data selfheal on xxx"<br>
                                </blockquote>
                              </div>
                            </div>
                            hi Adrian,<br>
                                  Thanks for taking the time to send
                            this mail. I raised this as bug @<a
                              moz-do-not-send="true"
                              href="https://bugzilla.redhat.com/show_bug.cgi?id=1275247"
                              target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1275247</a>,
                            fix is posted for review @ <a
                              moz-do-not-send="true"
                              href="http://review.gluster.com/#/c/12426/"
                              target="_blank">http://review.gluster.com/#/c/12426/</a><span><font
                                color="#888888"><br>
                                <br>
                                Pranith</font></span>
                            <div>
                              <div><br>
                                <blockquote type="cite"> <br>
                                  <blockquote type="cite">
                                    <div dir="ltr">inodelk.inodelk[2](BLOCKED)=type=WRITE,

                                      whence=0, start=0, len=0, pid = 0,
                                      owner=c4fd2d78487f0000,
                                      client=0x7fbe100e1380,
                                      connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,



                                      blocked at 2015-10-21 11:37:45<br>
                                      inodelk.inodelk[3](BLOCKED)=type=WRITE,

                                      whence=0, start=0, len=0, pid = 0,
                                      owner=dc752e78487f0000,
                                      client=0x7fbe100e1380,
                                      connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,



                                      blocked at 2015-10-21 11:37:45<br>
                                      inodelk.inodelk[4](BLOCKED)=type=WRITE,

                                      whence=0, start=0, len=0, pid = 0,
                                      owner=34832e78487f0000,
                                      client=0x7fbe100e1380,
                                      connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,



                                      blocked at 2015-10-21 11:37:45<br>
                                      inodelk.inodelk[5](BLOCKED)=type=WRITE,

                                      whence=0, start=0, len=0, pid = 0,
                                      owner=d44d2e78487f0000,
                                      client=0x7fbe100e1380,
                                      connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,



                                      blocked at 2015-10-21 11:37:45<br>
                                      inodelk.inodelk[6](BLOCKED)=type=WRITE,

                                      whence=0, start=0, len=0, pid = 0,
                                      owner=306f2e78487f0000,
                                      client=0x7fbe100e1380,
                                      connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,



                                      blocked at 2015-10-21 11:37:45<br>
                                      inodelk.inodelk[7](BLOCKED)=type=WRITE,

                                      whence=0, start=0, len=0, pid = 0,
                                      owner=8c902e78487f0000,
                                      client=0x7fbe100e1380,
                                      connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,



                                      blocked at 2015-10-21 11:37:45<br>
                                      inodelk.inodelk[8](BLOCKED)=type=WRITE,

                                      whence=0, start=0, len=0, pid = 0,
                                      owner=782c2e78487f0000,
                                      client=0x7fbe100e1380,
                                      connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,



                                      blocked at 2015-10-21 11:37:45<br>
                                      inodelk.inodelk[9](BLOCKED)=type=WRITE,

                                      whence=0, start=0, len=0, pid = 0,
                                      owner=1c0b2e78487f0000,
                                      client=0x7fbe100e1380,
                                      connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,



                                      blocked at 2015-10-21 11:37:45<br>
                                      inodelk.inodelk[10](BLOCKED)=type=WRITE,

                                      whence=0, start=0, len=0, pid = 0,
                                      owner=24332e78487f0000,
                                      client=0x7fbe100e1380,
                                      connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,



                                      blocked at 2015-10-21 11:37:45<br>
                                      <br>
                                      There seem to be multiple locks in
                                      BLOCKED state - which doesn't look
                                      normal to me. The other 2 nodes
                                      have<br>
                                      only 2 ACTIVE locks at the same
                                      time.<br>
                                      <br>
                                      Below is "gluster volume info"
                                      output.<br>
                                      <br>
                                      # gluster volume info<br>
                                       <br>
                                      Volume Name: data_mail1<br>
                                      Type: Replicate<br>
                                      Volume ID:
                                      fc3259a1-ddcf-46e9-ae77-299aaad93b7c<br>
                                      Status: Started<br>
                                      Number of Bricks: 1 x 3 = 3<br>
                                      Transport-type: tcp<br>
                                      Bricks:<br>
                                      Brick1:
                                      cluster-rep:/GFS/data/mail1<br>
                                      Brick2: mail-rep:/GFS/data/mail1<br>
                                      Brick3: web-rep:/GFS/data/mail1<br>
                                      Options Reconfigured:<br>
                                      performance.readdir-ahead: on<br>
                                      cluster.quorum-count: 2<br>
                                      cluster.quorum-type: fixed<br>
                                      cluster.server-quorum-ratio: 51%<br>
                                       <br>
                                      Volume Name: data_www1<br>
                                      Type: Replicate<br>
                                      Volume ID:
                                      0c37a337-dbe5-4e75-8010-94e068c02026<br>
                                      Status: Started<br>
                                      Number of Bricks: 1 x 3 = 3<br>
                                      Transport-type: tcp<br>
                                      Bricks:<br>
                                      Brick1: cluster-rep:/GFS/data/www1<br>
                                      Brick2: web-rep:/GFS/data/www1<br>
                                      Brick3: mail-rep:/GFS/data/www1<br>
                                      Options Reconfigured:<br>
                                      performance.readdir-ahead: on<br>
                                      cluster.quorum-type: fixed<br>
                                      cluster.quorum-count: 2<br>
                                      cluster.server-quorum-ratio: 51%<br>
                                       <br>
                                      Volume Name: system_mail1<br>
                                      Type: Replicate<br>
                                      Volume ID:
                                      0568d985-9fa7-40a7-bead-298310622cb5<br>
                                      Status: Started<br>
                                      Number of Bricks: 1 x 3 = 3<br>
                                      Transport-type: tcp<br>
                                      Bricks:<br>
                                      Brick1:
                                      cluster-rep:/GFS/system/mail1<br>
                                      Brick2: mail-rep:/GFS/system/mail1<br>
                                      Brick3: web-rep:/GFS/system/mail1<br>
                                      Options Reconfigured:<br>
                                      performance.readdir-ahead: on<br>
                                      cluster.quorum-type: none<br>
                                      cluster.quorum-count: 2<br>
                                      cluster.server-quorum-ratio: 51%<br>
                                       <br>
                                      Volume Name: system_www1<br>
                                      Type: Replicate<br>
                                      Volume ID:
                                      147636a2-5c15-4d9a-93c8-44d51252b124<br>
                                      Status: Started<br>
                                      Number of Bricks: 1 x 3 = 3<br>
                                      Transport-type: tcp<br>
                                      Bricks:<br>
                                      Brick1:
                                      cluster-rep:/GFS/system/www1<br>
                                      Brick2: web-rep:/GFS/system/www1<br>
                                      Brick3: mail-rep:/GFS/system/www1<br>
                                      Options Reconfigured:<br>
                                      performance.readdir-ahead: on<br>
                                      cluster.quorum-type: none<br>
                                      cluster.quorum-count: 2<br>
                                      cluster.server-quorum-ratio: 51%<br>
                                      <br>
                                      The issue does not occur when I
                                      get rid of 3rd arbiter brick.<br>
                                    </div>
                                  </blockquote>
                                  <br>
                                  What do you mean by 'getting rid of'?
                                  Killing the 3rd brick process of the
                                  volume?<br>
                                  <br>
                                  Regards,<br>
                                  Ravi<br>
                                  <blockquote type="cite">
                                    <div dir="ltr"><br>
                                      If there's any additional
                                      information that is missing and I
                                      could provide, please let me know.<br>
                                      <br>
                                      Greetings,<br>
                                      Adrian</div>
                                    <br>
                                    <fieldset></fieldset>
                                    <br>
                                    <pre>_______________________________________________
Gluster-users mailing list
<a moz-do-not-send="true" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>
                                  </blockquote>
                                  <br>
                                  <br>
                                  <fieldset></fieldset>
                                  <br>
                                  <pre>_______________________________________________
Gluster-users mailing list
<a moz-do-not-send="true" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" href="http://www.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>
                                </blockquote>
                                <br>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                      </div>
                      <br>
                    </div>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>