<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 27/03/2016 12:33 AM, Lindsay

      Mathieson wrote:<br>

    </div>

    <blockquote cite="mid:56F69DA3.5080409@gmail.com" type="cite">On

      26/03/2016 11:58 PM, Pranith Kumar Karampuri wrote:

      <br>

      <blockquote type="cite" style="color: #000000;">

        <blockquote type="cite" style="color: #000000;">Is that the same

          issue I posted earlier re "gluster volume heal info" appearing

          to block I/O?

          <br>

          <br>

        </blockquote>

        I don't think it is heal info that is blocking I/O. I think it

        is client triggering heal and block the fop until heal completes

        that results in this pattern. This data-heal disabling should

        get you out of this problem. </blockquote>

      <br>

      <br>

      I tried it earlier and it didn't seem to help.

      <br>

      <br>

      Does anything need to be restarted after cluster.data-self-heal is

      set off?

      <br>

    </blockquote>

    <br>

    <br>

    Tried again this morning. 100% replicate the behaviour I noted in<br>

    <br>

    <blockquote type="cite">After testing the heal process by killing

      glusterfsd on a node I noticed the following.

      <br>

      <br>

      - I/O continued at normal speed while glusterfsd was down.

      <br>

      <br>

      - After restarting glusterfsd, I/O still continued as normal

      <br>

      <br>

      - performing a "gluster volume heal datastore2 info" whould show

      some info then hang.

      <br>

      <br>

      - I/O on the cluster would cease. e.g in a VM where I was running

      a command line build of a large project, the build just stopped.

      The VM itself was mostly responsive but anything that involved

      accessing the disk hung.

      <br>

      <br>

      - if I killed the "gluster volume heal datastore2 info" command

      then I/O in the VM's resumed at a normal pace.

      <br>

      <br>

      - if I then reissued the "gluster volume heal datastore2 info"

      command I/O would continue for a short while (seconds - minutes)

      before hanging again.

      <br>

      <br>

      - killing the heal info command would resume I/O again.

      <br>

    </blockquote>

    <br>

    <br>

    iowait and cpu are under 4% on all three nodes.<br>

    <br>

    Even after I shutdown all vm's on datastore2 "gluster volume heal

    datastore2 info" hung indefinitely with no output. <br>

    <br>

    I had to stop/start the datastore2 before the info would work, it

    rteurned very quickly with:<br>

    <tt><br>

    </tt>

    <blockquote><tt>Brick vnb.proxmox.softlog:/tank/vmdata/datastore2</tt><br>

      <tt>Number of entries: 0</tt><br>

      <br>

      <tt>Brick vng.proxmox.softlog:/tank/vmdata/datastore2</tt><br>

      <tt>/.shard - Possibly undergoing heal</tt><br>

      <br>

      <tt>Number of entries: 1</tt><br>

      <br>

      <tt>Brick vna.proxmox.softlog:/tank/vmdata/datastore2</tt><br>

      <tt>/.shard - Possibly undergoing heal</tt><br>

      <br>

      <tt>Number of entries: 1</tt></blockquote>

    <br>

    Unfortunately its stayed that way for 10 minutes now.<br>

    <br>

    <br>

    I'd like to recheck this behaviour under 3.7.7 - can I just revert

    to that (debian packages) without recreating the datastore?<br>

    <br>

    thanks,<br>

    <br>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

Lindsay Mathieson</pre>

  </body>

</html>