<div dir="ltr"><div><div><div><div>Hi,<br><br></div>There is one bug that was uncovered recently wherein the same file could possibly get healed twice before marking that it no longer needs a heal.<br></div>Pranith sent a patch @ <a href="http://review.gluster.org/#/c/13766/">http://review.gluster.org/#/c/13766/</a> to fix this, although IIUC this bug existed in versions &lt; 3.7.9 as well.<br></div>Also because of this bug, files that need heal may appear in heal-info output longer than they ought to.<br></div><div>Did you see this issue in versions &lt; 3.7.9 as well?<br></div><div><br></div>-Krutika<br><div><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 25, 2016 at 1:04 PM, Lindsay Mathieson <span dir="ltr">&lt;<a href="mailto:lindsay.mathieson@gmail.com" target="_blank">lindsay.mathieson@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    Have resumed testing with 3.7.9 - this time I have propery hardware

    behind it, <br>

    <br>

    - 3 nodes<br>

    - each node with 4 WD Reds in ZFS raid 10 <br>

    - SSD for slog and cache.<br>

    <br>

    Using a sharded VM setup (4MB shards) and performance has been

    excellent, better than ceph on the same hardware. I have some

    interesting notes on that I will detail later.<br>

    <br>

    However unlike with 3.7.7, heal performance has been abysmal - deal

    breaking in fact. Maybe its my setup?<br>

    <br>

    Have been testing healing by killing  the glusterfsd and glusterd

    processes on another node and letting a VM run. Everything is fun at

    this point, despite a node being down, reads and writes continue

    normally.<br>

    <br>

    However a heal info shows what appears to be an excessive number of

    shards being marked as needing heals. A simple reboot of a Windows

    VM results in 360 4MB shards - 1.5GB of data. A compile resulted in

    7GB of shards being touched. Could there be some write amplification

    at work?<br>

    <br>

    However once I restart the glusterd process, which starts glisterfsd

    performance becomes atrocious. Disk IO nearly stops and any running

    VM&#39;s hang or slow down and *lot* until the heal is complete. The

    &quot;heal info&quot; command appears to hang as well, not comppleting at all.

    A build process that was taking 4 min&#39;s took over an hour.<br>

    <br>

    Once the heal finishes, I/O returns to normal.<br>

    <br>

    <br>

    Heres a fragment of the glfsheal log<br>

    <br>

    <tt>[2016-03-25 07:12:51.041590] I [MSGID: 114057]

      [client-handshake.c:1437:select_server_supported_programs]

      0-datastore2-client-2: Using Program GlusterFS 3.3, Num (1298437),

      Version (330)</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.041637] I

      [rpc-clnt.c:1847:rpc_clnt_reconfig] 0-datastore2-client-1:

      changing port to 49153 (from 0)</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.041808] I [MSGID: 114046]

      [client-handshake.c:1213:client_setvolume_cbk]

      0-datastore2-client-2: Connected to datastore2-client-2, attached

      to remote volume &#39;/tank/vmdata/datastore2&#39;.</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.041826] I [MSGID: 114047]

      [client-handshake.c:1224:client_setvolume_cbk]

      0-datastore2-client-2: Server and Client lk-version numbers are

      not same, reopening the fds</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.041901] I [MSGID: 108005]

      [afr-common.c:4010:afr_notify] 0-datastore2-replicate-0: Subvolume

      &#39;datastore2-client-2&#39; came back up; going online.</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.041929] I [MSGID: 114057]

      [client-handshake.c:1437:select_server_supported_programs]

      0-datastore2-client-0: <b>Using Program GlusterFS 3.3, Num

        (1298437), Version (330)</b></tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.041955] I [MSGID: 114035]

      [client-handshake.c:193:client_set_lk_version_cbk]

      0-datastore2-client-2: Server lk version = 1</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.042319] I [MSGID: 114046]

      [client-handshake.c:1213:client_setvolume_cbk]

      0-datastore2-client-0: Connected to datastore2-client-0, attached

      to remote volume &#39;/tank/vmdata/datastore2&#39;.</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.042333] I [MSGID: 114047]

      [client-handshake.c:1224:client_setvolume_cbk]

      0-datastore2-client-0: Server and Client lk-version numbers are

      not same, reopening the fds</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.042455] I [MSGID: 114057]

      [client-handshake.c:1437:select_server_supported_programs]

      0-datastore2-client-1: Using Program GlusterFS 3.3, Num (1298437),

      Version (330)</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.042520] I [MSGID: 114035]

      [client-handshake.c:193:client_set_lk_version_cbk]

      0-datastore2-client-0: Server lk version = 1</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.042846] I [MSGID: 114046]

      [client-handshake.c:1213:client_setvolume_cbk]

      0-datastore2-client-1: Connected to datastore2-client-1, attached

      to remote volume &#39;/tank/vmdata/datastore2&#39;.</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.042867] I [MSGID: 114047]

      [client-handshake.c:1224:client_setvolume_cbk]

      0-datastore2-client-1: Server and Client lk-version numbers are

      not same, reopening the fds</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.058131] I [MSGID: 114035]

      [client-handshake.c:193:client_set_lk_version_cbk]

      0-datastore2-client-1: Server lk version = 1</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.059075] I [MSGID: 108031]

      [afr-common.c:1913:afr_local_discovery_cbk]

      0-datastore2-replicate-0: selecting local read_child

      datastore2-client-2</tt><tt><br>

    </tt><tt>[2016-03-25 07:12:51.059619] I [MSGID: 104041]

      [glfs-resolve.c:869:__glfs_active_subvol] 0-datastore2: switched

      to graph 766e612d-3739-3437-352d-323031362d30 (0)</tt><tt><br>

    </tt><br>

    <br>

    I have no idea while client version 3.3 is being used! everything

    should be 3.7.9<br>

    <br>

    <br>

    Environment:<br>

    <br>

    - Proxmox (debian Jessie, 8.2)<br>

    - KVM VM&#39;s using gfapi, running on the same nodes as the gluster

    bricks<br>

    - bricks are hosted on 3 ZFS Pools (one per node)<br>

        * compression =lz4<br>

        * xattr=sa<br>

        * sync=standard<br>

        * acltype=posixacl <br>

    <br>

    <tt>Volume info:</tt><tt><br>

    </tt><tt>Volume Name: datastore2</tt><tt><br>

    </tt><tt>Type: Replicate</tt><tt><br>

    </tt><tt>Volume ID: 7d93a1c6-ac39-4d94-b136-e8379643bddd</tt><tt><br>

    </tt><tt>Status: Started</tt><tt><br>

    </tt><tt>Number of Bricks: 1 x 3 = 3</tt><tt><br>

    </tt><tt>Transport-type: tcp</tt><tt><br>

    </tt><tt>Bricks:</tt><tt><br>

    </tt><tt>Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore2</tt><tt><br>

    </tt><tt>Brick2: vng.proxmox.softlog:/tank/vmdata/datastore2</tt><tt><br>

    </tt><tt>Brick3: vna.proxmox.softlog:/tank/vmdata/datastore2</tt><tt><br>

    </tt><tt>Options Reconfigured:</tt><tt><br>

    </tt><tt>performance.readdir-ahead: on</tt><tt><br>

    </tt><tt>nfs.addr-namelookup: off</tt><tt><br>

    </tt><tt>nfs.enable-ino32: off</tt><tt><br>

    </tt><tt>features.shard: on</tt><tt><br>

    </tt><tt>cluster.quorum-type: auto</tt><tt><br>

    </tt><tt>cluster.server-quorum-type: server</tt><tt><br>

    </tt><tt>nfs.disable: on</tt><tt><br>

    </tt><tt>performance.write-behind: off</tt><tt><br>

    </tt><tt>performance.strict-write-ordering: on</tt><tt><br>

    </tt><tt>performance.stat-prefetch: off</tt><tt><br>

    </tt><tt>performance.quick-read: off</tt><tt><br>

    </tt><tt>performance.read-ahead: off</tt><tt><br>

    </tt><tt>performance.io-cache: off</tt><tt><br>

    </tt><tt>cluster.eager-lock: enable</tt><tt><br>

    </tt><tt>network.remote-dio: enable</tt><tt><br>

    </tt><br>

    <br>

    <br>

    I can do any testing required, bring back logs etc. Can&#39;t build

    gluster though.<br>

    <br>

    <br>

    thanks,<span class="HOEnZb"><font color="#888888"><br>

    <br>

    <br>

    <pre cols="72">-- 

Lindsay Mathieson

</pre>

  </font></span></div>

<br>_______________________________________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

<a href="http://www.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-users</a><br></blockquote></div><br></div>