<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Am 16.10.2015 um 18:51 schrieb Vijay

      Bellur:<br>

    </div>

    <blockquote cite="mid:56212B1C.10909@redhat.com" type="cite"><br>

      self-healing in gluster by default syncs only modified parts of

      the files from a source node. Gluster does a rolling checksum of a

      file needing self-heal to identify regions of the file which need

      to be synced over the network. This rolling checksum computation

      can sometimes be expensive and there are plans to have a lighter

      self-healing in 3.8 with more granular changelogs that can do away

      with the need to do a rolling checksum.

      <br>

    </blockquote>

    <br>

    I did some tests (see below) - could you please check this and tell

    me if this is normal?<br>

    <br>

    <br>

    For example, I have a 200GB VM disk image in my volume (the biggest

    file). About 75% of that disk is currently unused space and writes

    are only about 50 kbytes/sec. <br>

    That 200GB disk image <i>always</i> "heals" a very long time (at

    least 30 minutes) - even if I'm pretty sure only a few blocks could

    have been changed.<br>

    <br>

    <br>

    Anyway, I just rebooted a node (about 2-3 minutes downtime) to

    collect some information:<br>

    <ul>

      <li>In total I have about 790GB* files in that Gluster volume <br>

      </li>

      <li>about 411GB* belong to active VM HDD images, the remaining are

        backup/template files</li>

      <li>only VM HDD images are being healed (max 15 files)</li>

      <li>while healing, <tt>glusterfsd </tt>shows varying CPU usages

        between 70% and 650% (it's a 16 cores server); total 106 minutes

        CPU time once healing completed<br>

      </li>

      <li>once healing completes, the machine received a total of 7.0 GB

        and sent 3.6 GB over the internal network (so, yes, you're right

        that not all contents are transferred)</li>

      <li><b>total heal time: whopping 58 minutes</b><br>

      </li>

    </ul>

    <i>* these are summed up file sizes; "du" and "df" commands show

      smaller usage<br>

      <br>

    </i>Node details (all 3 nodes are identical):<i><br>

    </i>

    <ul>

      <li>DELL PowerEdge R730</li>

      <li>Intel Xeon E5-2600 @ 2.4GHz</li>

      <li>64 GB DDR4 RAM</li>

      <li>the server is able to gzip-compress about 1 GB data / second

        (all cores together)<br>

      </li>

      <li>3 TB HW-RAID10 HDD  (2.7TB reserved for Gluster); minimum 500

        MB/s write speed, 350 MB/s read speed</li>

      <li>redundant 1GBit/s internal network</li>

      <li>Debian 7 Wheezy / Proxmox 3.4, Kernel 2.6.32, Gluster 3.5.2<br>

      </li>

    </ul>

    Volume setup:<i><br>

    </i>

    <blockquote><tt> # gluster volume info systems</tt><br>

      <br>

      <tt>Volume Name: systems</tt><br>

      <tt>Type: Replicate</tt><br>

      <tt>Volume ID: b2d72784-4b0e-4f7b-b858-4ec59979a064</tt><br>

      <tt>Status: Started</tt><br>

      <tt>Number of Bricks: 1 x 3 = 3</tt><br>

      <tt>Transport-type: tcp</tt><br>

      <tt>Bricks:</tt><br>

      <tt>Brick1: metal1:/data/gluster/systems</tt><br>

      <tt>Brick2: metal2:/data/gluster/systems</tt><br>

      <tt>Brick3: metal3:/data/gluster/systems</tt><br>

      <tt>Options Reconfigured:</tt><br>

      <tt>cluster.server-quorum-ratio: 51%</tt><br>

    </blockquote>

    <i>Note that `</i><i><tt>gluster volume heal "systems" info</tt></i><i>`

      takes 3-10 seconds to complete during heal - I hope that doesn't

      slow down healing since I tend to run that command frequently.</i><br>

    <br>

    <br>

    Would you expect these results or is something wrong?<br>

    <br>

    Would upgrading to Gluster 3.6 or 3.7 improve healing performance?<br>

    <br>

    Thanks,<br>

    Udo<br>

    <br>

  </body>

</html>