<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 24/04/2016 11:12 AM, Lindsay

      Mathieson wrote:<br>

    </div>

    <blockquote cite="mid:571C1D6A.50304@gmail.com" type="cite">esterday

      I stopped the volume and ran a md5sum on all the shards to compare

      the 3 replicas. All 15 VM images were identical except for one

      (vm-307). It has 2048 shards of which 8 differed.

      <br>

      <br>

      volume heal info lists <b class="moz-txt-star"><span

          class="moz-txt-tag">*</span>no<span class="moz-txt-tag">*</span></b>

      files needing healed.

      <br>

      <br>

      Two things concern me:

      <br>

      <br>

      1. How did this happen? trust in gluster either keeping replica's

      sync'd or knowing when they are not is crucial.

      <br>

      <br>

      2. How do I force a heal of an individual file? I can find no

      documentation as to this process or even if it is possible.

      <br>

      <br>

      I do have one possible solution - delete the vm image and restore

      from backup. Not ideal.

      <br>

      <br>

      <br>

      Notes:

      <br>

      - I did have a hard disk failure on a brick while testing. ZFS

      recovered it with no errors.

      <br>

      <br>

      - My testing was reasonably severe - server reboots and killing of

      the gluster processes. All things that will happen in a cluster

      life time. I was pleased with how well gluster handled them.

      <br>

    </blockquote>

    <br>

    <br>

    Duplicating from a separate msg how I resolved the immediate issue:<br>

    <br>

    I used diff3 to compare the checksums of the shards and it revealed

    that seven of the shards were the same on two bricks (vna &amp; vng)

    and one of the shards was the same on two other bricks (vna &amp;

    vnb). Fortunately none were different on all 3 bricks :)<br>

    <br>

    Using the checksum as a quorum I deleted all the singleton shards (7

    on vnb, 1 on vng), touched the file owner and issule a "heal full".

    All 8 shards were restored with matching checksums for the other two

    bricks. A rechack of the entire set of shards for the vm showed all

    3 copies as identical and the VM itself is functioning normally.<br>

    <br>

    Its one way to manually heal up shard mismatches which gluster

    hasn't detected, if somewhat tedious. Its a method which lends

    itself to automation though.<br>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

Lindsay Mathieson</pre>

  </body>

</html>