<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Been running through my eternal testing regime ... and experimenting

    with removing/adding bricks - to me, a necessary part of volume

    maintenance for dealing with failed disks. The datastore is a VM

    host and all the following is done live. Sharding is active with a

    512MB shard size.<br>

    <br>

    So I started off with a replica 3 volume<br>

    <br>

    <blockquote><tt>// recreated from memory</tt><br>

      <tt>Volume Name: datastore1</tt><br>

      <tt>Type: Replicate</tt><br>

      <tt>Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b</tt><br>

      <tt>Status: Started</tt><br>

      <tt>Number of Bricks: 1 x 3 = 3</tt><br>

      <tt>Transport-type: tcp</tt><br>

      <tt>Bricks:</tt><br>

      <tt>Brick1: vnb.proxmox.softlog:/vmdata/datastore1</tt><br>

      <tt>Brick2: vng.proxmox.softlog:/vmdata/datastore1</tt><br>

      <tt>Brick3: vna.proxmox.softlog:/vmdata/datastore1</tt><br>

    </blockquote>

    <br>

    <br>

    I remove a brick with:<br>

    <br>

    <tt>gluster volume remove-brick datastore1 replica 2 

      vng.proxmox.softlog:/vmdata/datastore1 force</tt><br>

    <br>

    so we end up with:<br>

    <br>

    <blockquote><tt>Volume Name: datastore1</tt><br>

      <tt>Type: Replicate</tt><br>

      <tt>Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b</tt><br>

      <tt>Status: Started</tt><br>

      <tt>Number of Bricks: 1 x 2 = 2</tt><br>

      <tt>Transport-type: tcp</tt><br>

      <tt>Bricks:</tt><br>

      <tt>Brick1: vna.proxmox.softlog:/vmdata/datastore1</tt><br>

      <tt>Brick2: vnb.proxmox.softlog:/vmdata/datastore1</tt><br>

    </blockquote>

    <br>

    <br>

    All well and good. No heal issues, VM's running ok.<br>

    <br>

    Then I clean the brick off the vng host:<br>

    <br>

    <tt>rm -rf /vmdata/datastore1<br>

      <br>

      <br>

    </tt>I then add the brick back with:<br>

    <br>

    <blockquote><tt>gluster volume add-brick datastore1 replica 3 

        vng.proxmox.softlog:/vmdata/datastore1 <br>

        <br>

        Volume Name: datastore1<br>

        Type: Replicate<br>

        Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b<br>

        Status: Started<br>

        Number of Bricks: 1 x 3 = 3<br>

        Transport-type: tcp<br>

        Bricks:<br>

        Brick1: vna.proxmox.softlog:/vmdata/datastore1<br>

        Brick2: vnb.proxmox.softlog:/vmdata/datastore1<br>

        Brick3: vng.proxmox.softlog:/vmdata/datastore1<br>

      </tt></blockquote>

    <tt><br>

      <br>

    </tt>This recreates the brick directory "datastore1". Unfortunately

    this is where things start to go wrong :( Heal info:<br>

    <br>

    <blockquote><tt>gluster volume heal datastore1 info</tt><br>

      <tt>Brick vna.proxmox.softlog:/vmdata/datastore1</tt><br>

      <tt>/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.57 </tt><br>

      <tt>/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5 </tt><br>

      <tt>Number of entries: 2</tt><br>

      <br>

      <tt>Brick vnb.proxmox.softlog:/vmdata/datastore1</tt><br>

      <tt>/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5 </tt><br>

      <tt>/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.57 </tt><br>

      <tt>Number of entries: 2</tt><br>

      <br>

      <tt>Brick vng.proxmox.softlog:/vmdata/datastore1</tt><br>

      <tt>/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.1 </tt><br>

      <tt>/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.6 </tt><br>

      <tt>/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.15 </tt><br>

      <tt>/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.18 </tt><br>

      <tt>/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5 </tt><br>

    </blockquote>

    <br>

    Its my understanding that there shouldn't be any heal entries on vng

    as it that is where all the shards should be sent *to*<br>

    <br>

    also running qemu-img check on the hosted VM images results in a I/O

    error. Eventually the VM's themselves crash - I suspect this is due

    to individual shards being unreadable.<br>

    <br>

    Another odd behaviour I get is if I run a full heal on vnb I get the

    following error:<br>

    <br>

    <blockquote><tt>Launching heal operation to perform full self heal

        on volume datastore1 has been unsuccessful</tt><br>

    </blockquote>

    <br>

    However if I run it on VNA, it succeeds.<br>

    <br>

    <br>

    Lastly - if I remove the brick everythign returns to normal

    immediately. Heal Info shows no issues and qemu-img check returns no

    errors.<br>

    <br>

    <br>

    <br>

    <tt><br>

    </tt>

    <pre class="moz-signature" cols="72">-- 

Lindsay Mathieson</pre>

  </body>

</html>