<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Adding to my previous mail..<br>

      I find a couple of strange errors in the rebalance log

      (/var/log/glusterfs/sr_vol01-rebalance.log)<br>

      e.g.:<br>

      [2015-01-21 10:00:32.123999] E

      [afr-self-heal-entry.c:1135:afr_sh_entry_impunge_newfile_cbk]

      0-sr_vol01-replicate-11: creation of /some/file/on/the/volume.data

      on sr_vol01-client-23 failed (No space left on device)<br>

      <br>

      Why is the rebalance seemingly not taking account of the space

      left on disks available.<br>

      This is the current situation on this particular node:<br>

      [root@gluster03 ~]# df -h<br>

      Filesystem            Size  Used Avail Use% Mounted on<br>

      /dev/mapper/VolGroup-lv_root<br>

                             50G  2.4G   45G   5% /<br>

      tmpfs                 7.8G     0  7.8G   0% /dev/shm<br>

      /dev/sda1             485M   95M  365M  21% /boot<br>

      /dev/sdb1             1.9T  577G  1.3T  31% /export/brick1gfs03<br>

      /dev/sdc1             1.9T  154G  1.7T   9% /export/brick2gfs03<br>

      /dev/sdd1             1.9T  413G  1.5T  23% /export/brick3gfs03<br>

      /dev/sde1             1.9T  1.5T  417G  78% /export/brick4gfs03<br>

      /dev/sdf1             1.9T  1.6T  286G  85% /export/brick5gfs03<br>

      /dev/sdg1             1.9T  1.4T  443G  77% /export/brick6gfs03<br>

      /dev/sdh1             1.9T   33M  1.9T   1% /export/brick7gfs03<br>

      /dev/sdi1             466G   62G  405G  14% /export/brick8gfs03<br>

      /dev/sdj1             466G  166G  301G  36% /export/brick9gfs03<br>

      /dev/sdk1             466G  466G   20K 100% /export/brick10gfs03<br>

      /dev/sdl1             466G  450G   16G  97% /export/brick11gfs03<br>

      /dev/sdm1             1.9T  206G  1.7T  12% /export/brick12gfs03<br>

      /dev/sdn1             1.9T  306G  1.6T  17% /export/brick13gfs03<br>

      /dev/sdo1             1.9T  107G  1.8T   6% /export/brick14gfs03<br>

      /dev/sdp1             1.9T  252G  1.6T  14% /export/brick15gfs03<br>

      <br>

      why are brick10 and brick11 over utilised when there is plenty of

      space on brick 6, 14, etc. ?<br>

      Anyone any idea?<br>

      <br>

      Cheers,<br>

      Olav<br>

      <br>

      <pre class="moz-signature" cols="72">


</pre>

      On 21/01/15 13:18, Olav Peeters wrote:<br>

    </div>

    <blockquote cite="mid:54BF9928.8060708@gmail.com" type="cite">

      <meta http-equiv="content-type" content="text/html; charset=utf-8">

      Hi,<br>

      two days ago is started a gluster volume remove-brick on a

      Distributed-Replicate volume with 21 x 2 per node (3 in total).<br>

      <br>

      I wanted to remove 4 bricks per node which are smaller than the

      others (on each node I have 7 x 2TB disks and 4 x 500GB disks).<br>

      I am still on gluster 3.5.2. and I was not aware that using disks

      of different sizes is only supported as of 3.6.x (am I correct?)<br>

      <br>

      I started with 2 paired disks like so:<br>

      gluster volume remove-brick VOLNAME node03:/export/brick8node03

      node02:/export/brick10node02 start<br>

      <br>

      I followed the progress (which was very slow):<br>

      gluster volume remove-brick volume_name

      node03:/export/brick8node03 node02:/export/brick10node02 status<br>

      after a day the progress of node03:/export/brick8node03 showed

      "completed", the other brick remained "in progress"<br>

      <br>

      this morning several VM's with vdi's on the volume started showing

      disk errors + a couple of gluserfs mounts returned a disk is full

      type of error on the volume which is only ca. 41% filled with data

      currently.<br>

      <br>

      via df -h I saw that most of the 500GB disk where indeed 100%

      full. Others were meanwhile nearly empty..<br>

      Gluster seems to have gone nuts a bit during rebalancing the data.<br>

      <br>

      I did a:<br>

      gluster volume remove-brick VOLNAME node03:/export/brick8node03

      node02:/export/brick10node02 stop<br>

      and a:<br>

      gluster volume rebalance VOLNAME start<br>

      <br>

      progress is again very slow and some of the disks/bricks which

      were ca. 98% are now 100% full.<br>

      The situation seems to be both getting worse in some cases and

      slowly improving e.g. for another pair of bricks (from 100% to

      97%).<br>

      <br>

      There clearly has been some data corruption. Some VM's don't want

      to boot anymore, throwing disk errors.<br>

      <br>

      How do I proceed?<br>

      Wait a very long time for the rebalance to complete and hope that

      the data corruption is automatically mended?<br>

      <br>

      Upgrade to 3.6.x and hope that the issues (which might be related

      to me using bricks of different sizes) are resolved and again risk

      a remove-brick operation?<br>

      <br>

      Should I rather do a:<br>

      <meta http-equiv="content-type" content="text/html; charset=utf-8">

      gluster volume rebalance VOLNAME migrate-data start<br>

      <br>

      Should I have done a

      <meta http-equiv="content-type" content="text/html; charset=utf-8">

      replace-brick instead of a remove-brick operation originally? I

      thought that replace-brick is becoming obsolete.<br>

      <br>

      Thanks,<br>

      Olav<br>

      <br>

      <br>

      <br>

      <pre class="moz-signature" cols="72">

</pre>

    </blockquote>

    <br>

  </body>

</html>