<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">It look even worse than I had feared..

      :-(<br>

      This really is a crazy bug.<br>

      <br>

      If I understand you correctly, the only sane pairing of the xattrs

      is of the two 0-bit files, since this is the full list of bricks:<br>

      <br>

      root@gluster01 ~]# gluster volume info<br>

      <br>

      Volume Name: sr_vol01<br>

      Type: Distributed-Replicate<br>

      Volume ID: c6d6147e-2d91-4d98-b8d9-ba05ec7e4ad6<br>

      Status: Started<br>

      Number of Bricks: 21 x 2 = 42<br>

      Transport-type: tcp<br>

      Bricks:<br>

      Brick1: gluster01:/export/brick1gfs01<br>

      Brick2: gluster02:/export/brick1gfs02<br>

      Brick3: gluster01:/export/brick4gfs01<br>

      Brick4: gluster03:/export/brick4gfs03<br>

      Brick5: gluster02:/export/brick4gfs02<br>

      Brick6: gluster03:/export/brick1gfs03<br>

      Brick7: gluster01:/export/brick2gfs01<br>

      Brick8: gluster02:/export/brick2gfs02<br>

      Brick9: gluster01:/export/brick5gfs01<br>

      Brick10: gluster03:/export/brick5gfs03<br>

      Brick11: gluster02:/export/brick5gfs02<br>

      Brick12: gluster03:/export/brick2gfs03<br>

      Brick13: gluster01:/export/brick3gfs01<br>

      Brick14: gluster02:/export/brick3gfs02<br>

      Brick15: gluster01:/export/brick6gfs01<br>

      Brick16: gluster03:/export/brick6gfs03<br>

      Brick17: gluster02:/export/brick6gfs02<br>

      Brick18: gluster03:/export/brick3gfs03<br>

      Brick19: gluster01:/export/brick8gfs01<br>

      Brick20: gluster02:/export/brick8gfs02<br>

      Brick21: gluster01:/export/brick9gfs01<br>

      Brick22: gluster02:/export/brick9gfs02<br>

      Brick23: gluster01:/export/brick10gfs01<br>

      Brick24: gluster03:/export/brick10gfs03<br>

      Brick25: gluster01:/export/brick11gfs01<br>

      Brick26: gluster03:/export/brick11gfs03<br>

      Brick27: gluster02:/export/brick10gfs02<br>

      Brick28: gluster03:/export/brick8gfs03<br>

      Brick29: gluster02:/export/brick11gfs02<br>

      Brick30: gluster03:/export/brick9gfs03<br>

      Brick31: gluster01:/export/brick12gfs01<br>

      Brick32: gluster02:/export/brick12gfs02<br>

      Brick33: gluster01:/export/brick13gfs01<br>

      Brick34: gluster02:/export/brick13gfs02<br>

      Brick35: gluster01:/export/brick14gfs01<br>

      Brick36: gluster03:/export/brick14gfs03<br>

      Brick37: gluster01:/export/brick15gfs01<br>

      Brick38: gluster03:/export/brick15gfs03<br>

      Brick39: gluster02:/export/brick14gfs02<br>

      Brick40: gluster03:/export/brick12gfs03<br>

      Brick41: gluster02:/export/brick15gfs02<br>

      Brick42: gluster03:/export/brick13gfs03<br>

      <br>

      <br>

      The two 0-bit files are on brick 35 and 36 as the getfattr

      correctly lists.<br>

      <br>

      Another sane pairing could be this (if the first file did not also

      refer to client-34 and client-35):<br>

      <br>

      [root@gluster01 ~]# getfattr -m . -d -e hex

/export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

      getfattr: Removing leading '/' from absolute path names<br>

      # file:

export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000<br>

      trusted.afr.dirty=0x000000000000000000000000<br>

      trusted.afr.sr_vol01-client-32=0x000000000000000000000000<br>

      trusted.afr.sr_vol01-client-33=0x000000000000000000000000<br>

      trusted.afr.sr_vol01-client-34=0x000000000000000000000000<br>

      trusted.afr.sr_vol01-client-35=0x000000010000000100000000<br>

      trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417<br>

      <br>

      [root@gluster02 ~]# getfattr -m . -d -e hex

/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

      getfattr: Removing leading '/' from absolute path names<br>

      # file:

export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000<br>

      trusted.afr.dirty=0x000000000000000000000000<br>

      trusted.afr.sr_vol01-client-32=0x000000000000000000000000<br>

      trusted.afr.sr_vol01-client-33=0x000000000000000000000000<br>

      trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417<br>

      <br>

      But why is the security.selinux hash different?<br>

      <br>

      <br>

      You mention hostname changes..<br>

      I noticed that if I do a listing of available shared storages on

      one of the XenServer I get:<br>

      uuid ( RO)                : 272b2366-dfbf-ad47-2a0f-5d5cc40863e3<br>

                name-label ( RW): gluster_store<br>

          name-description ( RW): NFS SR

      [gluster01.irceline.be:/sr_vol01]<br>

                      host ( RO): &lt;shared&gt;<br>

                      type ( RO): nfs<br>

              content-type ( RO):<br>

      <br>

      <br>

      if I do normal general linux:<br>

      [root@same_story_on_both_xenserver ~]# mount<br>

      gluster02.irceline.be:/sr_vol01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3

      on /var/run/sr-mount/272b2366-dfbf-ad47-2a0f-5d5cc40863e3 type nfs

      (rw,soft,timeo=133,retrans=2147483647,tcp,noac,addr=192.168.0.72)<br>

      <br>

      Originally the mount was done on gluster01 (ip 192.168.0.71) as

      the name-description of the xe sr-list indicates..<br>

      It is as though when gluster01 was not available for a couple of

      minutes, the NFS mount internally was somehow automatically

      reconfigured to gluster02, but NFS cannot do this as far as I know

      (unless there is some fail-over mechanism - I never configured

      this). There also is no load-balancing between client and server.<br>

      If gluster01 is not available, the gluster volume should not have

      been available, end of story.. But from perspective of a client

      the NFS could be to any one of the three gluster nodes. The client

      should see exactly the same data..<br>

      <br>

      So a rebalance in the current state could do more harm than good?<br>

      I launched a second rebalance in the hope that the system would

      mend itself after all...<br>

      <br>

      Thanks a million for your support in this darkest hour of my time

      as a glusterfs user :-)<br>

      <br>

      Cheers,<br>

      Olav<br>

      <pre class="moz-signature" cols="72">

</pre>

      On 20/02/15 23:10, Joe Julian wrote:<br>

    </div>

    <blockquote cite="mid:54E7B0CF.7010702@julianfamily.org" type="cite">

      <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

      <br>

      <div class="moz-cite-prefix">On 02/20/2015 01:47 PM, Olav Peeters

        wrote:<br>

      </div>

      <blockquote cite="mid:54E7AB5B.3090207@gmail.com" type="cite">

        <meta content="text/html; charset=utf-8"

          http-equiv="Content-Type">

        <div class="moz-cite-prefix">Thanks Joe,<br>

          for the answers!<br>

          <br>

          I was not clear enough about the set up apparently.<br>

          The Gluster cluster consist of 3 nodes with each 14 bricks.

          The bricks are formatted as xfs, mounted locally as xfs. There

          is one volume, type: Distributed-Replicate (replica 2). The

          configuration is so that bricks are mirrored on two different

          nodes.<br>

          <br>

          The NFS mount which was alive but not used during reboot when

          the problem started are from clients (2 XenServer machines

          configured as a pool - a shared storage set-up). The

          comparisons I give below are between (other) clients mounting

          via either glusterfs or NFS. Similar problem with the

          exception that the first listing (via ls) after a fresh mount

          via NFS actually does find the files with data. A second

          listing only finds the 0 bit file with the same name.<br>

          <br>

          So all the 0bit files in mode 0644 can be safely removed?<br>

        </div>

      </blockquote>

      Probably? Is it likely that you have any empty files? I don't

      know.<br>

      <blockquote cite="mid:54E7AB5B.3090207@gmail.com" type="cite">

        <div class="moz-cite-prefix"> <br>

          Why do I see three files with the same name (and modification

          timestamp etc.) via either a glusterfs or NFS mount from a

          client? Deleting one of the three will probably not solve the

          issue either.. this seems to me an indexing issue in the

          gluster cluster.<br>

        </div>

      </blockquote>

      Very good question. I don't know. The xattrs tell a strange story

      that I haven't seen before. One legit file shows

      sr_vol01-client-32 and 33. This would be normal, assuming the

      filename hash would put it on that replica pair (we can't tell

      since the rebalance has changed the hash map). Another file shows

      sr_vol01-client-32, 33, 34, and 35 with pending updates scheduled

      for 35. I have no idea which brick this is (see "gluster volume

      info" and map the digits (35) with the bricks offset by 1

      (client-35 is brick 36). That last one is on 40,41. <br>

      <br>

      I don't know how these files all got on different replica sets. My

      speculations include hostname changes, long-running net-split

      conditions with different dht maps (failed rebalances), moved

      bricks, load balancers between client and server, mercury in

      retrograde (lol)...<br>

      <br>

      <blockquote cite="mid:54E7AB5B.3090207@gmail.com" type="cite">

        <div class="moz-cite-prefix"> How do I get Gluster to replicate

          the files correctly, only 2 versions of the same file, not

          three, and on two bricks on different machines?<br>

          <br>

        </div>

      </blockquote>

      <br>

      Identify which replica is correct by using the little python

      script at <a moz-do-not-send="true" class="moz-txt-link-freetext"

        href="http://joejulian.name/blog/dht-misses-are-expensive/">http://joejulian.name/blog/dht-misses-are-expensive/</a>

      to get the hash of the filename. Examine the dht map to see which

      replica pair *should* have that hash and remove the others (and

      their hardlink in .glusterfs). There is no 1-liner that's going to

      do this. I would probably script the logic in python, have it

      print out what it was going to do, check that for sanity and, if

      sane, execute it.<br>

      <br>

      But mostly figure out how Bricks 32 and/or 33 can become 34 and/or

      35 and/or 40 and/or 41. That's the root of the whole problem.<br>

      <br>

      <blockquote cite="mid:54E7AB5B.3090207@gmail.com" type="cite">

        <div class="moz-cite-prefix"> Cheers,<br>

          Olav<br>

          <br>

          <br>

          <br>

          <br>

          On 20/02/15 21:51, Joe Julian wrote:<br>

        </div>

        <blockquote cite="mid:54E79E41.2010704@julianfamily.org"

          type="cite">

          <meta content="text/html; charset=utf-8"

            http-equiv="Content-Type">

          <br>

          <div class="moz-cite-prefix">On 02/20/2015 12:21 PM, Olav

            Peeters wrote:<br>

          </div>

          <blockquote cite="mid:54E7974B.5080607@gmail.com" type="cite">

            <meta content="text/html; charset=utf-8"

              http-equiv="Content-Type">

            <div class="moz-cite-prefix">Let's take one file

              (3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd) as an

              example...<br>

              On the 3 nodes where all bricks are formatted as XFS and

              mounted in /export and

              272b2366-dfbf-ad47-2a0f-5d5cc40863e3 is the mounting point

              of a NFS shared storage connection from XenServer

              machines:<br>

            </div>

          </blockquote>

          Did I just read this correctly? Your bricks are NFS mounts?

          ie, GlusterFS Client &lt;-&gt; GlusterFS Server &lt;-&gt; NFS

          &lt;-&gt; XFS<br>

          <blockquote cite="mid:54E7974B.5080607@gmail.com" type="cite">

            <div class="moz-cite-prefix"> <br>

              [root@gluster01 ~]# find

              /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name

              '300*' -exec ls -la {} \;<br>

              -rw-r--r--. 2 root root 44332659200 Feb 17 23:55

/export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

            </div>

          </blockquote>

          Supposedly, this is the actual file.<br>

          <blockquote cite="mid:54E7974B.5080607@gmail.com" type="cite">

            <div class="moz-cite-prefix"> -rw-r--r--. 2 root root 0 Feb

              18 00:51

/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

            </div>

          </blockquote>

          This is not a linkfile. Note it's mode 0644. How it got there

          with those permissions would be a matter of history and would

          require information that's probably lost.<br>

          <blockquote cite="mid:54E7974B.5080607@gmail.com" type="cite">

            <div class="moz-cite-prefix"> <br>

              root@gluster02 ~]# find

              /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name

              '300*' -exec ls -la {} \;<br>

              -rw-r--r--. 2 root root 44332659200 Feb 17 23:55

/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              <br>

              [root@gluster03 ~]# find

              /export/*/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/ -name

              '300*' -exec ls -la {} \;<br>

              -rw-r--r--. 2 root root 44332659200 Feb 17 23:55

/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              -rw-r--r--. 2 root root 0 Feb 18 00:51

/export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

            </div>

          </blockquote>

          Same analysis as above.<br>

          <blockquote cite="mid:54E7974B.5080607@gmail.com" type="cite">

            <div class="moz-cite-prefix"> <br>

              3 files with information, 2 x a 0-bit file with the same

              name<br>

              <br>

              Checking the 0-bit files:<br>

              [root@gluster01 ~]# getfattr -m . -d -e hex

/export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              getfattr: Removing leading '/' from absolute path names<br>

              # file:

export/brick14gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000<br>

              trusted.afr.dirty=0x000000000000000000000000<br>

              trusted.afr.sr_vol01-client-34=0x000000000000000000000000<br>

              trusted.afr.sr_vol01-client-35=0x000000000000000000000000<br>

              trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417<br>

              <br>

              [root@gluster03 ~]# getfattr -m . -d -e hex

/export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              getfattr: Removing leading '/' from absolute path names<br>

              # file:

export/brick14gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000<br>

              trusted.afr.dirty=0x000000000000000000000000<br>

              trusted.afr.sr_vol01-client-34=0x000000000000000000000000<br>

              trusted.afr.sr_vol01-client-35=0x000000000000000000000000<br>

              trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417<br>

              <br>

              This is not a glusterfs link file since there is no

              "trusted.glusterfs.dht.linkto", am I correct? <br>

            </div>

          </blockquote>

          You are correct.<br>

          <blockquote cite="mid:54E7974B.5080607@gmail.com" type="cite">

            <div class="moz-cite-prefix"> <br>

              And checking the "good" files:<br>

              <br>

              # file:

export/brick13gfs01/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000<br>

              trusted.afr.dirty=0x000000000000000000000000<br>

              trusted.afr.sr_vol01-client-32=0x000000000000000000000000<br>

              trusted.afr.sr_vol01-client-33=0x000000000000000000000000<br>

              trusted.afr.sr_vol01-client-34=0x000000000000000000000000<br>

              trusted.afr.sr_vol01-client-35=0x000000010000000100000000<br>

              trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417<br>

              <br>

              [root@gluster02 ~]# getfattr -m . -d -e hex

/export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              getfattr: Removing leading '/' from absolute path names<br>

              # file:

export/brick13gfs02/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000<br>

              trusted.afr.dirty=0x000000000000000000000000<br>

              trusted.afr.sr_vol01-client-32=0x000000000000000000000000<br>

              trusted.afr.sr_vol01-client-33=0x000000000000000000000000<br>

              trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417<br>

              <br>

              [root@gluster03 ~]# getfattr -m . -d -e hex

/export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              getfattr: Removing leading '/' from absolute path names<br>

              # file:

export/brick13gfs03/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000<br>

              trusted.afr.dirty=0x000000000000000000000000<br>

              trusted.afr.sr_vol01-client-40=0x000000000000000000000000<br>

              trusted.afr.sr_vol01-client-41=0x000000000000000000000000<br>

              trusted.gfid=0xaefd184508414a8f8408f1ab8aa7a417<br>

              <br>

              <br>

              <br>

              Seen from a client via a glusterfs mount:<br>

              [root@client ~]# ls -al

              /mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*<br>

              -rw-r--r--. 1 root root 0 Feb 18 00:51

/mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              -rw-r--r--. 1 root root 0 Feb 18 00:51

/mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              -rw-r--r--. 1 root root 0 Feb 18 00:51

/mnt/glusterfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              <br>

              <br>

              <br>

              Via NFS (just after performing a umount and mount the

              volume again):<br>

              [root@client ~]# ls -al

              /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*                                    

              <br>

              -rw-r--r--. 1 root root 44332659200 Feb 17 23:55

/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              -rw-r--r--. 1 root root 44332659200 Feb 17 23:55

/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              -rw-r--r--. 1 root root 44332659200 Feb 17 23:55

/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              <br>

              Doing the same list a couple of seconds later:<br>

              [root@client ~]# ls -al

              /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*<br>

              -rw-r--r--. 1 root root 0 Feb 18 00:51

/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              -rw-r--r--. 1 root root 0 Feb 18 00:51

/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              -rw-r--r--. 1 root root 0 Feb 18 00:51

/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              And again, and again, and again:<br>

              [root@client ~]# ls -al

              /mnt/nfs/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/300*<br>

              -rw-r--r--. 1 root root 0 Feb 18 00:51

/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              -rw-r--r--. 1 root root 0 Feb 18 00:51

/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              -rw-r--r--. 1 root root 0 Feb 18 00:51

/mnt/test/272b2366-dfbf-ad47-2a0f-5d5cc40863e3/3009f448-cf6e-413f-baec-c3b9f0cf9d72.vhd<br>

              <br>

              This really seems odd. Why do we get to see "real data

              file" once only?<br>

              <br>

              It seems more and more that this crazy file duplication

              (and writing of sticky bit files) was actually triggered

              when rebooting one of the three nodes while there still is

              an active (even when there is no data exchange at all) NFS

              connection, since all 0-bit files (of the non Sticky bit

              type) were either created at 00:51 or 00:41, the exact

              moment one of the three nodes in the cluster were

              rebooted. This would mean that replication currently with

              GlusterFS creates hardly any redundancy. Quiet the

              opposite, if one of the machines goes down, all of your

              data seriously gets disorganised. I am buzzy configuring a

              test installation to see how this can be best reproduced

              for a bug report..<br>

              <br>

              Does anyone have a suggestion how to best get rid of the

              duplicates, or rather get this mess organised the way it

              should be?<br>

              This is a cluster with millions of files. A rebalance does

              not fix the issue, neither does a rebalance fix-layout

              help. Since this is a replicated volume all files should

              be their 2x, not 3x. Can I safely just remove all the 0

              bit files outside of the .glusterfs directory including

              the sticky bit files?<br>

              <br>

              The empty 0 bit files outside of .glusterfs on every brick

              I can probably safely removed like this:<br>

              find /export/* -path */.glusterfs -prune -o -type f -size

              0 -perm 1000 -exec rm {} \;<br>

              not?<br>

              <br>

              Thanks!<br>

              <br>

              Cheers,<br>

              Olav<br>

              On 18/02/15 22:10, Olav Peeters wrote:<br>

            </div>

            <blockquote cite="mid:54E4FFAB.8040203@gmail.com"

              type="cite">

              <meta content="text/html; charset=utf-8"

                http-equiv="Content-Type">

              <div class="moz-cite-prefix">Thanks Tom and Joe,<br>

                for the fast response!<br>

                <br>

                Before I started my upgrade I stopped all clients using

                the volume and stopped all VM's with VHD on the volume,

                but I guess, and this may be the missing thing to

                reproduce this in a lab, I did not detach a NFS shared

                storage mount from a XenServer pool to this volume,

                since this is an extremely risky business. I also did

                not stop the volume. This I guess was a bit stupid, but

                since I did upgrades in the past this way without any

                issues I skipped this step (a really bad habit). I'll

                make amends and file a proper bug report :-). I agree

                with you Joe, this should never happen, even when

                someone ignores the advice of stopping the volume. If it

                would also be nessessary to detach shared storage NFS

                connections to a volume, than franky, glusterfs is

                unusable in a private cloud. No one can afford downtime

                of the whole infrastructure just for a glusterfs

                upgrade. Ideally a replicated gluster volume should even

                be able to remain online and used during (at least a

                minor version) upgrade.<br>

                <br>

                I don't know whether a heal was maybe buzzy when I

                started the upgrade. I forgot to check. I did check the

                CPU activity on the gluster nodes which were very low

                (in the 0.0X range via top), so I doubt it. I will add

                this to the bug report as a suggestion should they not

                be able to reproduce with an open NFS connection.<br>

                <br>

                By the way, is it sufficient to do:<br>

                service glusterd stop<br>

                service glusterfsd stop<br>

                and do a:<br>

                ps aux | gluster*<br>

                to see if everything has stopped and kill any leftovers

                should this be necessary?<br>

                <br>

                For the fix, do you agree that if I run e.g.:<br>

                find /export/* -type f -size 0 -perm 1000 -exec /bin/rm

                {} \;<br>

                on every node if /export is the location of all my

                bricks, also in a replicated set-up, this will be save?<br>

                No necessary 0bit files will be deleted in e.g. the

                .glusterfs of every brick?<br>

                <br>

                Thanks for your support!<br>

                <br>

                Cheers,<br>

                Olav<br>

                <br>

                <br>

                <br>

                <br>

                <br>

                On 18/02/15 20:51, Joe Julian wrote:<br>

              </div>

              <blockquote cite="mid:54E4ED3D.2070707@julianfamily.org"

                type="cite">

                <meta content="text/html; charset=utf-8"

                  http-equiv="Content-Type">

                <br>

                <div class="moz-cite-prefix">On 02/18/2015 11:43 AM, <a

                    moz-do-not-send="true"

                    class="moz-txt-link-abbreviated"

                    href="mailto:tbenzvi@3vgeomatics.com">tbenzvi@3vgeomatics.com</a>

                  wrote:<br>

                </div>

                <blockquote

cite="mid:20150218124308.b2b02683b6fce9ed61e10e2e9bfae354.a3d1725a9b.mailapi@email04.secureserver.net"

                  type="cite">

                  <div>Hi Olav,</div>

                  <div><br class="Apple-interchange-newline">

                    I have a hunch that our problem was caused by

                    improper unmounting of the gluster volume, and have

                    since found that the proper order should be: kill

                    all jobs using volume -&gt; unmount volume on

                    clients -&gt; gluster volume stop -&gt; stop gluster

                    service (if necessary)</div>

                  <div> </div>

                  <div>In my case, I wrote a Python script to find

                    duplicate files on the mounted volume, then delete

                    the corresponding link files on the bricks (making

                    sure to also delete files in the .glusterfs

                    directory)</div>

                  <div> </div>

                  <div>However, your find command was also suggested to

                    me and I think it's a simpler solution. I believe

                    removing all link files (even ones that are not

                    causing duplicates) is fine since the next file

                    access gluster will do a lookup on all bricks and

                    recreate any link files if necessary. Hopefully a

                    gluster expert can chime in on this point as I'm not

                    completely sure.</div>

                </blockquote>

                <br>

                You are correct.<br>

                <br>

                <blockquote

cite="mid:20150218124308.b2b02683b6fce9ed61e10e2e9bfae354.a3d1725a9b.mailapi@email04.secureserver.net"

                  type="cite">

                  <div> </div>

                  <div>Keep in mind your setup is somewhat different

                    than mine as I have only 5 bricks with no

                    replication.</div>

                  <div> </div>

                  <div>Regards,</div>

                  <div>Tom</div>

                  <div> </div>

                  <blockquote class="threadBlockQuote"

                    style="border-left: 2px solid #C2C2C2; padding-left:

                    3px; margin-left: 4px;">--------- Original Message

                    ---------

                    <div>Subject: Re: [Gluster-users] Hundreds of

                      duplicate files<br>

                      From: "Olav Peeters" <a moz-do-not-send="true"

                        class="moz-txt-link-rfc2396E"

                        href="mailto:opeeters@gmail.com">&lt;opeeters@gmail.com&gt;</a><br>

                      Date: 2/18/15 10:52 am<br>

                      To: <a moz-do-not-send="true"

                        class="moz-txt-link-abbreviated"

                        href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>,

                      <a moz-do-not-send="true"

                        class="moz-txt-link-abbreviated"

                        href="mailto:tbenzvi@3vgeomatics.com">tbenzvi@3vgeomatics.com</a><br>

                      <br>

                      <div class="moz-cite-prefix">Hi all,<br>

                        I'm have this problem after upgrading from 3.5.3

                        to 3.6.2.<br>

                        At the moment I am still waiting for a heal to

                        finish (on a 31TB volume with 42 bricks,

                        replicated over three nodes).<br>

                        <br>

                        Tom,<br>

                        how did you remove the duplicates?<br>

                        with 42 bricks I will not be able to do this

                        manually..<br>

                        Did a:<br>

                        find $brick_root -type f -size 0 -perm 1000

                        -exec /bin/rm {} \;<br>

                        work for you?<br>

                        <br>

                        Should this type of thing ideally not be checked

                        and mended by a heal?<br>

                        <br>

                        Does anyone have an idea yet how this happens in

                        the first place? Can it be connected to

                        upgrading?<br>

                        <br>

                        Cheers,<br>

                        Olav<br>

                        <pre class="moz-signature"> </pre>

                        On 01/01/15 03:07, <a moz-do-not-send="true"

                          class="moz-txt-link-abbreviated"

                          href="mailto:tbenzvi@3vgeomatics.com">tbenzvi@3vgeomatics.com</a>

                        wrote:</div>

                      <blockquote

cite="mid:20141231190720.b2b02683b6fce9ed61e10e2e9bfae354.1100adfcd4.mailapi@email04.secureserver.net">

                        <div>No, the files can be read on a newly

                          mounted client! I went ahead and deleted all

                          of the link files associated with these

                          duplicates, and then remounted the volume. The

                          problem is fixed!</div>

                        <div>Thanks again for the help, Joe and Vijay.</div>

                        <div> </div>

                        <div>Tom</div>

                        <div> </div>

                        <blockquote class="threadBlockQuote"

                          style="border-left: 2px solid #C2C2C2;

                          padding-left: 3px; margin-left: 4px;">---------

                          Original Message ---------

                          <div>Subject: Re: [Gluster-users] Hundreds of

                            duplicate files<br>

                            From: "Vijay Bellur" <a

                              moz-do-not-send="true"

                              class="moz-txt-link-rfc2396E"

                              href="mailto:vbellur@redhat.com">&lt;vbellur@redhat.com&gt;</a><br>

                            Date: 12/28/14 3:23 am<br>

                            To: <a moz-do-not-send="true"

                              class="moz-txt-link-abbreviated"

                              href="mailto:tbenzvi@3vgeomatics.com">tbenzvi@3vgeomatics.com</a>,

                            <a moz-do-not-send="true"

                              class="moz-txt-link-abbreviated"

                              href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a><br>

                            <br>

                            On 12/28/2014 01:20 PM, <a

                              moz-do-not-send="true"

                              class="moz-txt-link-abbreviated"

                              href="mailto:tbenzvi@3vgeomatics.com">tbenzvi@3vgeomatics.com</a>

                            wrote:<br>

                            &gt; Hi Vijay,<br>

                            &gt; Yes the files are still readable from

                            the .glusterfs path.<br>

                            &gt; There is no explicit error. However,

                            trying to read a text file in<br>

                            &gt; python simply gives me null characters:<br>

                            &gt;<br>

                            &gt; &gt;&gt;&gt;

                            open('ott_mf_itab').readlines()<br>

                            &gt;

['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']<br>

                            &gt;<br>

                            &gt; And reading binary files does the same<br>

                            &gt;<br>

                            <br>

                            Is this behavior seen with a freshly mounted

                            client too?<br>

                            <br>

                            -Vijay<br>

                            <br>

                            &gt; --------- Original Message ---------<br>

                            &gt; Subject: Re: [Gluster-users] Hundreds

                            of duplicate files<br>

                            &gt; From: "Vijay Bellur" <a

                              moz-do-not-send="true"

                              class="moz-txt-link-rfc2396E"

                              href="mailto:vbellur@redhat.com">&lt;vbellur@redhat.com&gt;</a><br>

                            &gt; Date: 12/27/14 9:57 pm<br>

                            &gt; To: <a moz-do-not-send="true"

                              class="moz-txt-link-abbreviated"

                              href="mailto:tbenzvi@3vgeomatics.com">tbenzvi@3vgeomatics.com</a>,

                            <a moz-do-not-send="true"

                              class="moz-txt-link-abbreviated"

                              href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a><br>

                            &gt;<br>

                            &gt; On 12/28/2014 10:13 AM, <a

                              moz-do-not-send="true"

                              class="moz-txt-link-abbreviated"

                              href="mailto:tbenzvi@3vgeomatics.com">tbenzvi@3vgeomatics.com</a>

                            wrote:<br>

                            &gt; &gt; Thanks Joe, I've read your blog

                            post as well as your post<br>

                            &gt; regarding the<br>

                            &gt; &gt; .glusterfs directory.<br>

                            &gt; &gt; I found some unneeded duplicate

                            files which were not being read<br>

                            &gt; &gt; properly. I then deleted the link

                            file from the brick. This always<br>

                            &gt; &gt; removes the duplicate file from

                            the listing, but the file does not<br>

                            &gt; &gt; always become readable. If I also

                            delete the associated file in the<br>

                            &gt; &gt; .glusterfs directory on that

                            brick, then some more files become<br>

                            &gt; &gt; readable. However this solution

                            still doesn't work for all files.<br>

                            &gt; &gt; I know the file on the brick is

                            not corrupt as it can be read<br>

                            &gt; directly<br>

                            &gt; &gt; from the brick directory.<br>

                            &gt;<br>

                            &gt; For files that are not readable from

                            the client, can you check if the<br>

                            &gt; file is readable from the .glusterfs/

                            path?<br>

                            &gt;<br>

                            &gt; What is the specific error that is seen

                            while trying to read one such<br>

                            &gt; file from the client?<br>

                            &gt;<br>

                            &gt; Thanks,<br>

                            &gt; Vijay<br>

                            &gt;<br>

                            &gt;<br>

                            &gt;<br>

                            &gt;

                            _______________________________________________<br>

                            &gt; Gluster-users mailing list<br>

                            &gt; <a moz-do-not-send="true"

                              class="moz-txt-link-abbreviated"

                              href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

                            &gt; <a moz-do-not-send="true"

                              class="moz-txt-link-freetext"

                              href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

                            &gt;</div>

                        </blockquote>

                        <br>

                        <fieldset class="mimeAttachmentHeader"></fieldset>

                        <br>

                        <pre>_______________________________________________

Gluster-users mailing list

<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>

                      </blockquote>

                    </div>

                  </blockquote>

                  <br>

                  <fieldset class="mimeAttachmentHeader"></fieldset>

                  <br>

                  <pre wrap="">_______________________________________________

Gluster-users mailing list

<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>

                </blockquote>

                <br>

                <br>

                <fieldset class="mimeAttachmentHeader"></fieldset>

                <br>

                <pre wrap="">_______________________________________________

Gluster-users mailing list

<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>

              </blockquote>

              <br>

            </blockquote>

            <br>

          </blockquote>

          <br>

        </blockquote>

        <br>

      </blockquote>

      <br>

    </blockquote>

    <br>

  </body>

</html>