<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Hi,<br>
    <br>
    Looks like GFID conflict in Slave. (Same filename with different
    GFID exists in Slave undeleted may be due to unlink failure or any
    other failure)<br>
    Need to identify the cause for GFID conflict. Please share the
    workload details or share the changelogs from brick
    backend(/data/media/.glusterfs/changelogs)<br>
    <br>
    "ENTRY FAILED" shows file exists error but shows different GFID<br>
    <pre wrap="">[2015-11-20 11:40:14.93090] W [master(/data/media):803:log_failures]
_GMaster: ENTRY FAILED: ({'uid': 33, 'gfid':
'31d66429-c700-4a10-bb32-35e1b36a479f', 'gid': 33, 'mode': 33206, 'entry':
'.gfid/b1dc6c6d-dac7-4da9-9577-4614942a72a0/official-nightmare-before-christmas-vampire-teddy-girls-dress-body-web.jpg',
'op': 'CREATE'}, <b>17, 'df0e67f5-f2ce-45c3-b4f1-224aa3059ec7'</b>)

Also looks like Split brain issues in Slave. Refer this document to resolve Split brain issues in Slave.
</pre>
<a class="moz-txt-link-freetext" href="https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md">https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md</a><br>
    <pre class="moz-signature" cols="72">regards
Aravinda</pre>
    <div class="moz-cite-prefix">On 11/25/2015 03:08 AM, Audrius
      Butkevicius wrote:<br>
    </div>
    <blockquote
cite="mid:CAJ-B-StrqBFHtQA-i1pnmQ9nuC_LT+P2sbhxx_WKb+=fiw-qKg@mail.gmail.com"
      type="cite">
      <pre wrap="">So the version of rsync is 3.1.0, but the bug mentioned only applies to
large files, where as in my case the files are less than a MB.

I've started digging through the logs and found a bunch of these on the
slave:

[2015-11-20 11:40:46.730805] W [fuse-bridge.c:1978:fuse_create_cbk]
0-glusterfs-fuse: 1882288: /.gfid/31d66429-c700-4a10-bb32-35e1b36a479f =&gt;
-1 (Operation not permitted)
[2015-11-20 12:39:59.269844] W [fuse-bridge.c:1978:fuse_create_cbk]
0-glusterfs-fuse: 1918306: /.gfid/6802a0c6-1f62-4213-a70d-7b46d9ff8f3a =&gt;
-1 (Operation not permitted)

So something funky was happening for an hour 4 days ago. Given the volume
is on EBS, maybe there was some glitch there.

I can also find the corresponding failures on the master:

[2015-11-20 11:40:14.93090] W [master(/data/media):803:log_failures]
_GMaster: ENTRY FAILED: ({'uid': 33, 'gfid':
'31d66429-c700-4a10-bb32-35e1b36a479f', 'gid': 33, 'mode': 33206, 'entry':
'.gfid/b1dc6c6d-dac7-4da9-9577-4614942a72a0/official-nightmare-before-christmas-vampire-teddy-girls-dress-body-web.jpg',
'op': 'CREATE'}, 17, 'df0e67f5-f2ce-45c3-b4f1-224aa3059ec7')
[2015-11-20 11:40:14.265054] W [master(/data/media):803:log_failures]
_GMaster: META FAILED: ({'go':
'.gfid/31d66429-c700-4a10-bb32-35e1b36a479f', 'stat': {'atime':
1448019600.232466, 'gid': 33, 'mtime': 1448019600.316466, 'mode': 33279,
'uid': 33}, 'op': 'META'}, 2)

If I grep for SKIPPED GFID I get the following:

[2015-11-20 11:40:40.704817] W [master(/data/media):1014:process] _GMaster:
SKIPPED GFID =
192632af-28c5-4e03-a62d-458fe7f3b5f9,7ea8d7a8-524b-4dd0-b97a-dc7d3481f341,204f6112-0e8d-4f6d-855b-bf10f9c63b62,7e626e8f-edad-4f39-a6c6-547a1da34aa1,1f0d0208-1962-4eb1-91d4-cf7ed297d8e3,95d389c4-3258-4ca0-8fc4-26b8427b1eaf,425cedc6-6343-4326-8540-996d2d56dc9c,5955928b-2b8f-4cc9-a336-3eac4382789b,8932efcd-ba90-46ec-84c8-5e9e51cc84e9,2530275d-5f03-4143-9abf-d07cc79bf80a,73574466-86f3-4ab2-b5da-c31ac28c27c1,776e5e8f-5c6a-46b1-ad54-733e157d2097,008a69f3-217c-4dbc-a469-5a5bc8ecd589,dca8d8d9-03cf-4793-92e4-bfcfddd262f6,c85b7a29-73af-4f44-a07e-a44082d7a93a,6c1f56d6-4ea6-4910-9677-ea33edd35d28,0ea56588-87fa-4355-9403-e311525454fc,c8ce76c9-e21d-46ce-a2b5-14dfd0070f64,db9e6484-0e5e-4f6e-815b-3c2b273deee5,35d10752-43b5-4398-be5f-17cb9de73a6b,396e5faf-74a1-4849-97e3-009dbfb22836,d148e7d5-c2f3-4d06-8cd6-8588e6aac196,404d20c5-1c6c-4aad-98be-2c23930173b3,f1fae11c-db8e-4cd5-8e47-a3870316f89c,d8daa413-e57f-44fb-b907-b1a497f2dcfa,5f6ee8c2-84fb-432e-95cd-e428ab256e83,6bf54dcd-c3b4-4187-a390-eca!
 841e46570,
335c07ca-d339-4d3a-aa88-3b5753d24fbf,8fdbac00-6628-4f22-8fb4-b7a6524cae49,31d66429-c700-4a10-bb32-35e1b36a479f
[2015-11-20 11:41:35.907850] W [master(/data/media):1014:process] _GMaster:
SKIPPED GFID =
03069c7f-8eaa-45b0-92ed-50cb648cd912,788f5ed1-923e-4b86-9696-2a6de07ebb2e,43d12b40-b6e2-43c4-8883-85e89dc81321
[2015-11-20 12:11:55.492068] W [master(/data/media):1014:process] _GMaster:
SKIPPED GFID =
eb02369f-7ca8-480a-b00c-768964410ed8,17045ac9-27dd-4bf9-9f90-d7b146070dd5,265e3d9c-1657-45cb-bbf6-db439eb18ccf,553c420f-b3cc-47f2-8d5f-cfc2ffdd1a92
[2015-11-20 12:12:53.372432] W [master(/data/media):1014:process] _GMaster:
SKIPPED GFID =
66c5878e-8c00-4f7d-a3ad-4adec84a5e22,f4dc086d-9c2b-449c-9e31-bbae9ebcdea7,f99317b2-72e8-49e3-b676-647abad508b1
[2015-11-20 12:37:55.773813] W [master(/data/media):1014:process] _GMaster:
SKIPPED GFID =
4af54f1c-e8e1-4915-9328-a458d5d35d5d,acbe1f12-87e8-4192-b864-d90030269bba,7d27a795-da63-4742-9e91-abd8fa543612,8d4e642d-fd40-44d6-8419-8d3459df7ce3
[2015-11-20 12:39:28.852575] W [master(/data/media):1014:process] _GMaster:
SKIPPED GFID =
d90dc121-02e7-4a79-bc03-1bd8fddd9f48,54bb563f-ab44-4e91-a46b-764a122ce7fa,088141de-7545-40f9-b776-751738a89740,2dab3faf-4a6c-407a-88cd-cddef6f55299,d887806f-23b4-4389-a4dc-f9027702a2df,fc5a9bc8-ea62-4677-baed-16510541373a,33136ad2-c5b4-448c-991d-1e72fefef021,cf3e2675-e41b-4782-9478-91773eb0a4aa,6412d878-e0f1-4700-84df-05f4af35962f,ec3cf6e1-7f27-4650-b978-8a5a7f620389,d3651bb9-cd2d-4c5f-93e6-fe4fb1cdf5db,ecb0415e-1524-40f4-870e-1fd0f8371b1d,a118aaae-bd3e-4b19-a0e0-891aa9edb09a,7642d3f3-f1e5-4aca-bcfe-bdb3c44779a9,2e29f3f8-c460-48eb-9db5-b281b67cc2bf,e61db54b-3979-488a-8789-a5d0615c5a97,4212d840-9c22-4d9e-b61b-5e35271dfe80,dad1c60b-9da6-4e57-b014-daa1aca73ce3,93699a3d-40b8-4bbd-b78f-aabf965df57f,4fad7468-91f2-4deb-aaf7-6401068c9e6d,c9738295-46cc-4fe7-b359-dc94f5815ce9,91853c5c-4877-4c9e-9481-c86368942f78,59deed8e-d3d0-4ab7-854e-53a8dd455de0,20b86c13-7df1-4d13-bac1-7d628a00d6ce,b7b86a2d-7963-41a4-a423-14e25d1e78c4,3c17d7fe-bb7f-489c-a525-5c8b7bb93c3e,e230d207-7c68-4983-a958-f2d!
 cfc1ce694,
fa8bf3c0-abae-446c-83c5-45ef8bcaa4b8,14089102-8106-45d9-a3f1-d1446b568f4e,6802a0c6-1f62-4213-a70d-7b46d9ff8f3a,0a253bbc-ef98-4da0-951f-e17c5a7f5858,ef054b76-986b-4a89-b8e6-b4988221aaa2,48c0a153-708c-44ee-b186-cf255936a02b,fa2646a6-807c-4e9d-8f2b-a9cdf2674e0c,1ed4a563-4f6a-4b5a-9866-89025fe7afd5,0f293cf7-bc32-4f8a-87d5-388a4bffb4af,f4126726-667b-451d-8214-a18bb3f468cd,e23dc8b3-da1c-4d18-aec9-22e0aa174d81,40b9f10d-7304-4c0b-8498-bef23b305d03,15c25d1e-2a62-495e-887f-14d0cb0527b1,67371804-9084-4801-b664-44e88bea8ac3,4750fa3f-d1a4-4472-b10d-3f75d0b451dc
[2015-11-23 09:18:10.43391] W [master(/data/media):1014:process] _GMaster:
SKIPPED GFID =
228843f3-62f0-4687-b5eb-6d1e21257ad0,b0078359-fbf0-4709-8f40-8383a11d7875,60cff4d5-8b5d-4f7f-8bc1-27081a011458,bedb6ac4-208d-47e1-812c-5547c84ab841,da6810d9-4883-45e1-b73e-55a7ff17b5e7,e03b5c03-b25c-49ba-86f0-8a709a9c2658,053673a0-c1cc-4057-83fa-f97740cb5d4f,dbd6ea84-8f24-4a47-ac41-22c3fd788ecf,43caa3e7-ca04-47ab-b950-105606b313a4,62d8b1d0-fc89-4fb1-a41a-957dcb34d325,4e8fe1fa-60cd-47fa-bad6-f617c312f53b,6c3d6cf3-62ae-4ab8-9dc3-7815552401fe,f79be814-7e78-4985-bcdd-688da23d1808,c4186455-0f06-4b5d-89be-3c5ccbdeb6f0,f9c4ccdb-2337-479d-845d-ee4d85b69ece,bcd14726-1bab-4d97-8915-ec8bbe8faf8c,cca82341-a430-4a59-a900-1af66dcf7bb8,b7043a8e-4286-4831-91ec-c146e40bc6be,995ffeb6-a906-4078-88c6-404a2b38aad4,227f9987-5057-4133-848a-2b22aca5dde1,90b35242-32db-4570-8070-cf9dd49322a5,c6863c8f-1914-4a2d-814b-6e5853134faf,e2d19b1a-fc07-441c-b110-ca816b46fc40,9a3d0c0b-7d84-416f-9f3e-21b32a11ba1d,d8163f6b-8c40-418c-9c06-b3743af24e4e,522d7247-a75b-4af9-acb2-52a99eeced89,4b56ea9d-413a-4e24-b44e-433!
 f7603ad6d

There are also the following lines on the master, which might have some
impact:

E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done]
0-media-replicate-0: Failing READ on gfid
abdc7d5e-9187-4916-ae83-a8b615e32a17: split-brain observed. [Input/output
error]

E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done]
0-media-replicate-0: Failing GETXATTR on gfid
abdc7d5e-9187-4916-ae83-a8b615e32a17: split-brain observed. [Input/output
error]

E [mem-pool.c:417:mem_get0]
(--&gt;/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x809a2) [0x7f79e436b9a2]
--&gt;/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg+0x79f)
[0x7f79e430cb1f]
--&gt;/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get0+0x81)
[0x7f79e433e4a1] ) 0-mem-pool: invalid argument [Invalid argument]

E [mem-pool.c:417:mem_get0]
(--&gt;/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(recursive_rmdir+0x192)
[0x7f79e4329b32]
--&gt;/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg+0x79f)
[0x7f79e430cb1f]
--&gt;/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get0+0x81)
[0x7f79e433e4a1] ) 0-mem-pool: invalid argument [Invalid argument]

E [resource(/data/media):222:errlog] Popen: command "ssh
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-dpY5cI/8216bb7da58a00926f369bb7ac8c7e03.sock
<a class="moz-txt-link-abbreviated" href="mailto:root@us-west-gluster.server.com">root@us-west-gluster.server.com</a> /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd
--session-owner 6922055e-49a1-4afd-a3a0-a47960d6ba54 -N --listen --timeout
120 gluster://localhost:media" returned with 143, saying:
E [resource(/data/media):226:logerr] Popen: ssh&gt; [2015-11-18
21:57:19.772896] I [cli.c:721:main] 0-cli: Started running
/usr/sbin/gluster with version 3.7.5
E [resource(/data/media):226:logerr] Popen: ssh&gt; [2015-11-18
21:57:19.772955] I [cli.c:608:cli_rpc_init] 0-cli: Connecting to remote
glusterd at localhost
E [resource(/data/media):226:logerr] Popen: ssh&gt; [2015-11-18
21:57:19.871930] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
E [resource(/data/media):226:logerr] Popen: ssh&gt; [2015-11-18
21:57:19.872018] I [socket.c:2355:socket_event_handler] 0-transport:
disconnecting now
E [resource(/data/media):226:logerr] Popen: ssh&gt; [2015-11-18
21:57:19.872898] I [cli-rpc-ops.c:6348:gf_cli_getwd_cbk] 0-cli: Received
resp to getwd
E [resource(/data/media):226:logerr] Popen: ssh&gt; [2015-11-18
21:57:19.872963] I [input.c:36:cli_batch] 0-: Exiting with: 0

Status detail shows the following:

root@eu-gluster-1:/var/log/glusterfs/geo-replication/media# gluster volume
geo-replication media <a class="moz-txt-link-abbreviated" href="mailto:root@us-west-gluster.websitewebsitewebs.com::media">root@us-west-gluster.websitewebsitewebs.com::media</a>
status detail

MASTER NODE                            MASTER VOL    MASTER BRICK    SLAVE
USER    SLAVE                                            SLAVE NODE
                       STATUS     CRAWL STATUS       LAST_SYNCED
 ENTRY    DATA    META    FAILURES    CHECKPOINT TIME    CHECKPOINT
COMPLETED    CHECKPOINT COMPLETION TIME
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
eu-gluster-1.websitewebsitewebs.com    media         /data/media     root
       us-west-gluster.websitewebsitewebs.com::media
us-west-gluster.websitewebsitewebs.com    Active     Changelog Crawl
 2015-11-24 20:59:25    0        0       0       633         N/A
     N/A                     N/A
eu-gluster-2.websitewebsitewebs.com    media         /data/media     root
       us-west-gluster.websitewebsitewebs.com::media
us-west-gluster.websitewebsitewebs.com    Passive    N/A                N/A
                   N/A      N/A     N/A     N/A         N/A
 N/A                     N/A




What is the right way to retry failed items?
Can I get a list of them somehow so that I could touch them in hopes to fix
this?
I wonder why does it not retry the items automatically?


On Tue, Nov 24, 2015 at 6:11 AM, Venky Shankar <a class="moz-txt-link-rfc2396E" href="mailto:vshankar@redhat.com">&lt;vshankar@redhat.com&gt;</a> wrote:

</pre>
      <blockquote type="cite">
        <pre wrap="">On Tue, Nov 24, 2015 at 1:23 AM, Audrius Butkevicius
<a class="moz-txt-link-rfc2396E" href="mailto:audrius.butkevicius@gmail.com">&lt;audrius.butkevicius@gmail.com&gt;</a> wrote:
</pre>
        <blockquote type="cite">
          <pre wrap="">Hi,

I've got a geo-replicated gluster volume, with a few hundred thousand
images, which get generated on demand.

I started getting replication failures in the status detail view, but
</pre>
        </blockquote>
        <pre wrap="">it's
</pre>
        <blockquote type="cite">
          <pre wrap="">not obvious to me where to find the actual errors or how to actually fix
them.
</pre>
        </blockquote>
        <pre wrap="">
Chris here[1] mentioned about a bug in rsync (thanks!). Could that be
the issue here?

Mind checking rsync version used?

[1]:
<a class="moz-txt-link-freetext" href="http://www.gluster.org/pipermail/gluster-users/2015-November/024423.html">http://www.gluster.org/pipermail/gluster-users/2015-November/024423.html</a>

</pre>
        <blockquote type="cite">
          <pre wrap="">
The docs seem to be secretive about this as well. It seems if I tear the
geo-replication down, and do a force create from scratch, it goes back in
sync again, but as the files get generated, it starts getting failures
</pre>
        </blockquote>
        <pre wrap="">again
</pre>
        <blockquote type="cite">
          <pre wrap="">at some point.

Can someone provide me with information on how to check which files are
causing failures, and what are the actual failures? Or point me to the
relevant part in the docs?

Version 3.7.5-ubuntu1~trusty1

Related SO question:

</pre>
        </blockquote>
        <pre wrap=""><a class="moz-txt-link-freetext" href="http://stackoverflow.com/questions/33839056/gluster-geo-replication-debugging-failures">http://stackoverflow.com/questions/33839056/gluster-geo-replication-debugging-failures</a>
</pre>
        <blockquote type="cite">
          <pre wrap="">
Thanks,

Audrius.


_______________________________________________
Gluster-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>
<a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a>
</pre>
        </blockquote>
        <pre wrap="">
</pre>
      </blockquote>
      <pre wrap="">
</pre>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Gluster-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>
<a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>
    </blockquote>
    <br>
  </body>
</html>