<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p>It might be because of the fragmentation caused by extent map in

      XFS filesystem. see here in

      <a class="moz-txt-link-freetext" href="https://bugzilla.kernel.org/show_bug.cgi?id=73831">https://bugzilla.kernel.org/show_bug.cgi?id=73831</a></p>

    <p><br>

    </p>

    <p>Regards</p>

    <p>Rafi KC<br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 12/19/2016 01:22 AM, Gustave Dahl

      wrote:<br>

    </div>

    <blockquote

      cite="mid:03ab01d25968$340c0700$9c241500$@dahlfamily.net"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=windows-1252">

      <meta name="Generator" content="Microsoft Word 15 (filtered

        medium)">

      <style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:#0563C1;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:#954F72;

        text-decoration:underline;}

span.EmailStyle17

        {mso-style-type:personal-compose;

        font-family:"Calibri",sans-serif;

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

      <div class="WordSection1">

        <p class="MsoNormal">I have been running gluster as a storage

          backend to OpenNebula for about a year and it has been running

          great. I have had an intermittent problem that has gotten

          worse over the last couple of days and I could use some help.

          <o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Setup<o:p></o:p></p>

        <p class="MsoNormal">=====<o:p></o:p></p>

        <p class="MsoNormal">Gluster: 3.7.11<o:p></o:p></p>

        <p class="MsoNormal">Hyper Converged Setup - Gluster with KVM’s

          on the same machines with Gluster in a Slice on each server.<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Four Servers - Each with 4 Bricks<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Type: Distributed-Replicate<o:p></o:p></p>

        <p class="MsoNormal">Number of Bricks: 4 x 3 = 12<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Bricks are 1TB SSD's<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Gluster Status: 

          <a class="moz-txt-link-freetext" href="http://pastebin.com/Nux7VB4b">http://pastebin.com/Nux7VB4b</a><o:p></o:p></p>

        <p class="MsoNormal">Gluster Info:  <a class="moz-txt-link-freetext" href="http://pastebin.com/G5qR0kZq">http://pastebin.com/G5qR0kZq</a><o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Gluster is supporting qcow2 images that the

          KVM’s are using.  Image Sizes:  10GB up to 300GB images.<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">The volume is mounted on each node with

          glusterfs as a shared file system.  The KVM's using the images

          are using libgfapi ( i.e.

          file=gluster://shchst01:24007/shchst01/d8fcfdb97bc462aca502d5fe703afc66

          )<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Issue<o:p></o:p></p>

        <p class="MsoNormal">======<o:p></o:p></p>

        <p class="MsoNormal">This setup has been running well, with the

          exception of this intermittent problem.  This only happens on

          one node.  It has happened on other bricks (all on the same

          node) but more freqently on Node 2: Brick 4<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">It starts here: 

          <a class="moz-txt-link-freetext" href="http://pastebin.com/YgeJ5VA9">http://pastebin.com/YgeJ5VA9</a><o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Dec 18 02:08:54 shchhv02 kernel: XFS:

          possible memory allocation deadlock in kmem_alloc (mode:0x250)<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">This continues until:<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Dec 18 02:11:10 shchhv02

          storage-shchst01[14728]: [2016-12-18 08:11:10.428138] C

          [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]

          4-shchst01-client-11: server xxx.xx.xx.11:49155 has not

          responded in the last 42 seconds, disconnecting.<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">storage log:  <a class="moz-txt-link-freetext" href="http://pastebin.com/vxCdRnEg">http://pastebin.com/vxCdRnEg</a><o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">[2016-12-18 08:11:10.435927] E [MSGID:

          114031] [client-rpc-fops.c:2886:client3_3_opendir_cbk]

          4-shchst01-client-11: remote operation failed. Path: /

          (00000000-0000-0000-0000-000000000001) [Transport endpoint is

          not connected]<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:11:10.436240] E

          [rpc-clnt.c:362:saved_frames_unwind] (--&gt;

          /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f06efbaeae2]

          (--&gt;

          /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f06ef97990e]

          (--&gt;

          /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f06ef979a1e]

          (--&gt;

          /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f06ef97b40a]

          (--&gt;

          /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f06ef97bc38]

          ))))) 4-shchst01-client-11: forced unwinding frame

          type(GF-DUMP) op(NULL(2)) called at 2016-12-18 08:10:28.424311

          (xid=0x36883d)<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:11:10.436255] W

          [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 4-shchst01-client-11:

          socket disconnected<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:11:10.436369] E

          [rpc-clnt.c:362:saved_frames_unwind] (--&gt;

          /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f06efbaeae2]

          (--&gt;

          /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f06ef97990e]

          (--&gt;

          /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f06ef979a1e]

          (--&gt;

          /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f06ef97b40a]

          (--&gt;

          /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f06ef97bc38]

          ))))) 4-shchst01-client-11: forced unwinding frame

          type(GlusterFS 3.3) op(LOOKUP(27)) called at 2016-12-18

          08:10:38.370507 (xid=0x36883e)<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:11:10.436388] W [MSGID:

          114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk]

          4-shchst01-client-11: remote operation failed. Path: /

          (00000000-0000-0000-0000-000000000001) [Transport endpoint is

          not connected]<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:11:10.436488] I [MSGID:

          114018] [client.c:2030:client_rpc_notify]

          4-shchst01-client-11: disconnected from shchst01-client-11.

          Client process will keep trying to connect to glusterd until

          brick's port is available<o:p></o:p></p>

        <p class="MsoNormal">The message "W [MSGID: 114031]

          [client-rpc-fops.c:1572:client3_3_fstat_cbk]

          4-shchst01-client-11: remote operation failed [Transport

          endpoint is not connected]" repeated 3 times between

          [2016-12-18 08:11:10.432640] and [2016-12-18 08:11:10.433530]<o:p></o:p></p>

        <p class="MsoNormal">The message "W [MSGID: 114031]

          [client-rpc-fops.c:2669:client3_3_readdirp_cbk]

          4-shchst01-client-11: remote operation failed [Transport

          endpoint is not connected]" repeated 15 times between

          [2016-12-18 08:11:10.428844] and [2016-12-18 08:11:10.435727]<o:p></o:p></p>

        <p class="MsoNormal">The message "W [MSGID: 114061]

          [client-rpc-fops.c:4560:client3_3_fstat]

          4-shchst01-client-11:  (00000000-0000-0000-0000-000000000001)

          remote_fd is -1. EBADFD [File descriptor in bad state]"

          repeated 11 times between [2016-12-18 08:11:10.433598] and

          [2016-12-18 08:11:10.435742]<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">brick 4 log:  <a class="moz-txt-link-freetext" href="http://pastebin.com/kQcNyGk2">http://pastebin.com/kQcNyGk2</a><o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">[2016-12-18 08:08:33.000483] I

          [dict.c:473:dict_get]

          (--&gt;/lib64/libglusterfs.so.0(default_getxattr_cbk+0xac)

          [0x7f8504feccbc]

--&gt;/usr/lib64/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)

          [0x7f84f5734917] --&gt;/lib64/libglusterfs.so.0(dict_get+0xac)

          [0x7f8504fdd0fc] ) 0-dict: !this || key=() [Invalid argument]<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:08:33.003178] I

          [dict.c:473:dict_get]

          (--&gt;/lib64/libglusterfs.so.0(default_getxattr_cbk+0xac)

          [0x7f8504feccbc]

--&gt;/usr/lib64/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)

          [0x7f84f5734917] --&gt;/lib64/libglusterfs.so.0(dict_get+0xac)

          [0x7f8504fdd0fc] ) 0-dict: !this || key=() [Invalid argument]<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:08:34.021937] I

          [dict.c:473:dict_get]

          (--&gt;/lib64/libglusterfs.so.0(default_getxattr_cbk+0xac)

          [0x7f8504feccbc]

--&gt;/usr/lib64/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)

          [0x7f84f5734917] --&gt;/lib64/libglusterfs.so.0(dict_get+0xac)

          [0x7f8504fdd0fc] ) 0-dict: !this || key=() [Invalid argument]<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:10:11.671642] E

          [server-helpers.c:390:server_alloc_frame]

          (--&gt;/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x2fb)

          [0x7f8504dad73b]

--&gt;/usr/lib64/glusterfs/3.7.11/xlator/protocol/server.so(server3_3_fxattrop+0x86)

          [0x7f84f48a9a76]

--&gt;/usr/lib64/glusterfs/3.7.11/xlator/protocol/server.so(get_frame_from_request+0x2fb)

          [0x7f84f489eedb] ) 0-server: invalid argument: client [Invalid

          argument]<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:10:11.671689] E

          [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc

          actor failed to complete successfully<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:10:11.671808] I

          [login.c:81:gf_auth] 0-auth/login: allowed user names:

          b7391aaa-d0cb-4db6-9e4c-999310c97eb6<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:10:11.671820] I [MSGID:

          115029] [server-handshake.c:690:server_setvolume]

          0-shchst01-server: accepted client from

          shchhv03-13679-2016/12/17-22:57:24:920194-shchst01-client-11-0-2

          (version: 3.7.11)<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:13:31.526854] W

          [socket.c:589:__socket_rwv] 0-tcp.shchst01-server: writev on

          xxx.xxx.xx.12:65527 failed (Broken pipe)<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:13:31.526909] I

          [socket.c:2356:socket_event_handler] 0-transport:

          disconnecting now<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:13:31.526935] I [MSGID:

          115036] [server.c:552:server_rpc_notify] 0-shchst01-server:

          disconnecting connection from

          shchhv03-10686-2016/12/16-06:08:16:797591-shchst01-client-11-0-6<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:13:31.526976] I [MSGID:

          115013] [server-helpers.c:294:do_fd_cleanup]

          0-shchst01-server: fd cleanup on

          /b40877dae051c076b95c160f2f639e45<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:13:31.527008] W

          [socket.c:589:__socket_rwv] 0-tcp.shchst01-server: writev on

          xxx.xxx.xx.12:65525 failed (Broken pipe)<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:13:31.527009] I

          [socket.c:3378:socket_submit_reply] 0-tcp.shchst01-server: not

          connected (priv-&gt;connected = -1)<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:13:31.527040] E

          [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to

          submit message (XID: 0x309470, Program: GlusterFS 3.3,

          ProgVers: 330, Proc: 16) to rpc-transport

          (tcp.shchst01-server)<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:13:31.527099] I

          [socket.c:2356:socket_event_handler] 0-transport:

          disconnecting now<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:13:31.527114] E

          [server.c:205:server_submit_reply]

(--&gt;/usr/lib64/glusterfs/3.7.11/xlator/debug/io-stats.so(io_stats_fsync_cbk+0xc8)

          [0x7f84f4ada308]

--&gt;/usr/lib64/glusterfs/3.7.11/xlator/protocol/server.so(server_fsync_cbk+0x384)

          [0x7f84f48b0444]

--&gt;/usr/lib64/glusterfs/3.7.11/xlator/protocol/server.so(server_submit_reply+0x2f6)

          [0x7f84f489b086] ) 0-: Reply submission failed<o:p></o:p></p>

        <p class="MsoNormal">[2016-12-18 08:13:31.527121] I [MSGID:

          115036] [server.c:552:server_rpc_notify] 0-shchst01-server:

          disconnecting connection from

          shchhv02-15410-2016/12/17-06:07:39:376627-shchst01-client-11-0-6<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">statedump (brick 4), taken later in the

          day:  <a class="moz-txt-link-freetext" href="http://pastebin.com/DEE3RbT8">http://pastebin.com/DEE3RbT8</a><o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Temp Resolution Path<o:p></o:p></p>

        <p class="MsoNormal">====================<o:p></o:p></p>

        <p class="MsoNormal">There is a rise in load on the node, as

          well as on one particular KVM (on another node).  If we catch

          the load rise and clear pagecache, it seems to clear and

          resolve.  I have not been able to catch it enough to provide

          more details.<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">echo 1 &gt; /proc/sys/vm/drop_caches<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">There is something that I am missing.  I

          would appreciate any help to get me to root cause and

          resolution.<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Thanks,<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Gustave<o:p></o:p></p>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Gluster-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>

<a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a></pre>

    </blockquote>

    <br>

  </body>

</html>