[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd

Chen Chen chenchen at smartquerier.com
Sun Apr 3 08:43:22 UTC 2016


Hi Ashish Pandey,

After some investigation I updated the server from 3.7.6 to 3.7.9. I 
also switched from native fuse to NFS mount (which boosted the 
performance a lot when I tested) on April 1st.

Then after two days' running, the cluster appeared to be locked. "ls" 
hangs, no network usage, volume profile showed no r/w activity on 
bricks. "dmesg" showed the NFS went dead in 12 hrs (Apr 2 01:13), but 
"showmount" and "volume status" said NFS server is responding and all 
bricks are alive.

I'm not sure what had happened (glustershd.log and nfs.log didn't show 
anything interesting), so I dumped the whole log folder instead. It was 
a bit too large (5MB, filled by Error and Warning) and my mail was 
rejected multiple times by the mailing list. I can only attached the 
snapshot of all logs. You can grab the full version at 
https://dl.dropboxusercontent.com/u/56671522/glusterfs.tar.xz instead.

The volume profile info is also attached. Hope it helps.

Best wishes,
Chen

On 3/27/2016 2:38 AM, Ashish Pandey wrote:
> Hi Chen,
>
> Could you please send us following logs-
> 1 - brick logs - under /var/log/messages/brick/
> 2 - mount logs
>
> Also some information like what kind of IO was happening (read,write, unlink, rename on different mount) to understand this issue in a better way.
>
> ---
> Ashish
>
> ----- Original Message -----
> From: "陈陈" <chenchen at smartquerier.com>
> To: gluster-users at gluster.org
> Sent: Friday, March 25, 2016 8:59:04 AM
> Subject: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd
>
> Hi Everyone,
>
> I have a "2 x (4 + 2) = 12 Distributed-Disperse" volume. After upgraded
> to 3.7.8 I noticed the volume is frequently out of service. The
> glustershd.log is flooded by:
>
> [ec-combine.c:866:ec_combine_check] 0-mainvol-disperse-1: Mismatching
> xdata in answers of 'LOOKUP'"
> [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed
> on some subvolumes (up=3F, mask=3F, remaining=0, good=1E, bad=21)
> [ec-common.c:71:ec_heal_report] 0-mainvol-disperse-1: Heal failed
> [Invalid argument]
> [ec-combine.c:206:ec_iatt_combine] 0-mainvol-disperse-0: Failed to
> combine iatt (inode: xxx, links: 1-1, uid: 1000-1000, gid: 1000-1000,
> rdev: 0-0, size: xxx-xxx, mode: 100600-100600)
>
> in normal working state, and sometimes 1000+ lines of:
>
> [client-rpc-fops.c:466:client3_3_open_cbk] 0-mainvol-client-7: remote
> operation failed. Path: <gfid:xxxx> (xxxx) [Too many open files]
>
> and the brick went offline. "top open" showed "Max open fds: 899195".
>
> Can anyone suggest me what happened, and what should I do? I was trying
> to deal with the terrible IOPS problem but things got even worse.
>
> Each Server has 2 x E5-2630v3 (32threads/server), 32GB RAM. Additional
> infos are in the attachements. Many thanks.
>
> Sincerely yours,
> Chen
>

-- 
Chen Chen
上海慧算生物技术有限公司
Shanghai SmartQuerier Biotechnology Co., Ltd.
Add: Room 410, 781 Cai Lun Road, China (Shanghai) Pilot Free Trade Zone
         Shanghai 201203, P. R. China
Mob: +86 15221885893
Email: chenchen at smartquerier.com
Web: www.smartquerier.com
-------------- next part --------------
Brick: sm16:/mnt/disk2/mainvol
------------------------------
Cumulative Stats:
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                37968              18257117                657317 
No. of Writes:                27442                  4436                   607 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:             19407384               3134980               3417081 
No. of Writes:                  641                  1008                  1217 
 
   Block Size:              32768b+               65536b+ 
 No. of Reads:              1028960               9867913 
No. of Writes:                 6889              20508938 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            529      FORGET
      0.00       0.00 us       0.00 us       0.00 us          81095     RELEASE
      0.00       0.00 us       0.00 us       0.00 us          15748  RELEASEDIR
 
    Duration: 167705 seconds
   Data Read: 869764755456 bytes
Data Written: 1344596574720 bytes
 
Interval 1 Stats:
 
    Duration: 255 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: sm16:/mnt/disk1/mainvol
------------------------------
Cumulative Stats:
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                25731                124811                 62170 
No. of Writes:                25591                  5235                   539 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:              1780063                 41332               1599410 
No. of Writes:                  668                   901                  1155 
 
   Block Size:              32768b+               65536b+ 
 No. of Reads:               597009               7867435 
No. of Writes:                 7347              18906027 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            500      FORGET
      0.00       0.00 us       0.00 us       0.00 us        2585213     RELEASE
      0.00       0.00 us       0.00 us       0.00 us          15757  RELEASEDIR
 
    Duration: 167705 seconds
   Data Read: 572226195968 bytes
Data Written: 1239575955968 bytes
 
Interval 1 Stats:
 
    Duration: 255 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: sm11:/mnt/disk1/mainvol
------------------------------
Cumulative Stats:
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                38428              18330601                659152 
No. of Writes:                27442                  4436                   607 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:             19680537               3186133               3557387 
No. of Writes:                  641                  1008                  1217 
 
   Block Size:              32768b+               65536b+ 
 No. of Reads:               961274              10006889 
No. of Writes:                 6889              20508938 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            529      FORGET
      0.00       0.00 us       0.00 us       0.00 us          81097     RELEASE
      0.00       0.00 us       0.00 us       0.00 us          15742  RELEASEDIR
 
    Duration: 167705 seconds
   Data Read: 880603889664 bytes
Data Written: 1344596574720 bytes
 
Interval 1 Stats:
 
    Duration: 255 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: sm11:/mnt/disk2/mainvol
------------------------------
Cumulative Stats:
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                26415                118603                 62244 
No. of Writes:                25591                  5235                   539 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:              1851055                 41928               1466117 
No. of Writes:                  668                   901                  1155 
 
   Block Size:              32768b+               65536b+ 
 No. of Reads:               641012               7944255 
No. of Writes:                 7347              18906027 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            500      FORGET
      0.00       0.00 us       0.00 us       0.00 us        2585238     RELEASE
      0.00       0.00 us       0.00 us       0.00 us          15755  RELEASEDIR
 
    Duration: 167705 seconds
   Data Read: 576850006016 bytes
Data Written: 1239575955968 bytes
 
Interval 1 Stats:
 
    Duration: 255 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: sm14:/mnt/disk2/mainvol
------------------------------
Cumulative Stats:
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                37320              17789029                655061 
No. of Writes:                27442                  4436                   607 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:             19600027               3110591               3185336 
No. of Writes:                  641                  1008                  1217 
 
   Block Size:              32768b+               65536b+ 
 No. of Reads:              1043031               9626406 
No. of Writes:                 6889              20508938 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            529      FORGET
      0.00       0.00 us       0.00 us       0.00 us          81097     RELEASE
      0.00       0.00 us       0.00 us       0.00 us          15744  RELEASEDIR
 
    Duration: 167705 seconds
   Data Read: 850640217600 bytes
Data Written: 1344596574720 bytes
 
Interval 1 Stats:
 
    Duration: 255 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: sm14:/mnt/disk1/mainvol
------------------------------
Cumulative Stats:
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                25430                118856                 63730 
No. of Writes:                25591                  5235                   539 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:              1896584                 24957               1611272 
No. of Writes:                  668                   901                  1155 
 
   Block Size:              32768b+               65536b+ 
 No. of Reads:               687858               7551537 
No. of Writes:                 7347              18906027 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            500      FORGET
      0.00       0.00 us       0.00 us       0.00 us        2585228     RELEASE
      0.00       0.00 us       0.00 us       0.00 us          15753  RELEASEDIR
 
    Duration: 167704 seconds
   Data Read: 554862222336 bytes
Data Written: 1239575955968 bytes
 
Interval 1 Stats:
 
    Duration: 255 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: sm13:/mnt/disk1/mainvol
------------------------------
Cumulative Stats:
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                37852              18005136                657018 
No. of Writes:                27442                  4436                   607 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:             19471162               3132895               3154404 
No. of Writes:                  641                  1008                  1217 
 
   Block Size:              32768b+               65536b+ 
 No. of Reads:              1018312               9702965 
No. of Writes:                 6889              20508938 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            529      FORGET
      0.00       0.00 us       0.00 us       0.00 us          81097     RELEASE
      0.00       0.00 us       0.00 us       0.00 us          15741  RELEASEDIR
 
    Duration: 167705 seconds
   Data Read: 854245903872 bytes
Data Written: 1344596574720 bytes
 
Interval 1 Stats:
 
    Duration: 255 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: sm13:/mnt/disk2/mainvol
------------------------------
Cumulative Stats:
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                25869                100519                 63939 
No. of Writes:                25591                  5235                   539 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:              1853310                 41271               1394883 
No. of Writes:                  668                   901                  1155 
 
   Block Size:              32768b+               65536b+ 
 No. of Reads:               576972               7517939 
No. of Writes:                 7347              18906027 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            500      FORGET
      0.00       0.00 us       0.00 us       0.00 us        2585248     RELEASE
      0.00       0.00 us       0.00 us       0.00 us          15754  RELEASEDIR
 
    Duration: 167705 seconds
   Data Read: 545438357504 bytes
Data Written: 1239575955968 bytes
 
Interval 1 Stats:
 
    Duration: 255 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: sm15:/mnt/disk1/mainvol
------------------------------
Cumulative Stats:
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                25376                124010                 62769 
No. of Writes:                25591                  5235                   539 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:              1842626                 25247               1747332 
No. of Writes:                  668                   901                  1155 
 
   Block Size:              32768b+               65536b+ 
 No. of Reads:               615409               7695723 
No. of Writes:                 7347              18906027 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            500      FORGET
      0.00       0.00 us       0.00 us       0.00 us        2585252     RELEASE
      0.00       0.00 us       0.00 us       0.00 us          15752  RELEASEDIR
 
    Duration: 167705 seconds
   Data Read: 564089530880 bytes
Data Written: 1239575955968 bytes
 
Interval 1 Stats:
 
    Duration: 255 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: sm15:/mnt/disk2/mainvol
------------------------------
Cumulative Stats:
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                37794              17969276                655026 
No. of Writes:                27442                  4436                   607 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:             19297777               3087656               3290762 
No. of Writes:                  641                  1008                  1217 
 
   Block Size:              32768b+               65536b+ 
 No. of Reads:              1025743               9707300 
No. of Writes:                 6889              20508938 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            529      FORGET
      0.00       0.00 us       0.00 us       0.00 us          81097     RELEASE
      0.00       0.00 us       0.00 us       0.00 us          15742  RELEASEDIR
 
    Duration: 167705 seconds
   Data Read: 855877165568 bytes
Data Written: 1344596574720 bytes
 
Interval 1 Stats:
 
    Duration: 255 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: sm12:/mnt/disk2/mainvol
------------------------------
Cumulative Stats:
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                26499                 99466                 63056 
No. of Writes:                25591                  5235                   539 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:              1870342                 42157               1397655 
No. of Writes:                  668                   901                  1155 
 
   Block Size:              32768b+               65536b+ 
 No. of Reads:               548533               7738956 
No. of Writes:                 7347              18906027 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            500      FORGET
      0.00       0.00 us       0.00 us       0.00 us        2585231     RELEASE
      0.00       0.00 us       0.00 us       0.00 us          15751  RELEASEDIR
 
    Duration: 167706 seconds
   Data Read: 559290661888 bytes
Data Written: 1239575955968 bytes
 
Interval 1 Stats:
 
    Duration: 256 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: sm12:/mnt/disk1/mainvol
------------------------------
Cumulative Stats:
   Block Size:                512b+                1024b+                2048b+ 
 No. of Reads:                38786              18260049                659154 
No. of Writes:                27442                  4436                   607 
 
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:             19442314               3161210               3400222 
No. of Writes:                  641                  1008                  1217 
 
   Block Size:              32768b+               65536b+ 
 No. of Reads:               933426               9923716 
No. of Writes:                 6889              20508938 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            529      FORGET
      0.00       0.00 us       0.00 us       0.00 us          81097     RELEASE
      0.00       0.00 us       0.00 us       0.00 us          21405  RELEASEDIR
      0.75       2.66 us       2.00 us       3.00 us             35     OPENDIR
     14.07      49.86 us      23.00 us      92.00 us             35      LOOKUP
     33.15      58.74 us      20.00 us     116.00 us             70     READDIR
     52.04      92.23 us      20.00 us     921.00 us             70    GETXATTR
 
    Duration: 167706 seconds
   Data Read: 870389523968 bytes
Data Written: 1344596574720 bytes
 
Interval 1 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us             32  RELEASEDIR
      0.76       2.69 us       2.00 us       3.00 us             32     OPENDIR
     13.85      49.22 us      23.00 us      92.00 us             32      LOOKUP
     32.81      58.28 us      20.00 us     116.00 us             64     READDIR
     52.59      93.42 us      20.00 us     921.00 us             64    GETXATTR
 
    Duration: 256 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
-------------- next part --------------
[root at sm11 glusterfs]# tail bricks/*.log
==> bricks/mnt-disk1-mainvol.log <==
[2016-04-01 12:25:33.612779] E [MSGID: 115056] [server-rpc-fops.c:689:server_opendir_cbk] 0-mainvol-server: 10971356: OPENDIR /home/analyzer/personal/tcliu/projects/NTD/case_vcfs/HIGH/GALNT11 (e49e2adf-dc3f-41f5-96d5-b14b40f35d5f) ==> (Permission denied) [Permission denied]
[2016-04-01 12:29:46.857938] E [MSGID: 113018] [posix.c:234:posix_lookup] 0-mainvol-posix: post-operation lstat on parent /mnt/disk1/mainvol/.glusterfs/f3/83/f3833a3a-6c47-415d-ad1b-f3c6a7a57681 failed [No such file or directory]
[2016-04-01 12:29:46.859504] E [MSGID: 113018] [posix.c:234:posix_lookup] 0-mainvol-posix: post-operation lstat on parent /mnt/disk1/mainvol/.glusterfs/f3/83/f3833a3a-6c47-415d-ad1b-f3c6a7a57681 failed [No such file or directory]
[2016-04-01 12:33:45.228956] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/3b/50/3b50d2e8-4956-4f96-ac19-3053a04bb676 while doing xattrop: Key:trusted.ec.version  [No such file or directory]
[2016-04-01 14:31:38.579476] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/6b/b4/6bb472bd-df7a-47ce-8b9b-54f859e72b15 while doing xattrop: Key:trusted.ec.version  [No such file or directory]
[2016-04-01 15:20:17.888807] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/ab/9e/ab9e50a7-e233-4398-a459-3146d46554bf while doing xattrop: Key:trusted.ec.version  [No such file or directory]
[2016-04-01 15:22:29.297448] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/e7/d5/e7d5cb3c-6a47-45c9-8e90-d6863ba392ba while doing xattrop: Key:trusted.ec.version  [No such file or directory]
[2016-04-01 16:30:42.257752] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/4e/a4/4ea4bea2-b2fa-4a8c-9b28-f085edac24bb while doing xattrop: Key:trusted.ec.version  [No such file or directory]
[2016-04-01 16:30:42.257885] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/35/6b/356b1bd0-38e2-4a38-98aa-d070573b8ff2 while doing xattrop: Key:trusted.ec.version  [No such file or directory]
[2016-04-01 16:37:55.570342] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/42/69/426917ac-590b-4046-a6c8-f6b552d56c88 while doing xattrop: Key:trusted.ec.version  [No such file or directory]

==> bricks/mnt-disk2-mainvol.log <==
[2016-04-01 08:42:09.301205] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/8e/dd/8edd9615-5efd-4ff6-a3bd-fd6847588b03 while doing xattrop: Key:trusted.ec.version  [No such file or directory]
[2016-04-01 08:42:46.330855] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/3c/7e/3c7e5913-9f70-468b-ae58-314048bc555a while doing xattrop: Key:trusted.ec.version  [No such file or directory]
[2016-04-01 11:48:46.297341] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/03/39/0339da12-5bae-42fb-a8e8-ced5e4526547 while doing xattrop: Key:trusted.ec.version  [No such file or directory]
[2016-04-01 12:07:58.239150] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/c3/36/c3363729-efc7-4336-8175-ad63aa6797bc while doing xattrop: Key:trusted.ec.version  [No such file or directory]
[2016-04-01 12:25:07.579365] E [MSGID: 115056] [server-rpc-fops.c:689:server_opendir_cbk] 0-mainvol-server: 3977749: OPENDIR <gfid:70c326b9-e98d-41fb-b1da-2f5e52440347>/FOLH1B (49a59e47-3f92-46d1-b745-6ae0a0e61db4) ==> (Permission denied) [Permission denied]
[2016-04-01 12:25:33.602114] E [MSGID: 115056] [server-rpc-fops.c:689:server_opendir_cbk] 0-mainvol-server: 3980178: OPENDIR <gfid:70c326b9-e98d-41fb-b1da-2f5e52440347>/FOLH1B (49a59e47-3f92-46d1-b745-6ae0a0e61db4) ==> (Permission denied) [Permission denied]
[2016-04-01 12:25:33.612870] E [MSGID: 115056] [server-rpc-fops.c:689:server_opendir_cbk] 0-mainvol-server: 3980186: OPENDIR <gfid:70c326b9-e98d-41fb-b1da-2f5e52440347>/GALNT11 (e49e2adf-dc3f-41f5-96d5-b14b40f35d5f) ==> (Permission denied) [Permission denied]
[2016-04-01 12:33:45.228275] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/3b/50/3b50d2e8-4956-4f96-ac19-3053a04bb676 while doing xattrop: Key:trusted.ec.version  [No such file or directory]
[2016-04-01 14:31:38.579305] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/3f/3d/3f3d9c37-be02-48e0-973b-ef8c4f3c295c while doing xattrop: Key:trusted.ec.version  [No such file or directory]
[2016-04-01 14:52:40.571078] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/b0/7b/b07b5553-a55d-4059-8777-a0ec40e51132 while doing xattrop: Key:trusted.ec.version  [No such file or directory]

[root at sm11 glusterfs]# tail *.log
==> cli.log <==
[2016-04-03 07:33:00.323469] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2016-04-03 07:33:00.323577] I [socket.c:2356:socket_event_handler] 0-transport: disconnecting now
[2016-04-03 07:33:00.420750] I [cli-rpc-ops.c:2139:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2016-04-03 07:33:00.420987] I [input.c:36:cli_batch] 0-: Exiting with: 0
[2016-04-03 08:15:28.427738] I [cli.c:721:main] 0-cli: Started running gluster with version 3.7.9
[2016-04-03 08:15:28.436907] I [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not installed
[2016-04-03 08:15:28.437338] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2016-04-03 08:15:28.437433] I [socket.c:2356:socket_event_handler] 0-transport: disconnecting now
[2016-04-03 08:15:28.551696] I [cli-rpc-ops.c:2139:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2016-04-03 08:15:28.551936] I [input.c:36:cli_batch] 0-: Exiting with: 0

==> cmd_history.log <==
[2016-04-01 08:00:00.856908]  : volume set help : SUCCESS
[2016-04-01 08:03:45.605250]  : volume set help : SUCCESS
[2016-04-03 06:26:59.505978]  : volume set help : SUCCESS
[2016-04-03 06:41:39.827425]  : volume set help : SUCCESS
[2016-04-03 06:41:53.469277]  : volume set help : SUCCESS
[2016-04-03 06:42:13.859466]  : volume set help : SUCCESS
[2016-04-03 07:06:58.119033]  : volume set help : SUCCESS
[2016-04-03 07:07:08.245910]  : volume set help : SUCCESS
[2016-04-03 07:33:00.420496]  : volume set help : SUCCESS
[2016-04-03 08:15:28.551440]  : volume set help : SUCCESS

==> data.log <==
[2016-03-30 06:17:21.185246] W [MSGID: 114060] [client-handshake.c:724:client3_3_reopen_cbk] 0-mainvol-client-9: reopen on <gfid:69707d8f-989a-4cba-b724-33db1e8b8bbe> failed. [Stale file handle]
[2016-03-30 06:17:21.185988] W [MSGID: 114060] [client-handshake.c:724:client3_3_reopen_cbk] 0-mainvol-client-9: reopen on <gfid:9f4bafa4-b932-410a-877c-265edb553155> failed. [Stale file handle]
[2016-03-30 06:17:21.186031] W [MSGID: 114060] [client-handshake.c:724:client3_3_reopen_cbk] 0-mainvol-client-9: reopen on <gfid:9f4bafa4-b932-410a-877c-265edb553155> failed. [Stale file handle]
[2016-03-30 06:17:47.088748] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed on some subvolumes (up=3F, mask=37, remaining=0, good=37, bad=8)
The message "W [MSGID: 122035] [ec-common.c:419:ec_child_select] 0-mainvol-disperse-1: Executing operation with some subvolumes unavailable (8)" repeated 5 times between [2016-03-30 06:16:14.119064] and [2016-03-30 06:17:14.120278]
[2016-03-30 06:17:21.155916] W [MSGID: 114060] [client-handshake.c:724:client3_3_reopen_cbk] 0-mainvol-client-9: reopen on <gfid:a293e6b6-357f-4cce-934e-f21757615648> failed. [Stale file handle]
[2016-03-30 06:17:21.156052] W [MSGID: 114060] [client-handshake.c:724:client3_3_reopen_cbk] 0-mainvol-client-9: reopen on <gfid:fb838b06-bd89-4cd1-931d-49f16185e742> failed. [Stale file handle]
The message "W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed on some subvolumes (up=3F, mask=37, remaining=0, good=37, bad=8)" repeated 3 times between [2016-03-30 06:17:47.088748] and [2016-03-30 06:18:07.271744]
[2016-03-30 06:18:14.124001] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed on some subvolumes (up=3F, mask=37, remaining=0, good=37, bad=8)
[2016-04-01 03:14:01.680654] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f425f2e8dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f42609538b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f4260953739] ) 0-: received signum (15), shutting down

==> etc-glusterfs-glusterd.vol.log <==
[2016-04-03 06:26:53.908779] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2016-04-03 06:26:59.507550] I [socket.c:3383:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2016-04-03 06:26:59.507588] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 12) to rpc-transport (socket.management)
[2016-04-03 06:26:59.507613] E [MSGID: 106430] [glusterd-utils.c:474:glusterd_submit_reply] 0-glusterd: Reply submission failed
[2016-04-03 06:42:13.859506] I [socket.c:3383:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2016-04-03 06:42:13.859520] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 12) to rpc-transport (socket.management)
[2016-04-03 06:42:13.859534] E [MSGID: 106430] [glusterd-utils.c:474:glusterd_submit_reply] 0-glusterd: Reply submission failed
[2016-04-03 07:07:08.245951] I [socket.c:3383:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2016-04-03 07:07:08.245966] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 12) to rpc-transport (socket.management)
[2016-04-03 07:07:08.245981] E [MSGID: 106430] [glusterd-utils.c:474:glusterd_submit_reply] 0-glusterd: Reply submission failed

==> glfsheal-mainvol.log <==

==> glustershd.log <==
[2016-04-02 17:03:08.694924] E [MSGID: 114031] [client-rpc-fops.c:466:client3_3_open_cbk] 0-mainvol-client-5: remote operation failed. Path: <gfid:c5df439e-1c6e-4105-b6c2-014a7be439cd> (c5df439e-1c6e-4105-b6c2-014a7be439cd) [Transport endpoint is not connected]
[2016-04-02 17:03:08.695053] E [MSGID: 114031] [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-mainvol-client-5: remote operation failed [Transport endpoint is not connected]
[2016-04-02 17:03:08.703770] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f2bc87bca52] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2bc85878de] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2bc85879ee] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f2bc858937a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f2bc8589ba8] ))))) 0-mainvol-client-11: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2016-04-02 17:03:08.697566 (xid=0x4fb1e9)
[2016-04-02 17:03:08.638750] W [MSGID: 122056] [ec-combine.c:866:ec_combine_check] 0-mainvol-disperse-1: Mismatching xdata in answers of 'LOOKUP'
[2016-04-02 17:03:08.700878] E [MSGID: 114031] [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-mainvol-client-5: remote operation failed [Transport endpoint is not connected]
The message "E [MSGID: 114031] [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-mainvol-client-11: remote operation failed [Transport endpoint is not connected]" repeated 7 times between [2016-04-02 17:03:08.628255] and [2016-04-02 17:03:08.748082]
[2016-04-02 17:33:09.795869] E [rpc-clnt.c:201:call_bail] 0-mainvol-client-11: bailing out frame type(GlusterFS 3.3) op(OPEN(11)) xid = 0x4fb1f6 sent = 2016-04-02 17:03:08.750519. timeout = 1800 for 172.16.135.16:49153
[2016-04-02 17:33:09.795952] E [MSGID: 114031] [client-rpc-fops.c:466:client3_3_open_cbk] 0-mainvol-client-11: remote operation failed. Path: <gfid:db0c1d6c-f733-4bc3-8c76-0b1cc8d6cbe7> (db0c1d6c-f733-4bc3-8c76-0b1cc8d6cbe7) [Transport endpoint is not connected]
[2016-04-02 18:01:59.992552] E [rpc-clnt.c:201:call_bail] 0-mainvol-client-11: bailing out frame type(GlusterFS 3.3) op(OPEN(11)) xid = 0x4fb221 sent = 2016-04-02 17:31:56.361972. timeout = 1800 for 172.16.135.16:49153
[2016-04-02 18:01:59.992618] E [MSGID: 114031] [client-rpc-fops.c:466:client3_3_open_cbk] 0-mainvol-client-11: remote operation failed. Path: <gfid:9afe87ba-855f-492c-901f-b618f5247705> (9afe87ba-855f-492c-901f-b618f5247705) [Transport endpoint is not connected]

==> mainvol-rebalance.log <==
230: volume mainvol
231:     type debug/io-stats
232:     option log-level WARNING
233:     option latency-measurement off
234:     option count-fop-hits off
235:     subvolumes mainvol-dht
236: end-volume
237:
+------------------------------------------------------------------------------+
[2016-04-01 05:03:14.643881] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f9c219c9dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f9c230348b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f9c23034739] ) 0-: received signum (15), shutting down

==> nfs.log <==
[2016-04-01 12:54:42.426597] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed on some subvolumes (up=3F, mask=3F, remaining=10, good=2D, bad=2)
[2016-04-01 12:59:47.138952] E [MSGID: 114030] [client-rpc-fops.c:3022:client3_3_readv_cbk] 0-mainvol-client-4: XDR decoding failed [Invalid argument]
[2016-04-01 12:59:47.139022] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-mainvol-client-4: remote operation failed [Invalid argument]
[2016-04-01 12:59:47.141444] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=4, good=2B, bad=10)
[2016-04-01 13:38:07.961739] E [MSGID: 114030] [client-rpc-fops.c:3022:client3_3_readv_cbk] 0-mainvol-client-7: XDR decoding failed [Invalid argument]
[2016-04-01 13:38:07.962014] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-mainvol-client-7: remote operation failed [Invalid argument]
[2016-04-01 13:38:07.964187] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed on some subvolumes (up=3F, mask=3F, remaining=10, good=2D, bad=2)
[2016-04-01 15:16:17.152097] E [MSGID: 114030] [client-rpc-fops.c:3022:client3_3_readv_cbk] 0-mainvol-client-1: XDR decoding failed [Invalid argument]
[2016-04-01 15:16:17.159452] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-mainvol-client-1: remote operation failed [Invalid argument]
[2016-04-01 15:16:17.159833] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=1, good=3C, bad=2)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4169 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160403/33482a61/attachment.p7s>


More information about the Gluster-users mailing list