<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Hello guys,</p>
<p>I would like to get some advices on a some problems we have on
our 3 hosts gluster setup.</p>
<p>Here the setup used:</p>
<ol>
<li>GlusterFS 3.8.0-1 (we did an upgrade from 3.7.11 last week)</li>
<li>Type: Disperse</li>
<li>Number of Bricks: 1 x (2 + 1) = 3</li>
<li>Transport-type: tcp</li>
<li>Options Reconfigured: transport.address-family: inet</li>
</ol>
<p>Please note that we also have the ACL option enabled on the
volume mount.<br>
</p>
<p>Use case:<br>
</p>
<p>An user submit jobs/tasks to a Spark cluster which have the
glusterfs volume mounted on each host.</p>
13 tasks were successfully completed in ~30 min for each (convert
some logs to a json format and write the ouput to the gluster fs)
but one was blocked for more than 12 hours when we checked<br>
was going wrong.<br>
<br>
We found some log entries related to an inode locking in the brick
log one one host:<br>
<br>
[2016-06-19 03:15:08.563397] E [inodelk.c:304:__inode_unlock_lock]
0-exp-locks: Matching lock not found for unlock
0-9223372036854775807, by 10613ebc6c6a0000 on 0x6cee5c0f4730<br>
[2016-06-19 03:15:08.563684] E [MSGID: 115053]
[server-rpc-fops.c:273:server_inodelk_cbk] 0-exp-server: 5375861:
INODELK /spark/user/20160328/_temporary/0/_temporary (015bde3a-09d<br>
6-41a2-8e9f-7e7c5295d596) ==> (Invalid argument) [Invalid
argument]<br>
<br>
Errors in the data log:<br>
<p>[2016-06-19 03:13:29.198676] I [MSGID: 109036]
[dht-common.c:8824:dht_log_new_layout_for_dir_selfheal] 0-exp-dht:
Setting layout of /spark/user/20160328/_temporary/0/_temporary/at<br>
tempt_201606190511_0004_m_000004_26 with [Subvol_name:
exp-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
<br>
[2016-06-19 03:14:59.349357] I [MSGID: 109066]
[dht-rename.c:1562:dht_rename] 0-exp-dht: renaming
/spark/user/20160328/_temporary/0/_temporary/attempt_201606190511_0004_m_000001_2<br>
3 (hash=exp-disperse-0/cache=exp-disperse-0) =>
/spark/user/20160328/_temporary/0/task_201606190511_0004_m_000001
(hash=exp-disperse-0/cache=<nul>)</p>
<p>And these entries are also spamming the data log when an action
is done the fs:<br>
</p>
<p>[2016-06-19 13:58:22.817308] I [dict.c:462:dict_get]
(-->/usr/lib64/glusterfs/3.8.0/xlator/debug/io-stats.so(+0x13628)
[0x6f0655cd1628]
-->/usr/lib64/glusterfs/3.8.0/xlator/system/posix-acl.s<br>
o(+0x9ccb) [0x6f0655ab5ccb]
-->/lib64/libglusterfs.so.0(dict_get+0xec) [0x6f066528df7c] )
0-dict: !this || key=system.posix_acl_access [Invalid argument]<br>
[2016-06-19 13:58:22.817364] I [dict.c:462:dict_get]
(-->/usr/lib64/glusterfs/3.8.0/xlator/debug/io-stats.so(+0x13628)
[0x6f0655cd1628]
-->/usr/lib64/glusterfs/3.8.0/xlator/system/posix-acl.s<br>
o(+0x9d21) [0x6f0655ab5d21]
-->/lib64/libglusterfs.so.0(dict_get+0xec) [0x6f066528df7c] )
0-dict: !this || key=system.posix_acl_default [Invalid argument]</p>
<p>We did a stadump and we got confirmation that some processes were
in a blocking state.</p>
<p>We did a clear lock on the blocked inode and the spark job has
finally finished (with errors).</p>
<p>What could be the root cause of these lockings?<br>
</p>
<p>Thanks for your help!</p>
<p>Florian<br>
</p>
<p><br>
</p>
</body>
</html>