<body style="padding:12px 8px; font-size:12px;"><p></p><div>I tracked this problem, and found that the loc.parent and loc.pargfid are all null in the call <span>sequences below</span>:<br><br>ec_manager_writev() -> ec_get_size_version() -> ec_lookup(). This can cause server_resolve() return an EINVAL.<br><br>A replace-brick will cause all opened fd and inode table recreate, but ec_lookup() get the loc from fd->_ctx. <br><br>So loc.parent and loc.pargfid are missing while fd changed. Other xlators always do a lookup from root <br><br>directory, so never cause this problem. It seems that a recursive lookup from root directory may address this <br><br>issue.<br><br>----- 原邮件信息 -----<br><strong>发件人:</strong>Raghavendra Gowdappa <rgowdapp@redhat.com><br><strong>发送时间:</strong>14-12-24 21:48:56<br><strong>收件人:</strong>Xavier Hernandez <xhernandez@datalab.es><br><strong>抄送人:</strong>Gluster Devel <gluster-devel@gluster.org><br><strong>主题:</strong>Re: [Gluster-devel] Problems with graph switch in disperse<br><br>Do you know the origins of EIO? fuse-bridge only fails a lookup fop with EIO (when NULL gfid is received in a successful lookup reply). So, there might be other xlator which is sending EIO.<br>
<br>
----- Original Message -----<br>
> From: &quot;Xavier Hernandez&quot; <xhernandez@datalab.es><br>
> To: &quot;Gluster Devel&quot; <gluster-devel@gluster.org><br>
> Sent: Wednesday, December 24, 2014 6:25:17 PM<br>
> Subject: [Gluster-devel] Problems with graph switch in disperse<br>
> <br>
> Hi,<br>
> <br>
> I'm experiencing a problem when gluster graph is changed as a result of<br>
> a replace-brick operation (probably with any other operation that<br>
> changes the graph) while the client is also doing other tasks, like<br>
> writing a file.<br>
> <br>
> When operation starts, I see that the replaced brick is disconnected,<br>
> but writes continue working normally with one brick less.<br>
> <br>
> At some point, another graph is created and comes online. Remaining<br>
> bricks on the old graph are disconnected and the old graph is destroyed.<br>
> I see how new write requests are sent to the new graph.<br>
> <br>
> This seems correct. However there's a point where I see this:<br>
> <br>
> [2014-12-24 11:29:58.541130] T [fuse-bridge.c:2305:fuse_write_resume]<br>
> 0-glusterfs-fuse: 2234: WRITE (0x16dcf3c, size=131072, offset=255721472)<br>
> [2014-12-24 11:29:58.541156] T [ec-helpers.c:101:ec_trace] 2-ec:<br>
> WIND(INODELK) 0x7f8921b7a9a4(0x7f8921b78e14) [refs=5, winds=3, jobs=1]<br>
> frame=0x7f8932e92c38/0x7f8932e9e6b0, min/exp=3/3, err=0 state=1<br>
> {111:000:000} idx=0<br>
> [2014-12-24 11:29:58.541292] T [rpc-clnt.c:1384:rpc_clnt_record]<br>
> 2-patchy-client-0: Auth Info: pid: 0, uid: 0, gid: 0, owner:<br>
> d025e932897f0000<br>
> [2014-12-24 11:29:58.541296] T [io-cache.c:133:ioc_inode_flush]<br>
> 2-patchy-io-cache: locked inode(0x16d2810)<br>
> [2014-12-24 11:29:58.541354] T<br>
> [rpc-clnt.c:1241:rpc_clnt_record_build_header] 2-rpc-clnt: Request<br>
> fraglen 152, payload: 84, rpc hdr: 68<br>
> [2014-12-24 11:29:58.541408] T [io-cache.c:137:ioc_inode_flush]<br>
> 2-patchy-io-cache: unlocked inode(0x16d2810)<br>
> [2014-12-24 11:29:58.541493] T [io-cache.c:133:ioc_inode_flush]<br>
> 2-patchy-io-cache: locked inode(0x16d2810)<br>
> [2014-12-24 11:29:58.541536] T [io-cache.c:137:ioc_inode_flush]<br>
> 2-patchy-io-cache: unlocked inode(0x16d2810)<br>
> [2014-12-24 11:29:58.541537] T [rpc-clnt.c:1577:rpc_clnt_submit]<br>
> 2-rpc-clnt: submitted request (XID: 0x17 Program: GlusterFS 3.3,<br>
> ProgVers: 330, Proc: 29) to rpc-transport (patchy-client-0)<br>
> [2014-12-24 11:29:58.541646] W [fuse-bridge.c:2271:fuse_writev_cbk]<br>
> 0-glusterfs-fuse: 2234: WRITE => -1 (Input/output error)<br>
> <br>
> It seems that fuse still has a write request pending for graph 0. It is<br>
> resumed but it returns EIO without calling the xlator stack (operations<br>
> seen between the two log messages are from other operations and they are<br>
> sent to graph 2). I'm not sure why this happens and how I should aviod this.<br>
> <br>
> I tried the same scenario with replicate and it seems to work, so there<br>
> must be something wrong in disperse, but I don't see where the problem<br>
> could be.<br>
> <br>
> Any ideas ?<br>
> <br>
> Thanks,<br>
> <br>
> Xavi<br>
> _______________________________________________<br>
> Gluster-devel mailing list<br>
> Gluster-devel@gluster.org<br>
> <a target="_blank" href="http://www.gluster.org/mailman/listinfo/gluster-devel">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br>
> <br>
_______________________________________________<br>
Gluster-devel mailing list<br>
Gluster-devel@gluster.org<br>
<a target="_blank" href="http://www.gluster.org/mailman/listinfo/gluster-devel">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br>
<p></p></div></body>