<p dir="ltr"><br>
On Mar 24, 2015 10:48 PM, "Emmanuel Dreyfus" <<a href="mailto:manu@netbsd.org">manu@netbsd.org</a>> wrote:<br>
><br>
> Hi<br>
><br>
> The merge of <a href="http://review.gluster.org/9953/">http://review.gluster.org/9953/</a> removed a few crashes from<br>
> NetBSD regression tests, but the thing remains uterly broken since the<br>
> merge of <a href="http://review.gluster.org/9708/">http://review.gluster.org/9708/</a> though I cannot tell if I have<br>
> bugs leftover form this commit or if I face new problems.<br>
><br>
> Here are the known problem so far:<br>
><br>
> 1) This needs to be merged:<br>
> <a href="http://review.gluster.org/9831">http://review.gluster.org/9831</a><br>
> <a href="http://review.gluster.org/9944">http://review.gluster.org/9944</a><br>
><br>
> 2) I still experience memory corruption, which usually crash glsuterfsd<br>
> because some pointer waas replaced by value 0x3. This strikes on iobref<br>
> most of the time, but it can happens elsewhere.<br>
><br>
> I would be glad if someone could help here. On nbslave70:/autobuild I<br>
> added code to check for iobref/iobuf sanity at random place (by calling<br>
> iobref_sanity()). I do this in synask_wrap and in STACK_WIND/UNWIND,<br>
> but I have not been able to spot the source of the problem yet.</p>
<p dir="ltr">I'll take a look at this tomorrow.</p>
<p dir="ltr">><br>
> The weird thing is that memory seems to always be overwritten by the<br>
> same values, and magic 0xcafebabe number before the buffer is preserved.<br>
> Here is an example: where iobref->iobrefs = 0xbb11a458<br>
> 0xbb11a44c: 0xcafebabe 0x00000000 0x00000000 0x00000003<br>
> 0xbb11a45c: 0x00000003 0x00000008 0x00000003 0x0000000c<br>
> 0xbb11a46c: 0x00000003 0x0000000e 0x00000003 0x00000010<br>
> 0xbb11a47c: 0x00000003 0x00000009 0x00000003 0x0000000d<br>
> 0xbb11a48c: 0x00000003 0x00000015 0x00000003 0x00000016<br>
> 0xbb11a49c: 0x00000003 0x00000032 0x00000034 0xbb1e2018<br>
> 0xbb11a4ac: 0xcafebabe 0x00000000 0x00000000 0xbb11a5d8<br>
><br>
><br>
> Additionnaly, there are two workarounds I had to make for crashes<br>
> that happen sometime:<br>
> 3) I had to make this change (not yet posted on gerrit) to avoid crashing<br>
> because op = GD_OP_NONE. Things seems to go fins without the test.<br>
> a cause or a symptom:<br>
><br>
> diff --git a/xlators/mgmt/glusterd/src/glusterd-utils.c b/xlators/mgmt/glusterd/src/glusterd-utils.c<br>
> index 02d2cfb..c06959c 100644<br>
> --- a/xlators/mgmt/glusterd/src/glusterd-utils.c<br>
> +++ b/xlators/mgmt/glusterd/src/glusterd-utils.c<br>
> @@ -8301,15 +8301,12 @@ out:<br>
> int<br>
> glusterd_volume_heal_use_rsp_dict (dict_t *aggr, dict_t *rsp_dict)<br>
> {<br>
> int ret = 0;<br>
> dict_t *ctx_dict = NULL;<br>
> - glusterd_op_t op = GD_OP_NONE;<br>
> + glusterd_op_t op = GD_OP_HEAL_VOLUME;<br>
><br>
> GF_ASSERT (rsp_dict);<br>
><br>
> - op = glusterd_op_get_op ();<br>
> - GF_ASSERT (GD_OP_HEAL_VOLUME == op);<br>
> -<br>
> if (aggr) {<br>
> ctx_dict = aggr;<br>
><br>
><br>
> 4) Here I crash because this->private = NULL, and here is a<br>
> workaround:<br>
><br>
> diff --git a/xlators/storage/posix/src/posix.c b/xlators/storage/posix/src/posix.c<br>
> index ae08adc..3918e07 100644<br>
> --- a/xlators/storage/posix/src/posix.c<br>
> +++ b/xlators/storage/posix/src/posix.c<br>
> @@ -913,6 +913,7 @@ posix_opendir (call_frame_t *frame, xlator_t *this,<br>
><br>
> VALIDATE_OR_GOTO (frame, out);<br>
> VALIDATE_OR_GOTO (this, out);<br>
> + VALIDATE_OR_GOTO (this->private, out);<br>
> VALIDATE_OR_GOTO (loc, out);<br>
> VALIDATE_OR_GOTO (fd, out);<br>
><br>
><br>
><br>
> 4)<br>
><br>
><br>
> --<br>
> Emmanuel Dreyfus<br>
> <a href="mailto:manu@netbsd.org">manu@netbsd.org</a><br>
> _______________________________________________<br>
> Gluster-devel mailing list<br>
> <a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>
> <a href="http://www.gluster.org/mailman/listinfo/gluster-devel">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br>
</p>