<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">Raised a bug

      (<a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1278418">https://bugzilla.redhat.com/show_bug.cgi?id=1278418</a>) for this,

      and sent a fix (<a class="moz-txt-link-freetext" href="http://review.gluster.org/#/c/12516/">http://review.gluster.org/#/c/12516/</a>) too. It

      would be great if you could also review the patch.<br>

      <br>

      Regards,<br>

      Avra<br>

      <br>

      On 11/05/2015 06:01 PM, Avra Sengupta wrote:<br>

    </div>

    <blockquote cite="mid:563B4C1F.4070901@redhat.com" type="cite">

      <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

      <div class="moz-cite-prefix">Hey Michael,<br>

        <br>

        Thanks, but I don't think that would be necessary anymore.<br>

        <br>

        Guys,<br>

        <br>

        I wrote a patch changing logs to set brick status logs to INFO (<a

          moz-do-not-send="true" class="moz-txt-link-freetext"

          href="http://review.gluster.org/#/c/12515/"><a class="moz-txt-link-freetext" href="http://review.gluster.org/#/c/12515/">http://review.gluster.org/#/c/12515/</a></a>).

        Ironically this patch too did not fail regression on first go,

        but did fail on the next iteration. From what I see in the logs

        (given below). As i had suspected, the brick connectivity

        happens a tad bit after the clone command is executed. Now I

        don't know why this time delay happens on the regression setup

        (that too not all the time), and never locally. I can think of

        various reasons for the same(slower regression machines being my

        prime suspect to begin with), but I can't say for sure. I will

        raise a bug for this, and try and modify the testcase

        accordingly.<br>

        <br>

        Logs:<br>

        [2015-11-05 11:25:15.103233] E [MSGID: 106122]

        [glusterd-snapshot.c:2376:glusterd_snapshot_clone_prevalidate]

        0-management: Failed to pre validate<br>

        <b>[2015-11-05 11:25:15.103265] E [MSGID: 106443]

          [glusterd-snapshot.c:2398:glusterd_snapshot_clone_prevalidate]

          0-management: One or more bricks are not running. Please run

          snapshot status command to see brick sta</b><b><br>

        </b><b>tus.</b><br>

        Please start the stopped brick and then issue snapshot clone

        command <br>

        [2015-11-05 11:25:15.103280] W [MSGID: 106443]

        [glusterd-snapshot.c:8398:glusterd_snapshot_prevalidate]

        0-management: Snapshot clone pre-validation failed<br>

        [2015-11-05 11:25:15.103294] W [MSGID: 106122]

        [glusterd-mgmt.c:166:gd_mgmt_v3_pre_validate_fn] 0-management:

        Snapshot Prevalidate Failed<br>

        [2015-11-05 11:25:15.103305] E [MSGID: 106122]

        [glusterd-mgmt.c:820:glusterd_mgmt_v3_pre_validate]

        0-management: Pre Validation failed for operation Snapshot on

        local node<br>

        [2015-11-05 11:25:15.103315] E [MSGID: 106122]

        [glusterd-mgmt.c:2166:glusterd_mgmt_v3_initiate_snap_phases]

        0-management: Pre Validation Failed<br>

        [2015-11-05 11:25:15.103332] E [MSGID: 106027]

        [glusterd-snapshot.c:7946:glusterd_snapshot_clone_postvalidate]

        0-management: unable to find clone clone1 volinfo<br>

        [2015-11-05 11:25:15.103342] W [MSGID: 106444]

        [glusterd-snapshot.c:8837:glusterd_snapshot_postvalidate]

        0-management: Snapshot create post-validation failed<br>

        [2015-11-05 11:25:15.103352] W [MSGID: 106121]

        [glusterd-mgmt.c:323:gd_mgmt_v3_post_validate_fn] 0-management:

        postvalidate operation failed<br>

        [2015-11-05 11:25:15.103362] E [MSGID: 106121]

        [glusterd-mgmt.c:1585:glusterd_mgmt_v3_post_validate]

        0-management: Post Validation failed for operation Snapshot on

        local node<br>

        [2015-11-05 11:25:15.103372] E [MSGID: 106122]

        [glusterd-mgmt.c:2286:glusterd_mgmt_v3_initiate_snap_phases]

        0-management: Post Validation Failed<br>

        [2015-11-05 11:25:15.109994]:++++++++++

        G_LOG:./tests/bugs/snapshot/bug-1275616.t: TEST: 42 42 149

        snap_info_volume CLI Snaps Available patchy ++++++++++<br>

        [2015-11-05 11:25:15.239358]:++++++++++

        G_LOG:./tests/bugs/snapshot/bug-1275616.t: TEST: 43 43 150

        snap_config_volume CLI snap-max-hard-limit patchy ++++++++++<br>

        [2015-11-05 11:25:15.378255]:++++++++++

        G_LOG:./tests/bugs/snapshot/bug-1275616.t: TEST: 45 45 200

        snap_info_volume CLI Snaps Available clone1 ++++++++++<br>

        [2015-11-05 11:25:15.501970] E [MSGID: 106027]

        [glusterd-snapshot.c:3574:glusterd_snapshot_get_info_by_volume]

        0-management: Volume (clone1) does not exist [Invalid argument]<br>

        [2015-11-05 11:25:15.502024] E [MSGID: 106027]

        [glusterd-snapshot.c:3766:glusterd_handle_snapshot_info]

        0-management: Failed to get volume info of volume clone1

        [Invalid argument]<br>

        [2015-11-05 11:25:15.502061] W [MSGID: 106063]

        [glusterd-snapshot.c:9082:glusterd_handle_snapshot_fn]

        0-management: Snapshot info failed<br>

        [2015-11-05 11:25:15.510016]:++++++++++

        G_LOG:./tests/bugs/snapshot/bug-1275616.t: TEST: 46 46 200

        snap_config_volume CLI snap-max-hard-limit clone1 ++++++++++<br>

        [2015-11-05 11:25:15.639515] E [MSGID: 106060]

        [glusterd-snapshot.c:438:snap_max_limits_display_commit]

        0-management: Volume (clone1) does not exist<br>

        [2015-11-05 11:25:15.639543] E [MSGID: 106090]

        [glusterd-snapshot.c:1446:glusterd_handle_snapshot_config]

        0-management: snap-max-limit display commit failed.<br>

        [2015-11-05 11:25:15.639558] W [MSGID: 106045]

        [glusterd-snapshot.c:9101:glusterd_handle_snapshot_fn]

        0-management: snapshot config failed<br>

        <b>[2015-11-05 11:25:15.684746] I

          [glusterd-utils.c:4883:glusterd_set_brick_status] 0-glusterd:

          Setting brick

          slave28.cloud.gluster.org:/var/run/gluster/snaps/7db8306c170541eb98c02633407bf625/brick1

          status to started</b><br>

        <br>

        Regards,<br>

        Avra<br>

        <br>

        On 11/05/2015 05:07 PM, Michael Scherer wrote:<br>

      </div>

      <blockquote cite="mid:1446723474.31793.131.camel@redhat.com"

        type="cite">

        <pre wrap="">Le jeudi 05 novembre 2015 à 15:59 +0530, Avra Sengupta a écrit :

</pre>

        <blockquote type="cite">

          <pre wrap="">On 11/05/2015 03:57 PM, Avra Sengupta wrote:

</pre>

          <blockquote type="cite">

            <pre wrap="">On 11/05/2015 03:56 PM, Vijay Bellur wrote:

</pre>

            <blockquote type="cite">

              <pre wrap="">On Thursday 05 November 2015 12:19 PM, Avra Sengupta wrote:

</pre>

              <blockquote type="cite">

                <pre wrap="">Hi,

We investigated the logs in the regression failures that encountered

this and following are the findings:

1. snapshot clone failure is indeed the reason for the failure.

2. snapshot clone has failed in pre-validation with the error that the

brick of snap3 is not up and running.

3. snap3 was created, and subsequently started (because of

activate-on-create being enabled), long before we tried to create a

clone out of it.

4. The snap3's brick shows no failure logs, and thereby gives us no

reason to believe that it did not start properly in the course of the

testcase.

5. Which leaves us with the assumption (it is an assumption because we

do not have any logs backing it) that, there was some delay in either

the start of the brick process for snap3, or for glusterd to register

that the same has started, and before either of these events could have

happened the clone command got executed and failed. This would make 

it a

race.

Some other things to consider about the particular testcase:

1. It did pass (and still passes consistently), in our local systems

making it not reproducible locally.

2. The patch was merged after both linux and netbsd regressions passed

(at one go).

3. The release 3.7 backported patch for the same, has also passed both

the linux and netbsd regressions as of now.

The rationale behind mentioning the above three points being, this

testcase has passed locally, as well as on the regression setups(not

just at the time of merge, but even now), which brings me back to the

assumption mentioned in point #5 . To get more clarity on the said

assumption we need access to one of the regression setups, so that we

can try reproducing the failure in that environment and get some proof

of what really is happening.

Vijay,

Could you please provide us with a jenkins linux slave to perform the

above mentioned validity

</pre>

              </blockquote>

              <pre wrap="">Please send out a request on gluster-infra if not done so and Michael 

Scherer should be able to help.

Thanks!

Vijay

</pre>

            </blockquote>

            <pre wrap="">+ Adding gluster-infra and Michael

Could you please provide us with a jenkins linux slave to perform the 

above mentioned validity

</pre>

          </blockquote>

        </blockquote>

        <pre wrap="">So you just want 1 single centos 6 gluster slave, who need access to it,

and for how long ?

Can you provides a ssh key so I can create a snapshot and give to you ?

</pre>

      </blockquote>

      <br>

    </blockquote>

    <br>

  </body>

</html>