<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <br>

    <div class="moz-cite-prefix">On 01/25/2016 02:17 AM, Richard Wareing

      wrote:<br>

    </div>

    <blockquote

cite="mid:0ACFC66923ABE3439A641C0843AB52310104F5EDCD@PRN-MBX02-5.TheFacebook.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=windows-1252">

      <style type="text/css" id="owaParaStyle"></style>

      <div style="direction: ltr;font-family: Tahoma;color:

        #000000;font-size: 10pt;">Hello all,

        <div><br>

        </div>

        <div>Just gave a talk at SCaLE 14x today and I mentioned our new

          locks revocation feature which has had a significant impact on

          our GFS cluster reliability.  As such I wanted to share the

          patch with the community, so here's the bugzilla report:</div>

        <div><br>

        </div>

        <div><a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1301401">https://bugzilla.redhat.com/show_bug.cgi?id=1301401</a></div>

        <div><br>

        </div>

        <div>=====</div>

        <div>Summary:</div>

        <div>

          <div>Mis-behaving brick clients (gNFSd, FUSE, gfAPI) can cause

            cluster instability and eventual complete unavailability due

            to failures in releasing entry/inode locks in a timely

            manner.</div>

          <div><br>

          </div>

          <div>Classic symptoms on this are increased brick (and/or

            gNFSd) memory usage due the high number of (lock request)

            frames piling up in the processes.  The failure-mode results

            in bricks eventually slowing down to a crawl due to

            swapping, or OOMing due to complete memory exhaustion;

            during this period the entire cluster can begin to fail.

             End-users will experience this as hangs on the filesystem,

            first in a specific region of the file-system and ultimately

            the entire filesystem as the offending brick begins to turn

            into a zombie (i.e. not quite dead, but not quite alive

            either).</div>

          <div><br>

          </div>

          <div>Currently, these situations must be handled by an

            administrator detecting &amp; intervening via the

            "clear-locks" CLI command.  Unfortunately this doesn't scale

            for large numbers of clusters, and it depends on the correct

            (external) detection of the locks piling up (for which there

            is little signal other than state dumps).</div>

          <div><br>

          </div>

          <div>This patch introduces two features to remedy this

            situation:</div>

          <div><br>

          </div>

          <div>1. Monkey-unlocking - This is a feature targeted at

            developers (only!) to help track down crashes due to stale

            locks, and prove the utility of he lock revocation feature.

             It does this by silently dropping 1% of unlock requests;

            simulating bugs or mis-behaving clients.</div>

          <div><br>

          </div>

          <div>The feature is activated via:</div>

          <div>features.locks-monkey-unlocking &lt;on/off&gt;</div>

          <div><br>

          </div>

          <div>You'll see the message</div>

          <div>"[&lt;timestamp&gt;] W [inodelk.c:653:pl_inode_setlk]

            0-groot-locks: MONKEY LOCKING (forcing stuck lock)!" ... in

            the logs indicating a request has been dropped.</div>

          <div><br>

          </div>

          <div>2. Lock revocation - Once enabled, this feature will

            revoke a *contended*lock <span style="font-size: 13.3333px;"> </span><span

              style="font-size: 13.3333px;">(i.e. if nobody else asks

              for the lock, we will not revoke it)</span><span

              style="font-size: 13.3333px;"> </span><span

              style="font-size: 10pt;">either by the amount of time the

              lock has been held, how many other lock requests are

              waiting on the lock to be freed, or some combination of

              both.  Clients which are losing their locks will be

              notified by receiving EAGAIN (send back to their callback

              function).</span></div>

          <div><br>

          </div>

          <div>The feature is activated via these options:</div>

          <div>features.locks-revocation-secs &lt;integer; 0 to

            disable&gt;</div>

          <div>features.locks-revocation-clear-all [on/off]</div>

          <div>features.locks-revocation-max-blocked &lt;integer&gt;</div>

          <div><br>

          </div>

          <div>Recommended settings are: 1800 seconds for a time based

            timeout (give clients the benefit of the doubt, or chose a

            max-blocked requires some experimentation depending on your

            workload, but generally values of hundreds to low thousands

            (it's normal for many ten's of locks to be taken out when

            files are being written @ high throughput).</div>

        </div>

      </div>

    </blockquote>

    <br>

    I really like this feature. One question though, self-heal,

    rebalance domain locks are active until self-heal/rebalance is

    complete which can take more than 30 minutes if the files are in

    TBs. I will try to see what we can do to handle these without

    increasing the revocation-secs too much. May be we can come up with

    per domain revocation timeouts. Comments are welcome.<br>

    <br>

    Pranith<br>

    <blockquote

cite="mid:0ACFC66923ABE3439A641C0843AB52310104F5EDCD@PRN-MBX02-5.TheFacebook.com"

      type="cite">

      <div style="direction: ltr;font-family: Tahoma;color:

        #000000;font-size: 10pt;">

        <div>

        </div>

        <div><br>

        </div>

        <div>=====</div>

        <div><br>

        </div>

        <div>The patch supplied will patch clean the the v3.7.6 release

          tag, and probably to any 3.7.x release &amp; master (posix

          locks xlator is rarely touched).</div>

        <div><br>

        </div>

        <div>Richard</div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div><br>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Gluster-devel mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a>

<a class="moz-txt-link-freetext" href="http://www.gluster.org/mailman/listinfo/gluster-devel">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre>

    </blockquote>

    <br>

  </body>

</html>