<br><tt><font size=2>LiPing10168633/user/zte_ltd 写于 2016/01/28 21:40:30:<br>

<br>

&gt; From: 李平10168633/user/zte_ltd</font></tt>

<br><tt><font size=2>&gt; To: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;,

gluster-devel@gluster.org, </font></tt>

<br><tt><font size=2>&gt; Cc: li.yi79@zte.com.cn, Liu.Jianjun3@zte.com.cn,

<br>

&gt; yang.bin18@zte.com.cn, zhou.shigang37@zte.com.cn</font></tt>

<br><tt><font size=2>&gt; Date: 2016/01/28 21:40</font></tt>

<br><tt><font size=2>&gt; Subject: Re: 答复: Re: [Gluster-devel] Gluster

AFR volume write <br>

&gt; performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND<br>

&gt; in afr_writev</font></tt>

<br><tt><font size=2>&gt; <br>

&gt; Sorry for the late reply.</font></tt>

<br><tt><font size=2>&gt; <br>

&gt; Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt; 写于 2016/01/25

17:48:06:<br>

&gt; <br>

&gt; &gt; From: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;</font></tt>

<br><tt><font size=2>&gt; &gt; To: li.ping288@zte.com.cn, </font></tt>

<br><tt><font size=2>&gt; &gt; Cc: li.yi79@zte.com.cn, zhou.shigang37@zte.com.cn,

<br>

&gt; &gt; Liu.Jianjun3@zte.com.cn, yang.bin18@zte.com.cn</font></tt>

<br><tt><font size=2>&gt; &gt; Date: 2016/01/25 17:48</font></tt>

<br><tt><font size=2>&gt; &gt; Subject: Re: 答复: Re: [Gluster-devel]

Gluster AFR volume write <br>

&gt; &gt; performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND<br>

&gt; &gt; in afr_writev</font></tt>

<br><tt><font size=2>&gt; &gt; <br>

&gt; &gt; <br>

</font></tt>

<br><tt><font size=2>&gt; &gt; On 01/25/2016 03:09 PM, li.ping288@zte.com.cn

wrote:</font></tt>

<br><tt><font size=2>&gt; &gt; Hi Pranith, <br>

&gt; &gt; <br>

&gt; &gt; I'd be willing to have a chance to do my contribution to open-source.

<br>

&gt; &gt; It's my first time to deliver a patch for GlusterFS, hence I'm

not <br>

&gt; &gt; quite familiar with the code review and submitting procedures.

<br>

&gt; &gt; <br>

&gt; &gt; I'll try to make it ASAP. By the way is there any guidelines

to dothis work?</font></tt>

<br><tt><font size=2>&gt; &gt; </font></tt><a href=http://www.gluster.org/community/documentation/index.php/><tt><font size=2>http://www.gluster.org/community/documentation/index.php/</font></tt></a><tt><font size=2><br>

&gt; &gt; Simplified_dev_workflow may be helpful. Feel free to ask any

doubt <br>

&gt; &gt; you may have.<br>

&gt; &gt; <br>

&gt; &gt; How do you guys use glusterfs?<br>

&gt; &gt; <br>

&gt; &gt; Pranith<br>

</font></tt>

<br><tt><font size=2>&gt; Thanks for your warm tips. &nbsp;We currently

use glusterfs to build the <br>

&gt; shared storage for distributed cluster nodes.</font></tt>

<br><tt><font size=2>&gt; <br>

&gt; Here are the solutions I pondered over these days:</font></tt>

<br><tt><font size=2>&gt; <br>

&gt; 1，Reverting the AFR GLUSTERFS_WRITE_IS_APPEND modifications. &nbsp;<br>

&gt; because this optimization only play a part for appending write fops,

</font></tt>

<br><tt><font size=2>&gt; &nbsp; &nbsp; &nbsp;but most of the time of writing

it is not kind of this. Hence I<br>

&gt; think it is not worth to do an optimization for the low probability

<br>

&gt; situation &nbsp;</font></tt>

<br><tt><font size=2>&gt; &nbsp; &nbsp; &nbsp;at cost of the vast majority

of AFR writing performance drop. </font></tt>

<br><tt><font size=2>&gt; 2，Revising the fixed GLUSTERFS_WRITE_IS_APPEND

dictionary option in<br>

&gt; afr_writev in a dynamic way. &nbsp;i.e. adding a new dynamic configurable</font></tt>

<br><tt><font size=2>&gt; &nbsp; &nbsp; &nbsp;option &quot;write_is_append&quot;

just as the existing &quot;ensure-<br>

&gt; durability&quot; for AFR. &nbsp;It could be configured on if AFR writing

<br>

&gt; performance is not mainly </font></tt>

<br><tt><font size=2>&gt; &nbsp; &nbsp; &nbsp;concerned and off if the

performance is demanded.</font></tt>

<br><tt><font size=2>&gt; &nbsp; &nbsp; &nbsp;</font></tt>

<br><tt><font size=2>&gt; I have been trying to find out a way in posix_writev

to predict the <br>

&gt; appending write &nbsp;in advance and then lock/unlock or not lock

<br>

&gt; accordingly in the </font></tt>

<br><tt><font size=2>&gt; shortest and soonest, but I get no chance.</font></tt>

<br>

<br><tt><font size=2>3, &nbsp;Another compromising solution crossing my

mind today is to let the WRITE_IS_APPEND not </font></tt>

<br><tt><font size=2>take effect for O_DIRECT option. It is already &nbsp;ineffective

for SYNC writing, and also the performance for </font></tt>

<br><tt><font size=2>page cache writing is not so bad (not as good as no

locking of course).</font></tt>

<br>

<br><tt><font size=2>I would prefer the 2th and 3th way.</font></tt>

<br>

<br><tt><font size=2>Are there any other opinions?</font></tt>

<br>

<br><tt><font size=2>&gt; <br>

&gt; Anybody's other good ideas are appreciated.</font></tt>

<br><tt><font size=2>&gt; <br>

&gt; Ping.Li</font></tt>

<br><tt><font size=2>&gt; <br>

&gt; &gt; <br>

&gt; &gt; Thanks &amp; Best Regards. <br>

&gt; &gt; <br>

&gt; &gt; Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt; 写于 2016/01/23

14:01:36:<br>

&gt; &gt; <br>

&gt; &gt; &gt; From: Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;

<br>

&gt; &gt; &gt; To: li.ping288@zte.com.cn, gluster-devel@gluster.org, <br>

&gt; &gt; &gt; Cc: li.yi79@zte.com.cn, Liu.Jianjun3@zte.com.cn, <br>

&gt; &gt; &gt; zhou.shigang37@zte.com.cn, yang.bin18@zte.com.cn <br>

&gt; &gt; &gt; Date: 2016/01/23 14:02 <br>

&gt; &gt; &gt; Subject: Re: 答复: Re: [Gluster-devel] Gluster AFR volume

write <br>

&gt; &gt; &gt; performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND<br>

&gt; &gt; &gt; in afr_writev <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; <br>

&gt; &gt; <br>

&gt; &gt; &gt; On 01/22/2016 07:14 AM, li.ping288@zte.com.cn wrote: <br>

&gt; &gt; &gt; Hi Pranith, it is appreciated for your reply. <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt; 写于

2016/01/20 18:51:19:<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; 发件人: &nbsp;Pranith Kumar Karampuri &lt;pkarampu@redhat.com&gt;

<br>

&gt; &gt; &gt; &gt; 收件人: &nbsp;li.ping288@zte.com.cn, gluster-devel@gluster.org,

<br>

&gt; &gt; &gt; &gt; 日期: &nbsp;2016/01/20 18:51 <br>

&gt; &gt; &gt; &gt; 主题: Re: [Gluster-devel] Gluster AFR volume write

performance has <br>

&gt; &gt; &gt; &gt; been seriously affected by GLUSTERFS_WRITE_IS_APPEND

in afr_writev <br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; Sorry for the delay in response.<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; On 01/15/2016 02:34 PM, li.ping288@zte.com.cn wrote:

<br>

&gt; &gt; &gt; &gt; GLUSTERFS_WRITE_IS_APPEND Setting in afr_writev function

at <br>

&gt; &gt; &gt; &gt; glusterfs client end makes the posix_writev in the

server end &nbsp;deal <br>

&gt; &gt; &gt; &gt; IO write fops from parallel &nbsp;to serial in consequence.

<br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; i.e. &nbsp;multiple io-worker threads carrying out

IO write fops are <br>

&gt; &gt; &gt; &gt; blocked in posix_writev to execute final write fop

pwrite/pwritev in<br>

&gt; &gt; &gt; &gt; __posix_writev function ONE AFTER ANOTHER. <br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; For example: <br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; thread1: iot_worker -&gt; ... &nbsp;-&gt; posix_writev()

&nbsp; | <br>

&gt; &gt; &gt; &gt; thread2: iot_worker -&gt; ... &nbsp;-&gt; posix_writev()

&nbsp; | <br>

&gt; &gt; &gt; &gt; thread3: iot_worker -&gt; ... &nbsp;-&gt; posix_writev()

&nbsp; -&gt; __posix_writev() <br>

&gt; &gt; &gt; &gt; thread4: iot_worker -&gt; ... &nbsp;-&gt; posix_writev()

&nbsp; | <br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; there are 4 iot_worker thread doing the 128KB IO write

fops as <br>

&gt; &gt; &gt; &gt; above, but only one can execute __posix_writev function

and the <br>

&gt; &gt; &gt; &gt; others have to wait. <br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; however, if the afr volume is configured on with storage.linux-aio

<br>

&gt; &gt; &gt; &gt; which is off in default, &nbsp;the iot_worker will

use posix_aio_writev <br>

&gt; &gt; &gt; &gt; instead of posix_writev to write data. <br>

&gt; &gt; &gt; &gt; the posix_aio_writev function won't be affected by

<br>

&gt; &gt; &gt; &gt; GLUSTERFS_WRITE_IS_APPEND, and the AFR volume write

<br>

&gt; performance goes up. <br>

&gt; &gt; &gt; &gt; I think this is a bug :-(. <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; Yeah, I agree with you. I suppose the GLUSTERFS_WRITE_IS_APPEND

is a<br>

&gt; &gt; &gt; misuse in afr_writev. <br>

&gt; &gt; &gt; I checked the original intent of GLUSTERS_WRITE_IS_APPEND

change at <br>

&gt; &gt; &gt; review website: <br>

&gt; &gt; &gt; </font></tt><a href=http://review.gluster.org/#/c/5501/><tt><font size=2>http://review.gluster.org/#/c/5501/</font></tt></a><tt><font size=2>

<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; The initial purpose seems to avoid an unnecessary fsync()

in <br>

&gt; &gt; &gt; afr_changelog_post_op_safe function if the writing data

position <br>

&gt; &gt; &gt; was currently at the end of the file, detected by <br>

&gt; &gt; &gt; (preop.ia_size == offset || (fd-&gt;flags &amp; O_APPEND))

in posix_writev. <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; In comparison with the afr write performance loss, I think

<br>

&gt; &gt; &gt; it costs too much. <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; I suggest to make the GLUSTERS_WRITE_IS_APPEND setting configurable

<br>

&gt; &gt; &gt; just as ensure-durability in afr. <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; You are right, it doesn't make sense to put this option

in <br>

&gt; &gt; &gt; dictionary if ensure-durability is off. </font></tt><a href=http://review.gluster.org/13285><tt><font size=2>http://review.gluster.org/13285</font></tt></a><tt><font size=2><br>

&gt; &gt; &gt; addresses this. Do you want to try this out?<br>

&gt; &gt; &gt; Thanks for doing most of the work :-). Do let me know if

you want to<br>

&gt; &gt; &gt; raise a bug for this. Or I can take that up if you don't

have time.<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; Pranith <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; So, my question is whether &nbsp;AFR volume could work

fine with <br>

&gt; &gt; &gt; &gt; storage.linux-aio configuration which bypass the <br>

&gt; &gt; &gt; &gt; GLUSTERFS_WRITE_IS_APPEND setting in afr_writev, <br>

&gt; &gt; &gt; &gt; and why glusterfs keeps posix_aio_writev different

from posix_writev ? <br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; Any replies to clear my confusion would be grateful,

and thanks <br>

&gt; &gt; in advance.<br>

&gt; &gt; &gt; &gt; What is the workload you have? multiple writers on

same file workloads? <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; I test the afr gluster volume by fio like this: <br>

&gt; &gt; &gt; fio --filename=/mnt/afr/20G.dat --direct=1 --rw=write --bs=128k

--<br>

&gt; &gt; &gt; size=20G --numjobs=8 &nbsp; <br>

&gt; &gt; &gt; --runtime=60 --group_reporting --name=afr_test &nbsp;--iodepth=1

--<br>

&gt; &gt; ioengine=libaio<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; The Glusterfs BRICKS are two IBM X3550 M3. <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; The local disk direct write performance of 128KB IO req

block size <br>

&gt; &gt; &gt; is about 18MB/s <br>

&gt; &gt; &gt; in single thread and 80MB/s in 8 multi-threads. <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; If the GLUSTERS_WRITE_IS_APPEND is configed, the afr gluster

volume <br>

&gt; &gt; &gt; write performance is 18MB/s <br>

&gt; &gt; &gt; as the single thread, and if not, the performance is nearby

75MB/s.<br>

&gt; &gt; &gt; (network bandwith is enough) <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; Pranith <br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; --------------------------------------------------------<br>

&gt; &gt; &gt; &gt; ZTE Information Security Notice: The information contained

in this <br>

&gt; &gt; &gt; &gt; mail (and any attachment transmitted herewith) is privileged

and <br>

&gt; &gt; &gt; &gt; confidential and is intended for the exclusive use

of the addressee<br>

&gt; &gt; &gt; &gt; (s). &nbsp;If you are not an intended recipient, any

disclosure, <br>

&gt; &gt; &gt; &gt; reproduction, distribution or other dissemination or

use of the <br>

&gt; &gt; &gt; &gt; information contained is strictly prohibited. &nbsp;If

you have received <br>

&gt; &gt; &gt; &gt; this mail in error, please delete it and notify us

immediately.<br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; <br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; &gt; _______________________________________________<br>

&gt; &gt; &gt; &gt; Gluster-devel mailing list<br>

&gt; &gt; &gt; &gt; Gluster-devel@gluster.org<br>

&gt; &gt; &gt; &gt; </font></tt><a href="http://www.gluster.org/mailman/listinfo/gluster-devel"><tt><font size=2>http://www.gluster.org/mailman/listinfo/gluster-devel</font></tt></a><tt><font size=2>

<br>

&gt; &gt; &gt; <br>

&gt; &gt; &gt; --------------------------------------------------------<br>

&gt; &gt; &gt; ZTE Information Security Notice: The information contained

in this <br>

&gt; &gt; &gt; mail (and any attachment transmitted herewith) is privileged

and <br>

&gt; &gt; &gt; confidential and is intended for the exclusive use of the

addressee<br>

&gt; &gt; &gt; (s). &nbsp;If you are not an intended recipient, any disclosure,

<br>

&gt; &gt; &gt; reproduction, distribution or other dissemination or use

of the <br>

&gt; &gt; &gt; information contained is strictly prohibited. &nbsp;If you

have received <br>

&gt; &gt; &gt; this mail in error, please delete it and notify us immediately.<br>

&gt; &gt; &gt; </font></tt>