<br><tt><font size=2>LiPing10168633/user/zte_ltd 写于 2016/01/28 21:40:30:<br>
<br>
> From: 李平10168633/user/zte_ltd</font></tt>
<br><tt><font size=2>> To: Pranith Kumar Karampuri <pkarampu@redhat.com>,
gluster-devel@gluster.org, </font></tt>
<br><tt><font size=2>> Cc: li.yi79@zte.com.cn, Liu.Jianjun3@zte.com.cn,
<br>
> yang.bin18@zte.com.cn, zhou.shigang37@zte.com.cn</font></tt>
<br><tt><font size=2>> Date: 2016/01/28 21:40</font></tt>
<br><tt><font size=2>> Subject: Re: 答复: Re: [Gluster-devel] Gluster
AFR volume write <br>
> performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND<br>
> in afr_writev</font></tt>
<br><tt><font size=2>> <br>
> Sorry for the late reply.</font></tt>
<br><tt><font size=2>> <br>
> Pranith Kumar Karampuri <pkarampu@redhat.com> 写于 2016/01/25
17:48:06:<br>
> <br>
> > From: Pranith Kumar Karampuri <pkarampu@redhat.com></font></tt>
<br><tt><font size=2>> > To: li.ping288@zte.com.cn, </font></tt>
<br><tt><font size=2>> > Cc: li.yi79@zte.com.cn, zhou.shigang37@zte.com.cn,
<br>
> > Liu.Jianjun3@zte.com.cn, yang.bin18@zte.com.cn</font></tt>
<br><tt><font size=2>> > Date: 2016/01/25 17:48</font></tt>
<br><tt><font size=2>> > Subject: Re: 答复: Re: [Gluster-devel]
Gluster AFR volume write <br>
> > performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND<br>
> > in afr_writev</font></tt>
<br><tt><font size=2>> > <br>
> > <br>
</font></tt>
<br><tt><font size=2>> > On 01/25/2016 03:09 PM, li.ping288@zte.com.cn
wrote:</font></tt>
<br><tt><font size=2>> > Hi Pranith, <br>
> > <br>
> > I'd be willing to have a chance to do my contribution to open-source.
<br>
> > It's my first time to deliver a patch for GlusterFS, hence I'm
not <br>
> > quite familiar with the code review and submitting procedures.
<br>
> > <br>
> > I'll try to make it ASAP. By the way is there any guidelines
to dothis work?</font></tt>
<br><tt><font size=2>> > </font></tt><a href=http://www.gluster.org/community/documentation/index.php/><tt><font size=2>http://www.gluster.org/community/documentation/index.php/</font></tt></a><tt><font size=2><br>
> > Simplified_dev_workflow may be helpful. Feel free to ask any
doubt <br>
> > you may have.<br>
> > <br>
> > How do you guys use glusterfs?<br>
> > <br>
> > Pranith<br>
</font></tt>
<br><tt><font size=2>> Thanks for your warm tips. We currently
use glusterfs to build the <br>
> shared storage for distributed cluster nodes.</font></tt>
<br><tt><font size=2>> <br>
> Here are the solutions I pondered over these days:</font></tt>
<br><tt><font size=2>> <br>
> 1,Reverting the AFR GLUSTERFS_WRITE_IS_APPEND modifications. <br>
> because this optimization only play a part for appending write fops,
</font></tt>
<br><tt><font size=2>> but most of the time of writing
it is not kind of this. Hence I<br>
> think it is not worth to do an optimization for the low probability
<br>
> situation </font></tt>
<br><tt><font size=2>> at cost of the vast majority
of AFR writing performance drop. </font></tt>
<br><tt><font size=2>> 2,Revising the fixed GLUSTERFS_WRITE_IS_APPEND
dictionary option in<br>
> afr_writev in a dynamic way. i.e. adding a new dynamic configurable</font></tt>
<br><tt><font size=2>> option "write_is_append"
just as the existing "ensure-<br>
> durability" for AFR. It could be configured on if AFR writing
<br>
> performance is not mainly </font></tt>
<br><tt><font size=2>> concerned and off if the
performance is demanded.</font></tt>
<br><tt><font size=2>> </font></tt>
<br><tt><font size=2>> I have been trying to find out a way in posix_writev
to predict the <br>
> appending write in advance and then lock/unlock or not lock
<br>
> accordingly in the </font></tt>
<br><tt><font size=2>> shortest and soonest, but I get no chance.</font></tt>
<br>
<br><tt><font size=2>3, Another compromising solution crossing my
mind today is to let the WRITE_IS_APPEND not </font></tt>
<br><tt><font size=2>take effect for O_DIRECT option. It is already ineffective
for SYNC writing, and also the performance for </font></tt>
<br><tt><font size=2>page cache writing is not so bad (not as good as no
locking of course).</font></tt>
<br>
<br><tt><font size=2>I would prefer the 2th and 3th way.</font></tt>
<br>
<br><tt><font size=2>Are there any other opinions?</font></tt>
<br>
<br><tt><font size=2>> <br>
> Anybody's other good ideas are appreciated.</font></tt>
<br><tt><font size=2>> <br>
> Ping.Li</font></tt>
<br><tt><font size=2>> <br>
> > <br>
> > Thanks & Best Regards. <br>
> > <br>
> > Pranith Kumar Karampuri <pkarampu@redhat.com> 写于 2016/01/23
14:01:36:<br>
> > <br>
> > > From: Pranith Kumar Karampuri <pkarampu@redhat.com>
<br>
> > > To: li.ping288@zte.com.cn, gluster-devel@gluster.org, <br>
> > > Cc: li.yi79@zte.com.cn, Liu.Jianjun3@zte.com.cn, <br>
> > > zhou.shigang37@zte.com.cn, yang.bin18@zte.com.cn <br>
> > > Date: 2016/01/23 14:02 <br>
> > > Subject: Re: 答复: Re: [Gluster-devel] Gluster AFR volume
write <br>
> > > performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND<br>
> > > in afr_writev <br>
> > > <br>
> > > <br>
> > <br>
> > > On 01/22/2016 07:14 AM, li.ping288@zte.com.cn wrote: <br>
> > > Hi Pranith, it is appreciated for your reply. <br>
> > > <br>
> > > Pranith Kumar Karampuri <pkarampu@redhat.com> 写于
2016/01/20 18:51:19:<br>
> > > <br>
> > > > 发件人: Pranith Kumar Karampuri <pkarampu@redhat.com>
<br>
> > > > 收件人: li.ping288@zte.com.cn, gluster-devel@gluster.org,
<br>
> > > > 日期: 2016/01/20 18:51 <br>
> > > > 主题: Re: [Gluster-devel] Gluster AFR volume write
performance has <br>
> > > > been seriously affected by GLUSTERFS_WRITE_IS_APPEND
in afr_writev <br>
> > > > <br>
> > > > Sorry for the delay in response.<br>
> > > <br>
> > > > On 01/15/2016 02:34 PM, li.ping288@zte.com.cn wrote:
<br>
> > > > GLUSTERFS_WRITE_IS_APPEND Setting in afr_writev function
at <br>
> > > > glusterfs client end makes the posix_writev in the
server end deal <br>
> > > > IO write fops from parallel to serial in consequence.
<br>
> > > > <br>
> > > > i.e. multiple io-worker threads carrying out
IO write fops are <br>
> > > > blocked in posix_writev to execute final write fop
pwrite/pwritev in<br>
> > > > __posix_writev function ONE AFTER ANOTHER. <br>
> > > > <br>
> > > > For example: <br>
> > > > <br>
> > > > thread1: iot_worker -> ... -> posix_writev()
| <br>
> > > > thread2: iot_worker -> ... -> posix_writev()
| <br>
> > > > thread3: iot_worker -> ... -> posix_writev()
-> __posix_writev() <br>
> > > > thread4: iot_worker -> ... -> posix_writev()
| <br>
> > > > <br>
> > > > there are 4 iot_worker thread doing the 128KB IO write
fops as <br>
> > > > above, but only one can execute __posix_writev function
and the <br>
> > > > others have to wait. <br>
> > > > <br>
> > > > however, if the afr volume is configured on with storage.linux-aio
<br>
> > > > which is off in default, the iot_worker will
use posix_aio_writev <br>
> > > > instead of posix_writev to write data. <br>
> > > > the posix_aio_writev function won't be affected by
<br>
> > > > GLUSTERFS_WRITE_IS_APPEND, and the AFR volume write
<br>
> performance goes up. <br>
> > > > I think this is a bug :-(. <br>
> > > <br>
> > > Yeah, I agree with you. I suppose the GLUSTERFS_WRITE_IS_APPEND
is a<br>
> > > misuse in afr_writev. <br>
> > > I checked the original intent of GLUSTERS_WRITE_IS_APPEND
change at <br>
> > > review website: <br>
> > > </font></tt><a href=http://review.gluster.org/#/c/5501/><tt><font size=2>http://review.gluster.org/#/c/5501/</font></tt></a><tt><font size=2>
<br>
> > > <br>
> > > The initial purpose seems to avoid an unnecessary fsync()
in <br>
> > > afr_changelog_post_op_safe function if the writing data
position <br>
> > > was currently at the end of the file, detected by <br>
> > > (preop.ia_size == offset || (fd->flags & O_APPEND))
in posix_writev. <br>
> > > <br>
> > > In comparison with the afr write performance loss, I think
<br>
> > > it costs too much. <br>
> > > <br>
> > > I suggest to make the GLUSTERS_WRITE_IS_APPEND setting configurable
<br>
> > > just as ensure-durability in afr. <br>
> > > <br>
> > > You are right, it doesn't make sense to put this option
in <br>
> > > dictionary if ensure-durability is off. </font></tt><a href=http://review.gluster.org/13285><tt><font size=2>http://review.gluster.org/13285</font></tt></a><tt><font size=2><br>
> > > addresses this. Do you want to try this out?<br>
> > > Thanks for doing most of the work :-). Do let me know if
you want to<br>
> > > raise a bug for this. Or I can take that up if you don't
have time.<br>
> > > <br>
> > > Pranith <br>
> > > <br>
> > > > <br>
> > > > So, my question is whether AFR volume could work
fine with <br>
> > > > storage.linux-aio configuration which bypass the <br>
> > > > GLUSTERFS_WRITE_IS_APPEND setting in afr_writev, <br>
> > > > and why glusterfs keeps posix_aio_writev different
from posix_writev ? <br>
> > > > <br>
> > > > Any replies to clear my confusion would be grateful,
and thanks <br>
> > in advance.<br>
> > > > What is the workload you have? multiple writers on
same file workloads? <br>
> > > <br>
> > > I test the afr gluster volume by fio like this: <br>
> > > fio --filename=/mnt/afr/20G.dat --direct=1 --rw=write --bs=128k
--<br>
> > > size=20G --numjobs=8 <br>
> > > --runtime=60 --group_reporting --name=afr_test --iodepth=1
--<br>
> > ioengine=libaio<br>
> > > <br>
> > > The Glusterfs BRICKS are two IBM X3550 M3. <br>
> > > <br>
> > > The local disk direct write performance of 128KB IO req
block size <br>
> > > is about 18MB/s <br>
> > > in single thread and 80MB/s in 8 multi-threads. <br>
> > > <br>
> > > If the GLUSTERS_WRITE_IS_APPEND is configed, the afr gluster
volume <br>
> > > write performance is 18MB/s <br>
> > > as the single thread, and if not, the performance is nearby
75MB/s.<br>
> > > (network bandwith is enough) <br>
> > > <br>
> > > > <br>
> > > > Pranith <br>
> > > > <br>
> > > > <br>
> > > > --------------------------------------------------------<br>
> > > > ZTE Information Security Notice: The information contained
in this <br>
> > > > mail (and any attachment transmitted herewith) is privileged
and <br>
> > > > confidential and is intended for the exclusive use
of the addressee<br>
> > > > (s). If you are not an intended recipient, any
disclosure, <br>
> > > > reproduction, distribution or other dissemination or
use of the <br>
> > > > information contained is strictly prohibited. If
you have received <br>
> > > > this mail in error, please delete it and notify us
immediately.<br>
> > > > <br>
> > > <br>
> > > > <br>
> > > > <br>
> > > <br>
> > > > _______________________________________________<br>
> > > > Gluster-devel mailing list<br>
> > > > Gluster-devel@gluster.org<br>
> > > > </font></tt><a href="http://www.gluster.org/mailman/listinfo/gluster-devel"><tt><font size=2>http://www.gluster.org/mailman/listinfo/gluster-devel</font></tt></a><tt><font size=2>
<br>
> > > <br>
> > > --------------------------------------------------------<br>
> > > ZTE Information Security Notice: The information contained
in this <br>
> > > mail (and any attachment transmitted herewith) is privileged
and <br>
> > > confidential and is intended for the exclusive use of the
addressee<br>
> > > (s). If you are not an intended recipient, any disclosure,
<br>
> > > reproduction, distribution or other dissemination or use
of the <br>
> > > information contained is strictly prohibited. If you
have received <br>
> > > this mail in error, please delete it and notify us immediately.<br>
> > > </font></tt>