[Gluster-devel] gfapi zero copy write enhancement

Shyam srangana at redhat.com
Thu Aug 25 11:06:33 UTC 2016


On 08/25/2016 02:46 AM, Saravanakumar Arumugam wrote:
> Hi,
>
> On 08/25/2016 12:58 AM, Shyam wrote:
>> Hi,
>>
>> I was attempting to review this [1] change, and for a long time I
>> wanted to understand why we need this and what is the manner in which
>> we need to achieve the same. As there is a lack of understanding on my
>> part I am starting with some questions.
>>
>> 1) In writev FOP what is the role/use of the iobref parameter?
>> - I do not see the posix xlator using this
>> - The payload is carried in vector, rather than in the iobref
>> - Across the code, other than protocol/client that (sorf of)
>> serializes this, I do not see it having any use
>>
>> So what am I missing?
>>
>> 2) Coming to the current change, what prevents us from doing this as [2]?
>> - in short just pass in the buffer received as a part of iovec
>>
>> [2] is not necessarily clean, just a hack, that assumes there is a
>> iovcount of 1 always, and I just tested this with a write call, and
>> not a writev call. (just stating before we start a code review of the
>> same ;) )
>>
>> 3) From discussions in [3] and [4] I understand that this is to
>> eliminate copy when working with RDMA. Further, Avati's response to
>> the thought, discusses why we need to leave the memory management of
>> read/write buffers to the applications than use/reuse gluster buffers.
>>
>> So, in the long term if this is for RDMA, what is the change in
>> justification for the manner in which we are asking applications to
>> use gluster buffers, than doing it the other way?
>>
>> 4) Why should applications not reuse buffers? and instead ask for
>> fresh/new buffers for each write call?
> Reason is: The buffer might be ref'ed in some translator like io-cache
> and write-behind.
>
> Discussion in patch:
> ------------------------------------------------------------------------------
>
> << IMPORTANT: Buffer should not be reused across the zero copy write
> operation. Is this still valid, given that application allocates and
> free the buffer ? =============================
> Yes this is still valid, if application tries to reuse the buffer then
> it might see a hang.
> The reason being, the buffer might be ref'ed in some translator like
> io-cache and write-behind.
> ------------------------------------------------------------------------------

Thank you, I followed this in the code and now understand that we could 
be stashing away the iov pointers for later use in write behind and 
hence the copy in the gfapi layer.

>
>
>>
>> 5) What is the performance gain noticed with this approach? As the
>> thread that (re)started this is [5]. In Commvault Simpana,
>> - What were the perf gains due to this approach?
>> - How does the application use the write buffers?
>>   - Meaning, the buffer that is received from Gluster is used to
>> populate data from the network? I am curious as to how this
>> application uses these buffers, and where does data get copied into
>> these buffers from.
>>
> (slightly offtopic to this question)
> Sometimes, Performance gained may not be in terms of read/write
> rates..but in terms of free CPU.
> Just to give an example: With copy, CPU occupancy is 70%
>                                          Without copy CPU occupancy is 40%

Agreed, and valid. I wanted to know what the gain was and if Sachin can 
add more color to this, the better.

May I understand how this is being tested? and if there is a program to 
do so, could you pass it along.

>
> But, Sachin can share the results.
>
>> Eliminating a copy in glfs_write seems trivial (if my hack and answers
>> to the first question are as per my assumptions), I am wondering what
>> we are attempting here, or what I am missing.
> From what I am understood,  there is a layer of separation between
> libgfapi and gluster.
>
> Gluster plays with the buffer with whatever way it likes(read different
> translators) and hence  allocation and freeing should happen from Gluster.
> Otherwise, if application needs to have control over buffer, there is a
> copy involved (at gluster layer).

Ok, let's target this part, which is where an alternative to the current 
approach exists.

So, why not use the application buffer till the (first) point where we 
decide to actually store the buffers (or buffer pointers as is the 
current mechanism) for later use. IOW, if we decide to fake a write, as 
in write behind, let's take a copy of the buffers then, rather than 
force applications to use gluster buffers. Wouldn't that be better.

Looking at FUSE and gNFS, in both cases we either read from the FUSE 
reader end, or RPCs from the network, so we already have to allocate and 
supply a buffer for reading the requests (in this case the write 
request) from these ends. This means we have buffers that we can stash 
away, by taking a ref in write behind or other places.

In the case of gfapi, as the application passes in a buffer, to adhere 
to the current mechanism we are creating a copy. Why not consider the 
buffers as non-gluster owned and take ownership (i.e copy) when needed 
and hence address the current problem?

>
> Thanks,
> Saravana
>


More information about the Gluster-devel mailing list