<br><br><div class="gmail_quote">On Tue Jan 13 2015 at 11:57:53 PM Mohammed Rafi K C <<a href="mailto:rkavunga@redhat.com">rkavunga@redhat.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
<br>
<div>On 01/14/2015 12:11 AM, Anand Avati
wrote:<br>
</div>
<blockquote type="cite">3) Why not have a separate iobuf pool for RDMA?<br>
</blockquote>
<br></div><div bgcolor="#FFFFFF" text="#000000">
Since every fops are using the default iobuf_pool, if we go with
another iobuf_pool dedicated to rdma, we need to copy that buffer
from default pool to rdma or so, unless we are intelligently
allocating the buffers based on the transport which we are going to
use. It is an extra level copying in the I/O path.</div></blockquote><div><br></div><div>Not sure what you mean by that. Every fop does not use default iobuf_pool. Only readv() and writev() do. If you really want to save on memory registration cost, your first target should be the header buffers (which is used in every fop, and currently valloc()ed and ibv_reg_mr() per call). Making headers use an iobuf pool where every arena is registered during arena creation and destruction will get you the highest overhead savings.</div><div><br></div><div>Coming to file data iobufs, today iobuf pools are used in a "mixed" way, i.e, they hold both data being actively transferred/under IO, and also data which is being held long term (cached by io-cache). io-cache just does an iobuf_ref() and holds on to the data. This avoids memory copies in io-cache layer. However that may be something we want to reconsider: io-cache could use its own iobuf pool into which data is copied into from the transfer iobuf (which is pre-registered with RDMA in bulk etc.)</div><div><br></div><div>Thanks</div><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
<br>
<blockquote type="cite"><br>
<div class="gmail_quote">On Tue Jan 13 2015 at 6:30:09 AM Mohammed
Rafi K C <<a href="mailto:rkavunga@redhat.com" target="_blank">rkavunga@redhat.com</a>>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi All,<br>
<br>
When using RDMA protocol, we need to register the buffer which
is going<br>
to send through rdma with rdma device. In fact, it is a costly<br>
operation, and a performance killer if it happened in I/O
path. So our<br>
current plan is to register pre-allocated iobuf_arenas from
iobuf_pool<br>
with rdma when rdma is getting initialized. The problem comes
when all<br>
the iobufs are exhausted, then we need to dynamically allocate
new<br>
arenas from libglusterfs module. Since it is created in
libglusterfs, we<br>
can't make a call to rdma from libglusterfs. So we will force
to<br>
register each of the iobufs from the newly created arenas with
rdma in<br>
I/O path. If io-cache is turned on in client stack, then all
the<br>
pre-registred arenas will use by io-cache as cache buffer. so
we have to<br>
do the registration in rdma for each i/o call for every
iobufs,<br>
eventually we cannot make use of pre registered arenas.<br>
<br>
To address the issue, we have two approaches in mind,<br>
<br>
1) Register each dynamically created buffers in iobuf by
bringing<br>
transport layer together with libglusterfs.<br>
<br>
2) create a separate buffer for caching and offload the data
from the<br>
read response to the cache buffer in background.<br>
<br>
If we could make use of preregister memory for every rdma
call, then we<br>
will have approximately 20% increment for write and 25% of
increment for<br>
read.<br>
<br>
Please give your thoughts to address the issue.<br>
<br>
Thanks & Regards<br>
Rafi KC<br>
_______________________________________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a><br>
<a href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a><br>
</blockquote>
</div>
</blockquote>
<br>
</div></blockquote></div>