[GEDI] [RFC v4 11/11] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint

David Hildenbrand david at redhat.com
Fri Sep 2 08:06:45 UTC 2022


On 30.08.22 22:16, Stefan Hajnoczi wrote:
> On Thu, Aug 25, 2022 at 09:43:16AM +0200, David Hildenbrand wrote:
>> On 23.08.22 21:22, Stefan Hajnoczi wrote:
>>> On Tue, Aug 23, 2022 at 10:01:59AM +0200, David Hildenbrand wrote:
>>>> On 23.08.22 00:24, Stefan Hajnoczi wrote:
>>>>> Register guest RAM using BlockRAMRegistrar and set the
>>>>> BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory
>>>>> accesses in I/O requests.
>>>>>
>>>>> This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely
>>>>> on DMA mapping/unmapping.
>>>>
>>>> Can you explain why we're monitoring RAMRegistrar to hook into "guest
>>>> RAM" and not go the usual path of the MemoryListener?
>>>
>>> The requirements are similar to VFIO, which uses RAMBlockNotifier. We
>>
>> Only VFIO NVME uses RAMBlockNotifier. Ordinary VFIO uses the MemoryListener.
>>
>> Maybe the difference is that ordinary VFIO has to replicate the actual
>> guest physical memory layout, and VFIO NVME is only interested in
>> possible guest RAM inside guest physical memory.
>>
>>> need to learn about all guest RAM because that's where I/O buffers are
>>> located.
>>>
>>> Do you think RAMBlockNotifier should be avoided?
>>
>> I assume it depends on the use case. For saying "this might be used for
>> I/O" it might be good enough I guess.
>>
>>>
>>>> What will BDRV_REQ_REGISTERED_BUF actually do? Pin all guest memory in
>>>> the worst case such as io_uring fixed buffers would do ( I hope not ).
>>>
>>> BLK_REQ_REGISTERED_BUF is a hint that no bounce buffer is necessary
>>> because the I/O buffer is located in memory that was previously
>>> registered with bdrv_registered_buf().
>>>
>>> The RAMBlockNotifier calls bdrv_register_buf() to let the libblkio
>>> driver know about RAM. Some libblkio drivers ignore this hint, io_uring
>>> may use the fixed buffers feature, vhost-user sends the shared memory
>>> file descriptors to the vhost device server, and VFIO/vhost may pin
>>> pages.
>>>
>>> So the blkio block driver doesn't add anything new, it's the union of
>>> VFIO/vhost/vhost-user/etc memory requirements.
>>
>> The issue is if that backend pins memory inside any of these regions.
>> Then, you're instantly incompatible to anything the relies on sparse
>> RAMBlocks, such as memory ballooning or virtio-mem, and have to properly
>> fence it.
>>
>> In that case, you'd have to successfully trigger
>> ram_block_discard_disable(true) first, before pinning. Who would do that
>> now conditionally, just like e.g., VFIO does?
>>
>> io_uring fixed buffers would be one such example that pins memory and is
>> problematic. vfio (unless on s390x) is another example, as you point out.
> 
> Okay, I think libblkio needs to expose a bool property called
> "mem-regions-pinned" so QEMU whether or not the registered buffers will
> be pinned.
> 
> Then the QEMU BlockDriver can do:
> 
>   if (mem_regions_pinned) {
>       if (ram_block_discard_disable(true) < 0) {
>           ...fail to open block device...
>       }
>   }
> 
> Does that sound right?

Yes, I think so.

> 
> Is "pinned" the best word to describe this or is there a more general
> characteristic we are looking for?

pinning should be the right term. We want to express that all user page
tables will immediately get populated and that a kernel subsystem will
take longterm references on mapped page that will go out of sync as soon
as we discard memory e.g., using madvise(MADV_DONTEED).

We just should not confuse it with memlock / locking into memory, which
are yet different semantics (e.g., don't swap it out).

> 
>>
>> This has to be treated with care. Another thing to consider is that
>> different backends might only support a limited number of such regions.
>> I assume there is a way for QEMU to query this limit upfront? It might
>> be required for memory hot(un)plug to figure out how many memory slots
>> we actually have (for ordinary DIMMs, and if we ever want to make this
>> compatible to virtio-mem, it might be required as well when the backend
>> pins memory).
> 
> Yes, libblkio reports the maximum number of blkio_mem_regions supported
> by the device. The property is called "max-mem-regions".
> 
> The QEMU BlockDriver currently doesn't use this information. Are there
> any QEMU APIs that should be called to propagate this value?

I assume we have to do exactly the same thing as e.g.,
vhost_has_free_slot()/kvm_has_free_slot() does.

Especially, hw/mem/memory-device.c needs care and
slots_limit/used_memslots handling in hw/virtio/vhost.c might be
relevant as well.


Note that I have some patches pending that extend that handling, by also
providing how many used+free slots there are, such as:

https://lore.kernel.org/all/20211027124531.57561-3-david@redhat.com/

-- 
Thanks,

David / dhildenb



More information about the integration mailing list