<div dir="ltr">So like many I probably thought I had done my research and understood what would happen when rebooting as brick/node only to find out I was wrong.<div><br></div><div>In my mind I saw I had a 1x3 replicate so I could rolling reboot and they&#39;d heal up.  However looking at logs of ovirt shortly after the rebooted brick came up all vm&#39;s started pausing/going unresponsive.  At the time I was puzzled and freaked out.  Next morning on my run I think I found the error in my logic and reading comprehension of my research.  Once the 3rd brick came up it had to heal and changes to all the VM&#39;s.  It is file based not block based healing so it saw multi-GB files that it had to recopy over.  It had to halt all write to those files while that occurred or it would be a never ending cycle of re-copying the large images.  So the fact most VM&#39;s went haywire isnt that odd.  It does look based on timing in alerts the 2 bricks that were up kept serving images until 3rd brick came back.  It did heal all images just fine.  </div><div><br></div><div>So knowing what I believe I now know you can&#39;t really do what I had hoped and just reboot one brick and have the VM&#39;s stay up all the time.  In order to achieve something like that I&#39;d need a 2nd set of bricks I could live storage migrate to.</div><div><br></div><div>Am I understanding correctly how that works?</div><div><br></div><div>I could also look at minimizing downtime by moving to sharding and that way the heal would only need to copy smaller files.  However I&#39;d still end up potentially with paused VM&#39;s unless those heals were pretty quick.  Probably safest to plan downtime of VM&#39;s or work out a storage migration plan if I had a real need for a high number of 9&#39;s uptime.</div></div>