vm: call after-shutdown cleanup also from vm.kill and vm.shutdown
Cleaning up after domain shutdown (domain-stopped and domain-shutdown events) relies on libvirt events which may be unreliable in some cases (events may be processed with some delay, of if libvirt was restarted in the meantime, may not happen at all). So, instead of ensuring only proper ordering between shutdown cleanup and next startup, also trigger the cleanup when we know for sure domain isn't running: - at vm.kill() - after libvirt confirms domain was destroyed - at vm.shutdown(wait=True) - after successful shutdown - at vm.remove_from_disk() - after ensuring it isn't running but just before actually removing it This fixes various race conditions: - qvm-kill && qvm-remove: remove could happen before shutdown cleanup was done and storage driver would be confused about that - qvm-shutdown --wait && qvm-clone: clone could happen before new content was commited to the original volume, making the copy of previous VM state (and probably more) Previously it wasn't such a big issue on default configuration, because LVM driver was fully synchronous, effectively blocking the whole qubesd for the time the cleanup happened. To avoid code duplication, factor out _ensure_shutdown_handled function calling actual cleanup (and possibly canceling one called with libvirt event). Note that now, "Duplicated stopped event from libvirt received!" warning may happen in normal circumstances, not only because of some bug. It is very important that post-shutdown cleanup happen when domain is not running. To ensure that, take startup_lock and under it 1) ensure its halted and only then 2) execute the cleanup. This isn't necessary when removing it from disk, because its already removed from the collection at that time, which also avoids other calls to it (see also "vm/dispvm: fix DispVM cleanup" commit). Actually, taking the startup_lock in remove_from_disk function would cause a deadlock in DispVM auto cleanup code: - vm.kill (or other trigger for the cleanup) - vm.startup_lock acquire <==== - vm._ensure_shutdown_handled - domain-shutdown event - vm._auto_cleanup (in DispVM class) - vm.remove_from_disk - cannot take vm.startup_lock again
This commit is contained in:
parent
5be003d539
commit
2c1629da04
@ -879,21 +879,10 @@ class QubesVM(qubes.vm.mix.net.NetVMMixin, qubes.vm.BaseVM):
|
||||
#
|
||||
|
||||
@asyncio.coroutine
|
||||
def start(self, start_guid=True, notify_function=None,
|
||||
mem_required=None):
|
||||
'''Start domain
|
||||
|
||||
:param bool start_guid: FIXME
|
||||
:param collections.Callable notify_function: FIXME
|
||||
:param int mem_required: FIXME
|
||||
def _ensure_shutdown_handled(self):
|
||||
'''Make sure previous shutdown is fully handled.
|
||||
MUST NOT be called when domain is running.
|
||||
'''
|
||||
|
||||
with (yield from self.startup_lock):
|
||||
# Intentionally not used is_running(): eliminate also "Paused",
|
||||
# "Crashed", "Halting"
|
||||
if self.get_power_state() != 'Halted':
|
||||
return self
|
||||
|
||||
with (yield from self._domain_stopped_lock):
|
||||
# Don't accept any new stopped event's till a new VM has been
|
||||
# created. If we didn't received any stopped event or it wasn't
|
||||
@ -918,6 +907,25 @@ class QubesVM(qubes.vm.mix.net.NetVMMixin, qubes.vm.BaseVM):
|
||||
yield from self.fire_event_async('domain-stopped')
|
||||
yield from self.fire_event_async('domain-shutdown')
|
||||
|
||||
|
||||
@asyncio.coroutine
|
||||
def start(self, start_guid=True, notify_function=None,
|
||||
mem_required=None):
|
||||
'''Start domain
|
||||
|
||||
:param bool start_guid: FIXME
|
||||
:param collections.Callable notify_function: FIXME
|
||||
:param int mem_required: FIXME
|
||||
'''
|
||||
|
||||
with (yield from self.startup_lock):
|
||||
# Intentionally not used is_running(): eliminate also "Paused",
|
||||
# "Crashed", "Halting"
|
||||
if self.get_power_state() != 'Halted':
|
||||
return self
|
||||
|
||||
yield from self._ensure_shutdown_handled()
|
||||
|
||||
self.log.info('Starting {}'.format(self.name))
|
||||
|
||||
try:
|
||||
@ -1030,8 +1038,8 @@ class QubesVM(qubes.vm.mix.net.NetVMMixin, qubes.vm.BaseVM):
|
||||
return
|
||||
|
||||
if self._domain_stopped_event_received:
|
||||
self.log.warning('Duplicated stopped event from libvirt received!')
|
||||
# ignore this unexpected event
|
||||
# ignore this event - already triggered by shutdown(), kill(),
|
||||
# or subsequent start()
|
||||
return
|
||||
|
||||
self._domain_stopped_event_received = True
|
||||
@ -1088,7 +1096,11 @@ class QubesVM(qubes.vm.mix.net.NetVMMixin, qubes.vm.BaseVM):
|
||||
while timeout > 0 and not self.is_halted():
|
||||
yield from asyncio.sleep(0.25)
|
||||
timeout -= 0.25
|
||||
if not self.is_halted():
|
||||
with (yield from self.startup_lock):
|
||||
if self.is_halted():
|
||||
# make sure all shutdown tasks are completed
|
||||
yield from self._ensure_shutdown_handled()
|
||||
else:
|
||||
raise qubes.exc.QubesVMShutdownTimeoutError(self)
|
||||
|
||||
return self
|
||||
@ -1104,6 +1116,7 @@ class QubesVM(qubes.vm.mix.net.NetVMMixin, qubes.vm.BaseVM):
|
||||
if not self.is_running() and not self.is_paused():
|
||||
raise qubes.exc.QubesVMNotStartedError(self)
|
||||
|
||||
with (yield from self.startup_lock):
|
||||
try:
|
||||
self.libvirt_domain.destroy()
|
||||
except libvirt.libvirtError as e:
|
||||
@ -1112,6 +1125,9 @@ class QubesVM(qubes.vm.mix.net.NetVMMixin, qubes.vm.BaseVM):
|
||||
else:
|
||||
raise
|
||||
|
||||
# make sure all shutdown tasks are completed
|
||||
yield from self._ensure_shutdown_handled()
|
||||
|
||||
return self
|
||||
|
||||
def force_shutdown(self, *args, **kwargs):
|
||||
@ -1477,6 +1493,12 @@ class QubesVM(qubes.vm.mix.net.NetVMMixin, qubes.vm.BaseVM):
|
||||
"Can't remove VM {!s}, beacuse it's in state {!r}.".format(
|
||||
self, self.get_power_state()))
|
||||
|
||||
# make sure shutdown is handled before removing anything, but only if
|
||||
# handling is pending; if not, we may be called from within
|
||||
# domain-shutdown event (DispVM._auto_cleanup), which would deadlock
|
||||
if not self._domain_stopped_event_handled:
|
||||
yield from self._ensure_shutdown_handled()
|
||||
|
||||
yield from self.fire_event_async('domain-remove-from-disk')
|
||||
try:
|
||||
# TODO: make it async?
|
||||
|
Loading…
Reference in New Issue
Block a user