core-admin/qubes
Marek Marczykowski-Górecki 2c1629da04
vm: call after-shutdown cleanup also from vm.kill and vm.shutdown
Cleaning up after domain shutdown (domain-stopped and domain-shutdown
events) relies on libvirt events which may be unreliable in some cases
(events may be processed with some delay, of if libvirt was restarted in
the meantime, may not happen at all). So, instead of ensuring only
proper ordering between shutdown cleanup and next startup, also trigger
the cleanup when we know for sure domain isn't running:
 - at vm.kill() - after libvirt confirms domain was destroyed
 - at vm.shutdown(wait=True) - after successful shutdown
 - at vm.remove_from_disk() - after ensuring it isn't running but just
 before actually removing it

This fixes various race conditions:
 - qvm-kill && qvm-remove: remove could happen before shutdown cleanup
 was done and storage driver would be confused about that
 - qvm-shutdown --wait && qvm-clone: clone could happen before new content was
 commited to the original volume, making the copy of previous VM state
(and probably more)

Previously it wasn't such a big issue on default configuration, because
LVM driver was fully synchronous, effectively blocking the whole qubesd
for the time the cleanup happened.

To avoid code duplication, factor out _ensure_shutdown_handled function
calling actual cleanup (and possibly canceling one called with libvirt
event). Note that now, "Duplicated stopped event from libvirt received!"
warning may happen in normal circumstances, not only because of some
bug.

It is very important that post-shutdown cleanup happen when domain is
not running. To ensure that, take startup_lock and under it 1) ensure
its halted and only then 2) execute the cleanup. This isn't necessary
when removing it from disk, because its already removed from the
collection at that time, which also avoids other calls to it (see also
"vm/dispvm: fix DispVM cleanup" commit).
Actually, taking the startup_lock in remove_from_disk function would
cause a deadlock in DispVM auto cleanup code:
 - vm.kill (or other trigger for the cleanup)
   - vm.startup_lock acquire   <====
     - vm._ensure_shutdown_handled
       - domain-shutdown event
         - vm._auto_cleanup (in DispVM class)
           - vm.remove_from_disk
             - cannot take vm.startup_lock again
2018-10-26 23:54:08 +02:00
..
api storage: allow import_data and import_data_end be coroutines 2018-10-23 16:53:35 +02:00
ext ext/services: mechanism for advertising supported services 2018-10-23 16:47:39 +02:00
qmemman qmemman: fix early crash 2018-01-18 17:36:37 +01:00
storage storage: convert lvm driver to async version 2018-10-23 16:53:35 +02:00
tests vm/dispvm: fix DispVM cleanup 2018-10-26 23:54:08 +02:00
tools app: uncouple pool setup from loading initial configuration 2018-09-11 23:50:25 +00:00
vm vm: call after-shutdown cleanup also from vm.kill and vm.shutdown 2018-10-26 23:54:08 +02:00
__init__.py Fix issues found by pylint 2.0 2018-07-15 23:51:15 +02:00
app.py vm: add shutdown_timeout property, make vm.shutdown(wait=True) use it 2018-10-26 23:54:04 +02:00
backup.py Fix issues found by pylint 2.0 2018-07-15 23:51:15 +02:00
config.py app: create /var/lib/qubes as file-reflink if supported 2018-09-11 23:50:26 +00:00
core2migration.py Make pylint happy 2017-12-21 18:19:10 +01:00
devices.py Update documentation for device-attach event 2018-09-19 05:44:02 +02:00
dochelpers.py Fix issues found by pylint 2.0 2018-07-15 23:51:15 +02:00
events.py Fix issues found by pylint 2.0 2018-07-15 23:51:15 +02:00
exc.py vm: add shutdown_timeout property, make vm.shutdown(wait=True) use it 2018-10-26 23:54:04 +02:00
firewall.py Fix issues found by pylint 2.0 2018-07-15 23:51:15 +02:00
log.py Change license to LGPL v2.1+ 2017-10-12 00:11:50 +02:00
rngdoc.py Fix issues found by pylint 2.0 2018-07-15 23:51:15 +02:00
tarwriter.py Change license to LGPL v2.1+ 2017-10-12 00:11:50 +02:00
utils.py Fix issues found by pylint 2.0 2018-07-15 23:51:15 +02:00