If domain is set to autostart, qubes-vm@ systemd service is used to
start it at boot. Cleanup the service when domain is removed, and
similarly enable the service when domain is created and already have
autostart=True.
FixesQubesOS/qubes-issues#4014
Cleaning up after domain shutdown (domain-stopped and domain-shutdown
events) relies on libvirt events which may be unreliable in some cases
(events may be processed with some delay, of if libvirt was restarted in
the meantime, may not happen at all). So, instead of ensuring only
proper ordering between shutdown cleanup and next startup, also trigger
the cleanup when we know for sure domain isn't running:
- at vm.kill() - after libvirt confirms domain was destroyed
- at vm.shutdown(wait=True) - after successful shutdown
- at vm.remove_from_disk() - after ensuring it isn't running but just
before actually removing it
This fixes various race conditions:
- qvm-kill && qvm-remove: remove could happen before shutdown cleanup
was done and storage driver would be confused about that
- qvm-shutdown --wait && qvm-clone: clone could happen before new content was
commited to the original volume, making the copy of previous VM state
(and probably more)
Previously it wasn't such a big issue on default configuration, because
LVM driver was fully synchronous, effectively blocking the whole qubesd
for the time the cleanup happened.
To avoid code duplication, factor out _ensure_shutdown_handled function
calling actual cleanup (and possibly canceling one called with libvirt
event). Note that now, "Duplicated stopped event from libvirt received!"
warning may happen in normal circumstances, not only because of some
bug.
It is very important that post-shutdown cleanup happen when domain is
not running. To ensure that, take startup_lock and under it 1) ensure
its halted and only then 2) execute the cleanup. This isn't necessary
when removing it from disk, because its already removed from the
collection at that time, which also avoids other calls to it (see also
"vm/dispvm: fix DispVM cleanup" commit).
Actually, taking the startup_lock in remove_from_disk function would
cause a deadlock in DispVM auto cleanup code:
- vm.kill (or other trigger for the cleanup)
- vm.startup_lock acquire <====
- vm._ensure_shutdown_handled
- domain-shutdown event
- vm._auto_cleanup (in DispVM class)
- vm.remove_from_disk
- cannot take vm.startup_lock again
vm.shutdown(wait=True) waited indefinitely for the shutdown, which makes
useless without some boilerplate handling the timeout. Since the timeout
may depend on the operating system inside, add a per-VM property for it,
with value inheritance from template and then from global
default_shutdown_timeout property.
When timeout is reached, the method raises exception - whether to kill
it or not is left to the caller.
FixesQubesOS/qubes-issues#1696
Make it clear that volume creation fails because it needs to be in the
same pool as its parent. This message is shown in context of `qvm-create
-p root=MyPool` for example and the previous message didn't make sense
at all.
FixesQubesOS/qubes-issues#3438
When a VM (or its template) does not explicitly set a qrexec_timeout,
fall back to a global default_qrexec_timeout (with default value 60),
instead of hardcoding the fallback value to 60.
This makes it easy to set a higher timeout for the whole system, which
helps users who habitually launch applications from several (not yet
started) VMs at the same time. 60 seconds can be too short for that.
* qubesos/pr/228:
storage/lvm: filter out warning about intended over-provisioning
tests: fix getting kernel package version inside VM
tests/extra: add start_guid option to VMWrapper
vm/qubesvm: fire 'domain-start-failed' event even if fail was early
vm/qubesvm: check if all required devices are available before start
storage/lvm: fix reporting lvm command error
storage/lvm: save pool's revision_to_keep property
Needed for event-driven domains-tray UI updating and anti-GUI-DoS
usability improvements.
Catches errors from event handlers to protect libvirt, and logs to
main qubesd logger singleton (by default meaning systemd journal).
Multiple properties are related to system installed inside the VM, so it
makes sense to have them the same for all the VMs based on the same
template. Modify default value getter to first try get the value from a
template (if any) and only if it fails, fallback to original default
value.
This change is made to those properties:
- default_user (it was already this way)
- kernel
- kernelopts
- maxmem
- memory
- qrexec_timeout
- vcpus
- virt_mode
This is especially useful for manually installed templates (like
Windows).
Related to QubesOS/qubes-issues#3585
Resolve:
- no-else-return
- useless-object-inheritance
- useless-return
- consider-using-set-comprehension
- consider-using-in
- logging-not-lazy
Ignore:
- not-an-iterable - false possitives for asyncio coroutines
Ignore all the above in qubespolicy/__init__.py, as the file will be
moved to separate repository (core-qrexec) - it already has a copy
there, don't desynchronize them.
Fire 'domain-start-failed' even even if failure occurred during
'domain-pre-start' event. This will make sure if _anyone_ have seen
'domain-pre-start' event, will also see 'domain-start-failed'. In some
cases it will look like spurious 'domain-start-failed', but it is safer
option than the alternative.
Fail the VM start early if some persistently-assigned device is missing.
This will both save time and provide clearer error message.
FixesQubesOS/qubes-issues#3810
If domain die when trying to kill it, qubesd may loose a race and try to
kill it anyway. Handle libvirt exception in that case and conver it to
QubesVMNotStartedError - as it would be if qubesd would win the race.
FixesQubesOS/qubes-issues#3755
Some handlers may want to call into other VMs (or even the one asking),
but vm.run() functions are coroutines, so needs to be called from
another coroutine. Allow for that.
Also fix typo in documentation.
"Cannot execute qrexec-daemon!" error is very misleading for a startup
timeout error, make it clearer. This rely on qrexec-daemon using
distinct exit code for timeout error, but even without that, include its
stderr in the error message.
Lots of code expects the VM to be Halted after receiving one of these
events, but it could also be Dying or Crashed. Get rid of the Dying case
at least, by waiting until the VM has transitioned out of it.
Fixes e.g. the following DispVM cleanup bug:
$ qvm-create -C DispVM --prop auto_cleanup=True -l red dispvm
$ qvm-start dispvm
$ qvm-shutdown --wait dispvm # this won't remove dispvm
$ qvm-start dispvm
$ qvm-kill dispvm # but this will
1. Make sure VMs are started after dom0 actual memory usage is reported
to qmemman, otherwise dom0 will hold 4GB, even if just a little over 1GB
is needed at that time.
2. Request only vm.memory MB from qmemman, instead of vm.maxmem. While
HVM with PCI devices indeed do not support populate-on-demand, this is
already handled in libvirt XML.
The later may often cause VM startup fail on systems with 8GB of memory,
because maxmem is 4GB there and with dom0 keeping the other 4GB (see
point 1) there is not enough memory to start any sych VM.
FixesQubesOS/qubes-issues#3462
* qubesos/pr/187:
Don't fail create/clone if /var/lib/qubes/TYPE/NAME/ exists
Make 'qvm-volume revert' really use the latest revision
Fix wrong mocks of Volume.revisions
* qubesos/pr/185:
vm: remove doc for non-existing event `monitor-layout-change`
vm: include tag/feature name in event name
events: add support for wildcard event handlers
- use capital letters in acronyms in documentation to match upstream
documentation.
- refuse to start a PVH with without kernel set - provide meaningful
error message
Human readable format `str(datetime.datetime)` is a nightmare for Admin
API level communication. Especially setting the property in a format
that it was read was not supported, and handling such format in
untrusted input handling code is a bad idea. Revert to a simple intiger
format.
Rename events:
- domain-feature-set -> domain-feature-set:feature
- domain-feature-delete -> domain-feature-delete:feature
- domain-tag-add -> domain-tag-add:tag
- domain-tag-delete -> domain-tag-delete:tag
Make it consistent with property-* events. It makes more sense to
include tag/feature name in event name, so handler can watch a single
tag/feature - which is the most common case. Otherwise, most handlers
would begin with `if feature == '...'` anyway, wasting time on most
events.
In cases where multiple features/tags should be handled by a single
handler, it is now possible to register a handler with wildcard, for
example `domain-feature-set:*`.
Add property for IPv6 address ('ip6'). Build default value similarly to
IPv4 - common prefix + QID or Disp ID (for DispVMs).
This all is disabled unless 'ipv6' feature is enabled. It is inherited
from netvm (not template).
Even when enabled, VM may decide to not use it - or simply not support
it.
QubesOS/qubes-issues#718
If HVM have PCI device, it can't use PoD, so need 'maxmem' memory to be
started. Request that much from qmemman.
Note that is is somehow independent of enabling or not dynamic memory
management for the VM (`service.meminfo-writer` feature). Even if VM
initially had assigned maxmem memory, it can be later ballooned down.
QubesOS/qubes-issues#3207
* 20171107-tests-backup-api-misc:
test: make race condition on xterm close less likely
tests/backupcompatibility: fix handling 'internal' property
backup: fix handling target write error (like no disk space)
tests/backupcompatibility: drop R1 format tests
backup: use offline_mode for backup collection
qubespolicy: fix handling '$adminvm' target with ask action
app: drop reference to libvirt object after undefining it
vm: always log startup fail
api: do not log handled errors sent to a client
tests/backups: convert to new restore handling - using qubesadmin module
app: clarify error message on failed domain remove (used somewhere)
Fix qubes-core.service ordering
The previous version did not ensure that the stopped/shutdown event was
handled before a new VM start. This can easily lead to problems like in
QubesOS/qubes-issues#3164.
This improved version now ensures that the stopped/shutdown events are
handled before a new VM start.
Additionally this version should be more robust against unreliable
events from libvirt. It handles missing, duplicated and delayed stopped
events.
Instead of one 'domain-shutdown' event there are now 'domain-stopped'
and 'domain-shutdown'. The later is generated after the former. This way
it's easy to run code after the VM is shutdown including the stop of
it's storage.
vm.create_qdb_entries can be called multiple times - for example when
changing VM IP. Move starting qdb watcher to start(). And just in case,
cleanup old watcher (if still exists) before starting new one.
This fixes one FD leak.
Catch exception there and log it. Otherwise asyncio complains about not
retrieved exception. There is no one else to handle this exception,
because shutdown event is triggered from libvirt, not any Admin API.
If VM startup failed before starting anything (even in paused state),
there will be no further event, not even domain-shutdown. This makes it
hard for event-listening applications (like domains tray) to account
domain state. Fix this by emiting domain-start-failed event in every
case of failed startup after emiting domain-pre-start.
Related QubesOS/qubes-issues#3100
* services:
tests: check clockvm-related handlers
doc: include list of extensions
qubesvm: fix docstring
ext/services: move exporting 'service.*' features to extensions
app: update handling features/service os ClockVM
* tests-storage:
tests: register libvirt events
tests: even more agressive cleanup in tearDown
app: do not wrap libvirt_conn.close() in auto-reconnect wrapper
api: keep track of established connections
tests: drop VM cleanup from tearDownClass, fix asyncio usage in tearDown
storage: fix Storage.clone and Storage.clone_volume
tests: more tests fixes
firewall: raise ValueError on invalid hostname in dsthost=
qmemman: don't load qubes.xml
tests: fix AdminVM test
tests: create temporary files in /tmp
tests: remove renaming test - it isn't supported anymore
tests: various fixes for storage tests
tests: fix removing LVM volumes
tests: fix asyncio usage in some tests
tests: minor fixes to api/admin tests
storage/file: create -cow.img only when needed
storage: move volume_config['source'] filling to one place
app: do not create 'default' storage pool
app: add missing setters for default_pool* global properties
Don't set 'source' volume in various places (each VM class constructor
etc), do it as part of volume initialization. And when it needs to be
re-calculated, call storage.init_volume again.
This code was duplicated, and as usual in such a case, those copies
were different - one have set 'size', the other one not.
QubesOS/qubes-issues#2256
* tests-fixes-1:
api: extract function to make pylint happy
tests/vm: simplify AppVM storage test
storage: do not use deepcopy on volume configs
api: cleanup already started servers when some later failed
tests: fix block devices tests when running on real system
tests: fix some FD leaks
This code is unused now. Theoretically this is_outdated implementation
should be moved to FileVolume, but since we don't have VM reference
there, it isn't possible to read appropriate xenstore entry. As we're
phasing out file pool, simply ignore it for now.
QubesOS/qubes-issues#2256
Those functions are coroutines anyway, so allow event handlers to be
too.
Some of this (`domain-create-on-disk`, `domain-remove-from-disk`) will
be useful for appmenus handling.
Always define those properties, always include them in volume config.
Also simplify overriding pool based on volume type defined by those:
override pool unless snap_on_start=True.
QubesOS/qubes-issues#2256
We've decided to make VM name immutable. This is especially important
for Admin API, where some parts (especially policy) are sticked to the
VM name.
Now, to rename the VM, one need to clone it under new name (thanks to
LVM, this is very quick action), then remove the old one.
FixesQubesOS/qubes-issues#2868
* qubesos/pr/111:
vm: drop 'internal' property
qmemman: make sure to release lock
qmemman: fix meminfo parsing for python 3
devices: drop 'data' and 'frontend_domain' fields, rename 'devclass' to 'bus'
* qubesos/pr/110:
storage: use direct object references, not only identifiers
vm: fix volume_config
storage/lvm: prefix VM LVM volumes with 'vm-'
storage: fix VM rename
Reference objects, not their IDs - this way when object is modified, it
is visible everywhere where it is used. Main changes:
- volume.pool - Pool object
- volume.source - Volume object
Since volume have Pool object reference now, move volume related
functions into Volume class (from Pool class). This avoids horrible
`storage.get_pool(volume).something(volume)` construct.
One issue here is since volume.source reference a Volume object from a
different VM - VM's template, now VM load order is important. Since we
don't have control over it, initialize vm.storage when needed - possibly
while initializing storage of different VM. Since we don't have cycles
in AppVM-TemplateVM dependencies, it is safe.
Also, since this commit, volume.source (if defined) always points at
volume of the same name from VM's template. Using volumes with something
else as a source is no longer supported.
QubesOS/qubes-issues#2256