Similar to domain-start-failed, add an event fired when
domain-pre-shutdown was fired but the actual operation failed.
Note it might not catch all the cases, as shutdown() may be called with
wait=False, which means it won't wait fot the actual shutdown. In that
case, timeout won't result in domain-shutdown-failed event.
QubesOS/qubes-issues#5380
Check early (but after grabbing a startup_lock) if VM isn't just
removed. This could happen if someone grabs its reference from other
places (netvm of something else?) or just before removing it.
This commit makes the simple removal from the collection (done as the
first step in admin.vm.Remove implementation) efficient way to block
further VM startups, without introducing extra properties.
For this to be effective, removing from the collection, needs to happen
with the startup_lock held. Modify admin.vm.Remove accordingly.
Calling qrexec service dom0->dom0 can be useful when handling things
that can run in dom0 or other domain. This makes the interface uniform.
Example use cases include GUI VM and Audio VM.
The initializer of the class DispVM first calls the initializer of the
QubesVM class, which among other things sets properties as specified in
kwargs, and then copies over the properties of the template. This can
lead to properties passed explicitly by the caller through kwargs being
overwritten.
Hence only clone properties of the template that are still set to
default in the DispVM.
FixesQubesOS/qubes-issues#4556
Setting template_for_dispvms=False will at least prevent starting
(already existing) DispVMs based on it. Those should be first removed.
Add also tests for this case.
If kernel package ships default-kernelopts-common.txt file, use that
instead of hardcoded Linux-specific options.
For Linux kernel it may include xen_scrub_pages=0 option, but only if
initrd shipped with this kernel re-enable this option later.
QubesOS/qubes-issues#4839QubesOS/qubes-issues#4736
First of all, do not try to call those services in VMs not having qrexec
installed - for example Windows VMs without qubes tools.
Then, even if service call fails for any other reason, only log it but
do not prevent other services from being called. A single uncooperative
VM should generally be able only to hurt itself, not break other VMs
during suspend.
FixesQubesOS/qubes-issues#3489
Since we have more reliable domain-shutdown event delivery (it si
guaranteed to be delivered before subsequent domain start, even if
libvirt fails to report it), it's better to move detach_network call to
domain-shutdown handler. This way, frontend domain will see immediately
that the backend is gone. Technically it already know that, but at least
Linux do not propagate that anywhere, keeping the interface up,
seemingly operational, leading to various timeouts.
Additionally, by avoiding attach_network call _just_ after
detach_network call, it avoids various race conditions (like calling
cleanup scripts after new device got already connected).
While libvirt itself still doesn't cleanup devices when the backend
domain is gone, this will emulate it within qubesd.
FixesQubesOS/qubes-issues#3642FixesQubesOS/qubes-issues#1426
If default-kernelopts-pci.txt is present, it will override default
built-in kernelopts for the VMs with PCI device assigned.
Similarly if default-kernelopts-nopci.txt is present, it will override
default kernelopts for VMs without PCI devices.
For template-based VMs, kernelopts of the template takes precedence over
default-kernelopts-nopci.txt but not default-kernelopts-pci.txt.
FixesQubesOS/qubes-issues#4839
If a specific DVM template is used for given DispVM, make new DispVMs
called from it use the same DVM template (unless explicitly overridden).
This prevent various isolation bypass cases, like using a chain of
DispVMs to access network.
Instead of checking if domain is still running/paused, try to kill it
anyway and ignore appropriate exception. Otherwise domain could die
before the check and killing.
- Two new methods: .features.check_with_adminvm() and
.check_with_template_and_adminvm(). Common code refactored.
- Two new AdminAPI calls to take advantage of the methods:
- admin.vm.feature.CheckWithAdminVM
- admin.vm.feature.CheckWithTemplateAndAdminVM
- Features manager moved to separate module in anticipation of features
on app object in R5.0. The attribute Features.vm renamed to
Features.subject.
- Documentation, tests.
* devel-20181205:
vm/dispvm: fix /qubes-vm-presistence qubesdb entry
vm/mix/net: prevent setting provides_network=false if qube is still used
tests: updates-available notification
tests/network: reduce code duplication
tests: listen on 'misc' socket too
The new property is meant for management stack (Salt) to set which DVM
template should be used to maintain given VM. Since the DispVM based on
it will be given ultimate control over target VM (qubes.VMShell
service), it should be trusted. The one pointed to by default_dispvm
not necessary is one.
The property defaults to the value from the template (if any), and then
to a global management_dispvm property. By default it is set to None.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Use maxmem=0 for disabling dynamic memory balance, instead of cryptic
service.meminfo-writer feature. Under the hood, meminfo-writer service
is also set based on maxmem property (directly in qubesdb, not
vm.features dict).
Having this as a property (not "feature"), allow to have sensible
handling of default value. Specifically, disable it automatically if
otherwise it would crash a VM. This is the case for:
- domain with PCI devices (PoD is not supported by Xen then)
- domain without balloon driver and/or meminfo-writer service
The check for the latter is heuristic (assume presence of 'qrexec' also
can indicate balloon driver support), but it is true for currently
supported systems.
This also allows more reliable control of libvirt config: do not set
memory != maxmem, unless qmemman is enabled.
memory != maxmem only makes sense if qmemman for given domain is
enabled. Besides wasting some domain resources for extra page tables
etc, for HVM domains this is harmful, because maxmem-memory difference
is made of Popupate-on-Demand pool, which - when depleted - will kill
the domain. This means domain without balloon driver will die as soon
as will try to use more than initial memory - but without balloon driver
it sees maxmem memory and doesn't know about the lower limit.
FixesQubesOS/qubes-issues#4135
It makes a lot of sense to call long-running operations in that event
handler, including calling back into the VM. Allow that by using
fire_event_async, not just fire_event.
Also, document the event.
vm.kill() will try to get vm.startup_lock, so it can't be called while
holding it already.
Fix this by extracting vm._kill_locked(), which expect the lock to be
already taken by the caller.
If domain is set to autostart, qubes-vm@ systemd service is used to
start it at boot. Cleanup the service when domain is removed, and
similarly enable the service when domain is created and already have
autostart=True.
FixesQubesOS/qubes-issues#4014
Cleaning up after domain shutdown (domain-stopped and domain-shutdown
events) relies on libvirt events which may be unreliable in some cases
(events may be processed with some delay, of if libvirt was restarted in
the meantime, may not happen at all). So, instead of ensuring only
proper ordering between shutdown cleanup and next startup, also trigger
the cleanup when we know for sure domain isn't running:
- at vm.kill() - after libvirt confirms domain was destroyed
- at vm.shutdown(wait=True) - after successful shutdown
- at vm.remove_from_disk() - after ensuring it isn't running but just
before actually removing it
This fixes various race conditions:
- qvm-kill && qvm-remove: remove could happen before shutdown cleanup
was done and storage driver would be confused about that
- qvm-shutdown --wait && qvm-clone: clone could happen before new content was
commited to the original volume, making the copy of previous VM state
(and probably more)
Previously it wasn't such a big issue on default configuration, because
LVM driver was fully synchronous, effectively blocking the whole qubesd
for the time the cleanup happened.
To avoid code duplication, factor out _ensure_shutdown_handled function
calling actual cleanup (and possibly canceling one called with libvirt
event). Note that now, "Duplicated stopped event from libvirt received!"
warning may happen in normal circumstances, not only because of some
bug.
It is very important that post-shutdown cleanup happen when domain is
not running. To ensure that, take startup_lock and under it 1) ensure
its halted and only then 2) execute the cleanup. This isn't necessary
when removing it from disk, because its already removed from the
collection at that time, which also avoids other calls to it (see also
"vm/dispvm: fix DispVM cleanup" commit).
Actually, taking the startup_lock in remove_from_disk function would
cause a deadlock in DispVM auto cleanup code:
- vm.kill (or other trigger for the cleanup)
- vm.startup_lock acquire <====
- vm._ensure_shutdown_handled
- domain-shutdown event
- vm._auto_cleanup (in DispVM class)
- vm.remove_from_disk
- cannot take vm.startup_lock again
First unregister the domain from collection, and only then call
remove_from_disk(). Removing it from collection prevent further calls
being made to it. Or if anything else keep a reference to it (for
example as a netvm), then abort the operation.
Additionally this makes it unnecessary to take startup lock when
cleaning it up in tests.
vm.shutdown(wait=True) waited indefinitely for the shutdown, which makes
useless without some boilerplate handling the timeout. Since the timeout
may depend on the operating system inside, add a per-VM property for it,
with value inheritance from template and then from global
default_shutdown_timeout property.
When timeout is reached, the method raises exception - whether to kill
it or not is left to the caller.
FixesQubesOS/qubes-issues#1696
Make it clear that volume creation fails because it needs to be in the
same pool as its parent. This message is shown in context of `qvm-create
-p root=MyPool` for example and the previous message didn't make sense
at all.
FixesQubesOS/qubes-issues#3438
When a VM (or its template) does not explicitly set a qrexec_timeout,
fall back to a global default_qrexec_timeout (with default value 60),
instead of hardcoding the fallback value to 60.
This makes it easy to set a higher timeout for the whole system, which
helps users who habitually launch applications from several (not yet
started) VMs at the same time. 60 seconds can be too short for that.
* qubesos/pr/228:
storage/lvm: filter out warning about intended over-provisioning
tests: fix getting kernel package version inside VM
tests/extra: add start_guid option to VMWrapper
vm/qubesvm: fire 'domain-start-failed' event even if fail was early
vm/qubesvm: check if all required devices are available before start
storage/lvm: fix reporting lvm command error
storage/lvm: save pool's revision_to_keep property
Needed for event-driven domains-tray UI updating and anti-GUI-DoS
usability improvements.
Catches errors from event handlers to protect libvirt, and logs to
main qubesd logger singleton (by default meaning systemd journal).
Multiple properties are related to system installed inside the VM, so it
makes sense to have them the same for all the VMs based on the same
template. Modify default value getter to first try get the value from a
template (if any) and only if it fails, fallback to original default
value.
This change is made to those properties:
- default_user (it was already this way)
- kernel
- kernelopts
- maxmem
- memory
- qrexec_timeout
- vcpus
- virt_mode
This is especially useful for manually installed templates (like
Windows).
Related to QubesOS/qubes-issues#3585
Resolve:
- no-else-return
- useless-object-inheritance
- useless-return
- consider-using-set-comprehension
- consider-using-in
- logging-not-lazy
Ignore:
- not-an-iterable - false possitives for asyncio coroutines
Ignore all the above in qubespolicy/__init__.py, as the file will be
moved to separate repository (core-qrexec) - it already has a copy
there, don't desynchronize them.
Fire 'domain-start-failed' even even if failure occurred during
'domain-pre-start' event. This will make sure if _anyone_ have seen
'domain-pre-start' event, will also see 'domain-start-failed'. In some
cases it will look like spurious 'domain-start-failed', but it is safer
option than the alternative.
Fail the VM start early if some persistently-assigned device is missing.
This will both save time and provide clearer error message.
FixesQubesOS/qubes-issues#3810
Use VM's actual IP address as a gateway for other VMs, instead of
hardcoded link-local address. This is important for sys-net generated
ICMP diagnostics packets - those must _not_ have link-local source
address, otherwise wouldn't be properly forwarded back to the right VM.
If domain die when trying to kill it, qubesd may loose a race and try to
kill it anyway. Handle libvirt exception in that case and conver it to
QubesVMNotStartedError - as it would be if qubesd would win the race.
FixesQubesOS/qubes-issues#3755
Some handlers may want to call into other VMs (or even the one asking),
but vm.run() functions are coroutines, so needs to be called from
another coroutine. Allow for that.
Also fix typo in documentation.
"Cannot execute qrexec-daemon!" error is very misleading for a startup
timeout error, make it clearer. This rely on qrexec-daemon using
distinct exit code for timeout error, but even without that, include its
stderr in the error message.
* qubesos/pr/198:
backup.py: add vmN/empty file if no other files to backup
Allow include=None to be passed to admin.backup.Info
Add include_in_backups property for AdminVM
Use !auto_cleanup as DispVM include_in_backups default
Lots of code expects the VM to be Halted after receiving one of these
events, but it could also be Dying or Crashed. Get rid of the Dying case
at least, by waiting until the VM has transitioned out of it.
Fixes e.g. the following DispVM cleanup bug:
$ qvm-create -C DispVM --prop auto_cleanup=True -l red dispvm
$ qvm-start dispvm
$ qvm-shutdown --wait dispvm # this won't remove dispvm
$ qvm-start dispvm
$ qvm-kill dispvm # but this will
* qubesos/pr/190:
Missed one test, adding default-user in assert for test test_621_qdb_vm_with_network in TC_90
replaced underscore by dash and update test accordingly
Updated assert content for test_620_qdb_standalone in TC_90_QubesVM
Added the default_user property from the Qube to the qubesdb so it is available when starting X. This is the 1st part of a fix for issue https://github.com/QubesOS/qubes-issues/issues/2372
- catch both QubesException and libvirtError - do not kill starting VM
just because an error while connecting _other_ VMs to it
- try to detach network first (and do not abort on error) - if
libvirt/libxl will manage to cleanup stale interface this way, the
attach operation below may succeed.
FixesQubesOS/qubes-issues#3163
1. Make sure VMs are started after dom0 actual memory usage is reported
to qmemman, otherwise dom0 will hold 4GB, even if just a little over 1GB
is needed at that time.
2. Request only vm.memory MB from qmemman, instead of vm.maxmem. While
HVM with PCI devices indeed do not support populate-on-demand, this is
already handled in libvirt XML.
The later may often cause VM startup fail on systems with 8GB of memory,
because maxmem is 4GB there and with dom0 keeping the other 4GB (see
point 1) there is not enough memory to start any sych VM.
FixesQubesOS/qubes-issues#3462
* qubesos/pr/187:
Don't fail create/clone if /var/lib/qubes/TYPE/NAME/ exists
Make 'qvm-volume revert' really use the latest revision
Fix wrong mocks of Volume.revisions
* qubesos/pr/185:
vm: remove doc for non-existing event `monitor-layout-change`
vm: include tag/feature name in event name
events: add support for wildcard event handlers
- use capital letters in acronyms in documentation to match upstream
documentation.
- refuse to start a PVH with without kernel set - provide meaningful
error message
Human readable format `str(datetime.datetime)` is a nightmare for Admin
API level communication. Especially setting the property in a format
that it was read was not supported, and handling such format in
untrusted input handling code is a bad idea. Revert to a simple intiger
format.
Rename events:
- domain-feature-set -> domain-feature-set:feature
- domain-feature-delete -> domain-feature-delete:feature
- domain-tag-add -> domain-tag-add:tag
- domain-tag-delete -> domain-tag-delete:tag
Make it consistent with property-* events. It makes more sense to
include tag/feature name in event name, so handler can watch a single
tag/feature - which is the most common case. Otherwise, most handlers
would begin with `if feature == '...'` anyway, wasting time on most
events.
In cases where multiple features/tags should be handled by a single
handler, it is now possible to register a handler with wildcard, for
example `domain-feature-set:*`.
There may be cases when VM providing the network to other VMs is started
later - for example VM restart. While this is rare case (and currently
broken because of QubesOS/qubes-issues#1426), do not assume it will
always be the case.
Add property for IPv6 address ('ip6'). Build default value similarly to
IPv4 - common prefix + QID or Disp ID (for DispVMs).
This all is disabled unless 'ipv6' feature is enabled. It is inherited
from netvm (not template).
Even when enabled, VM may decide to not use it - or simply not support
it.
QubesOS/qubes-issues#718
Allow using default feature value from netvm, not template. This makes
sense for network-related features like using tor, supporting ipv6 etc.
Similarly to check_with_template, expose it also on Admin API.
Having both default_netvm and default_fw_netvm cause a lot of confusion,
because it isn't clear for the user which one is used when. Additionally
changing provides_network property may also change netvm property, which
may be unintended effect. This as a whole make it hard to:
- cover all netvm-changing actions with policy for Admin API
- cover all netvm-changing events (for example to apply the change to
the running VM, or to check for netvm loops)
As suggested by @qubesuser, kill the default_fw_netvm property and
simplify the logic around it.
Since we're past rc1, implement also migration logic. And add tests for
said migration.
FixesQubesOS/qubes-issues#3247