Handle this case specifically, as way too many users ignore the message
during installation and complain it doesn't work later.
Name the problem explicitly, instead of pointing at libvirt error log.
FixesQubesOS/qubes-issues#4689
Similar to domain-start-failed, add an event fired when
domain-pre-shutdown was fired but the actual operation failed.
Note it might not catch all the cases, as shutdown() may be called with
wait=False, which means it won't wait fot the actual shutdown. In that
case, timeout won't result in domain-shutdown-failed event.
QubesOS/qubes-issues#5380
Check early (but after grabbing a startup_lock) if VM isn't just
removed. This could happen if someone grabs its reference from other
places (netvm of something else?) or just before removing it.
This commit makes the simple removal from the collection (done as the
first step in admin.vm.Remove implementation) efficient way to block
further VM startups, without introducing extra properties.
For this to be effective, removing from the collection, needs to happen
with the startup_lock held. Modify admin.vm.Remove accordingly.
Calling qrexec service dom0->dom0 can be useful when handling things
that can run in dom0 or other domain. This makes the interface uniform.
Example use cases include GUI VM and Audio VM.
The initializer of the class DispVM first calls the initializer of the
QubesVM class, which among other things sets properties as specified in
kwargs, and then copies over the properties of the template. This can
lead to properties passed explicitly by the caller through kwargs being
overwritten.
Hence only clone properties of the template that are still set to
default in the DispVM.
FixesQubesOS/qubes-issues#4556
If kernel package ships default-kernelopts-common.txt file, use that
instead of hardcoded Linux-specific options.
For Linux kernel it may include xen_scrub_pages=0 option, but only if
initrd shipped with this kernel re-enable this option later.
QubesOS/qubes-issues#4839QubesOS/qubes-issues#4736
First of all, do not try to call those services in VMs not having qrexec
installed - for example Windows VMs without qubes tools.
Then, even if service call fails for any other reason, only log it but
do not prevent other services from being called. A single uncooperative
VM should generally be able only to hurt itself, not break other VMs
during suspend.
FixesQubesOS/qubes-issues#3489
Since we have more reliable domain-shutdown event delivery (it si
guaranteed to be delivered before subsequent domain start, even if
libvirt fails to report it), it's better to move detach_network call to
domain-shutdown handler. This way, frontend domain will see immediately
that the backend is gone. Technically it already know that, but at least
Linux do not propagate that anywhere, keeping the interface up,
seemingly operational, leading to various timeouts.
Additionally, by avoiding attach_network call _just_ after
detach_network call, it avoids various race conditions (like calling
cleanup scripts after new device got already connected).
While libvirt itself still doesn't cleanup devices when the backend
domain is gone, this will emulate it within qubesd.
FixesQubesOS/qubes-issues#3642FixesQubesOS/qubes-issues#1426
If default-kernelopts-pci.txt is present, it will override default
built-in kernelopts for the VMs with PCI device assigned.
Similarly if default-kernelopts-nopci.txt is present, it will override
default kernelopts for VMs without PCI devices.
For template-based VMs, kernelopts of the template takes precedence over
default-kernelopts-nopci.txt but not default-kernelopts-pci.txt.
FixesQubesOS/qubes-issues#4839
If a specific DVM template is used for given DispVM, make new DispVMs
called from it use the same DVM template (unless explicitly overridden).
This prevent various isolation bypass cases, like using a chain of
DispVMs to access network.
Instead of checking if domain is still running/paused, try to kill it
anyway and ignore appropriate exception. Otherwise domain could die
before the check and killing.
- Two new methods: .features.check_with_adminvm() and
.check_with_template_and_adminvm(). Common code refactored.
- Two new AdminAPI calls to take advantage of the methods:
- admin.vm.feature.CheckWithAdminVM
- admin.vm.feature.CheckWithTemplateAndAdminVM
- Features manager moved to separate module in anticipation of features
on app object in R5.0. The attribute Features.vm renamed to
Features.subject.
- Documentation, tests.
* devel-20181205:
vm/dispvm: fix /qubes-vm-presistence qubesdb entry
vm/mix/net: prevent setting provides_network=false if qube is still used
tests: updates-available notification
tests/network: reduce code duplication
tests: listen on 'misc' socket too
The new property is meant for management stack (Salt) to set which DVM
template should be used to maintain given VM. Since the DispVM based on
it will be given ultimate control over target VM (qubes.VMShell
service), it should be trusted. The one pointed to by default_dispvm
not necessary is one.
The property defaults to the value from the template (if any), and then
to a global management_dispvm property. By default it is set to None.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Use maxmem=0 for disabling dynamic memory balance, instead of cryptic
service.meminfo-writer feature. Under the hood, meminfo-writer service
is also set based on maxmem property (directly in qubesdb, not
vm.features dict).
Having this as a property (not "feature"), allow to have sensible
handling of default value. Specifically, disable it automatically if
otherwise it would crash a VM. This is the case for:
- domain with PCI devices (PoD is not supported by Xen then)
- domain without balloon driver and/or meminfo-writer service
The check for the latter is heuristic (assume presence of 'qrexec' also
can indicate balloon driver support), but it is true for currently
supported systems.
This also allows more reliable control of libvirt config: do not set
memory != maxmem, unless qmemman is enabled.
memory != maxmem only makes sense if qmemman for given domain is
enabled. Besides wasting some domain resources for extra page tables
etc, for HVM domains this is harmful, because maxmem-memory difference
is made of Popupate-on-Demand pool, which - when depleted - will kill
the domain. This means domain without balloon driver will die as soon
as will try to use more than initial memory - but without balloon driver
it sees maxmem memory and doesn't know about the lower limit.
FixesQubesOS/qubes-issues#4135
It makes a lot of sense to call long-running operations in that event
handler, including calling back into the VM. Allow that by using
fire_event_async, not just fire_event.
Also, document the event.
vm.kill() will try to get vm.startup_lock, so it can't be called while
holding it already.
Fix this by extracting vm._kill_locked(), which expect the lock to be
already taken by the caller.
If domain is set to autostart, qubes-vm@ systemd service is used to
start it at boot. Cleanup the service when domain is removed, and
similarly enable the service when domain is created and already have
autostart=True.
FixesQubesOS/qubes-issues#4014
Cleaning up after domain shutdown (domain-stopped and domain-shutdown
events) relies on libvirt events which may be unreliable in some cases
(events may be processed with some delay, of if libvirt was restarted in
the meantime, may not happen at all). So, instead of ensuring only
proper ordering between shutdown cleanup and next startup, also trigger
the cleanup when we know for sure domain isn't running:
- at vm.kill() - after libvirt confirms domain was destroyed
- at vm.shutdown(wait=True) - after successful shutdown
- at vm.remove_from_disk() - after ensuring it isn't running but just
before actually removing it
This fixes various race conditions:
- qvm-kill && qvm-remove: remove could happen before shutdown cleanup
was done and storage driver would be confused about that
- qvm-shutdown --wait && qvm-clone: clone could happen before new content was
commited to the original volume, making the copy of previous VM state
(and probably more)
Previously it wasn't such a big issue on default configuration, because
LVM driver was fully synchronous, effectively blocking the whole qubesd
for the time the cleanup happened.
To avoid code duplication, factor out _ensure_shutdown_handled function
calling actual cleanup (and possibly canceling one called with libvirt
event). Note that now, "Duplicated stopped event from libvirt received!"
warning may happen in normal circumstances, not only because of some
bug.
It is very important that post-shutdown cleanup happen when domain is
not running. To ensure that, take startup_lock and under it 1) ensure
its halted and only then 2) execute the cleanup. This isn't necessary
when removing it from disk, because its already removed from the
collection at that time, which also avoids other calls to it (see also
"vm/dispvm: fix DispVM cleanup" commit).
Actually, taking the startup_lock in remove_from_disk function would
cause a deadlock in DispVM auto cleanup code:
- vm.kill (or other trigger for the cleanup)
- vm.startup_lock acquire <====
- vm._ensure_shutdown_handled
- domain-shutdown event
- vm._auto_cleanup (in DispVM class)
- vm.remove_from_disk
- cannot take vm.startup_lock again