Commit Graph

345 Commits

Author SHA1 Message Date
Marek Marczykowski-Górecki
361550c621
vm: improve error message about missing IOMMU
Handle this case specifically, as way too many users ignore the message
during installation and complain it doesn't work later.

Name the problem explicitly, instead of pointing at libvirt error log.

Fixes QubesOS/qubes-issues#4689
2019-10-30 15:45:52 +01:00
Marek Marczykowski-Górecki
4dac995089
Merge branch 'start-logging'
* start-logging:
  vm: log startup errors in every case
2019-10-23 13:50:25 +02:00
Marek Marczykowski-Górecki
5cc453f097
vm: remove dead (commented out) code
default_user property was re-implemented, remove dead version.
2019-10-22 16:44:02 +02:00
Frédéric Pierret (fepitre)
d2d1ffb806
Make pylint happier 2019-10-20 16:40:40 +02:00
Frédéric Pierret (fepitre)
27aad9bd38
Handle GuiVM properties 2019-10-20 13:22:31 +02:00
Frédéric Pierret (fepitre)
a52cb6bb91
Make PEP8 happier 2019-10-20 13:22:29 +02:00
Marek Marczykowski-Górecki
d5b0a6e5b6
vm: log startup errors in every case 2019-10-10 19:41:02 +02:00
Marek Marczykowski-Górecki
51af2ed27c
vm: add domain-shutdown-failed event
Similar to domain-start-failed, add an event fired when
domain-pre-shutdown was fired but the actual operation failed.
Note it might not catch all the cases, as shutdown() may be called with
wait=False, which means it won't wait fot the actual shutdown. In that
case, timeout won't result in domain-shutdown-failed event.

QubesOS/qubes-issues#5380
2019-10-08 23:35:24 +02:00
Marek Marczykowski-Górecki
0300e895f2
Fix domain-start-failed event documentation. 2019-10-08 23:35:18 +02:00
Marek Marczykowski-Górecki
732231efb0
vm: refuse to start a VM not present in a collection
Check early (but after grabbing a startup_lock) if VM isn't just
removed. This could happen if someone grabs its reference from other
places (netvm of something else?) or just before removing it.
This commit makes the simple removal from the collection (done as the
first step in admin.vm.Remove implementation) efficient way to block
further VM startups, without introducing extra properties.

For this to be effective, removing from the collection, needs to happen
with the startup_lock held. Modify admin.vm.Remove accordingly.
2019-09-29 06:14:21 +02:00
Marek Marczykowski-Górecki
5fa75d73be
Make pylint happy 2019-09-27 16:29:20 +02:00
Rusty Bird
aa931fdbf7
vm/qubesvm: call storage.remove() *before* removing dir_path
Give the varlibqubes storage pool, which places volume images inside the
VM directory, a chance to remove them (and e.g. to log that).
2019-06-28 10:29:34 +00:00
Rusty Bird
cfa1c7171e
storage: fix typo 2019-06-28 10:29:32 +00:00
Frédéric Pierret (fepitre)
06ec862be4
qubesdb: add qubes-mac path entry 2019-05-23 11:30:08 +02:00
Marek Marczykowski-Górecki
60bbbdd702
Merge branch 'kernelopts-files'
* kernelopts-files:
  vm: allow files in kernels_dir override built-in default kernelopts
2019-03-08 18:08:12 +01:00
Marek Marczykowski-Górecki
092fb9659d
Make pylint happy
Fix multiple instances of 'no-else-raise' warning.
2019-02-27 18:40:18 +01:00
Marek Marczykowski-Górecki
2de5a8e894
vm,templates: allow to obtain common kernelopts from a kernel package
If kernel package ships default-kernelopts-common.txt file, use that
instead of hardcoded Linux-specific options.
For Linux kernel it may include xen_scrub_pages=0 option, but only if
initrd shipped with this kernel re-enable this option later.

QubesOS/qubes-issues#4839
QubesOS/qubes-issues#4736
2019-02-27 06:03:57 +01:00
Marek Marczykowski-Górecki
723b33a2f6
vm: move comment about memory overhead constants near their definition 2019-02-27 06:03:57 +01:00
Marek Marczykowski-Górecki
9257a6d14f
Do not abort suspend hooks if any qubes.Suspend* service fails to run
First of all, do not try to call those services in VMs not having qrexec
installed - for example Windows VMs without qubes tools.
Then, even if service call fails for any other reason, only log it but
do not prevent other services from being called. A single uncooperative
VM should generally be able only to hurt itself, not break other VMs
during suspend.

Fixes QubesOS/qubes-issues#3489
2019-02-27 06:03:57 +01:00
Marek Marczykowski-Górecki
f9593ce3e6
vm: allow files in kernels_dir override built-in default kernelopts
If default-kernelopts-pci.txt is present, it will override default
built-in kernelopts for the VMs with PCI device assigned.
Similarly if default-kernelopts-nopci.txt is present, it will override
default kernelopts for VMs without PCI devices.
For template-based VMs, kernelopts of the template takes precedence over
default-kernelopts-nopci.txt but not default-kernelopts-pci.txt.

Fixes QubesOS/qubes-issues#4839
2019-02-23 12:53:49 +01:00
Marek Marczykowski-Górecki
a9ec2bb2c3
vm/qubesvm: fix race condition in failed startup handling
Instead of checking if domain is still running/paused, try to kill it
anyway and ignore appropriate exception. Otherwise domain could die
before the check and killing.
2019-01-19 03:25:20 +01:00
Marek Marczykowski-Górecki
3728230e3c
Merge branch 'maxmem' 2018-12-09 18:38:21 +01:00
AJ Jordan
d4e567cb10
Fix typo 2018-12-06 20:43:39 -05:00
Marek Marczykowski-Górecki
7d1bcaf64c Introduce management_dispvm property
The new property is meant for management stack (Salt) to set which DVM
template should be used to maintain given VM. Since the DispVM based on
it will be given ultimate control over target VM (qubes.VMShell
service), it should be trusted. The one pointed to by default_dispvm
not necessary is one.

The property defaults to the value from the template (if any), and then
to a global management_dispvm property. By default it is set to None.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
2018-12-03 19:18:26 +01:00
Marek Marczykowski-Górecki
4dc8631010
Use maxmem=0 to disable qmemman, add more automation to it
Use maxmem=0 for disabling dynamic memory balance, instead of cryptic
service.meminfo-writer feature. Under the hood, meminfo-writer service
is also set based on maxmem property (directly in qubesdb, not
vm.features dict).
Having this as a property (not "feature"), allow to have sensible
handling of default value. Specifically, disable it automatically if
otherwise it would crash a VM. This is the case for:
 - domain with PCI devices (PoD is not supported by Xen then)
 - domain without balloon driver and/or meminfo-writer service

The check for the latter is heuristic (assume presence of 'qrexec' also
can indicate balloon driver support), but it is true for currently
supported systems.

This also allows more reliable control of libvirt config: do not set
memory != maxmem, unless qmemman is enabled.

memory != maxmem only makes sense if qmemman for given domain is
enabled.  Besides wasting some domain resources for extra page tables
etc, for HVM domains this is harmful, because maxmem-memory difference
is made of Popupate-on-Demand pool, which - when depleted - will kill
the domain. This means domain without balloon driver will die as soon
as will try to use more than initial memory - but without balloon driver
it sees maxmem memory and doesn't know about the lower limit.

Fixes QubesOS/qubes-issues#4135
2018-11-21 02:13:25 +01:00
Marek Marczykowski-Górecki
35a53840f1
vm: send domain-start-failed event also if some device is missing
Checking device presence wasn't covered with try/except that send the
event.
2018-11-15 18:25:29 +01:00
Marek Marczykowski-Górecki
0eab082d85
ext/core-features: make 'template-postinstall' event async
It makes a lot of sense to call long-running operations in that event
handler, including calling back into the VM. Allow that by using
fire_event_async, not just fire_event.

Also, document the event.
2018-11-15 18:25:29 +01:00
Marek Marczykowski-Górecki
328697730b
vm: fix deadlock on qrexec timeout handling
vm.kill() will try to get vm.startup_lock, so it can't be called while
holding it already.
Fix this by extracting vm._kill_locked(), which expect the lock to be
already taken by the caller.
2018-11-04 17:05:55 +01:00
Marek Marczykowski-Górecki
f13029219b
vm: disable/enable qubes-vm@ service when domain is removed/created
If domain is set to autostart, qubes-vm@ systemd service is used to
start it at boot. Cleanup the service when domain is removed, and
similarly enable the service when domain is created and already have
autostart=True.

Fixes QubesOS/qubes-issues#4014
2018-10-27 16:44:53 +02:00
Marek Marczykowski-Górecki
2c1629da04
vm: call after-shutdown cleanup also from vm.kill and vm.shutdown
Cleaning up after domain shutdown (domain-stopped and domain-shutdown
events) relies on libvirt events which may be unreliable in some cases
(events may be processed with some delay, of if libvirt was restarted in
the meantime, may not happen at all). So, instead of ensuring only
proper ordering between shutdown cleanup and next startup, also trigger
the cleanup when we know for sure domain isn't running:
 - at vm.kill() - after libvirt confirms domain was destroyed
 - at vm.shutdown(wait=True) - after successful shutdown
 - at vm.remove_from_disk() - after ensuring it isn't running but just
 before actually removing it

This fixes various race conditions:
 - qvm-kill && qvm-remove: remove could happen before shutdown cleanup
 was done and storage driver would be confused about that
 - qvm-shutdown --wait && qvm-clone: clone could happen before new content was
 commited to the original volume, making the copy of previous VM state
(and probably more)

Previously it wasn't such a big issue on default configuration, because
LVM driver was fully synchronous, effectively blocking the whole qubesd
for the time the cleanup happened.

To avoid code duplication, factor out _ensure_shutdown_handled function
calling actual cleanup (and possibly canceling one called with libvirt
event). Note that now, "Duplicated stopped event from libvirt received!"
warning may happen in normal circumstances, not only because of some
bug.

It is very important that post-shutdown cleanup happen when domain is
not running. To ensure that, take startup_lock and under it 1) ensure
its halted and only then 2) execute the cleanup. This isn't necessary
when removing it from disk, because its already removed from the
collection at that time, which also avoids other calls to it (see also
"vm/dispvm: fix DispVM cleanup" commit).
Actually, taking the startup_lock in remove_from_disk function would
cause a deadlock in DispVM auto cleanup code:
 - vm.kill (or other trigger for the cleanup)
   - vm.startup_lock acquire   <====
     - vm._ensure_shutdown_handled
       - domain-shutdown event
         - vm._auto_cleanup (in DispVM class)
           - vm.remove_from_disk
             - cannot take vm.startup_lock again
2018-10-26 23:54:08 +02:00
Marek Marczykowski-Górecki
e1f65bdf7b
vm: add shutdown_timeout property, make vm.shutdown(wait=True) use it
vm.shutdown(wait=True) waited indefinitely for the shutdown, which makes
useless without some boilerplate handling the timeout. Since the timeout
may depend on the operating system inside, add a per-VM property for it,
with value inheritance from template and then from global
default_shutdown_timeout property.

When timeout is reached, the method raises exception - whether to kill
it or not is left to the caller.

Fixes QubesOS/qubes-issues#1696
2018-10-26 23:54:04 +02:00
Marek Marczykowski-Górecki
58bcec2a64
qubesvm: improve error message about same-pool requirement
Make it clear that volume creation fails because it needs to be in the
same pool as its parent. This message is shown in context of `qvm-create
-p root=MyPool` for example and the previous message didn't make sense
at all.

Fixes QubesOS/qubes-issues#3438
2018-10-18 00:03:05 +02:00
Marek Marczykowski-Górecki
ba210c41ee
qubesvm: don't crash VM creation if icon symlink already exists
It can be leftover from previous failed attempt. Don't crash on it, and
replace it instead.

QubesOS/qubes-issues#3438
2018-10-18 00:01:45 +02:00
Rusty Bird
bee69a98b9
Add default_qrexec_timeout to qubes-prefs
When a VM (or its template) does not explicitly set a qrexec_timeout,
fall back to a global default_qrexec_timeout (with default value 60),
instead of hardcoding the fallback value to 60.

This makes it easy to set a higher timeout for the whole system, which
helps users who habitually launch applications from several (not yet
started) VMs at the same time. 60 seconds can be too short for that.
2018-09-16 18:42:48 +00:00
Rusty Bird
b3983f5ef8
'except FileNotFoundError' instead of ENOENT check 2018-09-13 19:46:45 +00:00
Marek Marczykowski-Górecki
7f1e2741ec
Merge remote-tracking branch 'qubesos/pr/228'
* qubesos/pr/228:
  storage/lvm: filter out warning about intended over-provisioning
  tests: fix getting kernel package version inside VM
  tests/extra: add start_guid option to VMWrapper
  vm/qubesvm: fire 'domain-start-failed' event even if fail was early
  vm/qubesvm: check if all required devices are available before start
  storage/lvm: fix reporting lvm command error
  storage/lvm: save pool's revision_to_keep property
2018-09-07 01:06:59 +02:00
Marek Marczykowski-Górecki
ee25f7c7bb
vm: fix error reporting on PVH without kernel set
Fixes QubesOS/qubes-issues#4254
2018-09-03 00:23:05 +02:00
Jean-Philippe Ouellet
e95ef5f61d
Add domain-paused/-unpaused events
Needed for event-driven domains-tray UI updating and anti-GUI-DoS
usability improvements.

Catches errors from event handlers to protect libvirt, and logs to
main qubesd logger singleton (by default meaning systemd journal).
2018-08-01 05:41:50 -04:00
Marek Marczykowski-Górecki
e51efcf980
vm: document domain-start-failed event 2018-07-16 22:02:59 +02:00
Marek Marczykowski-Górecki
af2435c0d4
Make some properties default to template's value (if any)
Multiple properties are related to system installed inside the VM, so it
makes sense to have them the same for all the VMs based on the same
template. Modify default value getter to first try get the value from a
template (if any) and only if it fails, fallback to original default
value.
This change is made to those properties:
 - default_user (it was already this way)
 - kernel
 - kernelopts
 - maxmem
 - memory
 - qrexec_timeout
 - vcpus
 - virt_mode

This is especially useful for manually installed templates (like
Windows).

Related to QubesOS/qubes-issues#3585
2018-07-16 22:02:58 +02:00
Marek Marczykowski-Górecki
be2465c1f9
Fix issues found by pylint 2.0
Resolve:
 - no-else-return
 - useless-object-inheritance
 - useless-return
 - consider-using-set-comprehension
 - consider-using-in
 - logging-not-lazy

Ignore:
 - not-an-iterable - false possitives for asyncio coroutines

Ignore all the above in qubespolicy/__init__.py, as the file will be
moved to separate repository (core-qrexec) - it already has a copy
there, don't desynchronize them.
2018-07-15 23:51:15 +02:00
Marek Marczykowski-Górecki
6a191febc3
vm/qubesvm: fire 'domain-start-failed' event even if fail was early
Fire 'domain-start-failed' even even if failure occurred during
'domain-pre-start' event. This will make sure if _anyone_ have seen
'domain-pre-start' event, will also see 'domain-start-failed'. In some
cases it will look like spurious 'domain-start-failed', but it is safer
option than the alternative.
2018-04-13 16:07:32 +02:00
Marek Marczykowski-Górecki
ba82d9dc21
vm/qubesvm: check if all required devices are available before start
Fail the VM start early if some persistently-assigned device is missing.
This will both save time and provide clearer error message.

Fixes QubesOS/qubes-issues#3810
2018-04-13 16:03:42 +02:00
Marek Marczykowski-Górecki
93b2424867
vm/qubesvm: fix missing icon handling in clone_disk_files()
Check for icon existence, not a directory for it.
2018-04-06 12:10:50 +02:00
Marek Marczykowski-Górecki
f4be284331
vm/qubesvm: handle libvirt reporting domain already dead when killing
If domain die when trying to kill it, qubesd may loose a race and try to
kill it anyway. Handle libvirt exception in that case and conver it to
QubesVMNotStartedError - as it would be if qubesd would win the race.

Fixes QubesOS/qubes-issues#3755
2018-04-02 23:56:03 +02:00
Marek Marczykowski-Górecki
1e9bf18bcf
Typo fix 2018-04-02 23:24:30 +02:00
Marek Marczykowski-Górecki
7c4566ec14
vm/qubesvm: allow 'features-request' to have async handlers
Some handlers may want to call into other VMs (or even the one asking),
but vm.run() functions are coroutines, so needs to be called from
another coroutine. Allow for that.
Also fix typo in documentation.
2018-03-02 01:16:38 +01:00
Marek Marczykowski-Górecki
ba5d19e1b4
vm: provide better error message for VM startup timeout
"Cannot execute qrexec-daemon!" error is very misleading for a startup
timeout error, make it clearer. This rely on qrexec-daemon using
distinct exit code for timeout error, but even without that, include its
stderr in the error message.
2018-02-27 04:35:05 +01:00
Marek Marczykowski-Górecki
716114f676
Merge remote-tracking branch 'qubesos/pr/197'
* qubesos/pr/197:
  Don't fire domain-stopped/-shutdown while VM is still Dying
2018-02-22 21:14:55 +01:00
Rusty Bird
f96fd70f76
Don't fire domain-stopped/-shutdown while VM is still Dying
Lots of code expects the VM to be Halted after receiving one of these
events, but it could also be Dying or Crashed. Get rid of the Dying case
at least, by waiting until the VM has transitioned out of it.

Fixes e.g. the following DispVM cleanup bug:

    $ qvm-create -C DispVM --prop auto_cleanup=True -l red dispvm
    $ qvm-start dispvm
    $ qvm-shutdown --wait dispvm  # this won't remove dispvm
    $ qvm-start dispvm
    $ qvm-kill dispvm  # but this will
2018-02-22 19:53:29 +00:00