Commit Graph

6005 Commits

Author SHA1 Message Date
Marek Marczykowski-Górecki
087a02c7f4
ext/services: add automatic migration meminfo-writer=False -> maxmem=0
Migrate meminfo-writer=False service setting to maxmem=0 as a method to
disable dynamic memory management. Remove the service from vm.features
dict in the process.

Additionally, translate any attempt to set the service.meminfo-writer
feature to either setting maxmem=0 or resetting it to the default (which
is memory balancing enabled if supported by given domain). This is to at
least partially not break existing tools using service.meminfo-writer as
a way to control dynamic memory management. This code does _not_ support
reading service.meminfo-writer feature state to get the current state of
dynamic memory management, as it would require synchronizing with all
the factors affecting its value. One of main reasons for migrating to
maxmem=0 approach is to avoid the need of such synchronization.

QubesOS/qubes-issues#4480
2018-11-21 02:13:25 +01:00
Marek Marczykowski-Górecki
62bc462a23
tests: default maxmem 2018-11-21 02:13:25 +01:00
Marek Marczykowski-Górecki
b8052f864a
tests: more cases for libvirt xml generation
Related to automatic mem balance enabling/disabling. Check how it behave
in presence of PCI devices, or explicit disabling it.
2018-11-21 02:13:25 +01:00
Marek Marczykowski-Górecki
4dc8631010
Use maxmem=0 to disable qmemman, add more automation to it
Use maxmem=0 for disabling dynamic memory balance, instead of cryptic
service.meminfo-writer feature. Under the hood, meminfo-writer service
is also set based on maxmem property (directly in qubesdb, not
vm.features dict).
Having this as a property (not "feature"), allow to have sensible
handling of default value. Specifically, disable it automatically if
otherwise it would crash a VM. This is the case for:
 - domain with PCI devices (PoD is not supported by Xen then)
 - domain without balloon driver and/or meminfo-writer service

The check for the latter is heuristic (assume presence of 'qrexec' also
can indicate balloon driver support), but it is true for currently
supported systems.

This also allows more reliable control of libvirt config: do not set
memory != maxmem, unless qmemman is enabled.

memory != maxmem only makes sense if qmemman for given domain is
enabled.  Besides wasting some domain resources for extra page tables
etc, for HVM domains this is harmful, because maxmem-memory difference
is made of Popupate-on-Demand pool, which - when depleted - will kill
the domain. This means domain without balloon driver will die as soon
as will try to use more than initial memory - but without balloon driver
it sees maxmem memory and doesn't know about the lower limit.

Fixes QubesOS/qubes-issues#4135
2018-11-21 02:13:25 +01:00
Marek Marczykowski-Górecki
35a53840f1
vm: send domain-start-failed event also if some device is missing
Checking device presence wasn't covered with try/except that send the
event.
2018-11-15 18:25:29 +01:00
Marek Marczykowski-Górecki
0eab082d85
ext/core-features: make 'template-postinstall' event async
It makes a lot of sense to call long-running operations in that event
handler, including calling back into the VM. Allow that by using
fire_event_async, not just fire_event.

Also, document the event.
2018-11-15 18:25:29 +01:00
Marek Marczykowski-Górecki
d2585aa871
tests/lvm: fix checking lvm pool existence cont.
Commit 15cf593bc5 "tests/lvm: fix checking
lvm pool existence" attempted to fix handling '-' in pool name by using
/dev/VG/LV symlink. But those are not created for thin pools. Change
back to /dev/mapper, but include '-' mangling.

Related QubesOS/qubes-issues#4332
2018-11-15 18:25:29 +01:00
Marek Marczykowski-Górecki
f023b3dd6e
backup: fix naming qubes.xml.000 in the archive
Restore old code for calculating subdir within the archive. The new one
had two problems:
 - set '/' for empty input subdir - which caused qubes.xml.000 to be
 named '/qubes.xml.000' (and then converted to '../../qubes.xml.000');
 among other things, this results in the wrong path used for encryption
 passphrase
 - resolved symlinks, which breaks calculating path for any symlinks
 within VM's directory (symlinks there should be treated as normal files
 to be sure that actual content is included in the backup)

This partially reverts 4e49b951ce.

Fixes QubesOS/qubes-issues#4493
2018-11-15 18:25:29 +01:00
Marek Marczykowski-Górecki
85a20428a6
libvirt: allow skipping hardcoded kernelopts
Add 'no-default-kernelopts' feature to skip default hardcoded
Linux-specific kernelopts.
This is especially useful for non-Linux VMs (including Mirage OS).

Fixes QubesOS/qubes-issues#4468
2018-11-15 17:54:26 +01:00
Marek Marczykowski-Górecki
328697730b
vm: fix deadlock on qrexec timeout handling
vm.kill() will try to get vm.startup_lock, so it can't be called while
holding it already.
Fix this by extracting vm._kill_locked(), which expect the lock to be
already taken by the caller.
2018-11-04 17:05:55 +01:00
Marek Marczykowski-Górecki
68dffb6895
api/admin: fix error message when refusing to create template on template
Fixes QubesOS/qubes-issues#4463
2018-11-04 17:05:55 +01:00
Marek Marczykowski-Górecki
e8201eebba
version 4.0.34 2018-11-01 22:31:18 +01:00
Marek Marczykowski-Górecki
64f290c9ba
ext/pci: fix error message about missing device
Print human readable device name, instead of "<PCIDevice at ...".

QubesOS/qubes-issues#4461
2018-11-01 22:28:50 +01:00
Marek Marczykowski-Górecki
00ca0459d9
ext/pci: use correct backend domain for getting PCIDevice instance
In practice backend_domain is ignored (all PCI devices belongs to dom0),
but lets fix this anyway.
2018-11-01 22:21:50 +01:00
Marek Marczykowski-Górecki
15cf593bc5
tests/lvm: fix checking lvm pool existence
If pool or group name have '-', it will be mangled as '--' in
/dev/mapper. Use /dev/VG_NAME/LV_NAME symlink instead.

Related QubesOS/qubes-issues#4332
2018-10-30 01:17:00 +01:00
Marek Marczykowski-Górecki
1ae6abdff5
exc: fix QubesMemoryError constructor
QubesVMError require 'vm' argument.
Fixes 2f3a9847 "exc: Make QubesMemoryError inherit from QubesVMError"
2018-10-30 01:14:58 +01:00
Marek Marczykowski-Górecki
b9a18a819c
Merge remote-tracking branch 'origin/pr/239'
* origin/pr/239:
  storage: fix NotImplementedError message for import_data()
  storage/reflink: make resize()/import_volume() more readable
  storage/reflink: unblock import_data() and import_data_end()
2018-10-29 23:00:04 +01:00
Marek Marczykowski-Górecki
fa2429aae4
Merge remote-tracking branch 'origin/pr/237'
* origin/pr/237:
  progress thresold removed as Marek suggested
  Avoid progress events flooding

Fixes QubesOS/qubes-issues#4406
Fixes QubesOS/qubes-issues#3035
2018-10-29 22:54:05 +01:00
Marek Marczykowski-Górecki
f30963fde1
tests/qubespolicy: adjust for removing 'assert' usage 2018-10-29 22:37:15 +01:00
Rusty Bird
7e4812a525
storage: fix NotImplementedError message for import_data() 2018-10-29 20:21:42 +00:00
Rusty Bird
73db2751b8
storage/reflink: make resize()/import_volume() more readable 2018-10-29 20:21:41 +00:00
Rusty Bird
425d993769
storage/reflink: unblock import_data() and import_data_end() 2018-10-29 20:21:39 +00:00
Marek Marczykowski-Górecki
f621e8792c
Merge branch 'master' into devel-no-assert 2018-10-29 20:29:53 +01:00
Marek Marczykowski-Górecki
3740e2d48b
api: make enforce() static
It doesn't use 'self'. And pylint complains.
2018-10-29 20:22:35 +01:00
Marek Marczykowski-Górecki
db6094f397
tests/api: adjust for proper exceptions instead of AssertionError 2018-10-29 20:22:10 +01:00
Marek Marczykowski-Górecki
2b5fc6299e
tests/api: do not test non-existing methods
Remove methods not included in specification (or with different
constraints). Keep commented out methods included in spec but not
implemented.
2018-10-29 20:21:36 +01:00
Marek Marczykowski-Górecki
c1c6dc2acd
Use ValueError in PropertyHolder.property_require()
Specifically do not use AssertionError, but also be consistent with
other value verification.
2018-10-29 20:16:41 +01:00
Marek Marczykowski-Górecki
114a9db09a
version 4.0.33 2018-10-29 05:45:56 +01:00
Marek Marczykowski-Górecki
26a553737f
storage/lvm: minor fix for lvs command building
Do not prepend 'sudo' each time - do a copy of array if that's
necessary.
2018-10-29 05:16:23 +01:00
Marek Marczykowski-Górecki
42061cb194
tests: try to collect qvm-open-in-dvm output if no editor window is shown
Try to collect more details about why the test failed. This will help
only if qvm-open-in-dvm exist early. On the other hand, if it hang, or
remote side fails to find the right editor (which results in GUI error
message), this change will not provide any more details.
2018-10-29 01:20:57 +01:00
Marek Marczykowski-Górecki
84c321b923
tests: increase session startup timeout for whonix-ws based VMs
First boot of whonix-ws based VM take extended period of time, because
a lot of files needs to be copied to private volume. This takes even
more time, when verbose logging through console is enabled. Extend the
timeout for that.
2018-10-27 16:44:53 +02:00
Marek Marczykowski-Górecki
84d3547f09
tests: adjust extra tests loader to work with nose2
Nose loader do not provide loader.loadTestsFromTestCase(), use
loader.loadTestsFromNames() instead.
2018-10-27 16:44:53 +02:00
Marek Marczykowski-Górecki
fb14f589cb
tests: wait for full user session before doing rest of the test
Clean VM shutdown may timeout if its initiated before full startup, so
make sure the full startup is completed first.
2018-10-27 16:44:53 +02:00
Marek Marczykowski-Górecki
4742a630f2
tests: use iptables --wait
QubesOS/qubes-issues#3665 affects also tests...
2018-10-27 16:44:53 +02:00
Marek Marczykowski-Górecki
2f3a984742
exc: Make QubesMemoryError inherit from QubesVMError
Same as other vm-related errors.
This helps QubesTestCase.cleanup_traceback() cleanup VM reference.
2018-10-27 16:44:53 +02:00
Marek Marczykowski-Górecki
8be70c9e4d
ext/services: allow for os=Linux feature request from VM
It's weird to set it for Windows, but not Linux.
2018-10-27 16:44:53 +02:00
Marek Marczykowski-Górecki
e244c192ae
tests: use /bin/uname instead of /bin/hostname as dummy output generator
Use something included in coreutils installed everywhere.
2018-10-27 16:44:53 +02:00
Marek Marczykowski-Górecki
f13029219b
vm: disable/enable qubes-vm@ service when domain is removed/created
If domain is set to autostart, qubes-vm@ systemd service is used to
start it at boot. Cleanup the service when domain is removed, and
similarly enable the service when domain is created and already have
autostart=True.

Fixes QubesOS/qubes-issues#4014
2018-10-27 16:44:53 +02:00
Marek Marczykowski-Górecki
0b7aa546c6
tests: remove VM reference from QubesVMError
Yet another place wheren object references are leaked.
2018-10-27 16:44:53 +02:00
Marek Marczykowski-Górecki
a972c61914
tests: use socat instead of nc
socat have only one variant, so one command line syntax to handle. It's
also installed by default in Qubes VMs.
2018-10-27 16:44:53 +02:00
Marek Marczykowski-Górecki
08ddeee9fb
tests: improve VMs cleanup wrt custom templates
Cleanup VMs in template reverse topological order, not network one.
Network can be set to None to break dependency, but template can't. For
netvm to be changed, kill VMs first (kill doesn't check network
dependency), so netvm change will not trigger side effects (runtime
change, which could fail).

This fixes cleanup for tests creating custom templates - previously
order was undefined and if template was tried removed before its child
VMs, it fails. All the relevant files were removed later anyway, but it
lead to python objects leaks.
2018-10-27 16:44:53 +02:00
Marek Marczykowski-Górecki
4e762788a9
tests: check if qubes-vm@ service is disabled on domain removal
Test for QubesOS/qubes-issues#4014
2018-10-27 01:43:00 +02:00
Marek Marczykowski-Górecki
cf8b6219a9
tests: make use of vm.shutdown(wait=True) 2018-10-27 01:43:00 +02:00
Marek Marczykowski-Górecki
2c1629da04
vm: call after-shutdown cleanup also from vm.kill and vm.shutdown
Cleaning up after domain shutdown (domain-stopped and domain-shutdown
events) relies on libvirt events which may be unreliable in some cases
(events may be processed with some delay, of if libvirt was restarted in
the meantime, may not happen at all). So, instead of ensuring only
proper ordering between shutdown cleanup and next startup, also trigger
the cleanup when we know for sure domain isn't running:
 - at vm.kill() - after libvirt confirms domain was destroyed
 - at vm.shutdown(wait=True) - after successful shutdown
 - at vm.remove_from_disk() - after ensuring it isn't running but just
 before actually removing it

This fixes various race conditions:
 - qvm-kill && qvm-remove: remove could happen before shutdown cleanup
 was done and storage driver would be confused about that
 - qvm-shutdown --wait && qvm-clone: clone could happen before new content was
 commited to the original volume, making the copy of previous VM state
(and probably more)

Previously it wasn't such a big issue on default configuration, because
LVM driver was fully synchronous, effectively blocking the whole qubesd
for the time the cleanup happened.

To avoid code duplication, factor out _ensure_shutdown_handled function
calling actual cleanup (and possibly canceling one called with libvirt
event). Note that now, "Duplicated stopped event from libvirt received!"
warning may happen in normal circumstances, not only because of some
bug.

It is very important that post-shutdown cleanup happen when domain is
not running. To ensure that, take startup_lock and under it 1) ensure
its halted and only then 2) execute the cleanup. This isn't necessary
when removing it from disk, because its already removed from the
collection at that time, which also avoids other calls to it (see also
"vm/dispvm: fix DispVM cleanup" commit).
Actually, taking the startup_lock in remove_from_disk function would
cause a deadlock in DispVM auto cleanup code:
 - vm.kill (or other trigger for the cleanup)
   - vm.startup_lock acquire   <====
     - vm._ensure_shutdown_handled
       - domain-shutdown event
         - vm._auto_cleanup (in DispVM class)
           - vm.remove_from_disk
             - cannot take vm.startup_lock again
2018-10-26 23:54:08 +02:00
Marek Marczykowski-Górecki
5be003d539
vm/dispvm: fix DispVM cleanup
First unregister the domain from collection, and only then call
remove_from_disk(). Removing it from collection prevent further calls
being made to it. Or if anything else keep a reference to it (for
example as a netvm), then abort the operation.

Additionally this makes it unnecessary to take startup lock when
cleaning it up in tests.
2018-10-26 23:54:08 +02:00
Marek Marczykowski-Górecki
e1f65bdf7b
vm: add shutdown_timeout property, make vm.shutdown(wait=True) use it
vm.shutdown(wait=True) waited indefinitely for the shutdown, which makes
useless without some boilerplate handling the timeout. Since the timeout
may depend on the operating system inside, add a per-VM property for it,
with value inheritance from template and then from global
default_shutdown_timeout property.

When timeout is reached, the method raises exception - whether to kill
it or not is left to the caller.

Fixes QubesOS/qubes-issues#1696
2018-10-26 23:54:04 +02:00
Marek Marczykowski-Górecki
b65fdf9700
storage: convert lvm driver to async version
LVM operations can take significant amount of time. This is especially
visible when stopping a VM (`vm.storage.stop()`) - in that time the
whole qubesd freeze for about 2 seconds.

Fix this by making all the ThinVolume methods a coroutines (where
supported). Each public coroutine is also wrapped with locking on
volume._lock to avoid concurrency-related problems.
This all also require changing internal helper functions to
coroutines. There are two functions that still needs to be called from
non-coroutine call sites:
 - init_cache/reset_cache (initial cache fill, ThinPool.setup())
 - qubes_lvm (ThinVolume.export()

So, those two functions need to live in two variants. Extract its common
code to separate functions to reduce code duplications.

Fixes QubesOS/qubes-issues#4283
2018-10-23 16:53:35 +02:00
Marek Marczykowski-Górecki
299c514647
tests: fix asyncio usage in storage_lvm.TC_01_ThinPool
Both vm.create_on_disk() and vm.start() are coroutines. Tests in this
class didn't run them, so basically didn't test anything.

Wrap couroutine calls with self.loop.run_until_complete().

Additionally, don't fail if LVM pool is named differently.
In that case, the test is rather sily, as it probably use the same pool
for source and destination (operation already tested elsewhere). But it
isn't a reason for failing the test.
2018-10-23 16:53:35 +02:00
Marek Marczykowski-Górecki
6170edb291
storage: allow import_data and import_data_end be coroutines
On some storage pools this operation can also be time consuming - for
example require creating temporary volume, and volume.create() already
can be a coroutine.
This is also requirement for making common code used by start()/create()
etc be a coroutine, otherwise neither of them can be and will block
other operations.

Related to QubesOS/qubes-issues#4283
2018-10-23 16:53:35 +02:00
Marek Marczykowski-Górecki
295705a708
doc: document features, qvm-features-request and services
Fixes QubesOS/qubes-issues#2829
2018-10-23 16:53:35 +02:00