Cleanup VMs in template reverse topological order, not network one.
Network can be set to None to break dependency, but template can't. For
netvm to be changed, kill VMs first (kill doesn't check network
dependency), so netvm change will not trigger side effects (runtime
change, which could fail).
This fixes cleanup for tests creating custom templates - previously
order was undefined and if template was tried removed before its child
VMs, it fails. All the relevant files were removed later anyway, but it
lead to python objects leaks.
First unregister the domain from collection, and only then call
remove_from_disk(). Removing it from collection prevent further calls
being made to it. Or if anything else keep a reference to it (for
example as a netvm), then abort the operation.
Additionally this makes it unnecessary to take startup lock when
cleaning it up in tests.
LVM operations can take significant amount of time. This is especially
visible when stopping a VM (`vm.storage.stop()`) - in that time the
whole qubesd freeze for about 2 seconds.
Fix this by making all the ThinVolume methods a coroutines (where
supported). Each public coroutine is also wrapped with locking on
volume._lock to avoid concurrency-related problems.
This all also require changing internal helper functions to
coroutines. There are two functions that still needs to be called from
non-coroutine call sites:
- init_cache/reset_cache (initial cache fill, ThinPool.setup())
- qubes_lvm (ThinVolume.export()
So, those two functions need to live in two variants. Extract its common
code to separate functions to reduce code duplications.
FixesQubesOS/qubes-issues#4283
Both vm.create_on_disk() and vm.start() are coroutines. Tests in this
class didn't run them, so basically didn't test anything.
Wrap couroutine calls with self.loop.run_until_complete().
Additionally, don't fail if LVM pool is named differently.
In that case, the test is rather sily, as it probably use the same pool
for source and destination (operation already tested elsewhere). But it
isn't a reason for failing the test.
Support 'supported-service.*' features requests coming from VMs. Set
such features directly (allow only value '1') and remove any not
reported in given call. This way uninstalling package providing given
service will automatically remove related 'supported-service...'
feature.
FixesQubesOS/qubes-issues#4402
Make sure events are sent to specific window found with xdotool search,
not the one having the focus. In case of Whonix, it can be first
connection wizard or whonixcheck report.
Searching based on class is used in many tests, searching by class, not
only by name in wait_for_window will allow to reduce code duplication.
While at it, improve it additionally:
- avoid active waiting for window and use `xdotool search --sync` instead
- return found window id
- add wait_for_window_coro() for use where coroutine is needed
- when waiting for window to disappear, check window id once and wait
for that particular window to disappear (avoid xdotool race
conditions on window enumeration)
Besides reducing code duplication, this also move various xdotool
imperfections handling into one place.
Allow removing VMs based on multiple prefixes at once. Removing them
separately doesn't handle all the dependencies (default_netvm, netvm)
correctly. This is needed for backup compatibility tests, where VMs are
created with `test-` prefix and `disp-tests-`. Additionally backup code
will create `disp-no-netvm`, which also may need to be removed.
If QUBES_TEST_TEMPLATES or QUBES_TEST_LOAD_ALL is set, create testcases
on modules import, instead of waiting until `load_tests` is called.
The `QUBES_TEST_TEMPLATES` doesn't require `qubes.xml` access, so it
should be safe to do regardless of the environment. The
`QUBES_TEST_LOAD_ALL` force loading tests (and reading `qubes.xml`)
regardless.
This is useful for test runners not supporting load_tests protocol. Or
with limited support - for example both default `unittest` runner and
`nose2` can either use load_tests protocol _or_ select individual tests.
Setting any of those variable allow to run a single test with those
runners.
With this feature used together load_tests protocol, tests could be
registered twice. Avoid this by not listing already defined test classes
in create_testcases_for_templates (according to load_tests protocol,
those should already be registered).
Allow easily list templates to be tested, without enumerating all the
test classes. This is especially useful with nose2 runner which can't
use load tests protocol _and_ select subset of tests.
If any object is leaked, QubesTestCase.cleanup_gc() raises an exception,
which have leaked objects list referenced in its traceback. This happens
after cleanup_traceback(), so isn't cleaned, causing cleanup_gc() fail
for all the further tests in the same test run.
Avoid this, by dropping list just before checking if any object is
leaked.
When a VM (or its template) does not explicitly set a qrexec_timeout,
fall back to a global default_qrexec_timeout (with default value 60),
instead of hardcoding the fallback value to 60.
This makes it easy to set a higher timeout for the whole system, which
helps users who habitually launch applications from several (not yet
started) VMs at the same time. 60 seconds can be too short for that.
qvm-sync-clock no longer fetches time from the network, by design.
So, lets not break clockvm's time and check only if everything else
correctly synchronize with it.
It isn't enough to wait for window to disappear, the service may still
be running. And if it is, test cleanup logic will complain about FD
leak.
To avoid deadlock on some test failure, do it with a timeout.
volume.path and volume.export() refer to the same thing in lvm_thin and
'file', but not in file-reflink (where volume.path is the -dirty.img,
which doesn't exist if the volume is not started).
Use the file-reflink storage driver if /var/lib/qubes is on a filesystem
that supports reflinks, e.g. when the btrfs layout was selected in
Anaconda. If it doesn't support reflinks (or if detection fails, e.g. in
an unprivileged test environment), use 'file' as before.
System tests are fragile for any object leaks, especially those holding
open files. Instead of wrapping all tests with try/finally removing
those local variables (as done in qubes.tests.integ.backup for example),
apply generic solution: clean all traceback objects from local
variables. Those aren't used to generate text report by either test
runner (qubes.tests.run and nose2). If one wants to break into debugger
and inspect tracebacks interactively, needs to comment out call to
cleanup_traceback.
* qubesos/pr/228:
storage/lvm: filter out warning about intended over-provisioning
tests: fix getting kernel package version inside VM
tests/extra: add start_guid option to VMWrapper
vm/qubesvm: fire 'domain-start-failed' event even if fail was early
vm/qubesvm: check if all required devices are available before start
storage/lvm: fix reporting lvm command error
storage/lvm: save pool's revision_to_keep property
Multiple properties are related to system installed inside the VM, so it
makes sense to have them the same for all the VMs based on the same
template. Modify default value getter to first try get the value from a
template (if any) and only if it fails, fallback to original default
value.
This change is made to those properties:
- default_user (it was already this way)
- kernel
- kernelopts
- maxmem
- memory
- qrexec_timeout
- vcpus
- virt_mode
This is especially useful for manually installed templates (like
Windows).
Related to QubesOS/qubes-issues#3585
Handle 'os' feature - if it's Windows, then set rpc-clipboard feature.
Handle 'gui-emulated' feature - request for specifically stubdomain GUI.
With 'gui' feature it is only possible to enable gui-agent based on, or
disable GUI completely.
Handle 'default-user' - verify it for weird characters and set
'default_user' property (if wasn't already set).
QubesOS/qubes-issues#3585
* lvm-snapshots:
tests: fix handling app.pools iteration
storage/lvm: add repr(ThinPool) for more meaningful test reports
tests: adjust for variable volume path
api/admin: expose volume path in admin.vm.volume.Info
tests: LVM: import, list_volumes, volatile volume, snapshot volume
tests: collect all SIGCHLD before cleaning event loop
storage/lvm: use temporary volume for data import
tests: ThinVolume.revert()
tests: LVM volume naming migration, and new naming in general
storage/lvm: improve handling interrupted commit
Since (for LVM at least) path is dynamic now, add information about it
to volume info. This is not very useful outside of dom0, but in dom0 it
can be very useful for various scripts.
This will disclose current volume revision id, but it is already
possible to deduce it from snapshots list.
Do not write directly to main volume, instead create temporary volume
and only commit it to the main one when operation is finished. This
solve multiple problems:
- import operation can be aborted, without data loss
- importing new data over existing volume will not leave traces of
previous content - especially when importing smaller volume to bigger
one
- import operation can be reverted - it create separate revision,
similar to start/stop
- easier to prevent qube from starting during import operation
- template still can be used when importing new version
QubesOS/qubes-issues#2256
First rename volume to backup revision, regardless of revisions_to_keep,
then rename -snap to current volume. And only then remove backup
revision (if exceed revisions_to_keep). This way even if commit
operation is interrupted, there is still a volume with the data.
This requires also adjusting few functions to actually fallback to most
recent backup revision if the current volume isn't found - create
_vid_current property for this purpose.
Also, use -snap volume for clone operation and commit it normally later.
This makes it safer to interrupt or even revert.
QubesOS/qubes-issues#2256
is_outdated() may be not supported by given volume pool driver. In that
case skip is_outdated information, instead of crashing the call.
FixesQubesOS/qubes-issues#3767
Before waiting for remaining tasks on event loop (including libvirt
events), make sure all destroyed objects are really destroyed. This is
especially important for libvirt connections, which gets cleaned up only
when appropriate destructor (__del__) register a cleanup callback and it
gets called by the loop.
Use VM's actual IP address as a gateway for other VMs, instead of
hardcoded link-local address. This is important for sys-net generated
ICMP diagnostics packets - those must _not_ have link-local source
address, otherwise wouldn't be properly forwarded back to the right VM.
* devel-storage-fixes:
storage/file: use proper exception instead of assert
storage/file: import data into temporary volume
storage/lvm: check for LVM LV existence and type when creating ThinPool
storage/lvm: fix size reporting just after creating LV
Similar to LVM changes, this fixes/improves multiple things:
- no old data visible in the volume
- failed import do not leave broken volume
- parially imported data not visible to running VM
QubesOS/qubes-issues#3169
* storage-properties:
storage: use None for size/usage properties if unknown
tests: call search_pool_containing_dir with various dirs and pools
storage: make DirectoryThinPool helper less verbose, add sudo
api/admin: add 'included_in' to admin.pool.Info call
storage: add Pool.included_in() method for checking nested pools
storage: move and generalize RootThinPool helper class
storage/kernels: refuse changes to 'rw' and 'revisions_to_keep'
api/admin: implement admin.vm.volume.Set.rw method
api/admin: include 'revisions_to_keep' and 'is_outdated' in volume info
Since Volume.is_outdated() is a method, not a property, add a function
for handling serialization. And at the same time, fix None serialization
(applicable to 'source' property).
QubesOS/qubes-issues#3256
Some handlers may want to call into other VMs (or even the one asking),
but vm.run() functions are coroutines, so needs to be called from
another coroutine. Allow for that.
Also fix typo in documentation.
Some kernels (like pvgrub2) may not provide modules.img and it isn't an
error. Don't break VM startup in that case, skip that device instead.
FixesQubesOS/qubes-issues#3563
* qubesos/pr/190:
Missed one test, adding default-user in assert for test test_621_qdb_vm_with_network in TC_90
replaced underscore by dash and update test accordingly
Updated assert content for test_620_qdb_standalone in TC_90_QubesVM
Added the default_user property from the Qube to the qubesdb so it is available when starting X. This is the 1st part of a fix for issue https://github.com/QubesOS/qubes-issues/issues/2372
This adds the file-reflink storage driver. It is never selected
automatically for pool creation, especially not the creation of
'varlibqubes' (though it can be used if set up manually).
The code is quite small:
reflink.py lvm.py file.py + block-snapshot
sloccount 334 lines 447 (134%) 570 (171%)
Background: btrfs and XFS (but not yet ZFS) support instant copies of
individual files through the 'FICLONE' ioctl behind 'cp --reflink'.
Which file-reflink uses to snapshot VM image files without an extra
device-mapper layer. All the snapshots are essentially freestanding;
there's no functional origin vs. snapshot distinction.
In contrast to 'file'-on-btrfs, file-reflink inherently avoids
CoW-on-CoW. Which is a bigger issue now on R4.0, where even AppVMs'
private volumes are CoW. (And turning off the lower, filesystem-level
CoW for 'file'-on-btrfs images would turn off data checksums too, i.e.
protection against bit rot.)
Also in contrast to 'file', all storage features are supported,
including
- any number of revisions_to_keep
- volume.revert()
- volume.is_outdated
- online fstrim/discard
Example tree of a file-reflink pool - *-dirty.img are connected to Xen:
- /var/lib/testpool/appvms/foo/volatile-dirty.img
- /var/lib/testpool/appvms/foo/root-dirty.img
- /var/lib/testpool/appvms/foo/root.img
- /var/lib/testpool/appvms/foo/private-dirty.img
- /var/lib/testpool/appvms/foo/private.img
- /var/lib/testpool/appvms/foo/private.img@2018-01-02T03:04:05Z
- /var/lib/testpool/appvms/foo/private.img@2018-01-02T04:05:06Z
- /var/lib/testpool/appvms/foo/private.img@2018-01-02T05:06:07Z
- /var/lib/testpool/appvms/bar/...
- /var/lib/testpool/appvms/...
- /var/lib/testpool/template-vms/fedora-26/...
- /var/lib/testpool/template-vms/...
It looks similar to a 'file' pool tree, and in fact file-reflink is
drop-in compatible:
$ qvm-shutdown --all --wait
$ systemctl stop qubesd
$ sed 's/ driver="file"/ driver="file-reflink"/g' -i.bak /var/lib/qubes/qubes.xml
$ systemctl start qubesd
$ sudo rm -f /path/to/pool/*/*/*-cow.img*
If the user tries to create a fresh file-reflink pool on a filesystem
that doesn't support reflinks, qvm-pool will abort and mention the
'setup_check=no' option. Which can be passed to force a fallback on
regular sparse copies, with of course lots of time/space overhead. The
same fallback code is also used when initially cloning a VM from a
foreign pool, or from another file-reflink pool on a different
mountpoint.
'journalctl -fu qubesd' will show all file-reflink copy/rename/remove
operations on VM creation/startup/shutdown/etc.
* qubesos/pr/187:
Don't fail create/clone if /var/lib/qubes/TYPE/NAME/ exists
Make 'qvm-volume revert' really use the latest revision
Fix wrong mocks of Volume.revisions
* qubesos/pr/185:
vm: remove doc for non-existing event `monitor-layout-change`
vm: include tag/feature name in event name
events: add support for wildcard event handlers
* qubesos/pr/180:
vm/qubesvm: default to PVH unless PCI devices are assigned
vm/qubesvm: expose 'start_time' property over Admin API
vm/qubesvm: revert backup_timestamp to '%s' format
doc: link qvm-device man page for qvm-block, qvm-pci, qvm-usb
Human readable format `str(datetime.datetime)` is a nightmare for Admin
API level communication. Especially setting the property in a format
that it was read was not supported, and handling such format in
untrusted input handling code is a bad idea. Revert to a simple intiger
format.
Rename events:
- domain-feature-set -> domain-feature-set:feature
- domain-feature-delete -> domain-feature-delete:feature
- domain-tag-add -> domain-tag-add:tag
- domain-tag-delete -> domain-tag-delete:tag
Make it consistent with property-* events. It makes more sense to
include tag/feature name in event name, so handler can watch a single
tag/feature - which is the most common case. Otherwise, most handlers
would begin with `if feature == '...'` anyway, wasting time on most
events.
In cases where multiple features/tags should be handled by a single
handler, it is now possible to register a handler with wildcard, for
example `domain-feature-set:*`.
Support registering handlers for more flexible wildcard events: not only
'*', but also 'something*'. This allows to register handlers for
'property-set:*' and such.
Load integration tests from outside of core-admin repository, through
entry points.
Create wrapper for VM object to keep very basic compatibility with tests
written for core2. This means if test use only basic functionality
(vm.start(), vm.run()), the same test will work for both core2 and
core3. This is especially important for app-* repositories, where the
same version serves multiple Qubes branches.
This also hides asyncio usage from tests writer.
See QubesOS/qubes-issues#1800 for details on original feature.
Test base functions of dom0 module (creating VM, setting property) and
configuring system inside of VM (through DispVM). The later is done for
each available template (the process use salt installed in that
template, not copied from dom0).
QubesOS/qubes-issues#3316
When dom0 do not provide the kernel, it should also not set kernel
command line in libvirt config. Otherwise qemu in stubdom fails to start
because it get -append option without -kernel, which is illegal
configuration.
FixesQubesOS/qubes-issues#3339
Add property for IPv6 address ('ip6'). Build default value similarly to
IPv4 - common prefix + QID or Disp ID (for DispVMs).
This all is disabled unless 'ipv6' feature is enabled. It is inherited
from netvm (not template).
Even when enabled, VM may decide to not use it - or simply not support
it.
QubesOS/qubes-issues#718
Allow using default feature value from netvm, not template. This makes
sense for network-related features like using tor, supporting ipv6 etc.
Similarly to check_with_template, expose it also on Admin API.
Having both default_netvm and default_fw_netvm cause a lot of confusion,
because it isn't clear for the user which one is used when. Additionally
changing provides_network property may also change netvm property, which
may be unintended effect. This as a whole make it hard to:
- cover all netvm-changing actions with policy for Admin API
- cover all netvm-changing events (for example to apply the change to
the running VM, or to check for netvm loops)
As suggested by @qubesuser, kill the default_fw_netvm property and
simplify the logic around it.
Since we're past rc1, implement also migration logic. And add tests for
said migration.
FixesQubesOS/qubes-issues#3247
* qubesos/pr/166:
create "lvm" pool using rootfs thin pool instead of hardcoding qubes_dom0-pool00
change default pool code to be fast
cache PropertyHolder.property_list and use O(1) property name lookups
remove unused netid code
cache isinstance(default, collections.Callable)
don't access netvm if it's None in visible_gateway/netmask
There were many cases were the check was missing:
- changing default_netvm
- resetting netvm to default value
- loading already broken qubes.xml
Since it was possible to create broken qubes.xml using legal calls, do
not reject loading such file, instead break the loop(s) by setting netvm
to None when loop is detected. This will be also useful if still not all
places are covered...
Place the check in default_netvm setter. Skip it during qubes.xml loading
(when events_enabled=False), but still keep it in setter, to _validate_ the
value before any property-* event got fired.
* 20171107-storage:
api/admin: add API for changing revisions_to_keep dynamically
storage/file: move revisions_to_keep restrictions to property setter
api/admin: hide dd statistics in admin.vm.volume.Import call
storage/lvm: fix importing different-sized volume from another pool
storage/file: fix preserving spareness on volume clone
api/admin: add pool size and usage to admin.pool.Info response
storage: add size and usage properties to pool object
* 20171107-tests-backup-api-misc:
test: make race condition on xterm close less likely
tests/backupcompatibility: fix handling 'internal' property
backup: fix handling target write error (like no disk space)
tests/backupcompatibility: drop R1 format tests
backup: use offline_mode for backup collection
qubespolicy: fix handling '$adminvm' target with ask action
app: drop reference to libvirt object after undefining it
vm: always log startup fail
api: do not log handled errors sent to a client
tests/backups: convert to new restore handling - using qubesadmin module
app: clarify error message on failed domain remove (used somewhere)
Fix qubes-core.service ordering
xterm is very fast on closing when application inside terminates. It is
so fast with closing on keydown event that xdotool do not manage to send
keyup event, resulting in xdotool crash. Add a little more time for
that.
Besides converting itself, change how the test verify restore
correctness: first collect VM metadata (and hashes of data) into plain
dict, then compare against it. This allow to destroy old VMs objects
before restoring the backup, so avoid having duplicate objects of the
same VM - which results in weird effects like trying to undefine libvirt
object twice.