This should allow importing a volume and changing the size at the
same time, without performing the resize operation on original
volume first.
The internal API has been renamed to internal.vm.volume.ImportBegin
to avoid confusion, and for symmetry with ImportEnd.
See QubesOS/qubes-issues#5239.
* origin/pr/310:
storage/reflink: fix comment
storage/reflink: bail out early on most FICLONE errnos
storage/reflink: pool.setup_check -> pool._setup_check
Don't fall back on 'cp' if the FICLONE ioctl gives an errno that's not
plausibly reflink specific, because in such a case any fallback could
theoretically mask real but intermittent system/storage errors.
Looking through ioctl_ficlone(2) and the kernel source, it should be
sufficient to do the fallback only on EBADF/EINVAL/EOPNOTSUPP/EXDEV.
(EISDIR/ETXTBSY don't apply to this storage driver, which will never
legitimately attempt to reflink a directory or an active - in the
storage domain - swap file.)
* origin/pr/303:
Update tests after adding /connected-ips
Also reload /connected-ips on firewall change / domain spawn
Also store /connected-ips6 for machines that have IPv6 addresses
Don't try to write to qubesdb of an offline VM
Maintain a list of connected machine IPs in qubesdb
Extension objects are singletons and normally do not require any special
cleanup. But in case of tests, we try to remove all the qubes objects
between tests and the cache in usb extension makes it hard.
Add a 'qubes-close' event that extensions can handle to remove extra
references stored in extension objects themselves.
Due to strangeness of KeyError (it overrrides __str__ method) in some
cases exceptions received superflous quotation marks when inheriting
from it.
fixesQubesOS/qubes-issues#5106
* origin/pr/298:
tests/network: let xl devd bring the interfaces up
tests/network: improve error reporting
api/admin: implement *.property.GetAll methods
Avoid resetting clocksync service of just enabled clockvm
doc/tests: extend qubes-specific quirks in tests
tests: add include and exclude lists for extra tests loader
xl devd doesn't use startup notify, so when the service is started
(according to systemd) it may still be initializing interfaces. Add a
little sleep for that.
Allow getting all the VM properties with one call. This greatly improve
performance of an applications retrieving many/all of them (qvm-ls,
qubes manager etc)
QubesOS/qubes-issues#5415FixesQubesOS/qubes-issues#3293
One alternative would look like
import ctypes
sizeof_int = ctypes.sizeof(ctypes.c_int)
FICLONE = (1073741824 % 256**sizeof_int) | 37897 | (sizeof_int << 16)
but, even if the above really(?) is a 100% correct Python port of
$ echo FICLONE | cpp -include linux/fs.h | tail -n 1
it still seems more likely that the ctypes package is somehow buggy
somewhere than for Qubes storage to run on an exotic architecture with
non 32 bit ints (in the foreseeable future).
So just document the baked in assumption.
The default (= text) mode for a loop device which contains a VM image
looked weird, even though it didn't make a difference here because the
dev_io object was never actually read from.
When setting global clockvm property, 'clocksync' service is
automatically added to the new value and then removed from the old one.
But if the new value is the same as the old one, the service gets
removed from the just set new value.
Check for this case explicitly.
FixesQubesOS/qubes-issues#4939
'extra' tests run is getting ridiculously long. Allow splitting it into
several jobs. Since this appears as just one class from the test loader
perspective, implement it as environment variables:
- QUBES_TEST_EXTRA_INCLUDE - load just selected tests
- QUBES_TEST_EXTRA_EXCLUDE - skip selected tests (to select "the rest"
tests)
* origin/pr/297:
doc: remove useless _static generated by sphinx
cleanup-dispvms: fix python shebang
spec: fix missing dependency
Fix Sphinx 2 new API for Fedora 31+
doc: Make PEP8 happier
qmemmand: separate SystemState init xc and xs to a 'init' method
doc: drop moved elsewhere components
Only first 4 disks can be emulated as IDE disks by QEMU. Specifically,
CDROM must be one of those first 4 disks, otherwise it will be
ignored. This is especially important if one wants to boot the VM from
that CDROM.
Since xvdd normally is a kernel-related volume (boot image, modules) it
makes perfect sense to re-use it for CDROM. It is either set for kernel
volume (in which case, VM should boot from it and not the CDROM), or
(possibly bootable) CDROM.
This needs to be done in two places:
- BlockExtension for dynamic attach
- libvirt xen.xml - for before-boot attach
In theory the latter would be enough, but it would be quite confusing
that device will get different options depending on when it's attached
(in addition to whether the kernel is set - introduced here).
This all also means, xvdd not always is a "system disk". Adjust listing
connected disks accordingly.
- allow TestQubesDB to be populated with initial data
- support list() method
- allow to register pre-created VM instance (useful for AdminVM, which
don't accept setting qid)
Pool.volumes property is implemented in a base class, individual drivers
should provide list_volumes() method as a backend for that property.
Fix this in a LinuxKernel pool.
Similar to admin.vm.volume.List. There are still other
admin.pool.volume.* methods missing, but lets start with just this one.
Those with both pool name and volume id arguments may need some more
thoughts.
Don't realy on a volume configuration only, it's easy to miss updating
it. Specifically, import_volume() function didn't updated the size based
on the source volume.
The size that the actual VM sees is based on the
file size, and so is the filesystem inside. Outdated size property can
lead to a data loss if the user perform an action based on a incorrect
assumption - like extending size, which actually will shrink the volume.
FixesQubesOS/qubes-issues#4821
Handle this case specifically, as way too many users ignore the message
during installation and complain it doesn't work later.
Name the problem explicitly, instead of pointing at libvirt error log.
FixesQubesOS/qubes-issues#4689
admin.pool.UsageDetails reports the usage data, unlike
admin.pool.Info, which should report the config/unchangeable data.
At the moment admin.Pool.Info still reports usage, to maintain
compatibility, but once all relevant tools are updated,
it should just return configuration data.
Added usage_details method to Pool class
(returns a dictionary with detailed information
on pool usage) and LVM implementation that returns
metadata info.
Needed for QubesOS/qubes-issues#5053
* origin/pr/287:
app: fix docstrings PEP8 refactor
tests: remove iptables_header content in test_622_qdb_keyboard_layout
tests: add test for guivm and keyboard_layout
gui: simplify setting guivm xid and keyboard layout
Make pylint happier
gui: set keyboard layout from feature
Handle GuiVM properties
Make PEP8 happier
Similar to domain-start-failed, add an event fired when
domain-pre-shutdown was fired but the actual operation failed.
Note it might not catch all the cases, as shutdown() may be called with
wait=False, which means it won't wait fot the actual shutdown. In that
case, timeout won't result in domain-shutdown-failed event.
QubesOS/qubes-issues#5380
Move this functionality from our custom runner (qubes.tests.run),
into base test class. This is very useful for correlating logs, so lets
have it with nose2 runner too.
Prevent starting a VM while it's being removed. Something could try to
start a VM just after it's being killer but before removing it (Whonix
example from previous commit is real-life case). The window specifically
is between kill() call and removing it from collection
(`del app.domains[vm.qid]`). Grab a startup_lock for the whole operation
to prevent it.
Check early (but after grabbing a startup_lock) if VM isn't just
removed. This could happen if someone grabs its reference from other
places (netvm of something else?) or just before removing it.
This commit makes the simple removal from the collection (done as the
first step in admin.vm.Remove implementation) efficient way to block
further VM startups, without introducing extra properties.
For this to be effective, removing from the collection, needs to happen
with the startup_lock held. Modify admin.vm.Remove accordingly.
There are cases when destination domain doesn't exist when the call gets
to qubesd. Namely:
1. The call comes from dom0, which bypasses qrexec policy
2. Domain was removed between checking the policy and here
Handle the the same way as if the domain wouldn't exist at policy
evaluation stage either - i.e. refuse the call.
On the client side it doesn't change much, but on the server call it
avoids ugly, useless tracebacks in system journal.
FixesQubesOS/qubes-issues#5105
Allow to manual inspect test environment after test fails. This is
similar to --do-not-clean option we had in R3.2.
The decorator should be used only while debugging and should never be
applied to the code committed into repository.
Domain shutdown handling may take extended amount of time, especially on
slow machine (all the LVM teardown etc). Take care of it by
synchronizing using vm.startup_lock, instead of increasing constant
delay. This way, the shutdown event handler needs to be started within
3s, not finish in this time.
When libvirt daemon is restarted, qubesd attempt to re-connect to the
new instance transparently (through virConnect object wrapper). But the
code lacked re-registering event handlers.
Fix this by adding reconnect callback argument to virConnectWrappper, to
be called after new connection is established. This callback will
additionally get old connection as an argument, if any cleanup is
needed. The old connection is closed just after callback returns.
Use this to re-register event handler, but also unregister old handler
first. While full unregister wont work on since old libvirt daemon
instance is dead already, it will still cleanup client structures.
Since the old libvirt connection is closed now, adjust also domain
reconnection logic, to handle stale connection object. In that case
isAlive() call throws an exception.
FixesQubesOS/qubes-issues#5303
Qubesd wrongly required default_template global property to be not None.
Furthermore, even without hard failure set, require_property method
raised an exception in case of a property having incorrect None value.
It now logs an error message instead, as designed.
fixesQubesOS/qubes-issues#5326
This fixes an invalid response generated by get_timezone when the time zones are composed by 3 parts, for example:
America/Argentina/Buenos_Aires
America/Indiana/Indianapolis
Update utils.py
Give raw cpu_time value, instead of normalized one (to number of vcpus),
as documented.
Move the normalization to cpu_usage calculation. At the same time, add
cpu_usage_raw without it, if anyone needs it.
QubesOS/qubes-issues#4531
* origin/pr/273:
tests: check importing empty data into ReflinkVolume
tests: check importing empty data into ThinVolume
tests: check importing empty data into FileVolume
tests: improve cleanup after LVM tests
During regular VM shutdown, the VM should sync() anyway. (And
admin.vm.volume.Import does fdatasync(), which is also fine.) But let's
be extra careful.
This is needed as a consequence of d8b6d3ef ("Make add_pool/remove_pool
coroutines, allow Pool.{setup,destroy} as coroutines"), but there hasn't
been any problem so far because no storage driver implemented pool
setup() as a coroutine.
Revert to use umount -l in storage tests cleanup. With fixed permissions
in 4234fe51 "tests: fix cleanup after reflink tests", it shouldn't cause
issues anymore, but apparently on some systems test cleanup fails
otherwise.
Reported by @rustybird
This reverts commit b6f77ebfa1.
There were (at least) five ways for the volume's nominal size and the
volume image file's actual size to desynchronize:
- loading a stale qubes.xml if a crash happened right after resizing the
image but before saving the updated qubes.xml (-> previously fixed)
- restarting a snap_on_start volume after resizing the volume or its
source volume (-> previously fixed)
- reverting to a differently sized revision
- importing a volume
- user tinkering with image files
Rather than trying to fix these one by one and hoping that there aren't
any others, override the volume size getter itself to always update from
the image file size. (If the getter is called though the storage API, it
takes the volume lock to avoid clobbering the nominal size when resize()
is running concurrently.)
And change the volume lock from an asyncio.Lock to a threading.Lock -
locking is now handled before coroutinization.
This will allow the coroutinized resize() and a new *not* coroutinized
size() getter from one of the next commits ("storage/reflink: preferably
get volume size from image size") to both run under the volume lock.
Successfully resize volumes without any currently existing image file,
e.g. cleanly stopped volatile volumes: Just update the nominal size in
this case.
Calling qrexec service dom0->dom0 can be useful when handling things
that can run in dom0 or other domain. This makes the interface uniform.
Example use cases include GUI VM and Audio VM.
Ask qubesd for admin.vm.Console call. This allows to intercept it with
admin-permission event. While at it, extract tty path extraction to
python, where libvirt domain object is already available.
FixesQubesOS/qubes-issues#5030
The initializer of the class DispVM first calls the initializer of the
QubesVM class, which among other things sets properties as specified in
kwargs, and then copies over the properties of the template. This can
lead to properties passed explicitly by the caller through kwargs being
overwritten.
Hence only clone properties of the template that are still set to
default in the DispVM.
FixesQubesOS/qubes-issues#4556
If Firefox is started for the first time, it will open both requested
page and its welcome page. This means closing the window will trigger a
confirmation about closing multiple tabs. Handle this.
Disk usage may change dynamically not only at VM start/stop. Refresh the
size cache before checking usage property, but no more than once every
30sec (refresh interval of disk space widget)
FixesQubesOS/qubes-issues#4888
If unmount is going to fail, let it do so explicitly, instead of hiding
the failure now, and observing it later at rmdir.
And if it fails, lets report what process is using that mount point.
Xenial environment has much newer GTK/Glib. For those test to run, few
more changes are needed:
- relevant GTK packages installed
- X server running (otherwise GTK terminate the process on module
import...)
- enable system side packages in virtualenv set by travis
Global properties should be loaded in stage 3, mark them as such.
Otherwise they are not loaded at all.
This applies to stats_interval and check_updates_vm. Others were
correct.
FixesQubesOS/qubes-issues#4856
Setting template_for_dispvms=False will at least prevent starting
(already existing) DispVMs based on it. Those should be first removed.
Add also tests for this case.
If kernel package ships default-kernelopts-common.txt file, use that
instead of hardcoded Linux-specific options.
For Linux kernel it may include xen_scrub_pages=0 option, but only if
initrd shipped with this kernel re-enable this option later.
QubesOS/qubes-issues#4839QubesOS/qubes-issues#4736
Return meaningful value for kernels_dir if VM has no 'kernel' volume.
Right now it's mostly useful for tests, but could be also used for new
VM classes which doesn't have modules.img, but still use dom0-provided
kernel.
First of all, do not try to call those services in VMs not having qrexec
installed - for example Windows VMs without qubes tools.
Then, even if service call fails for any other reason, only log it but
do not prevent other services from being called. A single uncooperative
VM should generally be able only to hurt itself, not break other VMs
during suspend.
FixesQubesOS/qubes-issues#3489
Since we have more reliable domain-shutdown event delivery (it si
guaranteed to be delivered before subsequent domain start, even if
libvirt fails to report it), it's better to move detach_network call to
domain-shutdown handler. This way, frontend domain will see immediately
that the backend is gone. Technically it already know that, but at least
Linux do not propagate that anywhere, keeping the interface up,
seemingly operational, leading to various timeouts.
Additionally, by avoiding attach_network call _just_ after
detach_network call, it avoids various race conditions (like calling
cleanup scripts after new device got already connected).
While libvirt itself still doesn't cleanup devices when the backend
domain is gone, this will emulate it within qubesd.
FixesQubesOS/qubes-issues#3642FixesQubesOS/qubes-issues#1426
Pool setup/destroy may be a time consuming operation, allow them to be
asynchronous. Fortunately add_pool and remove_pool are used only through
Admin API, so the change does not require modification of other
components.
Boolean properties require specific setter to properly handle literal
"True" and "False" values. Previously it required all bool properties to
include 'setter=qubes.property.bool' in addition to 'type=bool'.
This fixes loading some boolean properties from qubes.xml. Specifically
at least include_in_backups on DispVM class lacked setter, which
resulted in property being reset to True automatically on qubesd
restart.
FixesQubesOS/qubes-issues#4831
If default-kernelopts-pci.txt is present, it will override default
built-in kernelopts for the VMs with PCI device assigned.
Similarly if default-kernelopts-nopci.txt is present, it will override
default kernelopts for VMs without PCI devices.
For template-based VMs, kernelopts of the template takes precedence over
default-kernelopts-nopci.txt but not default-kernelopts-pci.txt.
FixesQubesOS/qubes-issues#4839
If a specific DVM template is used for given DispVM, make new DispVMs
called from it use the same DVM template (unless explicitly overridden).
This prevent various isolation bypass cases, like using a chain of
DispVMs to access network.
Look for the first updateable template up in the template chain, instead
of going just one level up. Especially this applies to
DispVM -> AppVM -> TemplateVM case.
If DispVM reports available updates, 'updates-available'
flag should be set on relevant TemplateVM, not AppVM (*-dvm).
Include test for the new case.
FixesQubesOS/qubes-issues#3736
Instead of checking if domain is still running/paused, try to kill it
anyway and ignore appropriate exception. Otherwise domain could die
before the check and killing.
some-vm-root is a valid VM name, and in that case it's volume can be
named some-vm-root-private. Do not let it confuse revision listing,
check for unexpected '-' in volume revision number.
The proper solution would be to use different separator, that is not
allowed in VM names. But that would require migration code that is
undesired in the middle of stable release life cycle.
FixesQubesOS/qubes-issues#4680
* tests-20181223:
tests: drop expectedFailure from qubes_desktop_run test
tests: grub in HVM qubes
tests: update dom0_update for new updates available flag
tests: regression test LVM listing code
tests/extra: wrap ProcessWrapper.wait() to be asyncio-aware
tests: adjust backupcompat for new maxmem handling
This commit resolves a bug which causes strings such as "350MiB" to be
rejected by parse_size, due to the fact that parse_size changes the case
of letters in the input string ("350MiB") to uppercase ("350MIB"), but
fails to do the same for the elements of the units conversion table.
The correction is simple: Apply the same case change to the units
table elements before comparison.
The user of ExtraTestCase don't need to know anything about asyncio.
vm.run().wait() normally is a coroutine, but provide a wrapper that
handle asyncio.
This fixes FD leak in input proxy tests.
Since 4dc86310 "Use maxmem=0 to disable qmemman, add more automation to
it" meminfo-writer service is not accessible directly. maxmem property
is used to encode memory management instead.
- Two new methods: .features.check_with_adminvm() and
.check_with_template_and_adminvm(). Common code refactored.
- Two new AdminAPI calls to take advantage of the methods:
- admin.vm.feature.CheckWithAdminVM
- admin.vm.feature.CheckWithTemplateAndAdminVM
- Features manager moved to separate module in anticipation of features
on app object in R5.0. The attribute Features.vm renamed to
Features.subject.
- Documentation, tests.
* devel-20181205:
vm/dispvm: fix /qubes-vm-presistence qubesdb entry
vm/mix/net: prevent setting provides_network=false if qube is still used
tests: updates-available notification
tests/network: reduce code duplication
tests: listen on 'misc' socket too
First install test-pkg-1.0, then add test-pkg-1.1 to repo and check if
updates-available flag is set. Then install updates and check if the
flag is cleared.
QubesOS/qubes-issues#2009
The new property is meant for management stack (Salt) to set which DVM
template should be used to maintain given VM. Since the DispVM based on
it will be given ultimate control over target VM (qubes.VMShell
service), it should be trusted. The one pointed to by default_dispvm
not necessary is one.
The property defaults to the value from the template (if any), and then
to a global management_dispvm property. By default it is set to None.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Migrate meminfo-writer=False service setting to maxmem=0 as a method to
disable dynamic memory management. Remove the service from vm.features
dict in the process.
Additionally, translate any attempt to set the service.meminfo-writer
feature to either setting maxmem=0 or resetting it to the default (which
is memory balancing enabled if supported by given domain). This is to at
least partially not break existing tools using service.meminfo-writer as
a way to control dynamic memory management. This code does _not_ support
reading service.meminfo-writer feature state to get the current state of
dynamic memory management, as it would require synchronizing with all
the factors affecting its value. One of main reasons for migrating to
maxmem=0 approach is to avoid the need of such synchronization.
QubesOS/qubes-issues#4480
Use maxmem=0 for disabling dynamic memory balance, instead of cryptic
service.meminfo-writer feature. Under the hood, meminfo-writer service
is also set based on maxmem property (directly in qubesdb, not
vm.features dict).
Having this as a property (not "feature"), allow to have sensible
handling of default value. Specifically, disable it automatically if
otherwise it would crash a VM. This is the case for:
- domain with PCI devices (PoD is not supported by Xen then)
- domain without balloon driver and/or meminfo-writer service
The check for the latter is heuristic (assume presence of 'qrexec' also
can indicate balloon driver support), but it is true for currently
supported systems.
This also allows more reliable control of libvirt config: do not set
memory != maxmem, unless qmemman is enabled.
memory != maxmem only makes sense if qmemman for given domain is
enabled. Besides wasting some domain resources for extra page tables
etc, for HVM domains this is harmful, because maxmem-memory difference
is made of Popupate-on-Demand pool, which - when depleted - will kill
the domain. This means domain without balloon driver will die as soon
as will try to use more than initial memory - but without balloon driver
it sees maxmem memory and doesn't know about the lower limit.
FixesQubesOS/qubes-issues#4135
It makes a lot of sense to call long-running operations in that event
handler, including calling back into the VM. Allow that by using
fire_event_async, not just fire_event.
Also, document the event.
Commit 15cf593bc5 "tests/lvm: fix checking
lvm pool existence" attempted to fix handling '-' in pool name by using
/dev/VG/LV symlink. But those are not created for thin pools. Change
back to /dev/mapper, but include '-' mangling.
Related QubesOS/qubes-issues#4332
Restore old code for calculating subdir within the archive. The new one
had two problems:
- set '/' for empty input subdir - which caused qubes.xml.000 to be
named '/qubes.xml.000' (and then converted to '../../qubes.xml.000');
among other things, this results in the wrong path used for encryption
passphrase
- resolved symlinks, which breaks calculating path for any symlinks
within VM's directory (symlinks there should be treated as normal files
to be sure that actual content is included in the backup)
This partially reverts 4e49b951ce.
FixesQubesOS/qubes-issues#4493
vm.kill() will try to get vm.startup_lock, so it can't be called while
holding it already.
Fix this by extracting vm._kill_locked(), which expect the lock to be
already taken by the caller.
* origin/pr/239:
storage: fix NotImplementedError message for import_data()
storage/reflink: make resize()/import_volume() more readable
storage/reflink: unblock import_data() and import_data_end()
Try to collect more details about why the test failed. This will help
only if qvm-open-in-dvm exist early. On the other hand, if it hang, or
remote side fails to find the right editor (which results in GUI error
message), this change will not provide any more details.
First boot of whonix-ws based VM take extended period of time, because
a lot of files needs to be copied to private volume. This takes even
more time, when verbose logging through console is enabled. Extend the
timeout for that.
If domain is set to autostart, qubes-vm@ systemd service is used to
start it at boot. Cleanup the service when domain is removed, and
similarly enable the service when domain is created and already have
autostart=True.
FixesQubesOS/qubes-issues#4014
Cleanup VMs in template reverse topological order, not network one.
Network can be set to None to break dependency, but template can't. For
netvm to be changed, kill VMs first (kill doesn't check network
dependency), so netvm change will not trigger side effects (runtime
change, which could fail).
This fixes cleanup for tests creating custom templates - previously
order was undefined and if template was tried removed before its child
VMs, it fails. All the relevant files were removed later anyway, but it
lead to python objects leaks.
Cleaning up after domain shutdown (domain-stopped and domain-shutdown
events) relies on libvirt events which may be unreliable in some cases
(events may be processed with some delay, of if libvirt was restarted in
the meantime, may not happen at all). So, instead of ensuring only
proper ordering between shutdown cleanup and next startup, also trigger
the cleanup when we know for sure domain isn't running:
- at vm.kill() - after libvirt confirms domain was destroyed
- at vm.shutdown(wait=True) - after successful shutdown
- at vm.remove_from_disk() - after ensuring it isn't running but just
before actually removing it
This fixes various race conditions:
- qvm-kill && qvm-remove: remove could happen before shutdown cleanup
was done and storage driver would be confused about that
- qvm-shutdown --wait && qvm-clone: clone could happen before new content was
commited to the original volume, making the copy of previous VM state
(and probably more)
Previously it wasn't such a big issue on default configuration, because
LVM driver was fully synchronous, effectively blocking the whole qubesd
for the time the cleanup happened.
To avoid code duplication, factor out _ensure_shutdown_handled function
calling actual cleanup (and possibly canceling one called with libvirt
event). Note that now, "Duplicated stopped event from libvirt received!"
warning may happen in normal circumstances, not only because of some
bug.
It is very important that post-shutdown cleanup happen when domain is
not running. To ensure that, take startup_lock and under it 1) ensure
its halted and only then 2) execute the cleanup. This isn't necessary
when removing it from disk, because its already removed from the
collection at that time, which also avoids other calls to it (see also
"vm/dispvm: fix DispVM cleanup" commit).
Actually, taking the startup_lock in remove_from_disk function would
cause a deadlock in DispVM auto cleanup code:
- vm.kill (or other trigger for the cleanup)
- vm.startup_lock acquire <====
- vm._ensure_shutdown_handled
- domain-shutdown event
- vm._auto_cleanup (in DispVM class)
- vm.remove_from_disk
- cannot take vm.startup_lock again
First unregister the domain from collection, and only then call
remove_from_disk(). Removing it from collection prevent further calls
being made to it. Or if anything else keep a reference to it (for
example as a netvm), then abort the operation.
Additionally this makes it unnecessary to take startup lock when
cleaning it up in tests.
vm.shutdown(wait=True) waited indefinitely for the shutdown, which makes
useless without some boilerplate handling the timeout. Since the timeout
may depend on the operating system inside, add a per-VM property for it,
with value inheritance from template and then from global
default_shutdown_timeout property.
When timeout is reached, the method raises exception - whether to kill
it or not is left to the caller.
FixesQubesOS/qubes-issues#1696
LVM operations can take significant amount of time. This is especially
visible when stopping a VM (`vm.storage.stop()`) - in that time the
whole qubesd freeze for about 2 seconds.
Fix this by making all the ThinVolume methods a coroutines (where
supported). Each public coroutine is also wrapped with locking on
volume._lock to avoid concurrency-related problems.
This all also require changing internal helper functions to
coroutines. There are two functions that still needs to be called from
non-coroutine call sites:
- init_cache/reset_cache (initial cache fill, ThinPool.setup())
- qubes_lvm (ThinVolume.export()
So, those two functions need to live in two variants. Extract its common
code to separate functions to reduce code duplications.
FixesQubesOS/qubes-issues#4283
Both vm.create_on_disk() and vm.start() are coroutines. Tests in this
class didn't run them, so basically didn't test anything.
Wrap couroutine calls with self.loop.run_until_complete().
Additionally, don't fail if LVM pool is named differently.
In that case, the test is rather sily, as it probably use the same pool
for source and destination (operation already tested elsewhere). But it
isn't a reason for failing the test.
On some storage pools this operation can also be time consuming - for
example require creating temporary volume, and volume.create() already
can be a coroutine.
This is also requirement for making common code used by start()/create()
etc be a coroutine, otherwise neither of them can be and will block
other operations.
Related to QubesOS/qubes-issues#4283
Support 'supported-service.*' features requests coming from VMs. Set
such features directly (allow only value '1') and remove any not
reported in given call. This way uninstalling package providing given
service will automatically remove related 'supported-service...'
feature.
FixesQubesOS/qubes-issues#4402
Make it clear that volume creation fails because it needs to be in the
same pool as its parent. This message is shown in context of `qvm-create
-p root=MyPool` for example and the previous message didn't make sense
at all.
FixesQubesOS/qubes-issues#3438
R3 format had limitation of ~40 rules per VM. Do not generate compat
rules (possibly hitting that limitation) if new format, free of that
limitation is supported.
FixesQubesOS/qubes-issues#1570FixesQubesOS/qubes-issues#4228
Make sure events are sent to specific window found with xdotool search,
not the one having the focus. In case of Whonix, it can be first
connection wizard or whonixcheck report.
Searching based on class is used in many tests, searching by class, not
only by name in wait_for_window will allow to reduce code duplication.
While at it, improve it additionally:
- avoid active waiting for window and use `xdotool search --sync` instead
- return found window id
- add wait_for_window_coro() for use where coroutine is needed
- when waiting for window to disappear, check window id once and wait
for that particular window to disappear (avoid xdotool race
conditions on window enumeration)
Besides reducing code duplication, this also move various xdotool
imperfections handling into one place.
Allow removing VMs based on multiple prefixes at once. Removing them
separately doesn't handle all the dependencies (default_netvm, netvm)
correctly. This is needed for backup compatibility tests, where VMs are
created with `test-` prefix and `disp-tests-`. Additionally backup code
will create `disp-no-netvm`, which also may need to be removed.
If QUBES_TEST_TEMPLATES or QUBES_TEST_LOAD_ALL is set, create testcases
on modules import, instead of waiting until `load_tests` is called.
The `QUBES_TEST_TEMPLATES` doesn't require `qubes.xml` access, so it
should be safe to do regardless of the environment. The
`QUBES_TEST_LOAD_ALL` force loading tests (and reading `qubes.xml`)
regardless.
This is useful for test runners not supporting load_tests protocol. Or
with limited support - for example both default `unittest` runner and
`nose2` can either use load_tests protocol _or_ select individual tests.
Setting any of those variable allow to run a single test with those
runners.
With this feature used together load_tests protocol, tests could be
registered twice. Avoid this by not listing already defined test classes
in create_testcases_for_templates (according to load_tests protocol,
those should already be registered).
Allow easily list templates to be tested, without enumerating all the
test classes. This is especially useful with nose2 runner which can't
use load tests protocol _and_ select subset of tests.
If any object is leaked, QubesTestCase.cleanup_gc() raises an exception,
which have leaked objects list referenced in its traceback. This happens
after cleanup_traceback(), so isn't cleaned, causing cleanup_gc() fail
for all the further tests in the same test run.
Avoid this, by dropping list just before checking if any object is
leaked.
When a VM (or its template) does not explicitly set a qrexec_timeout,
fall back to a global default_qrexec_timeout (with default value 60),
instead of hardcoding the fallback value to 60.
This makes it easy to set a higher timeout for the whole system, which
helps users who habitually launch applications from several (not yet
started) VMs at the same time. 60 seconds can be too short for that.
qvm-sync-clock no longer fetches time from the network, by design.
So, lets not break clockvm's time and check only if everything else
correctly synchronize with it.
It isn't enough to wait for window to disappear, the service may still
be running. And if it is, test cleanup logic will complain about FD
leak.
To avoid deadlock on some test failure, do it with a timeout.
volume.path and volume.export() refer to the same thing in lvm_thin and
'file', but not in file-reflink (where volume.path is the -dirty.img,
which doesn't exist if the volume is not started).
Use the file-reflink storage driver if /var/lib/qubes is on a filesystem
that supports reflinks, e.g. when the btrfs layout was selected in
Anaconda. If it doesn't support reflinks (or if detection fails, e.g. in
an unprivileged test environment), use 'file' as before.
_wait_and_reraise() is similar to asyncio.gather(), but it preserves the
current behavior of waiting for all futures and only _then_ reraising
the first exception (if there is any) in line.
Also switch Storage.create() and Storage.clone() to _wait_and_reraise().
Previously, they called asyncio.wait() and implicitly swallowed all
exceptions.
With that syntax, the default timestamp would have been from the time of
the function's definition (not invocation). But all callers are passing
an explicit timestamp anyway.
Convert create(), verify(), remove(), start(), stop(), revert(),
resize(), and import_volume() into coroutine methods, via a decorator
that runs them in the event loop's thread-based default executor.
This reduces UI hangs by unblocking the event loop, and can e.g. speed
up VM starts by starting multiple volumes in parallel.
Instead of raising a NotImplementedError, just return self like 'file'
and lvm_thin. This is needed when Storage.clone() is modified in another
commit* to no longer swallow exceptions.
* "storage: factor out _wait_and_reraise(); fix clone/create"
Import volume data to a new _path_import (instead of _path_dirty) before
committing to _path_clean. In case the computer crashes while an import
operation is running, the partially written file should not be attached
to Xen on the next volume startup.
Use <name>-import.img as the filename like 'file' does, to be compatible
with qubes.tests.api_admin/TC_00_VMs/test_510_vm_volume_import.
When the AT_REPLACE flag for linkat() finally lands in the Linux kernel,
_replace_file() can be modified to use unnamed (O_TMPFILE) tempfiles.
Until then, make sure stale tempfiles from previous crashes can't hang
around for too long.
It's sort of useful to be able to revert a volume that has only ever
been started once to its empty state. And the lvm_thin driver allows it
too, so why not.
System tests are fragile for any object leaks, especially those holding
open files. Instead of wrapping all tests with try/finally removing
those local variables (as done in qubes.tests.integ.backup for example),
apply generic solution: clean all traceback objects from local
variables. Those aren't used to generate text report by either test
runner (qubes.tests.run and nose2). If one wants to break into debugger
and inspect tracebacks interactively, needs to comment out call to
cleanup_traceback.
* qubesos/pr/228:
storage/lvm: filter out warning about intended over-provisioning
tests: fix getting kernel package version inside VM
tests/extra: add start_guid option to VMWrapper
vm/qubesvm: fire 'domain-start-failed' event even if fail was early
vm/qubesvm: check if all required devices are available before start
storage/lvm: fix reporting lvm command error
storage/lvm: save pool's revision_to_keep property