Convert create(), verify(), remove(), start(), stop(), revert(),
resize(), and import_volume() into coroutine methods, via a decorator
that runs them in the event loop's thread-based default executor.
This reduces UI hangs by unblocking the event loop, and can e.g. speed
up VM starts by starting multiple volumes in parallel.
Instead of raising a NotImplementedError, just return self like 'file'
and lvm_thin. This is needed when Storage.clone() is modified in another
commit* to no longer swallow exceptions.
* "storage: factor out _wait_and_reraise(); fix clone/create"
Import volume data to a new _path_import (instead of _path_dirty) before
committing to _path_clean. In case the computer crashes while an import
operation is running, the partially written file should not be attached
to Xen on the next volume startup.
Use <name>-import.img as the filename like 'file' does, to be compatible
with qubes.tests.api_admin/TC_00_VMs/test_510_vm_volume_import.
When the AT_REPLACE flag for linkat() finally lands in the Linux kernel,
_replace_file() can be modified to use unnamed (O_TMPFILE) tempfiles.
Until then, make sure stale tempfiles from previous crashes can't hang
around for too long.
It's sort of useful to be able to revert a volume that has only ever
been started once to its empty state. And the lvm_thin driver allows it
too, so why not.
System tests are fragile for any object leaks, especially those holding
open files. Instead of wrapping all tests with try/finally removing
those local variables (as done in qubes.tests.integ.backup for example),
apply generic solution: clean all traceback objects from local
variables. Those aren't used to generate text report by either test
runner (qubes.tests.run and nose2). If one wants to break into debugger
and inspect tracebacks interactively, needs to comment out call to
cleanup_traceback.
* qubesos/pr/228:
storage/lvm: filter out warning about intended over-provisioning
tests: fix getting kernel package version inside VM
tests/extra: add start_guid option to VMWrapper
vm/qubesvm: fire 'domain-start-failed' event even if fail was early
vm/qubesvm: check if all required devices are available before start
storage/lvm: fix reporting lvm command error
storage/lvm: save pool's revision_to_keep property
Needed for event-driven domains-tray UI updating and anti-GUI-DoS
usability improvements.
Catches errors from event handlers to protect libvirt, and logs to
main qubesd logger singleton (by default meaning systemd journal).
Multiple properties are related to system installed inside the VM, so it
makes sense to have them the same for all the VMs based on the same
template. Modify default value getter to first try get the value from a
template (if any) and only if it fails, fallback to original default
value.
This change is made to those properties:
- default_user (it was already this way)
- kernel
- kernelopts
- maxmem
- memory
- qrexec_timeout
- vcpus
- virt_mode
This is especially useful for manually installed templates (like
Windows).
Related to QubesOS/qubes-issues#3585
Handle 'os' feature - if it's Windows, then set rpc-clipboard feature.
Handle 'gui-emulated' feature - request for specifically stubdomain GUI.
With 'gui' feature it is only possible to enable gui-agent based on, or
disable GUI completely.
Handle 'default-user' - verify it for weird characters and set
'default_user' property (if wasn't already set).
QubesOS/qubes-issues#3585
* lvm-snapshots:
tests: fix handling app.pools iteration
storage/lvm: add repr(ThinPool) for more meaningful test reports
tests: adjust for variable volume path
api/admin: expose volume path in admin.vm.volume.Info
tests: LVM: import, list_volumes, volatile volume, snapshot volume
tests: collect all SIGCHLD before cleaning event loop
storage/lvm: use temporary volume for data import
tests: ThinVolume.revert()
tests: LVM volume naming migration, and new naming in general
storage/lvm: improve handling interrupted commit
Resolve:
- no-else-return
- useless-object-inheritance
- useless-return
- consider-using-set-comprehension
- consider-using-in
- logging-not-lazy
Ignore:
- not-an-iterable - false possitives for asyncio coroutines
Ignore all the above in qubespolicy/__init__.py, as the file will be
moved to separate repository (core-qrexec) - it already has a copy
there, don't desynchronize them.
Since (for LVM at least) path is dynamic now, add information about it
to volume info. This is not very useful outside of dom0, but in dom0 it
can be very useful for various scripts.
This will disclose current volume revision id, but it is already
possible to deduce it from snapshots list.
Do not write directly to main volume, instead create temporary volume
and only commit it to the main one when operation is finished. This
solve multiple problems:
- import operation can be aborted, without data loss
- importing new data over existing volume will not leave traces of
previous content - especially when importing smaller volume to bigger
one
- import operation can be reverted - it create separate revision,
similar to start/stop
- easier to prevent qube from starting during import operation
- template still can be used when importing new version
QubesOS/qubes-issues#2256
First rename volume to backup revision, regardless of revisions_to_keep,
then rename -snap to current volume. And only then remove backup
revision (if exceed revisions_to_keep). This way even if commit
operation is interrupted, there is still a volume with the data.
This requires also adjusting few functions to actually fallback to most
recent backup revision if the current volume isn't found - create
_vid_current property for this purpose.
Also, use -snap volume for clone operation and commit it normally later.
This makes it safer to interrupt or even revert.
QubesOS/qubes-issues#2256
Fire 'domain-start-failed' even even if failure occurred during
'domain-pre-start' event. This will make sure if _anyone_ have seen
'domain-pre-start' event, will also see 'domain-start-failed'. In some
cases it will look like spurious 'domain-start-failed', but it is safer
option than the alternative.
Fail the VM start early if some persistently-assigned device is missing.
This will both save time and provide clearer error message.
FixesQubesOS/qubes-issues#3810
is_outdated() may be not supported by given volume pool driver. In that
case skip is_outdated information, instead of crashing the call.
FixesQubesOS/qubes-issues#3767
Before waiting for remaining tasks on event loop (including libvirt
events), make sure all destroyed objects are really destroyed. This is
especially important for libvirt connections, which gets cleaned up only
when appropriate destructor (__del__) register a cleanup callback and it
gets called by the loop.
Use VM's actual IP address as a gateway for other VMs, instead of
hardcoded link-local address. This is important for sys-net generated
ICMP diagnostics packets - those must _not_ have link-local source
address, otherwise wouldn't be properly forwarded back to the right VM.
If domain die when trying to kill it, qubesd may loose a race and try to
kill it anyway. Handle libvirt exception in that case and conver it to
QubesVMNotStartedError - as it would be if qubesd would win the race.
FixesQubesOS/qubes-issues#3755
Scripts do parse its output sometimes (especially `lvs`), so make sure
we always gets the same format, regardless of the environment. Including
decimal separator.
FixesQubesOS/qubes-issues#3753
Don't rely on timestamps to sort revisions - the clock can go backwards
due to time sync. Instead, use a monotonically increasing natural number
as the revision ID.
Old revision example: private.img@2018-01-02-03T04:05:06Z (ignored now)
New revision example: private.img.123@2018-01-02-03T04:05:06Z
* devel-storage-fixes:
storage/file: use proper exception instead of assert
storage/file: import data into temporary volume
storage/lvm: check for LVM LV existence and type when creating ThinPool
storage/lvm: fix size reporting just after creating LV
Similar to LVM changes, this fixes/improves multiple things:
- no old data visible in the volume
- failed import do not leave broken volume
- parially imported data not visible to running VM
QubesOS/qubes-issues#3169
* storage-properties:
storage: use None for size/usage properties if unknown
tests: call search_pool_containing_dir with various dirs and pools
storage: make DirectoryThinPool helper less verbose, add sudo
api/admin: add 'included_in' to admin.pool.Info call
storage: add Pool.included_in() method for checking nested pools
storage: move and generalize RootThinPool helper class
storage/kernels: refuse changes to 'rw' and 'revisions_to_keep'
api/admin: implement admin.vm.volume.Set.rw method
api/admin: include 'revisions_to_keep' and 'is_outdated' in volume info
It may happen that one pool is inside a volume of other pool. This is
the case for example for varlibqubes pool (file driver,
dir_path=/var/lib/qubes) and default lvm pool (lvm_thin driver). The
latter include whole root filesystem, so /var/lib/qubes too.
This is relevant for proper disk space calculation - to not count some
space twice.
QubesOS/qubes-issues#3240QubesOS/qubes-issues#3241
This pool driver support only rw=False and revisions_to_keep=0 volumes.
Since there is API for changing those properties dynamically, block it
at pool driver level, instead of silently ignoring them.
* qubesos/pr/200:
Removed self.rules != old_rules
Avoid UTC datetime
Wrong init var to bool and missing call to total_seconds()
FixesQubesOS/qubes-issues#3661
* qubesos/pr/201:
storage/reflink: reorder start() to be more readable
storage/reflink: simplify
storage/reflink: let _remove_empty_dir() ignore ENOTEMPTY
storage/reflink: show size in refused volume shrink message
storage/reflink: fsync() after resizing existing file
Since Volume.is_outdated() is a method, not a property, add a function
for handling serialization. And at the same time, fix None serialization
(applicable to 'source' property).
QubesOS/qubes-issues#3256
After lot of testing it does not work properly. Could do something more
sophisticated but since calling save() is safe and probably lightweigth it is
not worth probably.
Some handlers may want to call into other VMs (or even the one asking),
but vm.run() functions are coroutines, so needs to be called from
another coroutine. Allow for that.
Also fix typo in documentation.
Some kernels (like pvgrub2) may not provide modules.img and it isn't an
error. Don't break VM startup in that case, skip that device instead.
FixesQubesOS/qubes-issues#3563
"Cannot execute qrexec-daemon!" error is very misleading for a startup
timeout error, make it clearer. This rely on qrexec-daemon using
distinct exit code for timeout error, but even without that, include its
stderr in the error message.
* qubesos/pr/198:
backup.py: add vmN/empty file if no other files to backup
Allow include=None to be passed to admin.backup.Info
Add include_in_backups property for AdminVM
Use !auto_cleanup as DispVM include_in_backups default
Lots of code expects the VM to be Halted after receiving one of these
events, but it could also be Dying or Crashed. Get rid of the Dying case
at least, by waiting until the VM has transitioned out of it.
Fixes e.g. the following DispVM cleanup bug:
$ qvm-create -C DispVM --prop auto_cleanup=True -l red dispvm
$ qvm-start dispvm
$ qvm-shutdown --wait dispvm # this won't remove dispvm
$ qvm-start dispvm
$ qvm-kill dispvm # but this will
* qubesos/pr/194:
reflink: style fix
storage: typo fix
lvm_thin: _remove_revisions() on revisions_to_keep==0
lvm_thin: don't purge one revision too few
lvm_thin: really remove revision
lvm_thin: fill in volume's revisions_to_keep from pool
Using '$' is easy to misuse in shell scripts, shell commands etc. After
all this years, lets abandon this dangerous character and move to
something safer: '@'. The choice was made after reviewing specifications
of various shells on different operating systems and this is the
character that have no special meaning in none of them.
To preserve compatibility, automatically translate '$' to '@' when
loading policy files.
If revisions_to_keep is 0, it may nevertheless have been > 0 before, so
it makes sense to call _remove_revisions() and hold back none (not all)
of the revisions in this case.
* qubesos/pr/190:
Missed one test, adding default-user in assert for test test_621_qdb_vm_with_network in TC_90
replaced underscore by dash and update test accordingly
Updated assert content for test_620_qdb_standalone in TC_90_QubesVM
Added the default_user property from the Qube to the qubesdb so it is available when starting X. This is the 1st part of a fix for issue https://github.com/QubesOS/qubes-issues/issues/2372
This adds the file-reflink storage driver. It is never selected
automatically for pool creation, especially not the creation of
'varlibqubes' (though it can be used if set up manually).
The code is quite small:
reflink.py lvm.py file.py + block-snapshot
sloccount 334 lines 447 (134%) 570 (171%)
Background: btrfs and XFS (but not yet ZFS) support instant copies of
individual files through the 'FICLONE' ioctl behind 'cp --reflink'.
Which file-reflink uses to snapshot VM image files without an extra
device-mapper layer. All the snapshots are essentially freestanding;
there's no functional origin vs. snapshot distinction.
In contrast to 'file'-on-btrfs, file-reflink inherently avoids
CoW-on-CoW. Which is a bigger issue now on R4.0, where even AppVMs'
private volumes are CoW. (And turning off the lower, filesystem-level
CoW for 'file'-on-btrfs images would turn off data checksums too, i.e.
protection against bit rot.)
Also in contrast to 'file', all storage features are supported,
including
- any number of revisions_to_keep
- volume.revert()
- volume.is_outdated
- online fstrim/discard
Example tree of a file-reflink pool - *-dirty.img are connected to Xen:
- /var/lib/testpool/appvms/foo/volatile-dirty.img
- /var/lib/testpool/appvms/foo/root-dirty.img
- /var/lib/testpool/appvms/foo/root.img
- /var/lib/testpool/appvms/foo/private-dirty.img
- /var/lib/testpool/appvms/foo/private.img
- /var/lib/testpool/appvms/foo/private.img@2018-01-02T03:04:05Z
- /var/lib/testpool/appvms/foo/private.img@2018-01-02T04:05:06Z
- /var/lib/testpool/appvms/foo/private.img@2018-01-02T05:06:07Z
- /var/lib/testpool/appvms/bar/...
- /var/lib/testpool/appvms/...
- /var/lib/testpool/template-vms/fedora-26/...
- /var/lib/testpool/template-vms/...
It looks similar to a 'file' pool tree, and in fact file-reflink is
drop-in compatible:
$ qvm-shutdown --all --wait
$ systemctl stop qubesd
$ sed 's/ driver="file"/ driver="file-reflink"/g' -i.bak /var/lib/qubes/qubes.xml
$ systemctl start qubesd
$ sudo rm -f /path/to/pool/*/*/*-cow.img*
If the user tries to create a fresh file-reflink pool on a filesystem
that doesn't support reflinks, qvm-pool will abort and mention the
'setup_check=no' option. Which can be passed to force a fallback on
regular sparse copies, with of course lots of time/space overhead. The
same fallback code is also used when initially cloning a VM from a
foreign pool, or from another file-reflink pool on a different
mountpoint.
'journalctl -fu qubesd' will show all file-reflink copy/rename/remove
operations on VM creation/startup/shutdown/etc.
When some expiring rules are present, it is necessary to reload firewall
when those rules expire. Previously systemd timer was used to trigger
this action, but since we have own daemon now, it isn't necessary
anymore - use this daemon for that.
Additionally automatically removing expired rules was completely broken
in R4.0.
FixesQubesOS/qubes-issues#1173
Even when volume is not persistent (like TemplateBasedVM:root), it
should be resizeable. Just the new size, similarly to the volume
content, will be reverted after qube shutdown.
Additionally, when VM is running, volume resize should affect _only_ its
temporary snapshot. This way resize can be properly reverted together
with actual volume changes (which include resize2fs call).
FixesQubesOS/qubes-issues#3519
- catch both QubesException and libvirtError - do not kill starting VM
just because an error while connecting _other_ VMs to it
- try to detach network first (and do not abort on error) - if
libvirt/libxl will manage to cleanup stale interface this way, the
attach operation below may succeed.
FixesQubesOS/qubes-issues#3163
1. Make sure VMs are started after dom0 actual memory usage is reported
to qmemman, otherwise dom0 will hold 4GB, even if just a little over 1GB
is needed at that time.
2. Request only vm.memory MB from qmemman, instead of vm.maxmem. While
HVM with PCI devices indeed do not support populate-on-demand, this is
already handled in libvirt XML.
The later may often cause VM startup fail on systems with 8GB of memory,
because maxmem is 4GB there and with dom0 keeping the other 4GB (see
point 1) there is not enough memory to start any sych VM.
FixesQubesOS/qubes-issues#3462
* qubesos/pr/187:
Don't fail create/clone if /var/lib/qubes/TYPE/NAME/ exists
Make 'qvm-volume revert' really use the latest revision
Fix wrong mocks of Volume.revisions
* qubesos/pr/185:
vm: remove doc for non-existing event `monitor-layout-change`
vm: include tag/feature name in event name
events: add support for wildcard event handlers
admin.vm.volume.ListSnapshots returned volume revisions in undefined
order, but 'qvm-volume revert' assumes the list to be in chronological
order. Make that assumption true.
clear_outdated_error_markers crashes if memory stats are not retrieved
yet. In practice it crashes at the very first call during daemon
startup, making the whole qmemman unusable.
This fixes bf4306b815
qmemman: clear "not responding" flags when VM require more memory
QubesOS/qubes-issues#3265
- use capital letters in acronyms in documentation to match upstream
documentation.
- refuse to start a PVH with without kernel set - provide meaningful
error message
* qubesos/pr/180:
vm/qubesvm: default to PVH unless PCI devices are assigned
vm/qubesvm: expose 'start_time' property over Admin API
vm/qubesvm: revert backup_timestamp to '%s' format
doc: link qvm-device man page for qvm-block, qvm-pci, qvm-usb
* qubesos/pr/179:
qmemman: request VMs balloon down with 16MB safety margin
qmemman: clear "not responding" flags when VM require more memory
qmemman: slightly improve logging
qmemman: reformat code, especially comments
Human readable format `str(datetime.datetime)` is a nightmare for Admin
API level communication. Especially setting the property in a format
that it was read was not supported, and handling such format in
untrusted input handling code is a bad idea. Revert to a simple intiger
format.
It looks like Linux balloon driver do not always precisely respect
requested target memory, but perform some rounding. Also, in some cases
(HVM domains), VM do not see all the memory that Xen have assigned to it
- there are some additional Xen pools for internal usage.
Include 16MB safety margin in memory requests to account for those two
things. This will avoid setting "no_response" flag for most of VMs.
QubesOS/qubes-issues#3265
Clear slow_memset_react/no_progress flags when VM request more memory
than it have assigned. If there is some available, it may be given to
such VM, solving the original problem (not reacting to balloon down
request). In any case, qmemman algorithm should not try to take away
memory from under-provisioned VM.
FixesQubesOS/qubes-issues#3265
Add logging more info about each domain state:
- last requested target
- no_progress and slow_memset_react flags
This makes it unnecessary to log separately when those flags are cleared.
Rename events:
- domain-feature-set -> domain-feature-set:feature
- domain-feature-delete -> domain-feature-delete:feature
- domain-tag-add -> domain-tag-add:tag
- domain-tag-delete -> domain-tag-delete:tag
Make it consistent with property-* events. It makes more sense to
include tag/feature name in event name, so handler can watch a single
tag/feature - which is the most common case. Otherwise, most handlers
would begin with `if feature == '...'` anyway, wasting time on most
events.
In cases where multiple features/tags should be handled by a single
handler, it is now possible to register a handler with wildcard, for
example `domain-feature-set:*`.
Support registering handlers for more flexible wildcard events: not only
'*', but also 'something*'. This allows to register handlers for
'property-set:*' and such.
When creating a new VM of type DispVM without specifying any template
(e.g. "qvm-create --class DispVM --label red foo"), use default_dispvm.
Otherwise it would fail saying "Got empty response from qubesd."
Load integration tests from outside of core-admin repository, through
entry points.
Create wrapper for VM object to keep very basic compatibility with tests
written for core2. This means if test use only basic functionality
(vm.start(), vm.run()), the same test will work for both core2 and
core3. This is especially important for app-* repositories, where the
same version serves multiple Qubes branches.
This also hides asyncio usage from tests writer.
See QubesOS/qubes-issues#1800 for details on original feature.
Test base functions of dom0 module (creating VM, setting property) and
configuring system inside of VM (through DispVM). The later is done for
each available template (the process use salt installed in that
template, not copied from dom0).
QubesOS/qubes-issues#3316
When dom0 do not provide the kernel, it should also not set kernel
command line in libvirt config. Otherwise qemu in stubdom fails to start
because it get -append option without -kernel, which is illegal
configuration.
FixesQubesOS/qubes-issues#3339
There may be cases when VM providing the network to other VMs is started
later - for example VM restart. While this is rare case (and currently
broken because of QubesOS/qubes-issues#1426), do not assume it will
always be the case.
Add property for IPv6 address ('ip6'). Build default value similarly to
IPv4 - common prefix + QID or Disp ID (for DispVMs).
This all is disabled unless 'ipv6' feature is enabled. It is inherited
from netvm (not template).
Even when enabled, VM may decide to not use it - or simply not support
it.
QubesOS/qubes-issues#718
Allow using default feature value from netvm, not template. This makes
sense for network-related features like using tor, supporting ipv6 etc.
Similarly to check_with_template, expose it also on Admin API.
Having both default_netvm and default_fw_netvm cause a lot of confusion,
because it isn't clear for the user which one is used when. Additionally
changing provides_network property may also change netvm property, which
may be unintended effect. This as a whole make it hard to:
- cover all netvm-changing actions with policy for Admin API
- cover all netvm-changing events (for example to apply the change to
the running VM, or to check for netvm loops)
As suggested by @qubesuser, kill the default_fw_netvm property and
simplify the logic around it.
Since we're past rc1, implement also migration logic. And add tests for
said migration.
FixesQubesOS/qubes-issues#3247
* qubesos/pr/166:
create "lvm" pool using rootfs thin pool instead of hardcoding qubes_dom0-pool00
change default pool code to be fast
cache PropertyHolder.property_list and use O(1) property name lookups
remove unused netid code
cache isinstance(default, collections.Callable)
don't access netvm if it's None in visible_gateway/netmask
There were many cases were the check was missing:
- changing default_netvm
- resetting netvm to default value
- loading already broken qubes.xml
Since it was possible to create broken qubes.xml using legal calls, do
not reject loading such file, instead break the loop(s) by setting netvm
to None when loop is detected. This will be also useful if still not all
places are covered...
Place the check in default_netvm setter. Skip it during qubes.xml loading
(when events_enabled=False), but still keep it in setter, to _validate_ the
value before any property-* event got fired.
If HVM have PCI device, it can't use PoD, so need 'maxmem' memory to be
started. Request that much from qmemman.
Note that is is somehow independent of enabling or not dynamic memory
management for the VM (`service.meminfo-writer` feature). Even if VM
initially had assigned maxmem memory, it can be later ballooned down.
QubesOS/qubes-issues#3207
* 20171107-storage:
api/admin: add API for changing revisions_to_keep dynamically
storage/file: move revisions_to_keep restrictions to property setter
api/admin: hide dd statistics in admin.vm.volume.Import call
storage/lvm: fix importing different-sized volume from another pool
storage/file: fix preserving spareness on volume clone
api/admin: add pool size and usage to admin.pool.Info response
storage: add size and usage properties to pool object
* 20171107-tests-backup-api-misc:
test: make race condition on xterm close less likely
tests/backupcompatibility: fix handling 'internal' property
backup: fix handling target write error (like no disk space)
tests/backupcompatibility: drop R1 format tests
backup: use offline_mode for backup collection
qubespolicy: fix handling '$adminvm' target with ask action
app: drop reference to libvirt object after undefining it
vm: always log startup fail
api: do not log handled errors sent to a client
tests/backups: convert to new restore handling - using qubesadmin module
app: clarify error message on failed domain remove (used somewhere)
Fix qubes-core.service ordering
currently it takes 100ms+ to determine the default pool every time,
including subprocess spawning (!), which is unacceptable since
finding out the default value of properties should be instantaneous
instead of checking every thin pool to see if the root device is in
it, find and cache the name of the root device thin pool and see if
there is a configured pool with that name
xterm is very fast on closing when application inside terminates. It is
so fast with closing on keydown event that xdotool do not manage to send
keyup event, resulting in xdotool crash. Add a little more time for
that.
When writing process returns an error, prefer reporting that one,
instead of other process (which in most cases will be canceled, so
the error will be meaningless CancelledError).
Then, include sanitized stderr from VM process in the error message.
This object is used solely for manipulating qubes.xml for the backup. Do
not open any connection to hypervisor/QubesDB/libvirt etc. This will
help avoiding various leaks (memory, FD).
Do not try to access that particular object (wrapper) when it got
undefined. If anyone want to access it, appropriate code should do a new
lookup, and probably re-define the object.
Those "errors" are already properly handled, and if necessary logged
independently by appropriate function. In some cases, such logs are
misleading (for example QubesNoSuchPropertyError is a normal thing
happening during qvm-ls).
FixesQubesOS/qubes-issues#3238
Besides converting itself, change how the test verify restore
correctness: first collect VM metadata (and hashes of data) into plain
dict, then compare against it. This allow to destroy old VMs objects
before restoring the backup, so avoid having duplicate objects of the
same VM - which results in weird effects like trying to undefine libvirt
object twice.
Point to system logs for more details. Do not include them directly in
the message for privacy reasons (Admin API client may not be given
permission to it).
QubesOS/qubes-issues#3273QubesOS/qubes-issues#3193
This one pool/volume property makes sense to change dynamically. There
may be more such properties, but lets be on the safe side and take
whitelist approach - allow only selected (just one for now), instead of
blacklisting any harmful ones.
QubesOS/qubes-issues#3256
Do not check for accepted value only in constructor, do that in property
setter. This will allow enforcing the limit regardless of how the value
was set.
This is preparation for dynamic revisions_to_keep change.
QubesOS/qubes-issues#3256
The previous version did not ensure that the stopped/shutdown event was
handled before a new VM start. This can easily lead to problems like in
QubesOS/qubes-issues#3164.
This improved version now ensures that the stopped/shutdown events are
handled before a new VM start.
Additionally this version should be more robust against unreliable
events from libvirt. It handles missing, duplicated and delayed stopped
events.
Instead of one 'domain-shutdown' event there are now 'domain-stopped'
and 'domain-shutdown'. The later is generated after the former. This way
it's easy to run code after the VM is shutdown including the stop of
it's storage.
If domain got removed during the tests (for example DispVM), vm.close()
wouldn't be called in cleanup and some file descriptors will be
leaked. Add event handler for cleaning this up. Do not use close()
method here, because it is destructive, but the object may still be used
by the test.
1. Fire both property-pre-del:netvm and property-del:netvm - those
events should be fired in pairs - especially one may assume the other
will be called too. This is the case here - one disconnect old netvm,
the other connect the new one.
2. Remove spurious 'newvalue' argument for property-del:netvm event.
3. Fix logic for default_fw_netvm/default_netvm usage. The former is
used if vm.provides_network=True.
If there was some netvm set, unset it first (same as with ordinary set).
Otherwise it will try to attach new netvm without detaching the old one
first.
oldvalue should contain the old value, even if it was default one. Same
as in property-(pre-)set events. If event want to check if that is
default value, it is always possible (in pre- event), but in practice
actual value is most useful.
This bug prevented netvm changing events from working when reseting
netvm.
Allow to get default value even it isn't set currently. This will allow
(G)UI to present better view, without duplicating logic for default
value.
FixesQubesOS/qubes-issues#3197
Check resize on each template separately, because it involves VM's
scripts (either qubes.ResizeDisk service, or some startup script).
QubesOS/qubes-issues#3173
- clone all features, not just qrexec (especially include 'gui')
- do not leak VM reference on failed test
- add test for online root volume resize
QubesOS/qubes-issues#3173
- make it clear that failed qubes.ResizeDisk service call means the need
to resize filesystem manually (but not necessarily the volume itself)
- propagate exceptions raised by async storage pool implementations
Related QubesOS/qubes-issues#3173
Expired rules are skipped while loading the firewall. Do that also when
such rules expired after loading the firewall. This applies to both
Admin API and actually applying the rules (sending them to appropriate
VM).
Related QubesOS/qubes-issues#3020
* bug3164:
tests: add regression test for #3164
storage/lvm: make sure volume cache is refreshed after changes
storage/lvm: fix Volume.verify()
storage/lvm: remove old volume only after successfully cloning new one
This is a race condition, so to make it more likely to fail (if it's
broken), make some things manually. In normal circumstances this order
of actions is also possible, just less likely to happen. But as seen in
the bug report, happens from time to time.
QubesOS/qubes-issues#3164
In some cases, it may happen that new volume (`self._vid_snap`) does not
exists. This is normally an error, but even in such a case, do not
remove the only remaining instance of volume (`self.vid`). Instead,
rename it temporarily and remove only after new volume is successfully
cloned.
FixesQubesOS/qubes-issues#3164
This applies to qvm-template-postprocess, which at the beginning try to
resize root volume to appropriate size. It makes more sense to silently
succeed here, instead of forcing every client-side utility to check if
the volume have already desired size.
- Use proper features/services names (updates proxy test).
- Fix logic error in wait_for_window.
- Fix test for qvm-sync-clock (first sync clockvm, then dom0), also fix
cleanup (unset clockvm before removing it)
- More fixes for asyncio usage
* fixes-20170929:
vm: do not start QubesDB watch instance multiple times
vm: report storage.stop() errors to log
vm: move comment
storage: fix method name in LinuxModules volume
Prevent removing domain that is referenced from anywhere
vm: add vm.klass property
Move QubesVM.{name,qid,uuid,label} to BaseVM
vm: do not allow deleting template property from AppVM and DispVM
vm/qubesvm: emit event on failed startup
vm/qubesvm: remove duplicated qmemman_client.close()
vm/dispvm: cleanup DispVM also on failed startup
vm/dispvm: fix error message
ext/block: properly list devtype=cdrom option
block: fix handling non-existing devices
block: improve handling device name and description
vm.create_qdb_entries can be called multiple times - for example when
changing VM IP. Move starting qdb watcher to start(). And just in case,
cleanup old watcher (if still exists) before starting new one.
This fixes one FD leak.
Do not use self.fail when handling exception - this will keep exception
object referenced, which in turn have reference to domain object (via
traceback).
- Prefer instance attributes over local variables - the former ones do
not leak into traceback object and are cleaned up by tests framework.
- Use 'with' syntax for handling files.
- Use subprocess.DEVNULL instead of open('/dev/null') where applicable
- Delete local variables when not needed anymore.
- Fix str/bytes
- Call skipTest as early as possible - before doing any setup
- Fix networking tests - configuration commands needs to be called as
root (missing user= argument).
- Fix setting firewall - policy is no longer changeable
- Add missing loop.run_until_complete() calls.
- Convert subprocess.Popen to asyncio.create_subprocess_exec where
needed (when called process needs to communicate with qubesd).
- Cleanup processes (call .wait()).
There is no more significant difference between PV and HVM. VMs are HVM
by default anyway. More important for this test is difference between
Linux (with Qubes packages installed) and other OS-es. Rename tests
accordingly. The later one is still incomplete.
The most important change is doing vm.close() when removing domain -
this means it wouldn't be cleaned later by iterating over app.domains.
Other changes include removing VMs in the right order, regarding netvm
dependency (otherwise killing or removing may fail). And one more
missing coroutine handling (in shutdown_and_wait).
Catch exception there and log it. Otherwise asyncio complains about not
retrieved exception. There is no one else to handle this exception,
because shutdown event is triggered from libvirt, not any Admin API.
Allow to get domain class as a property, not using admin.vm.List call.
This makes it unnecessary to call admin.vm.List on the client side to
construct wrapper object.
There is intentionally no default template in terms of qubes.property
definition, to not cause problems when switching global default_template
property - like breaking some VMs, or forcing the user to shutdown all
of them for this. But this also means it shouldn't be allowed to reset
template to "default" value, because it will result in a VM without
template at all.
FixesQubesOS/qubes-issues#3115
If VM startup failed before starting anything (even in paused state),
there will be no further event, not even domain-shutdown. This makes it
hard for event-listening applications (like domains tray) to account
domain state. Fix this by emiting domain-start-failed event in every
case of failed startup after emiting domain-pre-start.
Related QubesOS/qubes-issues#3100
* qubesos/pr/150:
qubes/tests: moar fixes
test-packages: add missing libvirt classes
qubes/tests: do not deadlock on .drain()
qubes/vm: put name= first in __repr__
tests: fix some memory leaks
tests: complain about memory leaks
tests: use one event loop and one libvirtaio impl
Recently libvirt removed support for changing event implementation.
Therefore we have to use a single, global one and we check if it is
empty between tests.
On certain locales (e.g. danish) `usage_percent` will output a comma-separated number, which will make `attr` point the last two decimal points, s.t. `return vol_info['attr'][4] == 'a'` (in the `verify` func) will fail and `qubesd` wont run.
* cdrom-boot:
devices: fix error reporting
api/admin: implement admin.vm.device....Set.persistent
devices: implement DeviceCollection.update_persistent()
devices: move DeviceInfo definition earlier
api: do not fail events when listener is cancelled in the meantime
'dispvm_allowed' name was confusing, because it suggested being able to
spawn new DispVMs, not being a template for DispVM.
FixesQubesOS/qubes-issues#3047
Clone properties from DispVM template after setting base properties
(qid, name, uuid). This means we can use standard clone_properties()
function. Otherwise various setters may fail - for example
netvm setter require uuid property initialized (for VM lookup in VM
collection).
Also, make dispvm_allowed check more robust - include direct creation of
DispVM, and also check just before VM startup (if property was changed
in the meantime).
FixesQubesOS/qubes-issues#3057
qvm-sync-clock in dom0 now synchronize only dom0 time. For VM time,
qvm-sync-clock needs to be called in VM. Also, both will communicate
with qubesd, so must be called asynchronously from tests.
Allow attached device to be converted from persistent to non-persistent
and the other way around.
This is to allow starting a VM with some device attached temporarily.
When VM is not running, it is possible to attach device only
persistently, so this change will allow to do that, then, after starting
the VM, change it to non-persistent - so it will not be attached again
at further startups.
QubesOS/qubes-issues#3055
This is because .tearDown() is not executed if the exception occurs in
setUp() [for example self.skipTest() raises an exception]. The lower
levels of .tearDown() being executed are critical to not leaking file
descriptors.
First, cache objects created with init_volume - this is the only place
where we have full volume configuration (including snap_on_start and
save_on_stop properties).
But also implement get_volume method, to get a volume instance for given
volume id. Such volume instance may be incomplete (other attributes are
available only in owning domain configuration), but it will be enough
for basic operations - like cheching and changing its size, cloning
etc.
Listing volumes still use list of physically present volumes.
This makes it possible to start qubesd service, without physical
presence of some storage devices. Starting VMs using such storage would
still fail, of course.
FixesQubesOS/qubes-issues#2960
This is useful to select default DispVM template for VMs started
directly by the user. This makes sense as long as AdminVM == GUIVM.
QubesOS/qubes-issues#2974
Add public Admin API call to create Disposable VM that would be
automatically destroyed after shutdown. Do not keep this functionality
for qrexec-policy tool only.
Also, use admin.vm.Start there, instead of internal.vm.Start and
admin.vm.Kill instead of internal.vm.CleanupDispVM (this is enough,
because DispVM now have auto_cleanup property).
QubesOS/qubes-issues#2974
Add auto_cleanup property, which remove DispVM after its shutdown
- this is to unify DispVM handling - less places needing special
handling after DispVM shutdown.
New DispVM inherit all settings from respective AppVM. Move this from
classmethod `DispVM.from_appvm()`, to DispVM constructor. This unify
creating new DispVM with any other VM class.
Notable exception are attached devices - because only one running VM can
have a device attached, this would prevent second DispVM started from
the same AppVM. If one need DispVM with some device attached, one can
create DispVM with auto_cleanup=False. Such DispVM will still not have
persistent storage (as any other DispVM).
Tests included.
QubesOS/qubes-issues#2974
* services:
tests: check clockvm-related handlers
doc: include list of extensions
qubesvm: fix docstring
ext/services: move exporting 'service.*' features to extensions
app: update handling features/service os ClockVM
* tests-storage:
tests: register libvirt events
tests: even more agressive cleanup in tearDown
app: do not wrap libvirt_conn.close() in auto-reconnect wrapper
api: keep track of established connections
tests: drop VM cleanup from tearDownClass, fix asyncio usage in tearDown
storage: fix Storage.clone and Storage.clone_volume
tests: more tests fixes
firewall: raise ValueError on invalid hostname in dsthost=
qmemman: don't load qubes.xml
tests: fix AdminVM test
tests: create temporary files in /tmp
tests: remove renaming test - it isn't supported anymore
tests: various fixes for storage tests
tests: fix removing LVM volumes
tests: fix asyncio usage in some tests
tests: minor fixes to api/admin tests
storage/file: create -cow.img only when needed
storage: move volume_config['source'] filling to one place
app: do not create 'default' storage pool
app: add missing setters for default_pool* global properties
* qdb-watch:
tests: add qdb_watch test
ext/block: make use of QubesDB watch
vm: add API for watching changes in QubesDB
vm: optimize imports
api/admin: don't send internal events in admin.Events
Add explanation why admin.vm.volume.Import is a custom script
Follow change of qubesdb path return type
Rename vm.qdb to vm.untrusted_qdb
Threis no more ntpd service used - new approach do not conflict with
ntpd. Because of this, new feature is named 'service.clocksync', and
should be _enabled_ in ClockVM ('ntpd' was disabled there).
QubesOS/qubes-issues#1230
When an API call is interrupted, the relevant coroutine is cancelled -
which means it may throw CancelledError. At the same time, cancelled
call have related socket already closed (and transport set to None). But
QubesDaemonProtocol.respond try to close the transport again, which
fails. Fix handling this case.
Get a VM statistics once. If previous measurements are provided,
calculate difference too. This is backend part of upcoming
admin.vm.Stats service.
QubesOS/qubes-issues#853
Remove some more references to objects holding (possibly indirectly)
reference to libvirt connection:
- local variables in tearDown function
- running Admin API calls (especially admin.Events)
- vmm._libvirt_conn directly, in case some reference to Qubes()
is still there
- any instance attribute that is an object from 'qubes' python package
(instead of just those descending from BaseVM)
- do not create new Qubes() instance for removing VMs - if we already
have one in self.app
Then trigger garbage collector to really cleanup those objects (and
close relevant file descriptors). It's important do do this before
closing event loop, because some of descructors may try to use it (for
example remove registered handlers).
When tearDownClass is executed, event loop is already closed. Since no
test really need it right now, drop support for test class-wide VMs and
convert those methods back to instance methods.
Also put coroutines (vm.remove_from_disk, vm.kill) onto event loop.
Only qubesd should load qubes.xml directly. Put a TODO comments for now
in place of slow VM reporting, invent some better mechanism later.
This loading of qubes.xml caused deadlocks, because qmemnan kept open
file descriptor (in locked state).
Since it is no longer child of QubesVM, constructor do not take 'qid'
and 'name' arguments.
Also:
- remove other dropped properties tests (netvm, storage related)
- make the test working in non-dom0
- improve TestPool mock - init_volume now return appropriate mock type,
instead of TestPool
- improve patching base directory (/var/lib/qubes) - it is stored in
more than one place...
- fix inheritance in TC_01_ThinPool class
- fix expected LVM volume names ('vm-' prefix)
- fix cleanup after FilePool tests - remove temporary qubes.xml
- asyncio usage
- better reporting in integ.storage - include error message in the
report, not only as a comment in code
Don't set 'source' volume in various places (each VM class constructor
etc), do it as part of volume initialization. And when it needs to be
re-calculated, call storage.init_volume again.
This code was duplicated, and as usual in such a case, those copies
were different - one have set 'size', the other one not.
QubesOS/qubes-issues#2256
Since we have app.default_pool* properties, create appropriately named
pool and let those properties choose the right pool. This also means we
don't need to specify pool name in default volume config anymore
QubesOS/qubes-issues#2256
Some events are internal for a sole purpose of getting some data from
extension. Since listeners of admin.Events cannot return anything, there
is no sense in sending those events there.
The old format have many issues and is discouraged by tar developers. In
this case the most important one is header with possible non-ASCII
characters, which will result in UnicodeDecodeError (tarfile module
require header parts in utf-8).
PAX format is much cleaner, as it use standard mechanism for extended
headers.
Since we have LVM by default, it is possible to backup VMs while they
are running. For now it will include its state from before startup, but
later we may implement some other logic (a snapshot of running VM).
Do not assume static list of volume (although it is true for now), and
also use proper API for getting volume size, instead of assuming it's a
normal file.
Changed the inheritance hierarchy:
1. Renamed `SystemTestsMixin` to `SystemTestCase`
2. `SystemTestCase` is a child of `QubesTestCase`
3. All classes extending the prior `SystemTestsMixin` now just extend `object`
* tests-fixes-1:
api: extract function to make pylint happy
tests/vm: simplify AppVM storage test
storage: do not use deepcopy on volume configs
api: cleanup already started servers when some later failed
tests: fix block devices tests when running on real system
tests: fix some FD leaks
Use minimal TestPool(), instead of Mock().
This allow effectively compare Volume instances (init_volume with the
same parameters return equal Volume instance).
Specify empty 'source' field, so it gets filled with appropriate
template's images. Then also fix recursive 'source' handling - DispVM
root volume should point at TemplateVM's root volume as a source, not a
AppVM's one - which is also only a snapshot.
FixesQubesOS/qubes-issues#2896
This code is unused now. Theoretically this is_outdated implementation
should be moved to FileVolume, but since we don't have VM reference
there, it isn't possible to read appropriate xenstore entry. As we're
phasing out file pool, simply ignore it for now.
QubesOS/qubes-issues#2256
LinuxKernel pool support only read-only volumes, so save_on_stop=True
doesn't make sense. Make it more explicit - raise NotImplementedError
otherwise.
Also, migrate old configs where snap_on_start=True, but no source was
given.
QubesOS/qubes-issues#2256
This driver isn't used in default Qubes 4.0 installation, but if we do
have it, let it follow defined API and its own documentation. And also
explicitly reject not supported operations:
- support only revisions_to_keep<=1, but do not support revert() anyway
(implemented version were wrong on so many levels...)
- use 'save_on_stop'/'snap_on_start' properties directly instead of
obsolete volume types
- don't call sudo - qubesd is running as root
- consistently use path, path_cow, path_source, path_source_cow
Also, add tests for BlockDevice instance returned by
FileVolume.block_device().
QubesOS/qubes-issues#2256
Do not always use pool named 'default'. Instead, have global
`default_pool` property to specify default storage pools.
Additionally add `default_pool_*` properties for each VM property, so
those can be set separately.
QubesOS/qubes-issues#2256
Those functions are coroutines anyway, so allow event handlers to be
too.
Some of this (`domain-create-on-disk`, `domain-remove-from-disk`) will
be useful for appmenus handling.
This will allow starting processes and calling RPC services in those
events. This if required for usb devices, which are attached using RPC
services.
Intentionally keep device listing events synchronous only - to
discourage putting long-running actions there.
This change also require some not-async attach method version for
loading devices from qubes.xml - have `load_persistent` for this.
commit/recover/reset should really be handled in start/stop. Nothing
stops specific pool implementation to define such functions privately.
QubesOS/qubes-issues#2256
Always define those properties, always include them in volume config.
Also simplify overriding pool based on volume type defined by those:
override pool unless snap_on_start=True.
QubesOS/qubes-issues#2256
Since libvirt do provide object for dom0 too, return it here.
It's much easier than special-casing AdminVM everywhere. And in fact
sometimes it is actually useful (for example attaching devices from/to
dom0, adjusting memory).
The first operation returns a token, which can be passed to the second
one to actually perform clone operation. This way the caller needs have
power over both source and destination VMs (or at least appropriate
volumes), so it's easier to enforce appropriate qrexec policy.
The pending tokens are stored on Qubes() instance (as QubesAdminAPI is
not persistent). It is design choice to keep them in RAM only - those
are one time use and this way restarting qubesd is a simple way to
invalidate all of them. Otherwise we'd need some additional calls like
CloneCancel or such.
QubesOS/qubes-issues#2622
Add convenient collection wrapper for easier getting selected volume.
Storage pool implementation may still provide only volume listing
function (pool.list_volumes), or, additionally, optimized
pool.get_volume.
This means it is both possible to iterate over volumes:
```python
for volume in pool.volumes:
...
```
And get a single volume:
```python
volume = pool.volumes[vid]
```
QubesOS/qubes-issues#2256
In the end firewall is implemented as .Get and .Set rules, with policy
statically set to 'drop'. This way allow atomic firewall updates.
Since we already have appropriate firewall format handling in
qubes.firewall module - reuse it from there, but adjust the code to be
prepared for potentially malicious input. And also mark such variables
with untrusted_ prefix.
There is also third method: .Reload - which cause firewall reload
without making any change.
QubesOS/qubes-issues#2622FixesQubesOS/qubes-issues#2869
There is a problem with having separate default action ("policy") and
rules because it isn't possible to set both of them atomically at the
same time.
To solve this problem, always have policy 'drop' (as a safe default),
but by default have a single rule with action 'accept'
FixesQubesOS/qubes-issues#2869
Do this for all standard property types - even if other types do
additional validation, do not expose them to non-ASCII characters.
QubesOS/qubes-issues#2622
We've decided to make VM name immutable. This is especially important
for Admin API, where some parts (especially policy) are sticked to the
VM name.
Now, to rename the VM, one need to clone it under new name (thanks to
LVM, this is very quick action), then remove the old one.
FixesQubesOS/qubes-issues#2868