Extension objects are singletons and normally do not require any special
cleanup. But in case of tests, we try to remove all the qubes objects
between tests and the cache in usb extension makes it hard.
Add a 'qubes-close' event that extensions can handle to remove extra
references stored in extension objects themselves.
When setting global clockvm property, 'clocksync' service is
automatically added to the new value and then removed from the old one.
But if the new value is the same as the old one, the service gets
removed from the just set new value.
Check for this case explicitly.
FixesQubesOS/qubes-issues#4939
Handle this case specifically, as way too many users ignore the message
during installation and complain it doesn't work later.
Name the problem explicitly, instead of pointing at libvirt error log.
FixesQubesOS/qubes-issues#4689
When libvirt daemon is restarted, qubesd attempt to re-connect to the
new instance transparently (through virConnect object wrapper). But the
code lacked re-registering event handlers.
Fix this by adding reconnect callback argument to virConnectWrappper, to
be called after new connection is established. This callback will
additionally get old connection as an argument, if any cleanup is
needed. The old connection is closed just after callback returns.
Use this to re-register event handler, but also unregister old handler
first. While full unregister wont work on since old libvirt daemon
instance is dead already, it will still cleanup client structures.
Since the old libvirt connection is closed now, adjust also domain
reconnection logic, to handle stale connection object. In that case
isAlive() call throws an exception.
FixesQubesOS/qubes-issues#5303
Qubesd wrongly required default_template global property to be not None.
Furthermore, even without hard failure set, require_property method
raised an exception in case of a property having incorrect None value.
It now logs an error message instead, as designed.
fixesQubesOS/qubes-issues#5326
Give raw cpu_time value, instead of normalized one (to number of vcpus),
as documented.
Move the normalization to cpu_usage calculation. At the same time, add
cpu_usage_raw without it, if anyone needs it.
QubesOS/qubes-issues#4531
This is needed as a consequence of d8b6d3ef ("Make add_pool/remove_pool
coroutines, allow Pool.{setup,destroy} as coroutines"), but there hasn't
been any problem so far because no storage driver implemented pool
setup() as a coroutine.
Global properties should be loaded in stage 3, mark them as such.
Otherwise they are not loaded at all.
This applies to stats_interval and check_updates_vm. Others were
correct.
FixesQubesOS/qubes-issues#4856
Pool setup/destroy may be a time consuming operation, allow them to be
asynchronous. Fortunately add_pool and remove_pool are used only through
Admin API, so the change does not require modification of other
components.
The new property is meant for management stack (Salt) to set which DVM
template should be used to maintain given VM. Since the DispVM based on
it will be given ultimate control over target VM (qubes.VMShell
service), it should be trusted. The one pointed to by default_dispvm
not necessary is one.
The property defaults to the value from the template (if any), and then
to a global management_dispvm property. By default it is set to None.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
vm.shutdown(wait=True) waited indefinitely for the shutdown, which makes
useless without some boilerplate handling the timeout. Since the timeout
may depend on the operating system inside, add a per-VM property for it,
with value inheritance from template and then from global
default_shutdown_timeout property.
When timeout is reached, the method raises exception - whether to kill
it or not is left to the caller.
FixesQubesOS/qubes-issues#1696
When a VM (or its template) does not explicitly set a qrexec_timeout,
fall back to a global default_qrexec_timeout (with default value 60),
instead of hardcoding the fallback value to 60.
This makes it easy to set a higher timeout for the whole system, which
helps users who habitually launch applications from several (not yet
started) VMs at the same time. 60 seconds can be too short for that.
Use the file-reflink storage driver if /var/lib/qubes is on a filesystem
that supports reflinks, e.g. when the btrfs layout was selected in
Anaconda. If it doesn't support reflinks (or if detection fails, e.g. in
an unprivileged test environment), use 'file' as before.
Needed for event-driven domains-tray UI updating and anti-GUI-DoS
usability improvements.
Catches errors from event handlers to protect libvirt, and logs to
main qubesd logger singleton (by default meaning systemd journal).
Resolve:
- no-else-return
- useless-object-inheritance
- useless-return
- consider-using-set-comprehension
- consider-using-in
- logging-not-lazy
Ignore:
- not-an-iterable - false possitives for asyncio coroutines
Ignore all the above in qubespolicy/__init__.py, as the file will be
moved to separate repository (core-qrexec) - it already has a copy
there, don't desynchronize them.
This adds the file-reflink storage driver. It is never selected
automatically for pool creation, especially not the creation of
'varlibqubes' (though it can be used if set up manually).
The code is quite small:
reflink.py lvm.py file.py + block-snapshot
sloccount 334 lines 447 (134%) 570 (171%)
Background: btrfs and XFS (but not yet ZFS) support instant copies of
individual files through the 'FICLONE' ioctl behind 'cp --reflink'.
Which file-reflink uses to snapshot VM image files without an extra
device-mapper layer. All the snapshots are essentially freestanding;
there's no functional origin vs. snapshot distinction.
In contrast to 'file'-on-btrfs, file-reflink inherently avoids
CoW-on-CoW. Which is a bigger issue now on R4.0, where even AppVMs'
private volumes are CoW. (And turning off the lower, filesystem-level
CoW for 'file'-on-btrfs images would turn off data checksums too, i.e.
protection against bit rot.)
Also in contrast to 'file', all storage features are supported,
including
- any number of revisions_to_keep
- volume.revert()
- volume.is_outdated
- online fstrim/discard
Example tree of a file-reflink pool - *-dirty.img are connected to Xen:
- /var/lib/testpool/appvms/foo/volatile-dirty.img
- /var/lib/testpool/appvms/foo/root-dirty.img
- /var/lib/testpool/appvms/foo/root.img
- /var/lib/testpool/appvms/foo/private-dirty.img
- /var/lib/testpool/appvms/foo/private.img
- /var/lib/testpool/appvms/foo/private.img@2018-01-02T03:04:05Z
- /var/lib/testpool/appvms/foo/private.img@2018-01-02T04:05:06Z
- /var/lib/testpool/appvms/foo/private.img@2018-01-02T05:06:07Z
- /var/lib/testpool/appvms/bar/...
- /var/lib/testpool/appvms/...
- /var/lib/testpool/template-vms/fedora-26/...
- /var/lib/testpool/template-vms/...
It looks similar to a 'file' pool tree, and in fact file-reflink is
drop-in compatible:
$ qvm-shutdown --all --wait
$ systemctl stop qubesd
$ sed 's/ driver="file"/ driver="file-reflink"/g' -i.bak /var/lib/qubes/qubes.xml
$ systemctl start qubesd
$ sudo rm -f /path/to/pool/*/*/*-cow.img*
If the user tries to create a fresh file-reflink pool on a filesystem
that doesn't support reflinks, qvm-pool will abort and mention the
'setup_check=no' option. Which can be passed to force a fallback on
regular sparse copies, with of course lots of time/space overhead. The
same fallback code is also used when initially cloning a VM from a
foreign pool, or from another file-reflink pool on a different
mountpoint.
'journalctl -fu qubesd' will show all file-reflink copy/rename/remove
operations on VM creation/startup/shutdown/etc.
When creating a new VM of type DispVM without specifying any template
(e.g. "qvm-create --class DispVM --label red foo"), use default_dispvm.
Otherwise it would fail saying "Got empty response from qubesd."
Having both default_netvm and default_fw_netvm cause a lot of confusion,
because it isn't clear for the user which one is used when. Additionally
changing provides_network property may also change netvm property, which
may be unintended effect. This as a whole make it hard to:
- cover all netvm-changing actions with policy for Admin API
- cover all netvm-changing events (for example to apply the change to
the running VM, or to check for netvm loops)
As suggested by @qubesuser, kill the default_fw_netvm property and
simplify the logic around it.
Since we're past rc1, implement also migration logic. And add tests for
said migration.
FixesQubesOS/qubes-issues#3247
* qubesos/pr/166:
create "lvm" pool using rootfs thin pool instead of hardcoding qubes_dom0-pool00
change default pool code to be fast
cache PropertyHolder.property_list and use O(1) property name lookups
remove unused netid code
cache isinstance(default, collections.Callable)
don't access netvm if it's None in visible_gateway/netmask
There were many cases were the check was missing:
- changing default_netvm
- resetting netvm to default value
- loading already broken qubes.xml
Since it was possible to create broken qubes.xml using legal calls, do
not reject loading such file, instead break the loop(s) by setting netvm
to None when loop is detected. This will be also useful if still not all
places are covered...
Place the check in default_netvm setter. Skip it during qubes.xml loading
(when events_enabled=False), but still keep it in setter, to _validate_ the
value before any property-* event got fired.
* 20171107-tests-backup-api-misc:
test: make race condition on xterm close less likely
tests/backupcompatibility: fix handling 'internal' property
backup: fix handling target write error (like no disk space)
tests/backupcompatibility: drop R1 format tests
backup: use offline_mode for backup collection
qubespolicy: fix handling '$adminvm' target with ask action
app: drop reference to libvirt object after undefining it
vm: always log startup fail
api: do not log handled errors sent to a client
tests/backups: convert to new restore handling - using qubesadmin module
app: clarify error message on failed domain remove (used somewhere)
Fix qubes-core.service ordering
currently it takes 100ms+ to determine the default pool every time,
including subprocess spawning (!), which is unacceptable since
finding out the default value of properties should be instantaneous
instead of checking every thin pool to see if the root device is in
it, find and cache the name of the root device thin pool and see if
there is a configured pool with that name
Do not try to access that particular object (wrapper) when it got
undefined. If anyone want to access it, appropriate code should do a new
lookup, and probably re-define the object.
Point to system logs for more details. Do not include them directly in
the message for privacy reasons (Admin API client may not be given
permission to it).
QubesOS/qubes-issues#3273QubesOS/qubes-issues#3193