Lots of code expects the VM to be Halted after receiving one of these
events, but it could also be Dying or Crashed. Get rid of the Dying case
at least, by waiting until the VM has transitioned out of it.
Fixes e.g. the following DispVM cleanup bug:
$ qvm-create -C DispVM --prop auto_cleanup=True -l red dispvm
$ qvm-start dispvm
$ qvm-shutdown --wait dispvm # this won't remove dispvm
$ qvm-start dispvm
$ qvm-kill dispvm # but this will
* qubesos/pr/194:
reflink: style fix
storage: typo fix
lvm_thin: _remove_revisions() on revisions_to_keep==0
lvm_thin: don't purge one revision too few
lvm_thin: really remove revision
lvm_thin: fill in volume's revisions_to_keep from pool
Using '$' is easy to misuse in shell scripts, shell commands etc. After
all this years, lets abandon this dangerous character and move to
something safer: '@'. The choice was made after reviewing specifications
of various shells on different operating systems and this is the
character that have no special meaning in none of them.
To preserve compatibility, automatically translate '$' to '@' when
loading policy files.
If revisions_to_keep is 0, it may nevertheless have been > 0 before, so
it makes sense to call _remove_revisions() and hold back none (not all)
of the revisions in this case.
* qubesos/pr/190:
Missed one test, adding default-user in assert for test test_621_qdb_vm_with_network in TC_90
replaced underscore by dash and update test accordingly
Updated assert content for test_620_qdb_standalone in TC_90_QubesVM
Added the default_user property from the Qube to the qubesdb so it is available when starting X. This is the 1st part of a fix for issue https://github.com/QubesOS/qubes-issues/issues/2372
This adds the file-reflink storage driver. It is never selected
automatically for pool creation, especially not the creation of
'varlibqubes' (though it can be used if set up manually).
The code is quite small:
reflink.py lvm.py file.py + block-snapshot
sloccount 334 lines 447 (134%) 570 (171%)
Background: btrfs and XFS (but not yet ZFS) support instant copies of
individual files through the 'FICLONE' ioctl behind 'cp --reflink'.
Which file-reflink uses to snapshot VM image files without an extra
device-mapper layer. All the snapshots are essentially freestanding;
there's no functional origin vs. snapshot distinction.
In contrast to 'file'-on-btrfs, file-reflink inherently avoids
CoW-on-CoW. Which is a bigger issue now on R4.0, where even AppVMs'
private volumes are CoW. (And turning off the lower, filesystem-level
CoW for 'file'-on-btrfs images would turn off data checksums too, i.e.
protection against bit rot.)
Also in contrast to 'file', all storage features are supported,
including
- any number of revisions_to_keep
- volume.revert()
- volume.is_outdated
- online fstrim/discard
Example tree of a file-reflink pool - *-dirty.img are connected to Xen:
- /var/lib/testpool/appvms/foo/volatile-dirty.img
- /var/lib/testpool/appvms/foo/root-dirty.img
- /var/lib/testpool/appvms/foo/root.img
- /var/lib/testpool/appvms/foo/private-dirty.img
- /var/lib/testpool/appvms/foo/private.img
- /var/lib/testpool/appvms/foo/private.img@2018-01-02T03:04:05Z
- /var/lib/testpool/appvms/foo/private.img@2018-01-02T04:05:06Z
- /var/lib/testpool/appvms/foo/private.img@2018-01-02T05:06:07Z
- /var/lib/testpool/appvms/bar/...
- /var/lib/testpool/appvms/...
- /var/lib/testpool/template-vms/fedora-26/...
- /var/lib/testpool/template-vms/...
It looks similar to a 'file' pool tree, and in fact file-reflink is
drop-in compatible:
$ qvm-shutdown --all --wait
$ systemctl stop qubesd
$ sed 's/ driver="file"/ driver="file-reflink"/g' -i.bak /var/lib/qubes/qubes.xml
$ systemctl start qubesd
$ sudo rm -f /path/to/pool/*/*/*-cow.img*
If the user tries to create a fresh file-reflink pool on a filesystem
that doesn't support reflinks, qvm-pool will abort and mention the
'setup_check=no' option. Which can be passed to force a fallback on
regular sparse copies, with of course lots of time/space overhead. The
same fallback code is also used when initially cloning a VM from a
foreign pool, or from another file-reflink pool on a different
mountpoint.
'journalctl -fu qubesd' will show all file-reflink copy/rename/remove
operations on VM creation/startup/shutdown/etc.
When some expiring rules are present, it is necessary to reload firewall
when those rules expire. Previously systemd timer was used to trigger
this action, but since we have own daemon now, it isn't necessary
anymore - use this daemon for that.
Additionally automatically removing expired rules was completely broken
in R4.0.
FixesQubesOS/qubes-issues#1173
Even when volume is not persistent (like TemplateBasedVM:root), it
should be resizeable. Just the new size, similarly to the volume
content, will be reverted after qube shutdown.
Additionally, when VM is running, volume resize should affect _only_ its
temporary snapshot. This way resize can be properly reverted together
with actual volume changes (which include resize2fs call).
FixesQubesOS/qubes-issues#3519
- catch both QubesException and libvirtError - do not kill starting VM
just because an error while connecting _other_ VMs to it
- try to detach network first (and do not abort on error) - if
libvirt/libxl will manage to cleanup stale interface this way, the
attach operation below may succeed.
FixesQubesOS/qubes-issues#3163
1. Make sure VMs are started after dom0 actual memory usage is reported
to qmemman, otherwise dom0 will hold 4GB, even if just a little over 1GB
is needed at that time.
2. Request only vm.memory MB from qmemman, instead of vm.maxmem. While
HVM with PCI devices indeed do not support populate-on-demand, this is
already handled in libvirt XML.
The later may often cause VM startup fail on systems with 8GB of memory,
because maxmem is 4GB there and with dom0 keeping the other 4GB (see
point 1) there is not enough memory to start any sych VM.
FixesQubesOS/qubes-issues#3462
* qubesos/pr/187:
Don't fail create/clone if /var/lib/qubes/TYPE/NAME/ exists
Make 'qvm-volume revert' really use the latest revision
Fix wrong mocks of Volume.revisions
* qubesos/pr/185:
vm: remove doc for non-existing event `monitor-layout-change`
vm: include tag/feature name in event name
events: add support for wildcard event handlers
admin.vm.volume.ListSnapshots returned volume revisions in undefined
order, but 'qvm-volume revert' assumes the list to be in chronological
order. Make that assumption true.
clear_outdated_error_markers crashes if memory stats are not retrieved
yet. In practice it crashes at the very first call during daemon
startup, making the whole qmemman unusable.
This fixes bf4306b815
qmemman: clear "not responding" flags when VM require more memory
QubesOS/qubes-issues#3265
- use capital letters in acronyms in documentation to match upstream
documentation.
- refuse to start a PVH with without kernel set - provide meaningful
error message
* qubesos/pr/180:
vm/qubesvm: default to PVH unless PCI devices are assigned
vm/qubesvm: expose 'start_time' property over Admin API
vm/qubesvm: revert backup_timestamp to '%s' format
doc: link qvm-device man page for qvm-block, qvm-pci, qvm-usb
* qubesos/pr/179:
qmemman: request VMs balloon down with 16MB safety margin
qmemman: clear "not responding" flags when VM require more memory
qmemman: slightly improve logging
qmemman: reformat code, especially comments
Human readable format `str(datetime.datetime)` is a nightmare for Admin
API level communication. Especially setting the property in a format
that it was read was not supported, and handling such format in
untrusted input handling code is a bad idea. Revert to a simple intiger
format.
It looks like Linux balloon driver do not always precisely respect
requested target memory, but perform some rounding. Also, in some cases
(HVM domains), VM do not see all the memory that Xen have assigned to it
- there are some additional Xen pools for internal usage.
Include 16MB safety margin in memory requests to account for those two
things. This will avoid setting "no_response" flag for most of VMs.
QubesOS/qubes-issues#3265
Clear slow_memset_react/no_progress flags when VM request more memory
than it have assigned. If there is some available, it may be given to
such VM, solving the original problem (not reacting to balloon down
request). In any case, qmemman algorithm should not try to take away
memory from under-provisioned VM.
FixesQubesOS/qubes-issues#3265
Add logging more info about each domain state:
- last requested target
- no_progress and slow_memset_react flags
This makes it unnecessary to log separately when those flags are cleared.
Rename events:
- domain-feature-set -> domain-feature-set:feature
- domain-feature-delete -> domain-feature-delete:feature
- domain-tag-add -> domain-tag-add:tag
- domain-tag-delete -> domain-tag-delete:tag
Make it consistent with property-* events. It makes more sense to
include tag/feature name in event name, so handler can watch a single
tag/feature - which is the most common case. Otherwise, most handlers
would begin with `if feature == '...'` anyway, wasting time on most
events.
In cases where multiple features/tags should be handled by a single
handler, it is now possible to register a handler with wildcard, for
example `domain-feature-set:*`.
Support registering handlers for more flexible wildcard events: not only
'*', but also 'something*'. This allows to register handlers for
'property-set:*' and such.
When creating a new VM of type DispVM without specifying any template
(e.g. "qvm-create --class DispVM --label red foo"), use default_dispvm.
Otherwise it would fail saying "Got empty response from qubesd."
Load integration tests from outside of core-admin repository, through
entry points.
Create wrapper for VM object to keep very basic compatibility with tests
written for core2. This means if test use only basic functionality
(vm.start(), vm.run()), the same test will work for both core2 and
core3. This is especially important for app-* repositories, where the
same version serves multiple Qubes branches.
This also hides asyncio usage from tests writer.
See QubesOS/qubes-issues#1800 for details on original feature.
Test base functions of dom0 module (creating VM, setting property) and
configuring system inside of VM (through DispVM). The later is done for
each available template (the process use salt installed in that
template, not copied from dom0).
QubesOS/qubes-issues#3316
When dom0 do not provide the kernel, it should also not set kernel
command line in libvirt config. Otherwise qemu in stubdom fails to start
because it get -append option without -kernel, which is illegal
configuration.
FixesQubesOS/qubes-issues#3339
There may be cases when VM providing the network to other VMs is started
later - for example VM restart. While this is rare case (and currently
broken because of QubesOS/qubes-issues#1426), do not assume it will
always be the case.
Add property for IPv6 address ('ip6'). Build default value similarly to
IPv4 - common prefix + QID or Disp ID (for DispVMs).
This all is disabled unless 'ipv6' feature is enabled. It is inherited
from netvm (not template).
Even when enabled, VM may decide to not use it - or simply not support
it.
QubesOS/qubes-issues#718
Allow using default feature value from netvm, not template. This makes
sense for network-related features like using tor, supporting ipv6 etc.
Similarly to check_with_template, expose it also on Admin API.
Having both default_netvm and default_fw_netvm cause a lot of confusion,
because it isn't clear for the user which one is used when. Additionally
changing provides_network property may also change netvm property, which
may be unintended effect. This as a whole make it hard to:
- cover all netvm-changing actions with policy for Admin API
- cover all netvm-changing events (for example to apply the change to
the running VM, or to check for netvm loops)
As suggested by @qubesuser, kill the default_fw_netvm property and
simplify the logic around it.
Since we're past rc1, implement also migration logic. And add tests for
said migration.
FixesQubesOS/qubes-issues#3247
* qubesos/pr/166:
create "lvm" pool using rootfs thin pool instead of hardcoding qubes_dom0-pool00
change default pool code to be fast
cache PropertyHolder.property_list and use O(1) property name lookups
remove unused netid code
cache isinstance(default, collections.Callable)
don't access netvm if it's None in visible_gateway/netmask
There were many cases were the check was missing:
- changing default_netvm
- resetting netvm to default value
- loading already broken qubes.xml
Since it was possible to create broken qubes.xml using legal calls, do
not reject loading such file, instead break the loop(s) by setting netvm
to None when loop is detected. This will be also useful if still not all
places are covered...
Place the check in default_netvm setter. Skip it during qubes.xml loading
(when events_enabled=False), but still keep it in setter, to _validate_ the
value before any property-* event got fired.
If HVM have PCI device, it can't use PoD, so need 'maxmem' memory to be
started. Request that much from qmemman.
Note that is is somehow independent of enabling or not dynamic memory
management for the VM (`service.meminfo-writer` feature). Even if VM
initially had assigned maxmem memory, it can be later ballooned down.
QubesOS/qubes-issues#3207
* 20171107-storage:
api/admin: add API for changing revisions_to_keep dynamically
storage/file: move revisions_to_keep restrictions to property setter
api/admin: hide dd statistics in admin.vm.volume.Import call
storage/lvm: fix importing different-sized volume from another pool
storage/file: fix preserving spareness on volume clone
api/admin: add pool size and usage to admin.pool.Info response
storage: add size and usage properties to pool object
* 20171107-tests-backup-api-misc:
test: make race condition on xterm close less likely
tests/backupcompatibility: fix handling 'internal' property
backup: fix handling target write error (like no disk space)
tests/backupcompatibility: drop R1 format tests
backup: use offline_mode for backup collection
qubespolicy: fix handling '$adminvm' target with ask action
app: drop reference to libvirt object after undefining it
vm: always log startup fail
api: do not log handled errors sent to a client
tests/backups: convert to new restore handling - using qubesadmin module
app: clarify error message on failed domain remove (used somewhere)
Fix qubes-core.service ordering
currently it takes 100ms+ to determine the default pool every time,
including subprocess spawning (!), which is unacceptable since
finding out the default value of properties should be instantaneous
instead of checking every thin pool to see if the root device is in
it, find and cache the name of the root device thin pool and see if
there is a configured pool with that name
xterm is very fast on closing when application inside terminates. It is
so fast with closing on keydown event that xdotool do not manage to send
keyup event, resulting in xdotool crash. Add a little more time for
that.
When writing process returns an error, prefer reporting that one,
instead of other process (which in most cases will be canceled, so
the error will be meaningless CancelledError).
Then, include sanitized stderr from VM process in the error message.
This object is used solely for manipulating qubes.xml for the backup. Do
not open any connection to hypervisor/QubesDB/libvirt etc. This will
help avoiding various leaks (memory, FD).
Do not try to access that particular object (wrapper) when it got
undefined. If anyone want to access it, appropriate code should do a new
lookup, and probably re-define the object.
Those "errors" are already properly handled, and if necessary logged
independently by appropriate function. In some cases, such logs are
misleading (for example QubesNoSuchPropertyError is a normal thing
happening during qvm-ls).
FixesQubesOS/qubes-issues#3238
Besides converting itself, change how the test verify restore
correctness: first collect VM metadata (and hashes of data) into plain
dict, then compare against it. This allow to destroy old VMs objects
before restoring the backup, so avoid having duplicate objects of the
same VM - which results in weird effects like trying to undefine libvirt
object twice.
Point to system logs for more details. Do not include them directly in
the message for privacy reasons (Admin API client may not be given
permission to it).
QubesOS/qubes-issues#3273QubesOS/qubes-issues#3193
This one pool/volume property makes sense to change dynamically. There
may be more such properties, but lets be on the safe side and take
whitelist approach - allow only selected (just one for now), instead of
blacklisting any harmful ones.
QubesOS/qubes-issues#3256
Do not check for accepted value only in constructor, do that in property
setter. This will allow enforcing the limit regardless of how the value
was set.
This is preparation for dynamic revisions_to_keep change.
QubesOS/qubes-issues#3256
The previous version did not ensure that the stopped/shutdown event was
handled before a new VM start. This can easily lead to problems like in
QubesOS/qubes-issues#3164.
This improved version now ensures that the stopped/shutdown events are
handled before a new VM start.
Additionally this version should be more robust against unreliable
events from libvirt. It handles missing, duplicated and delayed stopped
events.
Instead of one 'domain-shutdown' event there are now 'domain-stopped'
and 'domain-shutdown'. The later is generated after the former. This way
it's easy to run code after the VM is shutdown including the stop of
it's storage.
If domain got removed during the tests (for example DispVM), vm.close()
wouldn't be called in cleanup and some file descriptors will be
leaked. Add event handler for cleaning this up. Do not use close()
method here, because it is destructive, but the object may still be used
by the test.
1. Fire both property-pre-del:netvm and property-del:netvm - those
events should be fired in pairs - especially one may assume the other
will be called too. This is the case here - one disconnect old netvm,
the other connect the new one.
2. Remove spurious 'newvalue' argument for property-del:netvm event.
3. Fix logic for default_fw_netvm/default_netvm usage. The former is
used if vm.provides_network=True.