Libvirt do not show actual block device (loop*) choosen for the device -
only original (file) path. But file path is available in device
description. Please note that VM can provide any description (withing
allowed limits), effectively breaking this check again (hidding the
attachment status). But even without this bug it could do that - by
hidding the whole device from QubesDB.
FixesQubesOS/qubes-issues#2453
When calling _register_watches() multiple times for dom0 (by passing
None or since 7e9c816b by passing the corresponding libvirt domain) the
check was missing if there is already a QubesDB in _qdb. Therefore a new
QubesDB was created and the old one is destroyed by the GC. As a
consequence the watch_fd is closed but the libvirt event handle for this
fd is still registered. So when libvirt calls poll() it returns
immediately POLLNVAL with the closed fd. This is not caught in libvirt
and the callback is called as if an event happened. _qdb_handler() now
calls read_watch() on the new fd for dom0 and thereby hangs the thread.
This leads (at leads) to qubes-manager to miss VM status updates and
block device events.
FixesQubesOS/qubes-issues#2178
Libvirt reports dom0 as "Domain-0". Which is incompatible with how Qubes
and libxl toolstack names it ("dom0"). So handle this as a special case.
Otherwise reconnection retries leaks event object every iteration.
FixesQubesOS/qubes-issues#860
Thanks @alex-mazzariol for help with debugging!
Make sure that even compromised frontend will be cut of (possibly
sensitive - like a webcam) device. On the other hand, if backend domain
is already compromised, it may already compromise frontend domain too,
so none of them would be better to call detach to.
QubesOS/qubes-issues#531
Registering event implementation in libvirt and then not calling it is
harmful, because libvirt expects it working. Known drawbacks:
- keep-alives are advertised as supported but not really sent (cause
dropping connections)
- connections are not closed (sockets remains open, effectively leaking
file descriptors)
So call libvirt.virEventRegisterDefaultImpl only when it will be really
used (libvirt.virEventRunDefaultImpl called), which means calling it in
QubesWatch. Registering events implementation have effect only on new
libvirt connections, so start a new one for QubesWatch.
FixesQubesOS/qubes-issues#1380
QubesVM.start() first creates domain as paused, completes its setup
(including starting qubesdb-daemon and creating appropriate entries),
then resumes the domain. So wait for that resume to be sure that
`qubesdb-daemon` is already running and populated.
QubesOS/qubes-issues#1110
QubesWatch._register_watches is called from libvirt event callback,
asynchronously to qvm-start. This means that `qubesdb-daemon` may
not be running or populated yet.
If first QubesDB connection (or watch registration) fails, schedule next
try using timers in libvirt event API (as it is base of QubesWatch
mainloop), instead of some sleep loop. This way other events will be
processed in the meantime.
QubesOS/qubes-issues#1110
Since libvirt do not support such events (at least for libxl driver), we
need some way to notify qubes-manager when device is attached/detached.
Use the same protocol as for connect/disconnect but on the target
domain.
Define it only when really needed:
- during VM creation - to generate UUID
- just before VM startup
As a consequence we must handle possible exception when accessing
vm.libvirt_domain. It would be a good idea to make this field private in
the future. It isn't possible for now because block_* are external for
QubesVm class.
This hopefully fixes race condition when Qubes Manager tries to access
libvirt_domain (using some QubesVm.*) at the same time as other tool is
removing the domain. Additionally if Qubes Manage would loose that race, it could
define the domain again leaving some unused libvirt domain (blocking
that domain name for future use).
Provide vm.refresh(), which will force to reconnect do QubesDB daemon,
and also get new libvirt object (including new ID, if any). Use this
method whenever QubesDB call returns DisconnectedError exception. Also
raise that exception when someone is trying to talk to not running
QubesDB - instead of returning None.
This makes easier to import right objects in submodules (only one
object). This also implement lazy connection - at first access, not at
module import, which speeds up tools, which doesn't need runtime
information (like qvm-prefs or qvm-service). In the future this will
ease migration from xenstore to QubesDB.
Also implement "offline mode" - operate on qubes.xml without connecting
to VMM - raise exception at such try.
This is needed to run tools during installation, where only minimal
set of services are started, especially no libvirt.
loop device parsing should have "dXpY_style = True" in order to
correctly parse partitions on loop devices.
Reasoning:
==========
Using losetup to create a virtual SD card disk into a loop device and
creating partitions for it results in new devices within an AppVM that
look like: /dev/loop0p1 /dev/loop0p2 and so on.
However as soon as they are created, Qubes Manager rises an exception
and becomes blocked with the following message (redacted):
"QubesException: Invalid device name: loop0p1
at line 639 of file /usr/lib64/python2.7/site-
packages/qubesmanager/main.py
Details:
line: raise QubesException....
func: block_name_to_majorminor
line no.: 181
file: ....../qubes/qubesutils.py
Simply get device major-minor from /dev/ device file.
This is only partial solution, because this will work only for dom0
devices, but the same problem can apply to VM.
Also some major cleanups: Reduce some more code duplication
(verify_hmac, simplify backup_restore_prepare). Rename
backup_dir/backup_tmpdir variables to better match its purpose. Rename
backup_do_copy back to backup_do. Require QubesVm object (instead of VM
name) as appvm param.
This way backup process won't need more than 1GB for temporary files and
also will give more precise progress information. For now it looks like
the slowest element is qrexec, so without such limit, all the data would
be prepared (basically making second copy of it in dom0) while only
first few files would be transfered to the VM.
Also backup progress is calculated based on preparation thread, so when
it finishes there is some other time needed to flush all the data to the
VM. Limiting this amount makes progress somehow more accurate (but still
off by 1GB...).