QubesWatch._register_watches is called from libvirt event callback,
asynchronously to qvm-start. This means that `qubesdb-daemon` may
not be running or populated yet.
If first QubesDB connection (or watch registration) fails, schedule next
try using timers in libvirt event API (as it is base of QubesWatch
mainloop), instead of some sleep loop. This way other events will be
processed in the meantime.
QubesOS/qubes-issues#1110
This makes easier to handle some corner cases. One of them is having
entry without `dir_path` defined. This may happen when migrating from R2
(using backup+restore or in-place) while some DisposableVM was running
(even if not included in the backup itself).
Fixesqubesos/qubes-issues#1124
Reported by @doncohen, thanks @wyory for providing more details.
We use only one device-mapper layer for HVMs, and this isn't the same as
for PV - it is that one, which PV does in initramfs.
Device-mapper layers summary for template-based VMs:
PV: root.img+root-cow.img (dom0) -> xvda, xvda+volatile.img (VM)
HVM: root.img+volatile.img (dom0)
Since libvirt do not support such events (at least for libxl driver), we
need some way to notify qubes-manager when device is attached/detached.
Use the same protocol as for connect/disconnect but on the target
domain.
Define it only when really needed:
- during VM creation - to generate UUID
- just before VM startup
As a consequence we must handle possible exception when accessing
vm.libvirt_domain. It would be a good idea to make this field private in
the future. It isn't possible for now because block_* are external for
QubesVm class.
This hopefully fixes race condition when Qubes Manager tries to access
libvirt_domain (using some QubesVm.*) at the same time as other tool is
removing the domain. Additionally if Qubes Manage would loose that race, it could
define the domain again leaving some unused libvirt domain (blocking
that domain name for future use).
Provide vm.refresh(), which will force to reconnect do QubesDB daemon,
and also get new libvirt object (including new ID, if any). Use this
method whenever QubesDB call returns DisconnectedError exception. Also
raise that exception when someone is trying to talk to not running
QubesDB - instead of returning None.
The statement that unlock_db() is always called directly after save() is
no longer true - tests holds the lock all the time, doing multiple saves
in the middle.
When qfile-dom0-unpacker detects an error, it sends error report to
stdout and terminate (so stdout is closed). That close should be
transferred to the VM process (as EOF on its stdin), which will signal
it to stop sending the data and handle error report.
Also qrexec-client holds the connection until both stdin and
stdout are closed.
So when that EOF is missing, tar2qfile will not detect error report and
still tries to send the data and qrexec-client will hold the
connection while receiving process is long dead.
To prevent that deadlock from happening, close FD in python code, so
qfile-dom0-unpacker will be the last owner of write end of the pipe.
When it closes its stdout, qrexec-client will receive EOF at its stdin.
Otherwise deadlock could happen - the script will try to get read lock
on qubes.xml, while the calling tool can already hold the lock. If that
was write lock (which is in case of qfile-daemon-dvm), the deadlock
occurs.