Define it only when really needed:
- during VM creation - to generate UUID
- just before VM startup
As a consequence we must handle possible exception when accessing
vm.libvirt_domain. It would be a good idea to make this field private in
the future. It isn't possible for now because block_* are external for
QubesVm class.
This hopefully fixes race condition when Qubes Manager tries to access
libvirt_domain (using some QubesVm.*) at the same time as other tool is
removing the domain. Additionally if Qubes Manage would loose that race, it could
define the domain again leaving some unused libvirt domain (blocking
that domain name for future use).
Provide vm.refresh(), which will force to reconnect do QubesDB daemon,
and also get new libvirt object (including new ID, if any). Use this
method whenever QubesDB call returns DisconnectedError exception. Also
raise that exception when someone is trying to talk to not running
QubesDB - instead of returning None.
Libvirt will replace domain XML when trying to define the new one with
the same name and UUID - this is exactly what we need. This fixes race
condition with other processes (especially Qubes Manager), which can try
to access that libvirt domain object at the same time.
It have nothing to do with xenstore, so change the name to not mislead.
Also get rid of unused "xid" parameter - we should use XID as little as
possible, because it is not a simple task to keep it current.
It is used by just started DispVM to notice when restore process
completed. Alternatively it could watch its own domid, but lets do it in
Xen-independent way.
When VM is started by root, config file is created with root owner and
user has no write access to it. As the directory is user-writable,
delete the file first.
Conflicts:
core-modules/000QubesVm.py
Do not load qubes.xml again, it can cause race conditions between two
instances of the same VM objects.
Especially when VM is starting ProxyVM to which it is connected,
firewall rules could not be loaded.
Long time ago passio=True was used to replace current process with
qrexec-client directly (qvm-run --pass-io was the called), but this
behaviour is not used anymore (qvm-run was the only user). And this
option was left untouched, with misleading name - one would assume that
using passio=False should disallow any I/O, but this isn't the case.
Especially qvm-sync-clock is calling clockvm.run('...', wait=True),
default value for passio=False. This causes to output data from
untrusted VM, without sanitising terminal sequences, which can be fatal.
This patch changes passio semantic to actually do what it means - when
set to True - VM process will be able to interact with
stdin/stdout/stderr. But when set to False, all those FDs will be
connected to /dev/null.
Conflicts:
core-modules/000QubesVm.py
Otherwise deadlock could happen - the script will try to get read lock
on qubes.xml, while the calling tool can already hold the lock. If that
was write lock (which is in case of qfile-daemon-dvm), the deadlock
occurs.
This is the only place where ID was used - all other places uses name.
Linux qrexec-client accepts both ID and name, but sticking to one option
will simplify things (especially Windows qrexec-client/daemon).
Currently getting Stubdom XID is (the last one?) read directly from
Xenstore as there is no libvirt function for it.
This means that even if HVM is running it can have not connection to
Xenstore. For now give -1 in such situation.
None of found existing portable locking module does support RW locks.
Use lowlevel system locking support - both Windows and Linux support
such feature.
Drop locking code in write_firewall_conf() b/c is is called with
QubesVmCollection lock held anyway.
Currently <vm-dir>/<vm-name>.conf file is used only for debugging
purposes - the real one is passed directly to libvirt, without storing
on disk for it.
In some cases (e.g. qvm-clone) QubesVM.create_config_file() can be
called before VM directory exists and in this case it would fail.
Because it isn't critical fail in any means (the config file will be
recreated on next occasion) just ignore this error.
Final version most likely will have this part of code removed
completely.
Mostly done. Things still using xenstore/not working at all:
- DispVM
- qubesutils.py (especially qvm-block and qvm-usb code)
- external IP change notification for ProxyVM (should be done via RPC
service)
libvirt_domain object needs to be recreated, so force it. Also fix
config path setting (missing extension) - create_config_file
uses it as custom config indicator (if such detected, VM settings -
especially name, would not be updated).
1. Fake dom0 object doesn't need proper maxmem nor vcpus - set
statically to 0 instead of getting from physical host.
2. QubesHVM doesn't preserve maxmem setting, so set it to self.memory
earlier (to suppress default total_memory/2 calculation).
This makes easier to import right objects in submodules (only one
object). This also implement lazy connection - at first access, not at
module import, which speeds up tools, which doesn't need runtime
information (like qvm-prefs or qvm-service). In the future this will
ease migration from xenstore to QubesDB.
Also implement "offline mode" - operate on qubes.xml without connecting
to VMM - raise exception at such try.
This is needed to run tools during installation, where only minimal
set of services are started, especially no libvirt.
Do not recreate them at each startup. This will save some time and also
solve some problems from invalidated libvirt handles after domain
shutdown (e.g. causes qubes-manager crashes).
This requires storing uuid in qubes.xml.
Move DispVM creation to qfile-daemon-dvm/QubesDisposableVm from
qubes-restore. As actual restore is handled by libvirt, we don't get
much from separate qubes-restore process.
This code still needs some improvements, especially on performance.
Check maxmem taking into account the minimum init memory that allows
that requested maximum memory.
Explanation:
Linux kernel needs space for memory-related structures created at boot.
If init_mem is just 400MB, then max_mem can't balloon above 4.3GB (at
which poing it yields "add_memory() failed: -17" messages and apps
crash), regardless of the max_mem_size value.
Based on Marek's findings and my tests on a 16GB PC, using several
processes like:
stress -m 1 --vm-bytes 1g --vm-hang 100
result in the following points:
init_mem ==> actual max memory
400 4300
700 7554
800 8635
1024 11051
1200 12954
1300 14038
1500 14045 <== probably capped on my 16GB system
The actual ratio of max_mem_size/init_mem is surprisingly constant at
10.79
If less init memory is set than that ratio allows, then the set
maxmem is unreachable and the VM becomes unstable (app crashes)
Based on qubes-devel discussion titled "Qubes Dom0 init memory against
Xen best practices?" at:
https://groups.google.com/d/msg/qubes-devel/VRqkFj1IOtA/UgMgnwfxVSIJ
Do not pollute environment of calling process, otherwise all VMs started
from Qubes Manager afterwards will get QREXEC_STARTUP_NOWAIT, which
will cause wait_for_session not working.
When netvm and firewallvm is shut down, netvm handling code will
try to revoke firewallvm access to external IP. But if netvm shutdown
happens in the meantime, xenstore will throw ENOENT error.
In such case show an error to the user (via tray notification, not
dialog box!) and leave the VM in "transient" state. The user can wait
some more time for VM startup, check what VM is doing, or kill it
manually.
Now gui-agent supports reconnect to guid, so start it early to have Xorg
running in the VM.
This is still not done - for example it tries to run some commands via
(not running yet) qrexec.
Some programs (like KDE system settings) makes /etc/localtime hardlink
instead of symlink. Handle this case. Hopefully there will be less and
less such applications...
Alway start stubdom guid, then if guiagent_installed set - start the
target one and when connects, kill stubdom one. This allow the user to
see startup messages so prevent the impression of hang VM.
Note 1: this doesn't work when VM disables SVGA output (just after
windows boot splash screen).
Note 2: gui-daemon sometimes hangs after receiving SIGTERM (libvchan_wait
during libvchan_close). This looks to be stubdom gui agent problem.
This is somehow related to #757, but only first (easier) step. Actual
change of QubesAdminVm base class requires somehow more changes, for
example qvm-ls needs to know how to display this type of VM (none of
template, appvm, netvm).
Make this first step change now, because starting with R2Beta3 dom0 will
be stored in qubes.xml (for new backups purposes) so this rename would
be complicated later.
Using class method allow the users (Qubes Manager at least) to check
for compatibility without having any particular VM instance - useful
while creating the VM.