First the main bug: when meminfo xenstore watch fires, in some cases
(just after starting some domain) XS_Watcher refreshes internal list of
domains before processing the event. This is done specifically to
include new domain in there. But the opposite could happen too - the
domain could be destroyed. In this case refres_meminfo() function raises
an exception, which isn't handled and interrupts the whole xenstore
watch loop. This issue is likely to be triggered by killing the domain,
as this way it could disappear shortly after writing updated meminfo
entry. In case of proper shutdown, meminfo-writer is stopped earlier and
do not write updates just before domain destroy.
Fix this by checking if the requested domain is still there just after
refreshing the list.
Then, catch exceptions in xenstore watch handling functions, to not
interrupt xenstore watch loop. If it gets interrupted, qmemman basically
stops memory balancing.
And finally, clear force_refresh_domain_list flag after refreshing the
domain list. That missing line caused domain refresh at every meminfo
change, making it use some more CPU time.
While at it, change "EOF" log message to something a bit more
meaningful.
Thanks @conorsch for capturing valuable logs.
FixesQubesOS/qubes-issues#4890
meminfo (written by VM) is expected report KiB, but qmemman internally
use bytes. Convert units.
And also move obscure unit conversion in is_meminfo_suspicious to more
logical place in sanitize_and_parse_meminfo.
Filter in python3 returns a generator, can be iterated only once.
This is about list of existing domains - store it as a list, otherwise
domains will "disappear" after being discovered.
The following list is bollocks. There were many, many more.
Conflicts:
core-modules/003QubesTemplateVm.py
core-modules/005QubesNetVm.py
core/qubes.py
core/storage/__init__.py
core/storage/xen.py
doc/qvm-tools/qvm-pci.rst
doc/qvm-tools/qvm-prefs.rst
qubes/tools/qmemmand.py
qvm-tools/qvm-create
qvm-tools/qvm-prefs
qvm-tools/qvm-start
tests/__init__.py
vm-config/xen-vm-template-hvm.xml
This commit took 2 days (26-27.01.2016) and put our friendship to test.
--Wojtek and Marek
This is part of fixing qvm-start.
qmemman was moved with minimal touching, mainly module names.
Moved function parsing human-readable sizes from core2. This function is
wrong, because it treats k/M/G as 1024-based, but leave it for now.