When VM is shutting down, xenstore entries (especially 'name') can be deleted
before qmemman remove VM from its list. So check if name is defined before
reporting to qubes-manager.
When some VM did't returned memory to Xen, mark this VM as suspicious and abort
balance to always have some xen free_memory margin.
VMs marked as suspicius will be evaluated before next balance and still didn't
returned memory, will be skipped in balance process.
For unknown reason watch '@releaseDomain' is called twice: first when domain
disappeared from xenstore, second when resources (including memory) are freed.
So call do_balance after each of this event to redistribute freed memory.
In fact, set to ALL_PHYS_MEM (and the same for other domains that do not
have static-max key, although there should not be any). Previous method
of using maxmem_kb was broken, as qmemman sets maxmem_kb to the memory target
(which I do not like btw).
Libxl stores maxmem in xenstore (/local/domain/X/memory/static-max) and sets
maxmem and target_mem to actual memory. So qmemman should use xenstore entry as
memory_maximum (when exists) and also adjust maxmem when changing domain memory.
This prevent potential inifinite loop in qmemman when free memory cannot be
assigned to any VM (because of static max). Practically this will never happen,
because dom0 can always accept memory.
There seems to be a problem with xm mem-set, when executed for a value
very close to the current value - the request is ignored; apparently, the
domU kernel imposes some granularity on the request size.
So, if qmemman is asked for, say 470MB, and there is 469MB free, it will try
to milk 1MB from all domains - and this will fail. REQ_SAFETY_NET_FACTOR
does not help in this scenario.
The logs show
req= 1110016 avail= 2503727104.0 donors [('11', 194375270.40000001),...
borrow 90484.1597129 from 11 - so, beg for 90K from a domain
borrow 132239.288652 from 10
borrow 537099.316089 from 0
borrow 148004.024941 from 7
borrow 139834.21573 from 9
borrow 117855.794876 from 8
and then we fail when a domain does not provide this lousy 90KB.
The solution is to ask for actual_need+XEN_FREE_MEM_LEFT, but return if we already
have actual_need+XEN_FREE_MEM_MIN (the latter is 25MB smaller).
A small AppVM (say, with 100MB total) can go below prefmem, and
still not be assigned memory, because of the MIN_TOTAL_MEMORY_TRANSFER
threshold.
So, if AppVM is below prefmem, allow for smaller mem-sets.
Previously, memory_actual (retrieved from xen) was used; it can be inconsistent.
'Memtotal' can be spoofed, but anyway we rely on other fields from /proc/meminfo.
Apparently even if there is not enough xen memory to balloon up,
balloon driver will try to fulfill the request later, when
some memory is freed. Thus, in do_balloon, do not limit mem_set
to the available memory.
Apparently, it interferes:
INFO (XendCheckpoint:417) ERROR Internal error: Could not get vcpu context
INFO (XendCheckpoint:417) ERROR Internal error: Failed to map/save the p2m frame list
Now the balance() has two different cases: enough memory and low_on_memory.
In the former, distribute memory proportianally; in the former, dont do this, as this
makes a VM go below prefmem.
File "/usr/lib64/python2.6/xmlrpclib.py", line 710, in dump_int
raise OverflowError, "int exceeds XML-RPC limits"
OverflowError: int exceeds XML-RPC limits
How crappy.
It can fail e.g. when a domain is being shutdown with a pretty
message like
File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 1322, in setMemoryTarget
(target * 1024))
Error: (1, 'Operation not permitted')