Commit Graph

50 Commits

Author SHA1 Message Date
Rusty Bird
d7478d128b
storage/reflink: document hardcoded sizeof(int) for FICLONE
One alternative would look like

    import ctypes
    sizeof_int = ctypes.sizeof(ctypes.c_int)
    FICLONE = (1073741824 % 256**sizeof_int) | 37897 | (sizeof_int << 16)

but, even if the above really(?) is a 100% correct Python port of

    $ echo FICLONE | cpp -include linux/fs.h | tail -n 1

it still seems more likely that the ctypes package is somehow buggy
somewhere than for Qubes storage to run on an exotic architecture with
non 32 bit ints (in the foreseeable future).

So just document the baked in assumption.
2019-12-03 18:21:54 +00:00
Rusty Bird
3f0286220c
storage/reflink: simplify _replace_file() comment 2019-12-03 18:21:52 +00:00
Rusty Bird
9d5deffb13
storage/reflink: open in binary mode for loopdev resize ioctl
The default (= text) mode for a loop device which contains a VM image
looked weird, even though it didn't make a difference here because the
dev_io object was never actually read from.
2019-12-03 18:21:51 +00:00
Rusty Bird
4cd9e42416
storage/reflink: use a conditional expression 2019-12-03 18:21:50 +00:00
Marek Marczykowski-Górecki
5fa75d73be
Make pylint happy 2019-09-27 16:29:20 +02:00
Rusty Bird
6e592a56d7
storage/reflink: update comment and whitespace 2019-06-28 10:29:31 +00:00
Rusty Bird
1fe2aff643
storage/reflink: _fsync_path(path_from) in _commit()
During regular VM shutdown, the VM should sync() anyway. (And
admin.vm.volume.Import does fdatasync(), which is also fine.) But let's
be extra careful.
2019-06-28 10:29:30 +00:00
Rusty Bird
df23720e4e
storage/reflink: _fsync_dir() -> _fsync_path()
Make it work when passed a file, too.
2019-06-28 10:29:29 +00:00
Rusty Bird
35b4b915e7
storage/reflink: coroutinize pool setup() 2019-06-28 10:29:27 +00:00
Rusty Bird
2b4b45ead8
storage/reflink: preferably get volume size from image size
There were (at least) five ways for the volume's nominal size and the
volume image file's actual size to desynchronize:

- loading a stale qubes.xml if a crash happened right after resizing the
  image but before saving the updated qubes.xml (-> previously fixed)
- restarting a snap_on_start volume after resizing the volume or its
  source volume (-> previously fixed)
- reverting to a differently sized revision
- importing a volume
- user tinkering with image files

Rather than trying to fix these one by one and hoping that there aren't
any others, override the volume size getter itself to always update from
the image file size. (If the getter is called though the storage API, it
takes the volume lock to avoid clobbering the nominal size when resize()
is running concurrently.)
2019-06-23 12:48:00 +00:00
Rusty Bird
de159a1e1e
storage/reflink: don't run verify() under volume lock 2019-06-23 12:47:59 +00:00
Rusty Bird
4c9c0a88d5
storage/reflink: split @_unblock into @_coroutinized @_locked
And change the volume lock from an asyncio.Lock to a threading.Lock -
locking is now handled before coroutinization.

This will allow the coroutinized resize() and a new *not* coroutinized
size() getter from one of the next commits ("storage/reflink: preferably
get volume size from image size") to both run under the volume lock.
2019-06-23 12:47:58 +00:00
Rusty Bird
ab929baa38
Revert "storage/reflink: update volume size in __init__()"
Fixed more generally in one of the next commits ("storage/reflink:
preferably get volume size from image size").

This reverts commit 9f5d05bfde.
2019-06-23 12:47:57 +00:00
Rusty Bird
58a7e0f158
Revert "storage/reflink: update snap_on_start volume size in start()"
Fixed more generally in one of the next commits ("storage/reflink:
preferably get volume size from image size").

This reverts commit c43df968d5.
2019-06-23 12:47:56 +00:00
Rusty Bird
ef128156a3
storage/reflink: volume.resize(): succeed without image
Successfully resize volumes without any currently existing image file,
e.g. cleanly stopped volatile volumes: Just update the nominal size in
this case.
2019-06-15 16:03:46 +00:00
Rusty Bird
25e92bd19b
storage/reflink: volume.resize(): remove safety check
It is handled by 'qvm-volume resize', which has a '--force' option that
can't work if the check is duplicated in the storage driver.
2019-06-15 16:03:45 +00:00
Rusty Bird
30b92f8845
storage/reflink: simplify volume.usage() 2019-06-15 16:03:43 +00:00
Rusty Bird
9f5d05bfde
storage/reflink: update volume size in __init__() 2019-06-15 16:03:42 +00:00
Rusty Bird
c43df968d5
storage/reflink: update snap_on_start volume size in start() 2019-06-15 16:03:41 +00:00
Rusty Bird
73db2751b8
storage/reflink: make resize()/import_volume() more readable 2018-10-29 20:21:41 +00:00
Rusty Bird
425d993769
storage/reflink: unblock import_data() and import_data_end() 2018-10-29 20:21:39 +00:00
Rusty Bird
5756e870bd
storage/reflink: use context managers in is_supported()
Don't rely on garbage collection to close and remove the tempfiles.
2018-09-13 19:46:48 +00:00
Rusty Bird
c75fe09814
storage/reflink: is_reflink_supported() -> is_supported() 2018-09-11 23:50:16 +00:00
Rusty Bird
1889c9b75f
storage/reflink: run synchronous volume methods in executor
Convert create(), verify(), remove(), start(), stop(), revert(),
resize(), and import_volume() into coroutine methods, via a decorator
that runs them in the event loop's thread-based default executor.

This reduces UI hangs by unblocking the event loop, and can e.g. speed
up VM starts by starting multiple volumes in parallel.
2018-09-11 23:50:15 +00:00
Rusty Bird
3d986be02a
storage/reflink: native FICLONE in _copy_file() happy path
Avoid a subprocess launch, and distinguish reflink vs. fallback copy in
the log.
2018-09-11 23:50:14 +00:00
Rusty Bird
edda3a1734
storage/reflink: factor out _ficlone() 2018-09-11 23:50:13 +00:00
Rusty Bird
69af0a48ec
storage/reflink: inline and simplify _cmd() 2018-09-11 23:50:12 +00:00
Rusty Bird
fb06a8089a
storage/reflink: _update_loopdev_sizes() without losetup
Factor out a function, and use the LOOP_SET_CAPACITY ioctl instead of
going through losetup.
2018-09-11 23:50:10 +00:00
Rusty Bird
385ba91772
storage/reflink: resize(): don't look for loopdevs if clean 2018-09-09 20:01:22 +00:00
Rusty Bird
e7b7c253ac
storage/reflink: inline _require_self_on_stop() 2018-09-09 20:01:20 +00:00
Rusty Bird
6e8d7d4201
storage/reflink: no-op import_volume() if not save_on_stop
Instead of raising a NotImplementedError, just return self like 'file'
and lvm_thin. This is needed when Storage.clone() is modified in another
commit* to no longer swallow exceptions.

* "storage: factor out _wait_and_reraise(); fix clone/create"
2018-09-09 20:01:19 +00:00
Rusty Bird
60bf68a748
storage/reflink: add _path_import (don't reuse _path_dirty)
Import volume data to a new _path_import (instead of _path_dirty) before
committing to _path_clean. In case the computer crashes while an import
operation is running, the partially written file should not be attached
to Xen on the next volume startup.

Use <name>-import.img as the filename like 'file' does, to be compatible
with qubes.tests.api_admin/TC_00_VMs/test_510_vm_volume_import.
2018-09-09 20:01:18 +00:00
Rusty Bird
d301aa2e50
storage/reflink: delete stale tempfiles on start and remove
When the AT_REPLACE flag for linkat() finally lands in the Linux kernel,
_replace_file() can be modified to use unnamed (O_TMPFILE) tempfiles.
Until then, make sure stale tempfiles from previous crashes can't hang
around for too long.
2018-09-09 20:01:17 +00:00
Rusty Bird
75a4a1340e
storage/reflink: don't recompute static properties per call 2018-09-09 20:01:15 +00:00
Rusty Bird
ef2698adb4
storage/reflink: make revisions() more readable, use iglob 2018-09-09 20:01:14 +00:00
Rusty Bird
18f9356c2c
storage/reflink: refuse to revert() dirty volume 2018-09-09 20:01:13 +00:00
Rusty Bird
677183d8a6
storage/reflink: add revision even if empty
It's sort of useful to be able to revert a volume that has only ever
been started once to its empty state. And the lvm_thin driver allows it
too, so why not.
2018-09-09 20:01:12 +00:00
Rusty Bird
850778b52a
storage/reflink: remove redundant format specifiers 2018-09-09 20:01:11 +00:00
Marek Marczykowski-Górecki
d6b422cc36
Merge remote-tracking branch 'qubesos/pr/207'
* qubesos/pr/207:
  storage/reflink: strictly increasing revision ID
2018-03-22 01:54:38 +01:00
Rusty Bird
6a303760e9
storage/reflink: strictly increasing revision ID
Don't rely on timestamps to sort revisions - the clock can go backwards
due to time sync. Instead, use a monotonically increasing natural number
as the revision ID.

Old revision example: private.img@2018-01-02-03T04:05:06Z (ignored now)
New revision example: private.img.123@2018-01-02-03T04:05:06Z
2018-03-21 16:00:13 +00:00
Marek Marczykowski-Górecki
e5413a3036
Merge branch 'storage-properties'
* storage-properties:
  storage: use None for size/usage properties if unknown
  tests: call search_pool_containing_dir with various dirs and pools
  storage: make DirectoryThinPool helper less verbose, add sudo
  api/admin: add 'included_in' to admin.pool.Info call
  storage: add Pool.included_in() method for checking nested pools
  storage: move and generalize RootThinPool helper class
  storage/kernels: refuse changes to 'rw' and 'revisions_to_keep'
  api/admin: implement admin.vm.volume.Set.rw method
  api/admin: include 'revisions_to_keep' and 'is_outdated' in volume info
2018-03-21 01:43:53 +01:00
Marek Marczykowski-Górecki
d40fae9756
storage: add Pool.included_in() method for checking nested pools
It may happen that one pool is inside a volume of other pool. This is
the case for example for varlibqubes pool (file driver,
dir_path=/var/lib/qubes) and default lvm pool (lvm_thin driver). The
latter include whole root filesystem, so /var/lib/qubes too.
This is relevant for proper disk space calculation - to not count some
space twice.

QubesOS/qubes-issues#3240
QubesOS/qubes-issues#3241
2018-03-20 16:53:39 +01:00
Rusty Bird
1743c76ca9
storage/reflink: reorder start() to be more readable
This also makes slightly more sense in the exotic (and currently unused)
case of restarting a crashed snap_on_start *and* save_on_stop volume.
2018-03-12 16:38:56 +00:00
Rusty Bird
31810db977
storage/reflink: simplify 2018-03-11 17:39:51 +00:00
Rusty Bird
c382eb3752
storage/reflink: let _remove_empty_dir() ignore ENOTEMPTY 2018-03-11 17:39:51 +00:00
Rusty Bird
023cb49293
storage/reflink: show size in refused volume shrink message
Like e6bb282 did for lvm.
2018-03-11 15:34:56 +00:00
Rusty Bird
c31d317c63
storage/reflink: fsync() after resizing existing file
Ensure that the updated metadata is written to disk.
2018-03-11 15:34:55 +00:00
Rusty Bird
37e1aedfa3
reflink: style fix 2018-02-16 21:47:39 +00:00
Rusty Bird
c871424fb0
storage: typo fix 2018-02-16 21:47:37 +00:00
Rusty Bird
1695a732b8
file-reflink, a storage driver optimized for CoW filesystems
This adds the file-reflink storage driver. It is never selected
automatically for pool creation, especially not the creation of
'varlibqubes' (though it can be used if set up manually).

The code is quite small:

               reflink.py  lvm.py      file.py + block-snapshot
    sloccount  334 lines   447 (134%)  570 (171%)

Background: btrfs and XFS (but not yet ZFS) support instant copies of
individual files through the 'FICLONE' ioctl behind 'cp --reflink'.
Which file-reflink uses to snapshot VM image files without an extra
device-mapper layer. All the snapshots are essentially freestanding;
there's no functional origin vs. snapshot distinction.

In contrast to 'file'-on-btrfs, file-reflink inherently avoids
CoW-on-CoW. Which is a bigger issue now on R4.0, where even AppVMs'
private volumes are CoW. (And turning off the lower, filesystem-level
CoW for 'file'-on-btrfs images would turn off data checksums too, i.e.
protection against bit rot.)

Also in contrast to 'file', all storage features are supported,
including

    - any number of revisions_to_keep
    - volume.revert()
    - volume.is_outdated
    - online fstrim/discard

Example tree of a file-reflink pool - *-dirty.img are connected to Xen:

    - /var/lib/testpool/appvms/foo/volatile-dirty.img
    - /var/lib/testpool/appvms/foo/root-dirty.img
    - /var/lib/testpool/appvms/foo/root.img
    - /var/lib/testpool/appvms/foo/private-dirty.img
    - /var/lib/testpool/appvms/foo/private.img
    - /var/lib/testpool/appvms/foo/private.img@2018-01-02T03:04:05Z
    - /var/lib/testpool/appvms/foo/private.img@2018-01-02T04:05:06Z
    - /var/lib/testpool/appvms/foo/private.img@2018-01-02T05:06:07Z
    - /var/lib/testpool/appvms/bar/...
    - /var/lib/testpool/appvms/...
    - /var/lib/testpool/template-vms/fedora-26/...
    - /var/lib/testpool/template-vms/...

It looks similar to a 'file' pool tree, and in fact file-reflink is
drop-in compatible:

    $ qvm-shutdown --all --wait
    $ systemctl stop qubesd
    $ sed 's/ driver="file"/ driver="file-reflink"/g' -i.bak /var/lib/qubes/qubes.xml
    $ systemctl start qubesd
    $ sudo rm -f /path/to/pool/*/*/*-cow.img*

If the user tries to create a fresh file-reflink pool on a filesystem
that doesn't support reflinks, qvm-pool will abort and mention the
'setup_check=no' option. Which can be passed to force a fallback on
regular sparse copies, with of course lots of time/space overhead. The
same fallback code is also used when initially cloning a VM from a
foreign pool, or from another file-reflink pool on a different
mountpoint.

'journalctl -fu qubesd' will show all file-reflink copy/rename/remove
operations on VM creation/startup/shutdown/etc.
2018-02-12 21:20:05 +00:00