One of the many features OpenZFS brings to the table is ZFS native encryption. First introduced in OpenZFS 0.8, native encryption allows a system administrator to transparently encrypt data at-rest within ZFS itself. This obviates the need for separate tools like LUKS, VeraCrypt, or BitLocker.
OpenZFS encryption algorithm defaults to either aes-256-ccm
(prior to 0.8.4) or aes-256-gcm
(>= 0.8.4) when encryption=on
is set. But it may also be specified directly. Currently supported algorithms are:
aes-128-ccm
aes-192-ccm
aes-256-ccm
(default in OpenZFS < 0.8.4)aes-128-gcm
aes-192-gcm
aes-256-gcm
(default in OpenZFS >= 0.8.4)
There's more to OpenZFS native encryption than the algorithms used, though—so we'll try to give you a brief but solid grounding in the sysadmin's-eye perspective on the "why" and "what" as well as the simple "how."
Why (or why not) OpenZFS native encryption?
A clever sysadmin who wants to provide at-rest encryption doesn't actually need OpenZFS native encryption, obviously. As mentioned in the introduction, LUKS
, VeraCrypt
, and many other schemes are available and can be layered either beneath or atop OpenZFS itself.
First, the “why not”
Putting something like Linux's LUKS
underneath OpenZFS has an advantage—with the entire disk encrypted, an enterprising attacker can no longer see the names, sizes, or properties of ZFS datasets
and zvols
without access to the key. In fact, the attacker can't necessarily see that ZFS is in use at all!
But there are significant disadvantages to putting LUKS
(or similar) beneath OpenZFS. One of the gnarliest is that each individual disk which will be part of the pool must be encrypted, with each volume loaded and decrypted prior to the ZFS pool import
stage. This can be a noticeable challenge for ZFS systems with many disks—in some cases, many tens of disks. Another problem with encryption-beneath-ZFS is that the extra layer is an extra thing to go wrong—and it's in a position to undo all of ZFS' normal integrity guarantees.
Putting LUKS
or similar atop OpenZFS gets rid of the aforementioned problems—a LUKS
encrypted zvol
only needs one key regardless of how many disks are involved, and the LUKS
layer cannot undo OpenZFS' integrity guarantees from here. Unfortunately, encryption-atop-ZFS introduces a new problem—it effectively nerfs OpenZFS inline compression, since encrypted data is generally incompressible. This approach also requires the use of one zvol
per encrypted filesystem, along with a guest filesystem (e.g., ext4
) to format the LUKS
volume itself with.
Now, the “why”
OpenZFS native encryption splits the difference: it operates atop the normal ZFS storage layers and therefore doesn't nerf ZFS' own integrity guarantees. But it also doesn't interfere with ZFS compression—data is compressed prior to being saved to an encrypted dataset
or zvol
.
rsync
—and raw send makes it possible not only to replicate encrypted dataset
s and zvol
s, but to do so without exposing the key to the remote system.
This means that you can use ZFS replication to back up your data to an untrusted location, without concerns about your private data being read. With raw send, your data is replicated without ever being decrypted—and without the backup target ever being able to decrypt it at all. This means you can replicate your offsite backups to a friend's house or at a commercial service like rsync.net or zfs.rent without compromising your privacy, even if the service (or friend) is itself compromised.
In the event that you need to recover your offsite backup, you can simply replicate it back to your own location—then, and only then, loading the decryption key to actually access the data. This works for either full replication (moving every single block across the wire) or asynchronous incremental replication (beginning from a commonly held snapshot and only moving the blocks which have changed since that snapshot).
What’s encrypted—and what isn’t?
OpenZFS native encryption isn't a full-disk encryption scheme—it's enabled or disabled on a per-dataset / per-zvol basis, and it cannot be turned on for entire pools as a whole. The contents of encrypted datasets or zvols are protected from at-rest spying—but the metadata describing the datasets/zvols themselves is not.
Let's say we create an encrypted dataset named pool/encrypted
, and beneath it we create several more child datasets. The encryption
property for the children is inherited by default from the parent dataset, so we can see the following:
root@banshee:~# zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase banshee/encrypted
Enter passphrase:
Re-enter passphrase:
root@banshee:~# zfs create banshee/encrypted/child1
root@banshee:~# zfs create banshee/encrypted/child2
root@banshee:~# zfs create banshee/encrypted/child3
root@banshee:~# zfs list -r banshee/encrypted
NAME USED AVAIL REFER MOUNTPOINT
banshee/encrypted 1.58M 848G 432K /banshee/encrypted
banshee/encrypted/child1 320K 848G 320K /banshee/encrypted/child1
banshee/encrypted/child2 320K 848G 320K /banshee/encrypted/child2
banshee/encrypted/child3 320K 848G 320K /banshee/encrypted/child3
root@banshee:~# zfs get encryption banshee/encrypted/child1
NAME PROPERTY VALUE SOURCE
banshee/encrypted/child1 encryption aes-256-gcm -
At the moment, our encrypted datasets are all mounted. But even if we unmount them and unload the encryption key—making them inaccessible—we can still see that they exist, along with their properties:
root@banshee:~# wget -qO /banshee/encrypted/child2/HuckFinn.txt http://textfiles.com/etext/AUTHORS/TWAIN/huck_finn
root@banshee:~# zfs unmount banshee/encrypted
root@banshee:~# zfs unload-key -r banshee/encrypted
1 / 1 key(s) successfully unloaded
root@banshee:~# zfs mount banshee/encrypted
cannot mount 'banshee/encrypted': encryption key not loaded
root@banshee:~# ls /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory
root@banshee:~# zfs list -r banshee/encrypted
NAME USED AVAIL REFER MOUNTPOINT
banshee/encrypted 2.19M 848G 432K /banshee/encrypted
banshee/encrypted/child1 320K 848G 320K /banshee/encrypted/child1
banshee/encrypted/child2 944K 848G 720K /banshee/encrypted/child2
banshee/encrypted/child3 320K 848G 320K /banshee/encrypted/child3
As we can see above, after unloading the encryption key, we can no longer see our freshly-downloaded copy of Huckleberry Finn in /banshee/encrypted/child2/
. What we can still see is the existence—and structure—of our entire ZFS-encrypted tree. We can also see each encrypted dataset's properties, including but not limited to the USED
, AVAIL
, and REFER
of each dataset.
It's worth noting that trying to ls
an encrypted dataset which doesn't have its key loaded won't necessarily produce an error:
root@banshee:~# zfs get keystatus banshee/encrypted
NAME PROPERTY VALUE SOURCE
banshee/encrypted keystatus unavailable -
root@banshee:~# ls /banshee/encrypted
root@banshee:~#
This is because a naked directory exists on the host, even when the actual dataset is not mounted. Reloading the key doesn't automatically remount the dataset, either:
root@banshee:~# zfs load-key -r banshee/encrypted
Enter passphrase for 'banshee/encrypted':
1 / 1 key(s) successfully loaded
root@banshee:~# zfs mount | grep encr
root@banshee:~# ls /banshee/encrypted
root@banshee:~# ls /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory
In order to access our fresh copy of Huckleberry Finn, we'll also need to actually mount the freshly key-reloaded datasets:
root@banshee:~# zfs get keystatus banshee/encrypted/child2
NAME PROPERTY VALUE SOURCE
banshee/encrypted/child2 keystatus available -
root@banshee:~# ls -l /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory
root@banshee:~# zfs mount -a
root@banshee:~# ls -lh /banshee/encrypted/child2
total 401K
-rw-r--r-- 1 root root 554K Jun 13 2002 HuckFinn.txt
Now that we've both loaded the necessary key and mounted the datasets, we can see our encrypted data again.
How do I shot encryption?
Now that we've covered the why and the what, let's talk more about the how. These are the important commands and arguments to know:
zfs create
The first step of ZFS encryption is creating the encrypted dataset or zvol itself. You can't encrypt a pre-existing dataset or zvol—it needs to be created that way from the start. (You can, if necessary, full-replicate an existing dataset onto a new child dataset of an encrypted parent, thereby preserving snapshot history and other niceties. Just delete the source after replication has completed and you've verified that everything is good.)
zfs create
-o encryption=[algorithm] -o keylocation=[location] -o keyformat=[format] poolname/datasetname
Now, let's tackle each of the bolded arguments. Encryption is a no-brainer, if you've been reading along: if you set this to on
, you'll get the default encryption algorithm—a variant of aes-256
dependent on your particular OpenZFS version. Otherwise, you can specify the algorithm directly—currently, supporting aes
in widths 128
, 192
, or 256
in either ccm
or gcm
variants. We currently recommend aes-256-gcm
strictly on the basis of it being the default algo used in the newest currently available versions of OpenZFS!
Keylocation
can be either prompt
(which prompts you to type it in when necessary) or the path to a keyfile (in the form file:///path/to/keyfile
). Unfortunately, you cannot easily switch from interactive prompt
to a keyfile once the dataset or zvol is created—so some planning ahead is called for here. Using a keyfile allows encrypted ZFS volumes to be automatically mounted at boot—using prompt
necessitates a human manually unlock the encrypted volumes every time.
Keyformat
can be either passphrase
, hex
, or raw
. Passphrases must be between 8 and 512 bytes long, while both hex
and raw
keys must be precisely 32 bytes long. You can generate a raw key with dd if=/dev/urandom bs=32 count=1 of=/path/to/keyfile
.
The keylocation
is stored as a ZFS property of any dataset or zvol which uses it. This allows automatic mounting of encrypted datasets and zvols, when that location is a file rather than an interactive prompt.
A final note: ZFS doesn't actually encrypt your data directly with a supplied passphrase; it encrypts your data with a pseudo-randomly generated master key. Your passphrase unlocks that master-key, which then becomes available for use working with the volume itself!
zfs load-key
After creating your encrypted dataset, it's automatically mounted—but it won't be, after the system reboots. In order to make your encrypted volumes accessible again, you'll use the zfs load-key
command.
zfs load-key
[-nr] [-L location] [-a] poolname/zvol-or-dataset
As usual, the optional -r
is for recursive
, and it will load both the volume specified and any child volumes. -a
for all
goes a step further and loads keys for all encrypted volumes found on all currently imported pools.
Also as usual for Linux utilities, -n
is short for no-op
, aka dry run. Performing, for example, zfs load-key -n -a
checks that all keys specified in the keylocation
properties of encrypted datasets and zvols are available and correct—but it does this without actually changing the key or mount status.
Finally, -L location
overrides the keylocation
property stored in the ZFS dataset or zvol itself. For example, let's say you created an encrypted dataset using the interactively prompted passphrase correct horse battery staple
. You can type that phrase into /home/me/myzfskey.txt
, then do zfs load-key -L file:///home/me/myzfskey.txt poolname/datasetname
, and the key will load just fine.
A warning: simply loading the key doesn't actually mount your dataset or zvol—that's a separate operation which needs to be performed after the key is loaded, e.g., zfs mount -a
or zfs mount poolname/datasetname
.
If you'd like keys to be automatically loaded at boot time, the Arch Linux wiki has sample systemd
service definitions available.
zfs unload-key
As long as it's not busy
(generally meaning mounted), you can unload the key from an encrypted volume. This command shares -r
and -a
flags with zfs load-key
. Once unloaded, zfs get keystatus poolname/datasetname
will return unavailable
, and the volume in question may not be mounted until the key is reloaded.
Remember: volumes without their keys loaded cannot be read from or written to, but they can be replicated using the -w
flag on zfs send
! Maintenance-level ZFS operations such as scrubbing, resilvering, and even (dataset or zvol level) renaming also work fine on locked volumes.
zfs change-key
This command can be used to alter the keylocation
, keyformat
, and/or pbkdf2iters
properties of an encrypted volume, along with the actual key itself. Using zfs change-key
does not require re-encrypting all data in affected volumes—the master key remains unchanged; what zfs change-key
does is alter encryption-related ZFS properties, and/or change the passphrase used to unlock the master-key.
zfs change-key
[-l] [-o keylocation=location] [-o keyformat=format] [-o pbkdf2iters=value] poolname/dataset-or-zvol
-l
(lowercase L) loads the key prior to changing it—this is functionally equivalent to zfs load-key poolname/dataset ; zfs change-key poolname/dataset
.
keyformat
and keylocation
are just what they sound like: see zfs load-key
above for explanation. pbkdf2iters
is the number of PBKDF iterations applied to a passphrase in order to generate a master-key—it defaults to 100
and should not be messed with by mere mortals who don't know exactly what that means (and what its implications are) without further reading.
Remember, we're not changing the master key, only the user's "wrapped" key—so changing this key does not categorically guarantee the security of data whose original wrapper key was compromised. A sufficiently advanced attacker could have figured out the master key after compromising a user-wrapped key—so in the event of a serious compromise, you'll want to rewrite the dataset entirely (e.g., by replicating to a new dataset, child of a parent encrypted with a different key).
zfs send
zfs send
and zfs receive
are the same commands, used in the same way, for normal ZFS replication and work just the same as they always did... with a few very important caveats. In order to use encrypted replication, you must specify the -w
argument to zfs send
. Using -w
sends data exactly as it is already stored on-disk—which leaves compression intact, in addition to encryption.
If you forget to use -w
when zfs send
ing a dataset with its key loaded, the replication will work—but the target will be unencrypted!
Remember, ZFS replication is based on snapshots—so you'll need to create, manage, and work with snapshots in order to take advantage of replication. Without additional tooling, that process looks something like this:
root@banshee:~# zfs snapshot banshee/encrypted-source@snapshot1
root@banshee:~# zfs send -w banshee/encrypted-source@snapshot1 | zfs receive banshee/encrypted-target
root@banshee:~# zfs snapshot banshee/encrypted-source@snapshot2
root@banshee:~# zfs send -w -I banshee/encrypted-source@snapshot1 banshee/encrypted-source@snapshot2 | zfs receive banshee/encrypted-target
root@banshee:~# zfs list -rt snap banshee/encrypted-target
NAME USED AVAIL REFER MOUNTPOINT
banshee/encrypted-target@snapshot1 16K - 672K -
banshee/encrypted-target@snapshot2 0B - 672K -
root@banshee:~# zfs get keystatus banshee/encrypted-target
NAME PROPERTY VALUE SOURCE
banshee/encrypted-target keystatus unavailable -
In the above example, we perform first a full replication based on @snapshot1
, then an incremental replication from @snapshot1
to @snapshot2
. We can verify that the snapshots are present and correct on the target with zfs list -rt snapshot
, even without the key loaded—but attempting to access the actual data fails, since the key's not loaded for the target dataset, and it's also not mounted.
To access the data, we can either zfs load-key
and zfs mount
the target dataset, or we can zfs send -w
it right back to the source again.
If all this seems too complex and difficult to automate, my own tool syncoid can help—just specify --sendoptions=w
and you're off to the races; syncoid
automatically handles matching up common snapshots, and it even creates snapshots if necessary.
(In addition to master versions at the provided Github link, syncoid
's parent package sanoid
is currently available from most major distributions' default repositories—e.g., apt install sanoid
on Ubuntu 20.04.)
root@banshee:~# zfs destroy -r banshee/encrypted-target
root@banshee:~# syncoid --compress=none --sendoptions=w banshee/encrypted-source banshee/encrypted-target
INFO: Sending oldest full snapshot banshee/encrypted-source@1 (~ 491 KB) to new target filesystem:
418KiB 0:00:00 [53.2MiB/s] [===========================> ] 85%
INFO: Updating new target filesystem with incremental banshee/encrypted-source@1 ... syncoid_banshee_2021-06-23:17:31:24 (~ 19 KB):
52.5KiB 0:00:00 [ 391KiB/s] [================================] 269%
root@banshee:~# zfs list -rt snap banshee/encrypted-target
NAME USED AVAIL REFER MOUNTPOINT
banshee/encrypted-target@snapshot1 16K - 672K -
banshee/encrypted-target@snapshot2 16K - 672K -
banshee/encrypted-target@syncoid_banshee_2021-06-23:17:31:24 0B - 672K -
Note that we also used --compress=none
, since the encrypted source data won't be compressible. Any ZFS compression applied on-disk will still be intact with or without --compress=none
—this syncoid
option only compresses data in-flight and has no bearing on on-disk usage. If you forget to specify --compress=none
, it won't hurt anything; you'll just waste a few CPU cycles on the very fast and low-impact lzo
default in-flight compression.
Conclusion
Hopefully, you've got a much better handle on ZFS encryption now than you did going into this article. As is the case with many ZFS features, on-disk encryption is something that was already available and, in some ways, was simpler in the original layers. But wrapping it into the OpenZFS system directly allows for many interesting new possibilities—in particular, the "raw send" concept which allows OpenZFS' excellent asynchronous incremental replication to function without loaded keys can be a game-changer.
Please remember that zfs load-key
and zfs mount
really, really matter here—and they matter on both source and target. If you encounter a situation in which all your files appear to be gone but no obvious errors are being thrown, you almost certainly just forgot to mount the dataset—or, in some cases, mounted it in a different place than you expected to.
If you need help with ZFS encryption, ZFS replication, or even the sanoid
/ syncoid
orchestration tools briefly mentioned in this article, you may want to take a look at r/zfs, as well as the comment section of this article itself.
"start" - Google News
June 24, 2021 at 05:52AM
https://ift.tt/3qnXUM1
A quick-start guide to OpenZFS native encryption - Ars Technica
"start" - Google News
https://ift.tt/2yVRai7
https://ift.tt/2WhNuz0
Bagikan Berita Ini
0 Response to "A quick-start guide to OpenZFS native encryption - Ars Technica"
Post a Comment