Search

A quick-start guide to OpenZFS native encryption - Ars Technica

paksijenong.blogspot.com
Close-up photograph of a padlock.
Enlarge / On-disk encryption is a complex topic, but this article should give you a solid handle on OpenZFS' implementation.

One of the many features OpenZFS brings to the table is ZFS native encryption. First introduced in OpenZFS 0.8, native encryption allows a system administrator to transparently encrypt data at-rest within ZFS itself. This obviates the need for separate tools like LUKS, VeraCrypt, or BitLocker.

OpenZFS encryption algorithm defaults to either aes-256-ccm (prior to 0.8.4) or aes-256-gcm (>= 0.8.4) when encryption=on is set. But it may also be specified directly. Currently supported algorithms are:

  • aes-128-ccm
  • aes-192-ccm
  • aes-256-ccm (default in OpenZFS < 0.8.4)
  • aes-128-gcm
  • aes-192-gcm
  • aes-256-gcm (default in OpenZFS >= 0.8.4)

There's more to OpenZFS native encryption than the algorithms used, though—so we'll try to give you a brief but solid grounding in the sysadmin's-eye perspective on the "why" and "what" as well as the simple "how."

Why (or why not) OpenZFS native encryption?

A clever sysadmin who wants to provide at-rest encryption doesn't actually need OpenZFS native encryption, obviously. As mentioned in the introduction, LUKS, VeraCrypt, and many other schemes are available and can be layered either beneath or atop OpenZFS itself.

First, the “why not”

Putting something like Linux's LUKS underneath OpenZFS has an advantage—with the entire disk encrypted, an enterprising attacker can no longer see the names, sizes, or properties of ZFS datasets and zvols without access to the key. In fact, the attacker can't necessarily see that ZFS is in use at all!

But there are significant disadvantages to putting LUKS (or similar) beneath OpenZFS. One of the gnarliest is that each individual disk which will be part of the pool must be encrypted, with each volume loaded and decrypted prior to the ZFS pool import stage. This can be a noticeable challenge for ZFS systems with many disks—in some cases, many tens of disks. Another problem with encryption-beneath-ZFS is that the extra layer is an extra thing to go wrong—and it's in a position to undo all of ZFS' normal integrity guarantees.

Putting LUKS or similar atop OpenZFS gets rid of the aforementioned problems—a LUKS encrypted zvol only needs one key regardless of how many disks are involved, and the LUKS layer cannot undo OpenZFS' integrity guarantees from here. Unfortunately, encryption-atop-ZFS introduces a new problem—it effectively nerfs OpenZFS inline compression, since encrypted data is generally incompressible. This approach also requires the use of one zvol per encrypted filesystem, along with a guest filesystem (e.g., ext4) to format the LUKS volume itself with.

Now, the “why”

OpenZFS native encryption splits the difference: it operates atop the normal ZFS storage layers and therefore doesn't nerf ZFS' own integrity guarantees. But it also doesn't interfere with ZFS compression—data is compressed prior to being saved to an encrypted dataset or zvol.

There's an even more compelling reason to choose OpenZFS native encryption, though—something called "raw send." ZFS replication is ridiculously fast and efficient—frequently several orders of magnitude faster than filesystem-neutral tools like rsync—and raw send makes it possible not only to replicate encrypted datasets and zvols, but to do so without exposing the key to the remote system.

This means that you can use ZFS replication to back up your data to an untrusted location, without concerns about your private data being read. With raw send, your data is replicated without ever being decrypted—and without the backup target ever being able to decrypt it at all. This means you can replicate your offsite backups to a friend's house or at a commercial service like rsync.net or zfs.rent without compromising your privacy, even if the service (or friend) is itself compromised.

In the event that you need to recover your offsite backup, you can simply replicate it back to your own location—then, and only then, loading the decryption key to actually access the data. This works for either full replication (moving every single block across the wire) or asynchronous incremental replication (beginning from a commonly held snapshot and only moving the blocks which have changed since that snapshot).

What’s encrypted—and what isn’t?

OpenZFS native encryption isn't a full-disk encryption scheme—it's enabled or disabled on a per-dataset / per-zvol basis, and it cannot be turned on for entire pools as a whole. The contents of encrypted datasets or zvols are protected from at-rest spying—but the metadata describing the datasets/zvols themselves is not.

Let's say we create an encrypted dataset named pool/encrypted, and beneath it we create several more child datasets. The encryption property for the children is inherited by default from the parent dataset, so we can see the following:

root@banshee:~# zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase banshee/encrypted
Enter passphrase: 
Re-enter passphrase: 

root@banshee:~# zfs create banshee/encrypted/child1
root@banshee:~# zfs create banshee/encrypted/child2
root@banshee:~# zfs create banshee/encrypted/child3

root@banshee:~# zfs list -r banshee/encrypted
NAME                       USED  AVAIL     REFER  MOUNTPOINT
banshee/encrypted         1.58M   848G      432K  /banshee/encrypted
banshee/encrypted/child1   320K   848G      320K  /banshee/encrypted/child1
banshee/encrypted/child2   320K   848G      320K  /banshee/encrypted/child2
banshee/encrypted/child3   320K   848G      320K  /banshee/encrypted/child3

root@banshee:~# zfs get encryption banshee/encrypted/child1
NAME                      PROPERTY    VALUE        SOURCE
banshee/encrypted/child1  encryption  aes-256-gcm  -

At the moment, our encrypted datasets are all mounted. But even if we unmount them and unload the encryption key—making them inaccessible—we can still see that they exist, along with their properties:

root@banshee:~# wget -qO /banshee/encrypted/child2/HuckFinn.txt http://textfiles.com/etext/AUTHORS/TWAIN/huck_finn

root@banshee:~# zfs unmount banshee/encrypted
root@banshee:~# zfs unload-key -r banshee/encrypted
1 / 1 key(s) successfully unloaded

root@banshee:~# zfs mount banshee/encrypted
cannot mount 'banshee/encrypted': encryption key not loaded

root@banshee:~# ls /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory

root@banshee:~# zfs list -r banshee/encrypted
NAME                       USED  AVAIL     REFER  MOUNTPOINT
banshee/encrypted         2.19M   848G      432K  /banshee/encrypted
banshee/encrypted/child1   320K   848G      320K  /banshee/encrypted/child1
banshee/encrypted/child2   944K   848G      720K  /banshee/encrypted/child2
banshee/encrypted/child3   320K   848G      320K  /banshee/encrypted/child3

As we can see above, after unloading the encryption key, we can no longer see our freshly-downloaded copy of Huckleberry Finn in /banshee/encrypted/child2/. What we can still see is the existence—and structure—of our entire ZFS-encrypted tree. We can also see each encrypted dataset's properties, including but not limited to the USED, AVAIL, and REFER of each dataset.

It's worth noting that trying to ls an encrypted dataset which doesn't have its key loaded won't necessarily produce an error:

root@banshee:~# zfs get keystatus banshee/encrypted
NAME               PROPERTY   VALUE        SOURCE
banshee/encrypted  keystatus  unavailable  -
root@banshee:~# ls /banshee/encrypted
root@banshee:~# 

This is because a naked directory exists on the host, even when the actual dataset is not mounted. Reloading the key doesn't automatically remount the dataset, either:

root@banshee:~# zfs load-key -r banshee/encrypted
Enter passphrase for 'banshee/encrypted': 
1 / 1 key(s) successfully loaded
root@banshee:~# zfs mount | grep encr
root@banshee:~# ls /banshee/encrypted
root@banshee:~# ls /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory

In order to access our fresh copy of Huckleberry Finn, we'll also need to actually mount the freshly key-reloaded datasets:

root@banshee:~# zfs get keystatus banshee/encrypted/child2
NAME                      PROPERTY   VALUE        SOURCE
banshee/encrypted/child2  keystatus  available    -

root@banshee:~# ls -l /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory

root@banshee:~# zfs mount -a
root@banshee:~# ls -lh /banshee/encrypted/child2
total 401K
-rw-r--r-- 1 root root 554K Jun 13  2002 HuckFinn.txt

Now that we've both loaded the necessary key and mounted the datasets, we can see our encrypted data again.

How do I shot encryption?

Now that we've covered the why and the what, let's talk more about the how. These are the important commands and arguments to know:

zfs create

The first step of ZFS encryption is creating the encrypted dataset or zvol itself. You can't encrypt a pre-existing dataset or zvol—it needs to be created that way from the start. (You can, if necessary, full-replicate an existing dataset onto a new child dataset of an encrypted parent, thereby preserving snapshot history and other niceties. Just delete the source after replication has completed and you've verified that everything is good.)

zfs create -o encryption=[algorithm] -o keylocation=[location] -o keyformat=[format] poolname/datasetname

Now, let's tackle each of the bolded arguments. Encryption is a no-brainer, if you've been reading along: if you set this to on, you'll get the default encryption algorithm—a variant of aes-256 dependent on your particular OpenZFS version. Otherwise, you can specify the algorithm directly—currently, supporting aes in widths 128, 192, or 256 in either ccm or gcm variants. We currently recommend aes-256-gcm strictly on the basis of it being the default algo used in the newest currently available versions of OpenZFS!

Keylocation can be either prompt (which prompts you to type it in when necessary) or the path to a keyfile (in the form file:///path/to/keyfile). Unfortunately, you cannot easily switch from interactive prompt to a keyfile once the dataset or zvol is created—so some planning ahead is called for here. Using a keyfile allows encrypted ZFS volumes to be automatically mounted at boot—using prompt necessitates a human manually unlock the encrypted volumes every time.

Keyformat can be either passphrase, hex, or raw. Passphrases must be between 8 and 512 bytes long, while both hex and raw keys must be precisely 32 bytes long. You can generate a raw key with dd if=/dev/urandom bs=32 count=1 of=/path/to/keyfile.

The keylocation is stored as a ZFS property of any dataset or zvol which uses it. This allows automatic mounting of encrypted datasets and zvols, when that location is a file rather than an interactive prompt.

A final note: ZFS doesn't actually encrypt your data directly with a supplied passphrase; it encrypts your data with a pseudo-randomly generated master key. Your passphrase unlocks that master-key, which then becomes available for use working with the volume itself!

zfs load-key

After creating your encrypted dataset, it's automatically mounted—but it won't be, after the system reboots. In order to make your encrypted volumes accessible again, you'll use the zfs load-key command.

zfs load-key [-nr] [-L location] [-a] poolname/zvol-or-dataset

As usual, the optional -r is for recursive, and it will load both the volume specified and any child volumes. -a for all goes a step further and loads keys for all encrypted volumes found on all currently imported pools.

Also as usual for Linux utilities, -n is short for no-op, aka dry run. Performing, for example, zfs load-key -n -a checks that all keys specified in the keylocation properties of encrypted datasets and zvols are available and correct—but it does this without actually changing the key or mount status.

Finally, -L location overrides the keylocation property stored in the ZFS dataset or zvol itself. For example, let's say you created an encrypted dataset using the interactively prompted passphrase correct horse battery staple. You can type that phrase into /home/me/myzfskey.txt, then do zfs load-key -L file:///home/me/myzfskey.txt poolname/datasetname, and the key will load just fine.

A warning: simply loading the key doesn't actually mount your dataset or zvol—that's a separate operation which needs to be performed after the key is loaded, e.g., zfs mount -a or zfs mount poolname/datasetname.

If you'd like keys to be automatically loaded at boot time, the Arch Linux wiki has sample systemd service definitions available.

zfs unload-key

As long as it's not busy (generally meaning mounted), you can unload the key from an encrypted volume. This command shares -r and -a flags with zfs load-key. Once unloaded, zfs get keystatus poolname/datasetname will return unavailable, and the volume in question may not be mounted until the key is reloaded.

Remember: volumes without their keys loaded cannot be read from or written to, but they can be replicated using the -w flag on zfs send! Maintenance-level ZFS operations such as scrubbing, resilvering, and even (dataset or zvol level) renaming also work fine on locked volumes.

zfs change-key

This command can be used to alter the keylocation, keyformat, and/or pbkdf2iters properties of an encrypted volume, along with the actual key itself. Using zfs change-key does not require re-encrypting all data in affected volumes—the master key remains unchanged; what zfs change-key does is alter encryption-related ZFS properties, and/or change the passphrase used to unlock the master-key.

zfs change-key [-l] [-o keylocation=location] [-o keyformat=format] [-o pbkdf2iters=value] poolname/dataset-or-zvol

-l (lowercase L) loads the key prior to changing it—this is functionally equivalent to zfs load-key poolname/dataset ; zfs change-key poolname/dataset.

keyformat and keylocation are just what they sound like: see zfs load-key above for explanation. pbkdf2iters is the number of PBKDF iterations applied to a passphrase in order to generate a master-key—it defaults to 100 and should not be messed with by mere mortals who don't know exactly what that means (and what its implications are) without further reading.

Remember, we're not changing the master key, only the user's "wrapped" key—so changing this key does not categorically guarantee the security of data whose original wrapper key was compromised. A sufficiently advanced attacker could have figured out the master key after compromising a user-wrapped key—so in the event of a serious compromise, you'll want to rewrite the dataset entirely (e.g., by replicating to a new dataset, child of a parent encrypted with a different key).

zfs send

zfs send and zfs receive are the same commands, used in the same way, for normal ZFS replication and work just the same as they always did... with a few very important caveats. In order to use encrypted replication, you must specify the -w argument to zfs send. Using -w sends data exactly as it is already stored on-disk—which leaves compression intact, in addition to encryption.

If you forget to use -w when zfs sending a dataset with its key loaded, the replication will work—but the target will be unencrypted!

Remember, ZFS replication is based on snapshots—so you'll need to create, manage, and work with snapshots in order to take advantage of replication. Without additional tooling, that process looks something like this:

root@banshee:~# zfs snapshot banshee/encrypted-source@snapshot1
root@banshee:~# zfs send -w banshee/encrypted-source@snapshot1 | zfs receive banshee/encrypted-target

root@banshee:~# zfs snapshot banshee/encrypted-source@snapshot2
root@banshee:~# zfs send -w -I banshee/encrypted-source@snapshot1 banshee/encrypted-source@snapshot2 | zfs receive banshee/encrypted-target

root@banshee:~# zfs list -rt snap banshee/encrypted-target
NAME                                                            USED  AVAIL     REFER  MOUNTPOINT
banshee/encrypted-target@snapshot1                               16K      -      672K  -
banshee/encrypted-target@snapshot2                                0B      -      672K  -

root@banshee:~# zfs get keystatus banshee/encrypted-target
NAME                      PROPERTY   VALUE        SOURCE
banshee/encrypted-target  keystatus  unavailable  -

In the above example, we perform first a full replication based on @snapshot1, then an incremental replication from @snapshot1 to @snapshot2. We can verify that the snapshots are present and correct on the target with zfs list -rt snapshot, even without the key loaded—but attempting to access the actual data fails, since the key's not loaded for the target dataset, and it's also not mounted.

To access the data, we can either zfs load-key and zfs mount the target dataset, or we can zfs send -w it right back to the source again.

If all this seems too complex and difficult to automate, my own tool syncoid can help—just specify --sendoptions=w and you're off to the races; syncoid automatically handles matching up common snapshots, and it even creates snapshots if necessary.

(In addition to master versions at the provided Github link, syncoid's parent package sanoid is currently available from most major distributions' default repositories—e.g., apt install sanoid on Ubuntu 20.04.)

root@banshee:~# zfs destroy -r banshee/encrypted-target

root@banshee:~# syncoid --compress=none --sendoptions=w banshee/encrypted-source banshee/encrypted-target
INFO: Sending oldest full snapshot banshee/encrypted-source@1 (~ 491 KB) to new target filesystem:
 418KiB 0:00:00 [53.2MiB/s] [===========================>     ] 85%            
INFO: Updating new target filesystem with incremental banshee/encrypted-source@1 ... syncoid_banshee_2021-06-23:17:31:24 (~ 19 KB):
52.5KiB 0:00:00 [ 391KiB/s] [================================] 269%            

root@banshee:~# zfs list -rt snap banshee/encrypted-target
NAME                                                            USED  AVAIL     REFER  MOUNTPOINT
banshee/encrypted-target@snapshot1                               16K      -      672K  -
banshee/encrypted-target@snapshot2                               16K      -      672K  -
banshee/encrypted-target@syncoid_banshee_2021-06-23:17:31:24      0B      -      672K  -

Note that we also used --compress=none, since the encrypted source data won't be compressible. Any ZFS compression applied on-disk will still be intact with or without --compress=none—this syncoid option only compresses data in-flight and has no bearing on on-disk usage. If you forget to specify --compress=none, it won't hurt anything; you'll just waste a few CPU cycles on the very fast and low-impact lzo default in-flight compression.

Conclusion

Hopefully, you've got a much better handle on ZFS encryption now than you did going into this article. As is the case with many ZFS features, on-disk encryption is something that was already available and, in some ways, was simpler in the original layers. But wrapping it into the OpenZFS system directly allows for many interesting new possibilities—in particular, the "raw send" concept which allows OpenZFS' excellent asynchronous incremental replication to function without loaded keys can be a game-changer.

Please remember that zfs load-key and zfs mount really, really matter here—and they matter on both source and target. If you encounter a situation in which all your files appear to be gone but no obvious errors are being thrown, you almost certainly just forgot to mount the dataset—or, in some cases, mounted it in a different place than you expected to.

If you need help with ZFS encryption, ZFS replication, or even the sanoid / syncoid orchestration tools briefly mentioned in this article, you may want to take a look at r/zfs, as well as the comment section of this article itself.

Adblock test (Why?)



"start" - Google News
June 24, 2021 at 05:52AM
https://ift.tt/3qnXUM1

A quick-start guide to OpenZFS native encryption - Ars Technica
"start" - Google News
https://ift.tt/2yVRai7
https://ift.tt/2WhNuz0

Bagikan Berita Ini

0 Response to "A quick-start guide to OpenZFS native encryption - Ars Technica"

Post a Comment

Powered by Blogger.