This is a design proposal for a tool that allows to store deduplicated backups using public-key encryption.
The idea emerged from a project, whose backup restoration key was physically stored in a vault. Public-key encryption is particularly good for this use case.
However, very few backup tools support public-key encryption and those that support it, either rely on GnuPG and require access to the secret / private key, or don’t support deduplication and incremental archives.
Scope of the Project
- Public-key encryption of the repository.
- The secret key is not required, except when you want to restore a backup.
- Backup deduplication through chunking.
- Compressing the file chunks.
- Full backups. No diffing. Incrementally adds new / changed content.
- No need for an agent / server counterpart on the storage side.
- Suitable for immutable storage - e.g. S3 WORM.
- Limited number of configuration settings.
- Repository can be transferred on a medium with FAT32 filesystem.
Nice to have
- Support for multiple providers: local FS, S3, SSH, FTP, SMB, WebDAV.
- Configurable rolling hash settings
- Adding new public keys
- Conseal all metadata
The filesystem structure of a backup repository is inspired by git.
repository/ ├── archives/ └── objects/ ├── 00/ ├── ... └── ff/ └── config
There is an immutable
config file that is created with the registry and it stores the format version, the public key
for encryption and a shared salt for the object hashes.
archives/ is an optional, but convenient directory that exposes unencrypted metadata about the archives to the
user. If you remove the directory, you can still recover your data, but you need to traverse through all the objects to
find those that are archives.
objects/ directory stores everything, just like git1. The objects are placed in subdirectories matching their
first byte. All objects are encrypted and their content is not available and generally not required. Only if you want to
initiate a backup restoration, then you need provide a secret / private key to access archive’s content. In all other
cases, the tool can make decisions based on the object names that are hash of their content. There are four types of
objects that can be stored: archive, tree node / directory, file, chunk.
- archive - Archive metadata and a hash reference to a starting tree node.
- tree node - A directory. A list of files and tree nodes and their hash references.
- file - A list of hash references to chunks.
- chunk - A chunk of content from a file.
The algorithm use depth-first traversal the directories, since tree nodes depend on the hash reference of its subtree nodes. We begin by creating a tree node for the root directory of the archive.
Tree node contains a list of tuples
(object hash, file type, file name, metadata). The metadata may include filesystem
dates, attributes, permissions and ownership. To populate the list we iterate over the entries in the directory.
If the current entry is a file, we start by calculating its hash. We can now add the file to the tree node list. If an object with the same hash exists, we skip to the next entry. If it doesn’t then we proceed to splitting the file into chunks using a rolling hash algorithm, that would yield the same chunks, based on the file content. All the chunks are hashed and stored as objects, except for those that are already present in the repository. After that, if the number of chunks is more than one, a file object is created that lists all chunk hashes needed to produce the file.
If the current entry is a directory, then we traverse recursively, since we need to calculate the hash of the tree. Once the list of the tree node is complete, we hash the serialised content of the tree node, check if an object already exists and write it to the repository, if it doesn’t.
When the root tree node is finished, we proceed to create an archive object. This object type contains a reference to
the root tree node and metadata, e.g. name, time of creation, per-archive settings. In the
archives/ directory in the
repository we store the same archive object, but unencrypted. The name of the unencrypted archive object is the Unix
timestamp in milliseconds. It prevents collisions and skips the need to read the whole
Details and Rationale
Object Directory Structure
We store objects in
prefix is the first byte of the object hash and
rest-of-the-hash are the following bytes. Both are encoded in hexadecimal format. This is done so we can support
uploading the repository to a FAT32 thumb drive.
The filesystem FAT32 has a limit of 65536 files per directory. The assumption is that hashes have a uniform distribution and we would hit the limit roughly at 16M objects - unique files, chunks and directories. If we go for a default of 1MB chunks, 16M x 1MB is about 16TB of unique data. FAT32 has a limit of 2TB.
Object File and Records
Object file contains a record, e.g. archive record, file record. An object that contains an archive record is simply an archive object. We introduce two container record types - compression and encryption. Those records can nest other records. This way we store a file chunk as a chunk record nested in a compression record, nested in a encryption record.
We can also discriminate between file record and chunk record, thus we don’t need a separate file record for single-chunk files.
Hashing and Salt
BLAKE32 is a very-fast secure hash algorithm and since it also has a builtin MAC, we can use a repository’s shared salt as a key, instead of adding the salt to the data. This way we would generate unique hashes for that specific repository and it would be harder for an attacker to guess the repository content just by looking at the hashes.
We favour NaCl as an encryption library. A sealed box3 with an ephemeral key is a good choice. However, since chunks can be configured per archive, having large chunks encrypted and decrypted in memory could put a lot of strain on small machines. To mitigate that we split the encapsulated data in the encryption record in to smaller messages, e.g. 256KiB in size.
The encryption scheme is similar to a sealed box, but we add multiple messages. We start by generating an ephemeral
epk) and we append the
epk to the record. Then we calculate the shared key
repository’s public key
rpk. After that we split the data into messages with predefined size and start making NaCl
secretbox for every message using
sk as a key and message’s number as a nonce.
Rolling Hash and Chunking Algorithm
There are multiple algorithms that can be used to chunk the content. However, there is already a proof-of-concept that works with the Fast Content-Defined Chunking (FastCDC).
I would like to hear your thoughts.
Text me on social media or send me an email.
Thank you (you know who) for listening through all my blabbering on the topic, while we were hiking.