Best Practices: Off-Site Backups

May, 31 2011

Having a backup strategy is key. What works for me won't necessary work for you. I am going to outline my method for my own reference, to get feedback, and in the hope it might help someone else out there.

I am using Amazon S3 as my back-end, taking advantage of 10 GB for a year for free. I am abusing Fabric as my glue. Fabric is meant for remote execution but I am using its local feature as a kind of Pythonic shell script.

The first major issue is security. S3 is private by default but you are implicitly trusting Amazon. The only solution is to encrypt your data, which is easier said then done. The core of this post is documenting my method for encryption.

Encrypting Your Bits

I use OpenSSL to do the encryption instead of GPG or PGP because it is available on my Mac and widely on Linux without apt-getting.

Conceptually a backup is like a message sent from present day me to future me using asymmetric, or public/private key, encryption. I want to encrypt the backup with my public key and decrypt it later with my private key.

The trouble is that with OpenSSL asymmetric encryption can only be used for small files. The solution is to generate a unique symmetric key for every backup and encrypt that small file with my public key. I save the symmetrically encrypted backup and the asymmetrically encrypted symmetric key together and send them both to S3. Without your private key the two files get you nothing since they are both encrypted.

The first thing you have to once and only once is generate the public/private keys for encrypting the symmetric encryption keys (passwords):

def create_keys():
    """
    Only do this once, if you overwrite your keys you won't be able to decrypt your backups!
    """
    key_file_name = os.path.join(root, "keys/backup.pem")
    local("openssl genrsa -des3 -out %s 1024" % key_file_name) # Generate encrypted private key
    local("openssl rsa -in %s -pubout > %s" % (key_file_name, key_file_name + ".pub")) #Output the public part

Here is the meat of a backup:

def backup_to_s3():
    today = datetime.date.today()
    id = today.strftime("%d-%m-%Y")

    # Tar everything you want to backup.  Assume a list of directories in backup_dirs.
    backup_file_name = os.path.join(relative_store, "backup-%s.tar.gz" % id)
    local("tar -zcf %s %s" % (backup_file_name," ".join(backup_dirs)))

    # Generate a 64 character symmetric key
    key = os.urandom(64)
    key_file_name = os.path.join(relative_store, "backup-%s.key" % id)
    f = open(os.path.join(local_store, "backup-%s.key" % id), "w")
    f.write(key)
    f.close()

    # Encrypt the backup with the symmetric key
    local("openssl enc -e -aes128 -pass file:%s < %s > %s" % (key_file_name,
                                                              backup_file_name,
                                                              backup_file_name + ".enc"))

    # Encrypt the symmetric key with the public key generated before
    public_key_name = os.path.join(root, "keys/backup.pem.pub")
    local("openssl rsautl -encrypt -inkey %s -pubin -in %s -out %s" % (public_key_name,
                                                                       key_file_name,
                                                                       key_file_name + ".pubenc"))

    # Securely remove the unencrypted symmetric key file
    local("srm %s" % key_file_name)

    # Tar the encrypted backup and encrypted key together
    unfied_backup_file_name = os.path.join(relative_store, "unified_backup-%s.tar.gz" % id)
    local("tar -zcf %s %s %s" % (unfied_backup_file_name,
                                 backup_file_name + ".enc",
                                 key_file_name + ".pubenc"))

    # Push the result to s3
    local("s3put -a %s -s %s -b amjoconn-backups -p %s %s" % (ACCESS_KEY,
                                                             SECRET_KEY,
                                                             local_store,
                                                             os.path.abspath(unfied_backup_file_name)))

This script has several global configuration variables which aren't addressed. It also won't clean up your local store except to delete the unencrypted key.

Recovery

Like an claiming an insurance policy, recovery is the most important part of any backup system. Like insurance people tend not to talk too much about how it will work nor do they test it often. Given the complexity of the backup process, I figured outlining the recovery process even if the automation ultimately fails me would be handy for reference.

def recover_backup(id):
    # Try to get the file
    conn = boto.connect_s3(ACCESS_KEY, SECRET_KEY)
    bucket = conn.get_bucket("amjoconn-backups")
    key = bucket.get_key("unified_backup-%s.tar.gz" % id)
    if key is None:
        print "Unable to find backup with id %s" % id
        return
    recover_name = os.path.join(recovery_location, "recover-%s.tar.gz" % id)
    print "Writing", "unified_backup-%s.tar.gz" % id, "to", recover_name
    print "This might take a bit of time..."
    key.get_contents_to_filename(recover_name)
    local("tar -xzf %s" % recover_name)

<span class="c"># Decrypt the symmetric key file with the private key.</span>
<span class="n">key_file_name</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">relative_store</span><span class="p">,</span> <span class="s">&quot;backup-</span><span class="si">%s</span><span class="s">.key&quot;</span> <span class="o">%</span> <span class="nb">id</span><span class="p">)</span>
<span class="n">private_key_name</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">root</span><span class="p">,</span> <span class="s">&quot;keys/backup.pem&quot;</span><span class="p">)</span>
<span class="n">local</span><span class="p">(</span><span class="s">&quot;openssl rsautl -decrypt -inkey </span><span class="si">%s</span><span class="s"> -in </span><span class="si">%s</span><span class="s"> -out </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">private_key_name</span><span class="p">,</span>
                                                            <span class="n">key_file_name</span> <span class="o">+</span> <span class="s">&quot;.pubenc&quot;</span><span class="p">,</span>
                                                            <span class="n">key_file_name</span><span class="p">))</span>

<span class="c"># Decrypt the backup with the recently decrypted symmetric key.</span>
<span class="n">backup_file_name</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">relative_store</span><span class="p">,</span> <span class="s">&quot;backup-</span><span class="si">%s</span><span class="s">.tar.gz&quot;</span> <span class="o">%</span> <span class="nb">id</span><span class="p">)</span>
<span class="n">local</span><span class="p">(</span><span class="s">&quot;openssl enc -d -aes128 -pass file:</span><span class="si">%s</span><span class="s"> &lt; </span><span class="si">%s</span><span class="s"> &gt; </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">key_file_name</span><span class="p">,</span>
                                                          <span class="n">backup_file_name</span> <span class="o">+</span> <span class="s">&quot;.enc&quot;</span><span class="p">,</span>
                                                          <span class="n">backup_file_name</span><span class="p">))</span>

<span class="c"># Untar the data here.  Depending how you used tar, there will be some interesting directories created.</span>
<span class="n">local</span><span class="p">(</span><span class="s">&quot;tar -zxf </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">backup_file_name</span><span class="p">)</span>

Note, this doesn't clean up the decrypted symmetric key.

Wrap Up

There it is. How I am keeping my bits safe. Such a problem is never really solved though. I hope to be able to update this post over time as I discover better ways to backup because this solution has much room for improvement.


Tweet comments, corrections, or high fives to @amjoconn