# Considering my backup systems

With the recent news that Crashplan were doing away with their “Home” offering, I had reason to reconsider my choice of online backup backup provider. Since I haven’t written anything here lately and the results of my exploration (plus description of everything else I do to ensure data longevity) might be of interest to others looking to set up backup systems for their own data, a version of my notes from that process follows.

## The status quo

I run a Linux-based home server for all of my long-term storage, currently 15 terabytes of raw storage with btrfs RAID on top. The choice of btrfs and RAID allows me some degree of robustness against local disk failures and accidental damage to data.

If a disk fails I can replace it without losing data, and using btrfs’ RAID support it’s possible to use heterogenous disks, meaning when I need more capacity it’s possible to remove one disk (putting the volume into a degraded state) and add a new (larger) one and rebalance onto the new disk.

btrfs’ ability to take copy-on-write snapshots of subvolumes at any time makes it reasonable to take regular snapshots of everything, providing a first line of defense against accidental damage to data. I use Snapper to automatically create rolling snapshots of each of the major subvolumes:

• Synchronized files (mounted to other machines over the network) have 8 hourly, 7 daily, 4 weekly and 3 monthly snapshots available at any time.
• Staging items (for sorting into other locations) have a snapshot for each of the last two hours only, because those items change frequently and are of low value until considered further.
• Everything else keeps one snapshot from the last hour and each of the last 3 days.

This configuration strikes a balance according to my needs for accident recovery and storage demands plus performance. The frequently-changed items (synchronized with other machines and containing active projects) have a lot of snapshots because most individual files are small but may change frequently, so a large number of snapshots will tend to have modest storage needs. In addition, the chances of accidental data destruction are highest there. The other subvolumes are either more static or lower-value, so I feel little need to keep many snapshots of them.

I use Crashplan to back up the entire system to their “cloud”1 service for \$5 per month. The rate at which I add data to the system is usually lower than the rate at which it can be uploaded back to Crashplan as a backup, so in most cases new data is backed up remotely within hours of being created.

Finally, I have a large USB-connected external hard drive as a local offline backup. Also formatted with btrfs like the server (but with the entire disk encrypted), I can use btrfs send to send incremental backups to this external disk, even without the ability to send information from the external disk back. In practice, this means I can store the external disk somewhere else completely (possibly without an Internet connection) and occasionally shuttle diffs to it to update to a more recent version. I always unplug this disk from power and its host computer when not being updated, so it should only be vulnerable to physical damage and not accidental modification of its contents.

# Better SSL

I updated the site’s SSL certificate to no longer be self-signed. This means that if you use the site over HTTPS, you won’t need to manually accept the certificate, since it is now signed by StartSSL. If you’re interested in doing similar, Ars Technica have a decent walk through the process (though they target nginx for configuration, which may not be useful to those running other web servers).

For convenience, you can follow this link to switch to the HTTPS site.

I got a.. “fun” e-mail from mediafire a few weeks ago, saying that one of my files had been suspended due to suspected copyright infringement.

fb-hitler? Oh, that’s some Free Software I wrote. I disputed the claim, simply stating that fb-hitler.tar.bz2 is a piece of software that I created (and thus own the copyright to). As of tonight, I’ve heard nothing back about it, and the file is still inaccessible. Here’s the link to it, for future reference:

http://www.mediafire.com/?mhnmnjztyn3 (.tar.bz2, 477 KB)

And here’s the complete message I got. Notice it somehow got pulled in by somebody looking to get links to Dragonball Z downloads removed, and that the link to fb-hitler itself isn’t even in the (absurdly long) list of URLs given.

Dear MediaFire User:
MediaFire has received notification under the provisions of the Digital Millennium Copyright Act (“DMCA”) that your usage of a file is allegedly infringing on the file creator’s copyright protection.

The file named fb-hitler.tar.bz2 is identified by the key (mhnmnjztyn3).

As a result of this notice, pursuant to Section 512(c)(1)(C) of the DMCA, we have suspended access to the file.

The reason for suspension was:

Information about the party that filed the report:

Company Name: LeakID
Contact Address: 15 bis rue de chateaudun, 02250 La garenne colombes, France
Contact Phone:
Contact Email: herve.lemaire@leakid.com

If you feel this suspension was in error, please submit a counterclaim by following the process below.

Step 1. Click on the following link to open the counterclaim webpage.

Step 2. Use this PIN on the counterclaim webpage to begin the process:

[removed by Tari]

Step 3. Fill in the fields on the counterclaim form with as much detail as possible.

This is a post-only mailing. Replies to this message are not monitored or answered.

So, what of it? In theory, the DMCA is pretty reasonable (discounting the criminalization of DRM circumvention). The safe harbor provisions for hosts are worthwhile, and the takedown process (that is, sending a request to the host) is reasonable. Problem is, it’s been twisted- there’s supposed to be a penalty for requesting the takedown of items that the requestor does not own copyright to, in order to deter trolls. In practice, there is no penalty and content creators go around freely demanding the removal of just about anything, with no repercussions. The automated systems on most service providers now just worsen the problem (though understandably, because the hosts have little ability to fight against the underlying policy that results in these things), because rightsholders can spew all kinds of takedown demands with minimal effort.

For those subject to these takedown demands, it’s unfair because many hosts will deactivate users’ accounts when they receive too many demands for removal of items uploaded by a given user, even if the user proves they have the right to upload the content. For example, YouTube suspends accounts after three copyright-related incidents, no matter the outcome.

To my mind, this situation is unacceptable, and it reeks of the “old media” clutching at straws to prop up an outdated business model, to the detriment of everyone else. As recent history has shown, legal force has little effect on the economic problem of media piracy, and utterly fails to address the economics that lead to this phenomenon (sounds a bit like the US government’s war on drugs, actually..).

Going forward, I would support greatly reducing the length of copyright terms (to somewhere around 20 years, perhaps). While I can’t comment much on what exactly that means to rightsholders and their profits (though I have little sympathy for them, whatever the situation is, due to such things as seen in this post), it would be hugely useful to anybody concerned with preserving history (whom I count myself among), because the time required before something can legally be reproduced without the creator’s consent is greatly reduced. With shorter time spans, it is much less likely that any given piece of content will be lost forever, which is the ultimate result that should be avoided.

Enough of a rant regarding my position on copyright, though. The real point here is that I was annoyed by a spurious copyright claim on something I created, and I will be avoiding mediafire for my future file storage needs (not that I ever used them for much).

# SSL enabled

I just enabled SSL on this site in a fit of paranoia. It shouldn’t cause any problems, but please let me know if you notice something that’s broken. Normal browsing shouldn’t be affected, but site login is forced to SSL. My (self-signed) certificate has SHA1 fingerprint 6c:e4:77:91:e8:59:f8:d1:fd:ea:cf:87:6b:af:ce:3b:19:be:fa:b5.