Backups I can forget about (or am I totally paranoid?)

Alright, here it is.  I’m completely paranoid about backups.   Most people would be comfortable reading an SLA from a company and then sitting back and relaxing.  I don’t buy it.  I can’t relax about backups because I lost a startup to a lightning bolt in Charlottesville, VA in 1997.  I’m totally paranoid about backups, have been ever since.  I’ll set the suckers up, I’ll relax for a week or so, but then I’m constantly checking the damn things.

I’m OCD about backups.

Anyway, so here’s the question.  I’ve got a bunch of super important crap on S3.  Yes, I turned on versioning for all my buckets.  I’ve got databases on Heroku and on RDS.   Super important stuff… i.e. if I lost it, it would be a very bad day for all involved.

Here’s the question:  Should I back it up off of Amazon AWS to another provider?   Or, should I just rest easy knowing that S3 promises multi-data center redundancy?   Is there a company out there that I can just sign-up for and check a box “Backup all my AWS stuff to an underground salt mine”?   That’s what I’m looking for  – backups I can forget about.

Most people are relaxing on a Friday night, but no, not me.  I’m stewing over backups.  That’s the way I roll.

Why not just use Puppet or Chef?

I get asked this question quite a bit in certain situations. Especially when I have to hand something off to an operations team. Usually we’re talking about an application architecture that involves ten or more machines: several application servers, an (old clunky) relational database, and some web servers. These networks are not big by any means, but, I agree, they are certainly of a sufficient size to demand automation.

Operations: “Well, I see you’ve delivered a bunch of RPMs from the build, but why didn’t you just use Puppet or Chef?”

Developers: “I don’t have access at the appropriate level to start thinking about OS-level automation, I wish I did.”

The crux of the problem is that today’s operations departments are really concerned about “control” issues. They might be using VMware, but they’d never think of letting a development team automate calls to create VM instances as needed. Nope, everywhere I’ve gone in the last few years, the operations team is still handing out “new machines” as if they were physical artifacts.

Operations: “You need a new machine?”

Developers: “No, I need a new VMs.”

Operations: “It’s going to take us some time to provision that for you. How soon do you need it?”

Developers: “Five minutes enough time.”

Operations: “No, we have to set it up for you, and it takes time to provision this stuff.”

Developers: “I thought we used VMware, can’t you just…”

Operations: “Stop, it isn’t as easy as that.”

Developers: “At my last job, we just had a direct API call to Spring’s cloud…”

Operations: “Stop being ridiculous, we’re done.” (Walks away)

That dialog is fictional, but representative. All over the place, operations is holding on to this idea of “ceremony” when it comes to provisioning infrastructure. Puppet isn’t straightforward unless you have access to VMs at a certain level. At least in my experience, unless you can flick a switch and have some provider like EC2, Rackspace, or ten thousand other cloud providers rebuild a pristine VMs, you aren’t really testing your approach to automation.

…and, if you are trying to use Puppet or Chef without a good place to test the system (i.e. a fully virtualized environment under your control) there’s no use in trying. I love the idea of using Puppet or Chef, don’t get me wrong, but don’t use it in the middle of an active fire fight between two departments. That’s a recipe for failure. vs. CVSDude?

My opinion after having used both? by a mile,’s customer service responds in minutes, and the big difference is that isn’t going to try to gouge you with extra fees. I signed up with CVSDude because I wanted to get out of the business of hosting SVN, but after all the various fees, I ended up paying something like $47/mo. just to host a few small repositories. CVSDude wanted to charge me an extra $10/mo. for the privileges of having access to backup files. Also, CVSDude has a difficult to use administration system.

Anyway, if anyone is considering the two – go with