ex-googler rant: ec2 beats google’s cluster?

Over at Slacy’s blog is a post entitled “What Larry Page really needs to do to return Google to its startup roots”. I read it, I sort of cringed. While Google has problems, so does every company of that size and scale. What you are reading on slacy’s blog is one former employee’s laundry list of issues and problems with Google. There was an interesting nugget buried in all the complaining:

“Amazon EC2 is a better ecosystem for fast iteration and innovation than Google’s internal cluster management system. EC2 gives me reliability, and an easy way to start and stop entire services, not just individual jobs. Long-running processes and open source code are embraced by EC2, running a replicated sharded MongoDB instance on EC2 is almost a breeze. Google should focus on making a system that works within the entire Open Source ecosystem.”

My take is that Amazon and Google have similar requirements, while Google has to serve substantially larger traffic numbers, they both solve problems of the same magnitude. Where Google solved hosting and scalability by creating a grid/cloud/whatever computing system that relied on custom hardware being assembled into uniform general purpose nodes (Bigtable, Map/Reduce on a massive scale, etc.). Werner Vogels at Amazon developed this utility cloud platform that allowed the company to scale using general OS images. They both needed to be able to deal out thousands of images quickly, the key difference here is that Amazon make AWS a product early on. Amazon’s been selling AWS as a utility for a while, Google got into the game with AppEngine, but I’m not convinced that AppEngine really took off.

As a result, Amazon has a real economic incentive to improve EC2, S3, and expanded the product offering. They make money off of the product that forms the foundation of Amazon’s infrastructure. Google lacks this same incentive. They don’t sell hosted BigTable, they don’t sell the utility components that drive the Google infrastructure in the same way that Amazon does. While Google AppEngine is due for an update later this year (AppEngine for Business), I wonder if it is too little, too late. While almost everyone I work with is familiar with EC2 as a deployment platform, no one in that same group would ever consider Google AppEngine to be an appropriate deployment platform.

From this post, it seems like there is a contingent of Google employees who would rather use Amazon EC2? This interests me because in all organizations there seems to be an insurgent group of employees, fed up with the idea of an operations group consuming budget dollars and maintaining physical machines, that would happily move to Rackspace Cloud, Amazon EC2, GoGrid in a moment (give the appropriate SLA). Even if the majority of this post is ex-employee ranting, it is significant that even Googlers look to EC2 with a sense of awe and want.

But, don’t get me wrong… A post like this can be misleading. On one hand it can provide a interesting insight into the way a company either fails or succeeds. On the other hand, such posts lack class. Listen, if you didn’t like at Google, move on, don’t stop and write some sort of sordid tell-all about the flaws of your former employer. Does the author really think Google would be a much better place to innovate the day after they switched to git or decided to use MongoDB and Cassandra? Also, who would want to employ someone like this going forward?

But, set that aside and consider the fact that Page doesn’t need (and maybe doesn’t necessarily want) to move the company back to “Startup Mode”. Maybe Brin and Page are focused on a much more audacious goal than creating a new social network.

Maven Repositories: How Not to Treat End-users

So some Nexus user reports a problem with a Maven repository for iText. My first suspicion is that the repository is just being hosted by some web server. Maybe it has some issue with a metadata file. Who knows. But, I expect everything these days, people still hold on to the idea that a “Maven Repository” is just a bunch of files barfed on to the filesystem of a web server (it isn’t, there’s a bit more).

So I check out the root of the repository and see this welcoming bit of text:

“We now have our own Maven repository (available as long as we can pay for the extra traffic; note that this service will be taken offline if there are not enough users that are willing to become customers).”

That’s a threat. That’s a passive agressive threat. I call this the “Open Source Now, but You Just Wait Until We Fail” approach to business. This reads like some difficult Ant guy was forced to publish a Maven repository against his recommendation. “This service will be taken offline if there are not enough users that are willing to become customers”. Please, how annoying is this? How much is that bandwidth bill? I’ll bet they pay all of $2/month for the bandwidth to serve iText JAR files. I mean I’d understand this if there was no free option available to these people to publish these artifacts, but, clearly, these people are operating in an isolated bubble.

So, instead of threatening users, this is what this project should do:

  • Use an Open Source Repository Manager – Either Archiva , Artifactory, or Nexus. (Clearly, I’m recommending Nexus, but know that there are alternatives.) I only recommend this because it sounds like the person who set this up is somewhat exhausted from all the effort. It’s easy to set up one of these servers, but if you don’t like managing infrastructure…
  • Take advantage of a free Sonatype OSS repository (if you are an open source project) – No charge for this, all you have to do is ask, and since you seem like you are adverse to paying for the massive bandwidth bill, you don’t even have to pay for the bandwidth. Also once you’ve moved on to the OSS Sonatype repository, you can publish releases to Maven Central. Did I mention that this option is 100% free? (If you don’t like free, are not running an open source project, or want your own instance pay for a hosted Artifactory instance.)
  • Publish Your Artifacts to Central This one isn’t optional, and it isn’t about you working to help your end-users. This is more about you not being a difficult project to consume. Think of this less as a favor for your ungrateful, blood-sucking users and more as something that is just expected. Honest, at this point, if you don’t publish your artifacts to Central, people are cursing you every single day.

What users really don’t want is a company that threatens to make it more difficult to use a library for want of customers. Admittedly, this is a message of weakness that could be translated to: “We’re lucky to be able to pay for the electric bill these days, we’ve spent our last nickel on this Maven repository, please become a customer and save us.”

Celebrate your charity, “Look we care about our users and customers, here’s a standard repository manager.” Better yet, announce that you have integrated your release process to publish artifacts to Maven Central.

Batch Processing Images adding a Dropshadow with ImageMagick

Here’s the problem I was trying to solve. The books I write all use Apache FOP to render both plain HTML and PDF. Unlike some of the commercial (read “way too expensive”) FOP processors out there, Apache FOP doesn’t automatically scale/resample images for print resolution. The solution in DocBook is to include references to two images: a PNG at 72 DPI and a PDF at 150 DPI. For the past few years, I’ve just been managing this process with Photoshop macros and applying them in batch when needed. I had a series of macros that would add a dropshadow using PSD layers, and then automate the export to the appropriate formats.

Now while this sounds easy enough, think about a book that has 100 screenshots of a product that is constantly being updated. I replaced this manual process with the following bash script which uses ImageMagick to generate both web and print images with a drop-shadow. So, this solves a persistent problem that was causing me to waste too much time on production issues instead of focusing on content creation. I’m still having to deal with the fact that my DocBook contains two references to every image (two mediaobject elements for each figure). For my next step, I’m going to have to figure out how to dramatically simplify the docbook markup for authors: my goal is to have authors one worry about a single figure reference and to swap in the print-specific figure references right before I pass the XML to the XSLT that generates FO. (And if you don’t write books in DocBook this means nothing to you.)

I’m publishing it here just in case someone else finds it useful.

Here’s the Gist

Good, Now I can stop worrying about my Amazon EC2 Instances

I’ve been using EC2 since it started because it provides a good value (specifically a reserved m1.small instance is a good value). Every once in a while I’ll talk to some operations person who has serious issues with the platform and the “loss of control” that comes with using a cloud platform like Rackspace or EC2, but the advantages outweigh the disadvantages. When I’m running infrastructure on EC2 there’s just so much bull I can offload to the platform.

It isn’t that I don’t have to worry about failure. Failure happens everywhere. A machine might become unresponsive or a cosmic ray might flip just the right bit in RAM to cause the system to blow up. The real advantage to me is that, if there is a power supply failure, it certainly isn’t my problem. I don’t have to call up some operations drone and hear about how they are back-ordering some part from Dell. Hell, I don’t even care about the underlying hardware. I have an SLA, I have frequent backups, if your hardware decides to blow up tomorrow night, I’m going to simply fire up another instance in a different database.

When I talk to someone who is ordering physical hardware to run in a datacenter, I question their sanity. In 2011, why? I mean unless you are required to maintain physical hardware by some government regulation, or unless you are dealing with national intelligence data. Why would anyone take on the risk of maintaining physical hardware?

Back to Amazon AWS: As easy as it is to spin up new instances in hardware, it is even easier to terminate them, and this is one of the things that has bothered me about Amazon EC2 for years. I can spin up a bunch of critical infrastructure, and if I’m not absolutely careful I can click the wrong button on the AWS Console and terminate an instance.

…and I’ve done this. I’ve fumble fingered the terminate option at 3 AM working on a deadline, stared blankly at the screen as the AWS dashboard tells me an instance is terminating, and unleashed a tidal wave of expletives at the screen. EC2 is great but Amazon should make you do something special to destroy these instances.

Well they fixed it…. finally.

AWS Terminate ProtectionAWS Terminate Protection

As of who knows when I can now flip a bit on my instances that will prevent me from ever suffering through another “one-click, screw up”. If you activate this bit and you try to terminate the instance, the UI will tell you that the “Termination Protection” status is Enabled.

As with almost everything that happens in EC2, I just discovered this feature by using the EC2 console. Amazon does this to me all the time, a new feature… the exact feature I wanted… it just shows up one day.

Now, this is a step in the right direction, but I’d like Amazon to take this feature one step further. I’d like them to make it impossible to terminate an instance unless you and a colleague are sitting at two terminals separated by at least 20 feet in an underground bunker exchange a series of alphanumeric identifiers that authorize destruction. In other words, I’d like Amazon to make it as difficult to terminate an instance as it is to launch a missile.