ex-googler rant: ec2 beats google’s cluster?


Over at Slacy’s blog is a post entitled “What Larry Page really needs to do to return Google to its startup roots”. I read it, I sort of cringed. While Google has problems, so does every company of that size and scale. What you are reading on slacy’s blog is one former employee’s laundry list of issues and problems with Google. There was an interesting nugget buried in all the complaining:

“Amazon EC2 is a better ecosystem for fast iteration and innovation than Google’s internal cluster management system. EC2 gives me reliability, and an easy way to start and stop entire services, not just individual jobs. Long-running processes and open source code are embraced by EC2, running a replicated sharded MongoDB instance on EC2 is almost a breeze. Google should focus on making a system that works within the entire Open Source ecosystem.”

My take is that Amazon and Google have similar requirements, while Google has to serve substantially larger traffic numbers, they both solve problems of the same magnitude. Where Google solved hosting and scalability by creating a grid/cloud/whatever computing system that relied on custom hardware being assembled into uniform general purpose nodes (Bigtable, Map/Reduce on a massive scale, etc.). Werner Vogels at Amazon developed this utility cloud platform that allowed the company to scale using general OS images. They both needed to be able to deal out thousands of images quickly, the key difference here is that Amazon make AWS a product early on. Amazon’s been selling AWS as a utility for a while, Google got into the game with AppEngine, but I’m not convinced that AppEngine really took off.

As a result, Amazon has a real economic incentive to improve EC2, S3, and expanded the product offering. They make money off of the product that forms the foundation of Amazon’s infrastructure. Google lacks this same incentive. They don’t sell hosted BigTable, they don’t sell the utility components that drive the Google infrastructure in the same way that Amazon does. While Google AppEngine is due for an update later this year (AppEngine for Business), I wonder if it is too little, too late. While almost everyone I work with is familiar with EC2 as a deployment platform, no one in that same group would ever consider Google AppEngine to be an appropriate deployment platform.

From this post, it seems like there is a contingent of Google employees who would rather use Amazon EC2? This interests me because in all organizations there seems to be an insurgent group of employees, fed up with the idea of an operations group consuming budget dollars and maintaining physical machines, that would happily move to Rackspace Cloud, Amazon EC2, GoGrid in a moment (give the appropriate SLA). Even if the majority of this post is ex-employee ranting, it is significant that even Googlers look to EC2 with a sense of awe and want.

But, don’t get me wrong… A post like this can be misleading. On one hand it can provide a interesting insight into the way a company either fails or succeeds. On the other hand, such posts lack class. Listen, if you didn’t like at Google, move on, don’t stop and write some sort of sordid tell-all about the flaws of your former employer. Does the author really think Google would be a much better place to innovate the day after they switched to git or decided to use MongoDB and Cassandra? Also, who would want to employ someone like this going forward?

But, set that aside and consider the fact that Page doesn’t need (and maybe doesn’t necessarily want) to move the company back to “Startup Mode”. Maybe Brin and Page are focused on a much more audacious goal than creating a new social network.