7 Simple Rules for Software Deployments

Here’s a list of lowest common denominator rules for deployments that should apply across multiple languages and/or build tools

Don’t deploy development “snapshots” to production.  Deploy a real release.  Actually “cut” a release.  I know there are many people out there who will whine that this process takes too long for whatever reason – maybe your build tool has to run the build three times for cut a release?  Maybe you just don’t want to be bothered stopping everything to make a release.

So many people in the industry just skip releases and deploy straight from master or HEAD.  Maybe this works if you are a ten person startup, but if you run a real business this is an awful way to operate.  All it takes is one critical bug to realize two things: A.) There is no roll-back place because it isn’t possible, and B.) You don’t know how long it will take to get the system back up and running because of A.  Have fun if you don’t cut releases.  (Have fun getting your resume ready for your next job, that is.)

Make it impossible for development or snapshot builds to end up in production. Enforce this rule; otherwise, you’ll give people the option to ignore it. When it comes to a production deployments you should understand that people will always find a way around the rules if possible and expedient, short-term solutions often win out when a team is under pressure to deliver.

If you are using Nexus or Artifactory (or some other repository manager). Create a repository that is specifically for production and isolate it from repositories that contain snapshots. Use network or firewall rules to make sure that no one, no one can access the development repository even if they really need to.  This is the only way to truly ensure that snapshots never end up in production.

Have an immediate way to find out what version has been deployed to production. (and I mean immediate.)  Have you ever seen a bug in production and people are running around like crazy idiots trying to figure out what version of the code was deployed?   Your build tool should be able to write a git commit hash or a branch and version name to some file (and if it can’t find another one.)

Nothing looks worse to management when the team responsible for the build can’t immediately tell you with certainty what version of code is running in production.  Don’t be that team.

Practice a rollback before every release. Many people reading this will think I’m joking because the reality in many organizations is that you’ve never, ever, ever done a rollback.  If you are testing properly, rollbacks should be rare, but software systems are complex and failure is unpredictable no matter how much testing happens.

If you never practice a rollback there’s a good chance you couldn’t pull it off even if you needed to, and there’s also a chance that a rollback would put your databases into an unrecoverable state of highly-visible failure. If the question, “how quickly can we rollback?” produces an involuntary nervous chuckle from your deployment people then it is time to schedule a drill.

Have a plan to verify your release. Have you ever deployed to production only to wonder if the deployment actually worked?  This is what happens when you don’t have a QA team.  If you have a QA team they will often stay up all night with you to verify that a release has been deployed successfully. Be that team, because the alternative is often error prone.

If your deployment process involves software developers pushing code to production and then just “clicking around” to see if it all works.  You are doing it wrong, and you are likely violating an important rule of software development: you (almost) need a good QA team more than you need a good team of developers.

Drill. Run drills. Make them unexpected. Watch a submarine movie, any submarine movie because they are usually all the same.  Right after the initial dive and the singing of patriotic US or Russian songs, the captain whips around and runs a drill – all hands on deck, we just lost the database.

Developers and people involved in devops tend to think that these drills are silly and meant for operations.  They aren’t. When production goes haywire it’s almost always the developers being pulled into the war room being asked to guess what the problem is.  Toward a production launch, get your systems into staging and make something fail on purpose.  This way you’ll know exactly who needs to be involved if a failure occurs.

Know your build tool. Don’t just use your build tool, understand the best practices around it. Most build tools come with an approach to deployments that encourages some of the steps I’ve listed in this entry.

drone.io vs. codeship.io

I’m a paying customer of both drone.io and codeship.io for now.  Here are some of the reasons I’m moving toward codeship.io over drone.io.  I found a dearth of good information when I was trying to find info, so maybe this will help someone.

  1. I’ve been using drone.io for a while.  Drone.io is simple, it’s worked for me.  I’ve used Drone to deploy to heroku and also to S3 for a while.  I also use drone to automate some Apache httpd config through SSH.   It works, it’s a very simple interface.
  2. I had a project – an AngularJS project – which was being deployed to S3.  In drone you can configure an S3 deployment against a branch, and it’s a little annoying. It limits the number of files that can be uploaded to 500 or 1000 and it just silently omits anything more than that (which is ridiculous and has cost me multiple days).  This has been the case for a number of months and I haven’t seen any roadmap for addressing this limitation.
  3. The last straw with Drone was when I was trying to configure two S3 deployments – one S3 deployment to a production bucket if the commit was to the production branch and another to the staging bucket if the commit was to the staging branch. Drone.io wouldn’t allow me to configure multiple S3 deployments against different branches.   When you try to configure two S3 deployments it fails with a frustrating, “There is already an S3 deployment for this project” message.
  4. Next step… “hey I’ll just create another project against the same repository and configure an independent repository”….. drone.io doesn’t allow me to configure two projects against the same Github repository as a workaround to limiting my S3 deployments to 1 per project. At that point, I tried an awful work-around which was to fork a repo to create another drone.io build that would have another S3 deploy.  Awful work-around because I would have to tell a team to clone another remote and push to another branch.

Then I completely lost my patience with drone.io.  It’s a simple interface, but you shouldn’t have to adapt your process to the limitations of a CI system.  I was also maybe noticing that drone.io isn’t going anywhere these days.  Most services and companies tend to surprise me with a new feature or two every couple of months. I depended on drone.io S3 behavior for many months, but it still felt like a beta. drone.io didn’t feel like it was going anywhere.

 I googled for “Alternatives to drone.io” and ended up on this Quora question: http://www.quora.com/Travis-CI/What-are-the-alternatives-to-Travis-CI

From there I evaluated Travis for a second or two – but they have a high price point for what I’m looking for.   I stumbled upon codeship.io - important point is that I started using it first, then I saw that it was only $50 to get started. I signed up.

That whole process of starting to have doubts about drone.io to deciding to leave drone took 2-3 days.   The switch from drone.io to codeship.io took about two hours start to finish.   That’s not entirely true, I still have a few projects on drone.io, but I’m migrating everything over the next two weeks.

First impressions of codeship.io?  The interface is more complex than drone.io, it feels just as capable as drone.io maybe more-so, maybe not.  If drone.io and codeship.io were to compete on features I have the feeling that codeship.io would win, but comparing features of CI systems seems like a waste – most users are focused on one or two languages for support.  If it builds Javascript well, that’s really all I’m looking for. I’m really not looking for a jalopy of features. If I needed that I’d run Jenkins and install everything and the kitchen sink on it. My goal here is just a simple CI system that can hook up to GitHub, run builds, and not get in the way. Codeship.io did just that after 15 minutes of investment.

The UI of codeship.io isn’t as simple as drone.io, and at times when using it you may be a bit confused by the different abstractions it throws at you.  For example, the ability to drill into different stages of build output is awesome, but it can be difficult to figure out how to get to a particular project’s settings at times.  The project settings page doesn’t jump up and tell you which project you are configuring.

The irony of the migration is that codeship.io doesn’t support S3 publishing as a built-in deployment option (I ended up solving this with a grunt-s3 plugin after the fact).  I moved because my initial frustration with drone.io opened the door – I evaluated the options and concluded that codeship.io shows more promise at the moment.

It’s codeship.io for now, but this market is frictionless so we’ll see what happens next month. The ease with which one can just point a CI system at GitHub and get going is great. It means that there is nearly zero cost for evaluating the competition.

Ruby on Rails vs. J2EE

At this stage in the game they both remind me of each other when it comes to web application development.  Here are some important similarities:

  1. When you start both a Ruby on Rails and a Java project you’ll be selecting libraries and frameworks that you will have to put up with for years.  Start a RoR project in Rails 4.0 today and it’s very likely that you’ll be surfing along on the same version for years to come.   Start a JavaEE project on Tomcat 7, same result.  When you adopt a framework approach I’ve noticed that it is difficult to justify upgrades and moving to the latest greatest release once your project is fully underway.
  2. Both have a community that is committed to supporting the projects and platforms that sustain the ecosystem. Neither RoR or J2EE is going anywhere even though you’ll hear many in both communities lamenting the existence of the other.
  3. Rails people have grown up to the point where many of them realize that Rails isn’t the only answer.  This took a few years to realize, but we’re beyond the age of rockstar Ruby programmers walking around as if they’ve discovered productivity.  Rails developers are now maybe as humble and defeated as Java programmers.
  4. Both frameworks will lose developers to the continued growth of Node.js.
  5. Tooling support is about the same at this point (i.e. IntelliJ supports both)

Here are some important differences:

  1. Java seems easier to decouple over a long-term timeline.  If I have a team pour a few years of development time into a Java system, there’s a reasonable chance that I can build interfaces and decouple if certain architectural choices were made up front.
  2. In the trillion years since RoR has been released, J2EE has caught up to a certain point, but I still see more productivity during the first few months of a RoR project versus a Java project.
  3. Over the long-term, I see J2EE as being more maintainable.  This is likely related to #1, but I can decouple something written in Java and divide projects with dependencies.
  4. Rails projects grow to a certain size only to become terrible nightmares to maintain.  Java projects grow to a certain size only to become terrible nightmares to maintain, but there are people out there that understand how to manage this situation.
  5. No one talks about the build in Rails.  Everyone talks about the build in Java.
  6. Ruby seems easier to deploy into PaaS because it provides a common interface.  Java should be easy to deploy into a PaaS, but there’s so many ways to configure containers it marginally more difficult to solve this problem.


Is Grunt the new Ant?

I’ve heard this complaint a few times, but I don’t buy it.  Usually when someone says this there are a few implied statements:

  1. Grunt lacks a lifecycle – and by Lifecycle I mean either the Maven lifecycle or the sort of “lifecycle” one can obtain by applying a Gradle plugin.
  2. Grunt carried forward a few years will result in huge, unmanageable build Gruntfile.js disasters.
  3. Grunt isn’t a comprehensive and it doesn’t provide enough conventions.
  4. (I’m unwilling to admit that Javascript is taking over a big part of my previous job.)

While I can see how a single Grunt file used to manage a large, monolithic enterprise application would be a complete disaster, I also think that many of Grunt’s critics are coming at Grunt from a different place entirely.

They are assuming that Javascript projects using Grunt are similar to the Java projects  they are replacing, and they are holding on this an antiquated notion that there is going to always be “one build to rule them all.”

You’ve all seen this, the big Java build that ties together a massive amount of source code into a single build artifact requires a build tool (like Maven or Gradle) which can scale as the project grows.  This approach is still very valid for many systems – take Presto as an example.  Presto has a Maven build that just works, it generates the artifacts in a single consolidated build in which there are common standards (and technology) across an array of components.

Largish, enterprise web applications are a whole different matter. These days you are writing web applications in a mixture of Javascript and Java.  Your Javascript builds are in Grunt and your “application” is really just a series of server-side REST services supporting a collection of single-page applications written in Javascript.

While this approach provides the perfect opportunity to decouple projects from “the single build” approach, most developers are continuing to try to combine everything into one, perfect build.  (Much in the same way they are trying to cram everything into a single GitHub repository.)

I think that’s a bit of old-fashioned thinking. A better approach to scaling a large, heterogeneous web application is to divide it into many smaller projects. This reduction in scope allows developers to use whatever build tool makes the most sense for a particular module. If you’ve refactored a large AngularJS project into a series of libraries and modules does it matter that you have a Gruntfile.js which might repeat some configuration values and define the same “default” task?

Aside from the points I’ve already outlined, I’m finding Grunt to be maybe a bit more declarative than Maven or Gradle when it comes to builds.  If you generate a default build with Yeoman these days, you’ll see that it generates a Gruntfile.js which automatically loads tasks and which has a very thin layer of tasks at the tail of the build file.  Most of your Gruntfile.js is just customizing a little configuration and tasks ship with default behavior – I’m really not telling Grunt what to do – the tasks ship with some built-in intelligence.

And, Grunt brings an advantage that neither Maven nor Gradle have – my dependencies, the versions of my plugins are tracked in a separate package.json file.  I wish that was the norm in Java-based systems because dependency issues are always conflated with build customization in a way that often drives me up a wall.

So, no Grunt isn’t the new Ant. Also, there’s nothing wrong with Ant.  There I said it.


Why do you still “Java”?

This is a serious question.  Why do you still “Java”?  I’d like to hear your answer, here’s mine.  It might not be what you’d expect.

Why do I still “Java”?

I rarely do, but when I do it is always server-side, and my deployment is either on TomEE or Tomcat.

I’m increasingly seeing people move away from server-side Java to Node.js to reduce the impedance mismatch between client-side applications and server-side code. OK, that’s not entirely true, not everyone is moving from Java to Node.js, but the teams that are closest to front-end web development seem to be leading what could turn into a quick transition from Java to Node.js.

I see teams that are responsible for back-office code like transaction processing sticking with Java.  Java seems to be here to stay for e-commerce systems – banks and governments are still heavy into Java. I don’t see this changing.  That being said, I know of a few government agencies that have made a huge investment in Node.js.  Node is popping up in places you wouldn’t expect.

The only thing I’m using Java for at the moment is to serve REST APIs to a Javascript front ends, that’s it.  Everything client-side is Javascript – either Angular or Backbone – but usually AngularJS at this point and if a framework like that is too heavy-weight.  I tend to question the sanity of people who propose using anything like Spring MVC or JavaEE for anything other than some plumbing to support a Javascript application.

The big exception to this trend is Big Data. Systems like Presto seem much more appropriate in Java than anything else, and I see code in that project that I wouldn’t ever want to migrate to anything other than Java. They use features of Java few others do to squeeze as much performance out of machines as possible.

Overall trend right now – Java is moving further back in the stack, but it’s still important.  Another important trend, no one is paying attention to the new commercial support options for Java; in fact, many businesses are just openly hostile to the idea of paying for Java support.  (In several cases this is a motivation to look for alternatives.)

As for the why? It’s familiar. When I develop in Java this familiarity often trumps the agility or novelty of an alternative language.  For example, I have a system running Ruby on Rails that just turned 3 years old – I often look at that application and just wish I had opted to use Java.

Maven 3.2.1 Provides a New Cure for Dependency Hell

If you’ve used Maven long enough you’ve seen this pattern.

  • You work somewhere that breaks up development into several independent groups.
  • Different teams have very different standards for dependencies and project organization.
  • One team sends over a JAR file that has transitive dependencies on everything from testing frameworks to unnecessary JDBC drivers.

I call this pattern the “Someone Needs to Train that Team on Maven” pattern because that’s exactly what needs to happen. Usually this happens at large enterprises set up to support multiple levels of development, say one team is working on a project that supplies a client to another team. One team is developing a REST service and, as a convenience, they supply a JAR that contains a simple model object and some code to interact with the REST service.

Easy, right?  Wrong.  A good developer will make sure that this client artifact has as few dependencies as possible.  Lean dependency signatures are key in large enterprises.  If the interaction with that other system is via REST, then there’s no reason to include backend code to interact with a database or the code that actually implements the service. If the other team has a limited understanding of Maven dependencies – then there will be trouble.  You will get a client JAR that happens to include everything and the kitchen sink – your 8 MB WAR file bloats up to 200 MB because it includes several versions of Spring (even though you don’t use Spring).

In Maven, dependency hell is often not due to the tool itself, it is self-inflicted and it quickly infects your entire organization’s Maven projects. One bad project, one developer with a weak understanding of when and where to declare dependencies can create a disaster that will bloat the dependency trees of projects that consume that artifact.  Anyone who touches an artifact with bloated transitive dependencies gets bloated dependencies.

Jason van Zyl at Takari points to the solution.  

In Maven 3.2.1, you can exclude all of the transitive dependencies for a dependency.  This means that, if someone sends you a JAR artifact with an awful POM, you can cut the problem off at the root.

I’ve seen some people do similar things by declaring a dependency as “provided” in dependency management, but this is both time consuming and incorrect.  The thinking here is that I can selectively cut out transitive dependencies by just declaring each transitive dependency as “provided”.  It hacks at Maven’s model to get around this short-coming.  Maven 3.2.1 has a more elegant solution to what I see as an unfortunate reality for most large-scale Maven projects.

And, that unfortunate reality is that most people using Maven have a limited understanding of how dependencies work. This is an ailment which is easily fixed with training.

Fixing Payment Systems with Competition

This Target hack is a BFD. I’m at the mall this weekend because I’m a very last-minute shopper and it was the only time I could find to shop. My wife calls me because she gets this email from Chase which I’ll paraphrase here:

You got hacked.  Lolz!  It ain’t our fault, really.  So sorry. So so sorry. Oh, BTW we’re putting new limits on how you can use your card in the middle of Christmas week because of Target. Hey hope this doesn’t screw you up, but I hope you weren’t planning on spending more than $100 a day with us.   Happy holidays.

Think about this for longer than a few minutes, think about how this affects millions of customers, and then you’ll realize that this Target hack could potentially ding a percent or two off of this holiday season for a few retailers.

When we look back at this time, we’re going to laugh at how silly our approach to payment systems were from about 1980 – 2013.  I think that the Target hack is likely just the beginning, but it is clear that (even with strict PCI-compliance) we need radical change in payment.

Problems with Payment

  1. Our credit cards (at least in the US) are the technology-equivalent of a cassette tape. While I’m running around town with a smartphone that can read my fingerprint whenever I shop I’m still using the equivalent of an 8-track cassette tape to pay for everything. Instead of moving toward a system that uses my location and my fingerprint. We’re just walking around with wallets that are no more secure than a envelope labeled “My Credit Card Numbers” that is totally unprotected. Steal my wallet and you’ve got my credit card numbers… there’s a better way.
  2. We still have this irrational belief in the signature (and checkout clerks still eyeball them). This is our idea of identity verification – here’s a quill pen, why don’t you just sign this.  Now wait… there’s enough reliable location data flowing from my phone to enable every checkout clerk to say, “Welcome to the store Mr. O’Brien” without me saying anything.  The store should know I’m there already, the technology also exists to have the store take care of payment authorization every time I pick something up. My phone could generate a piece of data that could encrypt not just who I am, but where I’ve been today and what the time is down to the microsecond authenticated by several GPS satellites.
  3. Online payment systems that offer more security are tiny in comparison to the 50,000 lbs gorillas that dominate the system.  No one uses these systems. Add up the value of  all the innovative payment companies in the Bay Area (Square, PayPal, + a thousand others) and you still don’t touch the $6.9 trillion total volume of Visa.  That’s $6.9 trillion dollars flowing through billions of point-of-sale terminals (or “all the money”). Someone needs to figure out how to upgrade that instead of creating yet another payment system to trial in San Francisco and New York.

When I wrote about payment systems in 2010, the universal warning everyone was throwing at me was, “Don’t expect anything to change in the short-term.  The retail industry moves slowly and no one wants to make the capital investment necessary to upgrade point-of-sale.”  At the time I was talking to a senior manager at a well-known payment company based in the Bay Area about NFC payment systems.  According to him the future was now an revolution was upon us.  It wasn’t.

The solution

1. Ensure real competition in the payment processing space. Huge payment providers like the ones that have logos in your wallet have had a history of using confidentiality agreements with vendors and transaction fees as a tool to lock out the competition. For example, you are not allowed to offer discounts for different kinds of payment methods.  Whether or not this continues to happen after the interchange fee settlement is up for debate, but we need to make sure that new technologies are not locked out of the physical point-of-sale space.

2. Put all the risk on payment providers.  If you provide a card or a technology that people can use for payment, put all of the responsibility for a compromise on the payment provider. This will motivate payment providers to move away from the current, insecure methods of payment that we use today. Your credit card won’t just be a series of easy to copy numbers, it will make use of the technology we have available. Also, this would force dramatic changes to PCI.  “Storing a credit card #” at a merchant would go away, and instead your transactions would look more like PayPal’s authorization process for recurring payments.

With real competition, the payment processors that can control risk will be able to offer a significantly lower cost to the retailer, and retailers will provide the necessary motivation to consumers to adopt more secure technology.  If Square has the best risk management and fraud prevention technology available, a retailer should be able to offer consumers that use that technology a 1-2% discount if they pay with Square. Competition (not regulation) is the way out of this mess.


Whirr + Spot Prices + Thanksgiving Weekend means that I can run large m1.xlarge instances on the cheap.

<griping>Also, Whirr is essential, but the project has a sort of “forgotten Maven site” feel about it. It’s annoying when an open source project has several releases, but no one bothers to republish the site.  It’s even more annoying when the “Whirr in 5 minutes” tutorial takes 60 minutes because it doesn’t work.</griping>

The Fall Guy (or Representing Open Source in the Business)

The problem with being the developer who can write at an open source company is that you end up being enlisted into the whole “Please explain how open source works” discussion when the company hires non-technical managers.  You end up as the representative of this strange thing called “open source.” A VP (not yours) calls you up and says, “Hey, could you explain what open source is to our sales team?”

You seize upon this as an opportunity to spread the Gospel of FOSS. You prepare elaborate slides that speak of Cathedrals and Bazaars. You turn some Lessig into an inspirational dramatic monologue that will inspire these non-developers to start thinking of OSS as the heroic effort we are mounting to take back control from proprietary vendors and create an even larger sharing economy. You think that maybe it is appropriate to introduce some of the developers that work on the project that company is currently making money…

…and then you show up at the “Sales Kick-off” meeting and you realize that this is more of a Glengarry Glen Ross joke festival than it is an audience receptive to the idea of profiting from a sharing economy.  You quickly try to revise slides about “Free as in Beer”, because you realize that any mention of beer is going to get this crowd derailed pretty quickly. They scheduled you at the end of the day, after the VP of Sales gave a speech that involved football metaphors and after the regional sales director had a loud fight about territory with the sales team.  You realize that no one really wants to hear about OSS because they are all about to go out on some sales team-building exercise that involves a lot of drinking and more discussion of sports.

You are summoned to present with “…Ok, some hippy developer is going to tell us what this freeware @#$# is all about anyway. Go ahead show ‘em how to ‘make it rain.’”

If this is your job, you’ll find yourself in a room full of people asking you questions like “Alright, so do you geeks have anything better to do with your weekend?” and “Why are my customers getting all worked up over open source? I don’t get no commission on this crap.”

Some things that you’ll notice in the reaction:

  • People with a background in business and sales have no idea why you’ve been participating in open source for years.  Not only do they not understand it, some of them discount the entire idea (even if the company was built atop an OSS foundation).
  • Even if you think you’ve explained open source, there’s a large portion of the audience that either wasn’t listening or refuses to admit that it could ever work. (Someone will make a joke about how you are a communist.  It will be unclear whether that person was really joking or not.)
  • Jokes will be made about open source being about “free love,”, “hippies,” and “unicorns.”
  • Invariably, someone from the 1980s will show up and talk about how they once made a lot of money selling business software.  This will be used as an attempt to show others that your generation just has it all wrong.

If just the right kind of manager is there, everything you say about the “possibilities of open source” will be dismissed as over-idealistic nonsense.  Even though you might have just delivered a presentation on how Hadoop has created billions of dollars in value and how organizations like the Apache Software Foundation act as foundries for innovations that drive computing, someone will invariably stand up right after you and say, “Ok, enough about this open source crap, how are we going to make money?”

You realize that your “open-source” stuff is just going to be used as a scapegoat for a sales team that has no idea what OSS is.  This is the reason why you see headlines about large companies canceling support for OSS projects and products.  It isn’t because they couldn’t find a way to “monetize” – no it was often because they refused to understand the gold mine they were sitting on.

The Shift to Local Data Centers

In my post on Friday I wrote a fictional piece from 2020 predicting that the world’s IT infrastructure shifted to in-country data centers after the recent surveillance revelations.   It looks like this is going to happen faster than I expected.

What shall we name this trend?  How about “Jurisdictional Data Compliance” or “Jurisdictional Data Security”.  Walk up to your CIO today and ask what your JDC implementation plan is given your client’s new concerns about privacy.