Here’s a list of lowest common denominator rules for deployments that should apply across multiple languages and/or build tools
Don’t deploy development “snapshots” to production. Deploy a real release. Actually “cut” a release. I know there are many people out there who will whine that this process takes too long for whatever reason – maybe your build tool has to run the build three times for cut a release? Maybe you just don’t want to be bothered stopping everything to make a release.
So many people in the industry just skip releases and deploy straight from master or HEAD. Maybe this works if you are a ten person startup, but if you run a real business this is an awful way to operate. All it takes is one critical bug to realize two things: A.) There is no roll-back place because it isn’t possible, and B.) You don’t know how long it will take to get the system back up and running because of A. Have fun if you don’t cut releases. (Have fun getting your resume ready for your next job, that is.)
Make it impossible for development or snapshot builds to end up in production. Enforce this rule; otherwise, you’ll give people the option to ignore it. When it comes to a production deployments you should understand that people will always find a way around the rules if possible and expedient, short-term solutions often win out when a team is under pressure to deliver.
If you are using Nexus or Artifactory (or some other repository manager). Create a repository that is specifically for production and isolate it from repositories that contain snapshots. Use network or firewall rules to make sure that no one, no one can access the development repository even if they really need to. This is the only way to truly ensure that snapshots never end up in production.
Have an immediate way to find out what version has been deployed to production. (and I mean immediate.) Have you ever seen a bug in production and people are running around like crazy idiots trying to figure out what version of the code was deployed? Your build tool should be able to write a git commit hash or a branch and version name to some file (and if it can’t find another one.)
Nothing looks worse to management when the team responsible for the build can’t immediately tell you with certainty what version of code is running in production. Don’t be that team.
Practice a rollback before every release. Many people reading this will think I’m joking because the reality in many organizations is that you’ve never, ever, ever done a rollback. If you are testing properly, rollbacks should be rare, but software systems are complex and failure is unpredictable no matter how much testing happens.
If you never practice a rollback there’s a good chance you couldn’t pull it off even if you needed to, and there’s also a chance that a rollback would put your databases into an unrecoverable state of highly-visible failure. If the question, “how quickly can we rollback?” produces an involuntary nervous chuckle from your deployment people then it is time to schedule a drill.
Have a plan to verify your release. Have you ever deployed to production only to wonder if the deployment actually worked? This is what happens when you don’t have a QA team. If you have a QA team they will often stay up all night with you to verify that a release has been deployed successfully. Be that team, because the alternative is often error prone.
If your deployment process involves software developers pushing code to production and then just “clicking around” to see if it all works. You are doing it wrong, and you are likely violating an important rule of software development: you (almost) need a good QA team more than you need a good team of developers.
Drill. Run drills. Make them unexpected. Watch a submarine movie, any submarine movie because they are usually all the same. Right after the initial dive and the singing of patriotic US or Russian songs, the captain whips around and runs a drill – all hands on deck, we just lost the database.
Developers and people involved in devops tend to think that these drills are silly and meant for operations. They aren’t. When production goes haywire it’s almost always the developers being pulled into the war room being asked to guess what the problem is. Toward a production launch, get your systems into staging and make something fail on purpose. This way you’ll know exactly who needs to be involved if a failure occurs.
Know your build tool. Don’t just use your build tool, understand the best practices around it. Most build tools come with an approach to deployments that encourages some of the steps I’ve listed in this entry.