The Invisibility of Near Legacy Systems

“Wait, that part of the site is written in Ruby?  What in the…”

“Yes, and guess what you are responsible for keeping this up and running.”

“Really? That’s terrifying.”

Common problem everywhere – legacy systems.  I’m not talking about systems from 1999. Those systems are, in some ways, easy to retire.  You can’t miss those systems because they give off so much noise.  I’m talking about systems from 2-3 years ago that didn’t survive the proof-of-concept stage, but which continue to run some part of a public-facing business.

“Oh, that thing sends JSON to the other thing that does the new stuff…”

Maybe the system was written in a poorly selected technology stack, or maybe the team responsible just decided to stop coming back to the office.  If you walk into any large company that writes software there are going to be those applications people keep in the back of the department chugging away at some part of the business.

I’m not taking about “Bimodal IT” which is a new buzzword from Gartner that describes companies with a ‘Startup’ track for software development right next to the ‘Dilbert’ track for IT service management.  I’m taking about the several “failed experiments” that continue to haunt your budget – these ghosts grow invisible to management until that day someone realizes what’s happening.

Software projects move much faster than they did just a decade ago, and compared to two decades software projects are finished in the blink of an eye.  With this accelerate comes a new class of applications – the near legacy.

Almost every company has this near-legacy “asteroid field” because people rarely get rewarded for turning things off.

 

New Rules for Your Signup/Cancellation Process

Make your sign-up process as fast as possible, but also make it easy for me to immediately cancel my account.

  1. I should be able to signup for your service in 60 seconds.
  2. You are allowed to send an email verification, but that email better show up immediately.
  3. I should be able to delete my account in 120 seconds, and by delete I me (never send me another email, pretend like I never signed up.)
  4. Account deletion should not require anything more than a button and a confirmation dialog
  5. All of your emails, including updates should include a link to unsubscribe the email… immediately.  Don’t hide it, don’t make it the same color as the text.  Don’t hide it in the middle of legal small print.

Maybe, most important of all here’s what I don’t want:

  1. Don’t send me a confirmation email that I’ve unsubscribed from your email list.  This is like a parting, “slap in the face.”
  2. If the process for canceling an account involves talking to a human.  No. Don’t do this.
  3. Don’t get creative.  I have to keep track of a million tiny cloud startup service logins, I’m not going to be pleasantly surprised if your service redefines “signup”

Selling an Open Source Project (or Why you should reconsider Foundations)

While I think Eran’s language is a bit on the bold side, I do agree with his conclusion – selling an established open source project changes the rules for existing community members in a way that is unacceptable.  It brings to the surface questions of ownership and authority in a way that poisons the community.  Alienate the active commitership on an open source project, and you’ll quickly see how open source projects die.

Wait. Who owns this project?

When you contribute to an open source project you very often pour a significant portion of your life into the effort. When I was more active in open source it was this irrational activity of dedicating nights and weekends to an effort simply because it attracted my interests at the time. Sometimes contributing to an open source project makes financial sense if it lines up with your current job.  At other times, it might just be a problem that attracts your interest or it is an idea you want to see finished. People not involved in open source often have this image of glamour – “Wow, you commit to that project?” In reality, having any responsibility in an open source effort often translates to a lot of drudgery – it is difficult work, it is often thankless.

Continue reading

6 Things to Agree On Before You Launch That Site

I’ve been in the middle of a few, very large site launch events and it’s almost always the same story. Everyone’s so focused on the lead up to the launch of a site that few people realize that there’s often much more work right after a high profile (and extremely high risk) launch event than there was before.  It’s these spikes of activity post-launch that can make or break the success of the launch.

Here are some things I’ve learned from my own personal experience being in the middle of a large release event:

1. What is the metric for success?  If you are working on an e-commerce site is it performance?  Is it some performance metric improving?  What sort of improvement are you expecting? If your release is a phased or ramped release what are the conditions for go/no-go for ramp up?

2. What is the metric for failure? And, what happens if your release decreases an important metric? Who makes the ultimate decision to rollback a release and is there a realistic plan for a rollback?

3. What tools will you use to measure the metrics defined for both failure and success? How and who communicates success or failure to the business?

4. Once your software is in production, who is responsible for production support? Are you expecting developers to handle inquiries from operations?  If so, have you made sure that these developers have enough time to respond to immediate requests from production-facing operations teams?

5. Who is responsible for devising coverage calendars and making sure that developers can always be available to support production? After a huge site launches you often see more than half the staff decide to take a long vacation, who’s making sure that you have adequate coverage.

6. Do your developers know what to do if they get a call from someone in production operations who is experiencing a problem?  Do they have the appropriate tools to be able to answer questions about common failure scenarios?

My items are focused mostly on the developer’s unavoidable intersection with a production support team post-launch. This is where the dragons be.

7 Simple Rules for Software Deployments

Here’s a list of lowest common denominator rules for deployments that should apply across multiple languages and/or build tools

Don’t deploy development “snapshots” to production.  Deploy a real release.  Actually “cut” a release.  I know there are many people out there who will whine that this process takes too long for whatever reason – maybe your build tool has to run the build three times for cut a release?  Maybe you just don’t want to be bothered stopping everything to make a release.

So many people in the industry just skip releases and deploy straight from master or HEAD.  Maybe this works if you are a ten person startup, but if you run a real business this is an awful way to operate.  All it takes is one critical bug to realize two things: A.) There is no roll-back place because it isn’t possible, and B.) You don’t know how long it will take to get the system back up and running because of A.  Have fun if you don’t cut releases.  (Have fun getting your resume ready for your next job, that is.)

Make it impossible for development or snapshot builds to end up in production. Enforce this rule; otherwise, you’ll give people the option to ignore it. When it comes to a production deployments you should understand that people will always find a way around the rules if possible and expedient, short-term solutions often win out when a team is under pressure to deliver.

If you are using Nexus or Artifactory (or some other repository manager). Create a repository that is specifically for production and isolate it from repositories that contain snapshots. Use network or firewall rules to make sure that no one, no one can access the development repository even if they really need to.  This is the only way to truly ensure that snapshots never end up in production.

Have an immediate way to find out what version has been deployed to production. (and I mean immediate.)  Have you ever seen a bug in production and people are running around like crazy idiots trying to figure out what version of the code was deployed?   Your build tool should be able to write a git commit hash or a branch and version name to some file (and if it can’t find another one.)

Nothing looks worse to management when the team responsible for the build can’t immediately tell you with certainty what version of code is running in production.  Don’t be that team.

Practice a rollback before every release. Many people reading this will think I’m joking because the reality in many organizations is that you’ve never, ever, ever done a rollback.  If you are testing properly, rollbacks should be rare, but software systems are complex and failure is unpredictable no matter how much testing happens.

If you never practice a rollback there’s a good chance you couldn’t pull it off even if you needed to, and there’s also a chance that a rollback would put your databases into an unrecoverable state of highly-visible failure. If the question, “how quickly can we rollback?” produces an involuntary nervous chuckle from your deployment people then it is time to schedule a drill.

Have a plan to verify your release. Have you ever deployed to production only to wonder if the deployment actually worked?  This is what happens when you don’t have a QA team.  If you have a QA team they will often stay up all night with you to verify that a release has been deployed successfully. Be that team, because the alternative is often error prone.

If your deployment process involves software developers pushing code to production and then just “clicking around” to see if it all works.  You are doing it wrong, and you are likely violating an important rule of software development: you (almost) need a good QA team more than you need a good team of developers.

Drill. Run drills. Make them unexpected. Watch a submarine movie, any submarine movie because they are usually all the same.  Right after the initial dive and the singing of patriotic US or Russian songs, the captain whips around and runs a drill – all hands on deck, we just lost the database.

Developers and people involved in devops tend to think that these drills are silly and meant for operations.  They aren’t. When production goes haywire it’s almost always the developers being pulled into the war room being asked to guess what the problem is.  Toward a production launch, get your systems into staging and make something fail on purpose.  This way you’ll know exactly who needs to be involved if a failure occurs.

Know your build tool. Don’t just use your build tool, understand the best practices around it. Most build tools come with an approach to deployments that encourages some of the steps I’ve listed in this entry.

drone.io vs. codeship.io

I’m a paying customer of both drone.io and codeship.io for now.  Here are some of the reasons I’m moving toward codeship.io over drone.io.  I found a dearth of good information when I was trying to find info, so maybe this will help someone.

  1. I’ve been using drone.io for a while.  Drone.io is simple, it’s worked for me.  I’ve used Drone to deploy to heroku and also to S3 for a while.  I also use drone to automate some Apache httpd config through SSH.   It works, it’s a very simple interface.
  2. I had a project – an AngularJS project – which was being deployed to S3.  In drone you can configure an S3 deployment against a branch, and it’s a little annoying. It limits the number of files that can be uploaded to 500 or 1000 and it just silently omits anything more than that (which is ridiculous and has cost me multiple days).  This has been the case for a number of months and I haven’t seen any roadmap for addressing this limitation.
  3. The last straw with Drone was when I was trying to configure two S3 deployments – one S3 deployment to a production bucket if the commit was to the production branch and another to the staging bucket if the commit was to the staging branch. Drone.io wouldn’t allow me to configure multiple S3 deployments against different branches.   When you try to configure two S3 deployments it fails with a frustrating, “There is already an S3 deployment for this project” message.
  4. Next step… “hey I’ll just create another project against the same repository and configure an independent repository”….. drone.io doesn’t allow me to configure two projects against the same Github repository as a workaround to limiting my S3 deployments to 1 per project. At that point, I tried an awful work-around which was to fork a repo to create another drone.io build that would have another S3 deploy.  Awful work-around because I would have to tell a team to clone another remote and push to another branch.

Then I completely lost my patience with drone.io.  It’s a simple interface, but you shouldn’t have to adapt your process to the limitations of a CI system.  I was also maybe noticing that drone.io isn’t going anywhere these days.  Most services and companies tend to surprise me with a new feature or two every couple of months. I depended on drone.io S3 behavior for many months, but it still felt like a beta. drone.io didn’t feel like it was going anywhere.

 I googled for “Alternatives to drone.io” and ended up on this Quora question: http://www.quora.com/Travis-CI/What-are-the-alternatives-to-Travis-CI

From there I evaluated Travis for a second or two – but they have a high price point for what I’m looking for.   I stumbled upon codeship.io – important point is that I started using it first, then I saw that it was only $50 to get started. I signed up.

That whole process of starting to have doubts about drone.io to deciding to leave drone took 2-3 days.   The switch from drone.io to codeship.io took about two hours start to finish.   That’s not entirely true, I still have a few projects on drone.io, but I’m migrating everything over the next two weeks.

First impressions of codeship.io?  The interface is more complex than drone.io, it feels just as capable as drone.io maybe more-so, maybe not.  If drone.io and codeship.io were to compete on features I have the feeling that codeship.io would win, but comparing features of CI systems seems like a waste – most users are focused on one or two languages for support.  If it builds Javascript well, that’s really all I’m looking for. I’m really not looking for a jalopy of features. If I needed that I’d run Jenkins and install everything and the kitchen sink on it. My goal here is just a simple CI system that can hook up to GitHub, run builds, and not get in the way. Codeship.io did just that after 15 minutes of investment.

The UI of codeship.io isn’t as simple as drone.io, and at times when using it you may be a bit confused by the different abstractions it throws at you.  For example, the ability to drill into different stages of build output is awesome, but it can be difficult to figure out how to get to a particular project’s settings at times.  The project settings page doesn’t jump up and tell you which project you are configuring.

The irony of the migration is that codeship.io doesn’t support S3 publishing as a built-in deployment option (I ended up solving this with a grunt-s3 plugin after the fact).  I moved because my initial frustration with drone.io opened the door – I evaluated the options and concluded that codeship.io shows more promise at the moment.

It’s codeship.io for now, but this market is frictionless so we’ll see what happens next month. The ease with which one can just point a CI system at GitHub and get going is great. It means that there is nearly zero cost for evaluating the competition.

Ruby on Rails vs. J2EE

At this stage in the game they both remind me of each other when it comes to web application development.  Here are some important similarities:

  1. When you start both a Ruby on Rails and a Java project you’ll be selecting libraries and frameworks that you will have to put up with for years.  Start a RoR project in Rails 4.0 today and it’s very likely that you’ll be surfing along on the same version for years to come.   Start a JavaEE project on Tomcat 7, same result.  When you adopt a framework approach I’ve noticed that it is difficult to justify upgrades and moving to the latest greatest release once your project is fully underway.
  2. Both have a community that is committed to supporting the projects and platforms that sustain the ecosystem. Neither RoR or J2EE is going anywhere even though you’ll hear many in both communities lamenting the existence of the other.
  3. Rails people have grown up to the point where many of them realize that Rails isn’t the only answer.  This took a few years to realize, but we’re beyond the age of rockstar Ruby programmers walking around as if they’ve discovered productivity.  Rails developers are now maybe as humble and defeated as Java programmers.
  4. Both frameworks will lose developers to the continued growth of Node.js.
  5. Tooling support is about the same at this point (i.e. IntelliJ supports both)

Here are some important differences:

  1. Java seems easier to decouple over a long-term timeline.  If I have a team pour a few years of development time into a Java system, there’s a reasonable chance that I can build interfaces and decouple if certain architectural choices were made up front.
  2. In the trillion years since RoR has been released, J2EE has caught up to a certain point, but I still see more productivity during the first few months of a RoR project versus a Java project.
  3. Over the long-term, I see J2EE as being more maintainable.  This is likely related to #1, but I can decouple something written in Java and divide projects with dependencies.
  4. Rails projects grow to a certain size only to become terrible nightmares to maintain.  Java projects grow to a certain size only to become terrible nightmares to maintain, but there are people out there that understand how to manage this situation.
  5. No one talks about the build in Rails.  Everyone talks about the build in Java.
  6. Ruby seems easier to deploy into PaaS because it provides a common interface.  Java should be easy to deploy into a PaaS, but there’s so many ways to configure containers it marginally more difficult to solve this problem.

 

Is Grunt the new Ant?

I’ve heard this complaint a few times, but I don’t buy it.  Usually when someone says this there are a few implied statements:

  1. Grunt lacks a lifecycle – and by Lifecycle I mean either the Maven lifecycle or the sort of “lifecycle” one can obtain by applying a Gradle plugin.
  2. Grunt carried forward a few years will result in huge, unmanageable build Gruntfile.js disasters.
  3. Grunt isn’t a comprehensive and it doesn’t provide enough conventions.
  4. (I’m unwilling to admit that Javascript is taking over a big part of my previous job.)

While I can see how a single Grunt file used to manage a large, monolithic enterprise application would be a complete disaster, I also think that many of Grunt’s critics are coming at Grunt from a different place entirely.

They are assuming that Javascript projects using Grunt are similar to the Java projects  they are replacing, and they are holding on this an antiquated notion that there is going to always be “one build to rule them all.”

You’ve all seen this, the big Java build that ties together a massive amount of source code into a single build artifact requires a build tool (like Maven or Gradle) which can scale as the project grows.  This approach is still very valid for many systems – take Presto as an example.  Presto has a Maven build that just works, it generates the artifacts in a single consolidated build in which there are common standards (and technology) across an array of components.

Largish, enterprise web applications are a whole different matter. These days you are writing web applications in a mixture of Javascript and Java.  Your Javascript builds are in Grunt and your “application” is really just a series of server-side REST services supporting a collection of single-page applications written in Javascript.

While this approach provides the perfect opportunity to decouple projects from “the single build” approach, most developers are continuing to try to combine everything into one, perfect build.  (Much in the same way they are trying to cram everything into a single GitHub repository.)

I think that’s a bit of old-fashioned thinking. A better approach to scaling a large, heterogeneous web application is to divide it into many smaller projects. This reduction in scope allows developers to use whatever build tool makes the most sense for a particular module. If you’ve refactored a large AngularJS project into a series of libraries and modules does it matter that you have a Gruntfile.js which might repeat some configuration values and define the same “default” task?

Aside from the points I’ve already outlined, I’m finding Grunt to be maybe a bit more declarative than Maven or Gradle when it comes to builds.  If you generate a default build with Yeoman these days, you’ll see that it generates a Gruntfile.js which automatically loads tasks and which has a very thin layer of tasks at the tail of the build file.  Most of your Gruntfile.js is just customizing a little configuration and tasks ship with default behavior – I’m really not telling Grunt what to do – the tasks ship with some built-in intelligence.

And, Grunt brings an advantage that neither Maven nor Gradle have – my dependencies, the versions of my plugins are tracked in a separate package.json file.  I wish that was the norm in Java-based systems because dependency issues are always conflated with build customization in a way that often drives me up a wall.

So, no Grunt isn’t the new Ant. Also, there’s nothing wrong with Ant.  There I said it.