Lift Now Has Plans

Two weeks ago I blogged about Lift as a good site to help people meet personal goals.   Now, Lift has announced a new feature “Plans”.

lift-plans

What I like about Lift is it’s simplicity.  It isn’t asking me to tweet every other second, and the mobile application hasn’t decided to ask me to write a review. (Have I mentioned I hate that.)

10 Steps to Get Your Crazy Logs Under Control

Two days ago I wrote a post about how “developers tailing the logs” is a common pattern.  A couple people responded to me directly asking me if I had some sort of telepathic ability because they were stuck in a war room tailing logs at that very moment.  It’s a common pattern.  As developers we understand that tailing log files is much like tasseomancy (reading tea leaves) – sometimes the logs roll by so quickly we have to use a sixth sense to recognize the errors.  We are “log whisperers.”

The problem here is that tailing logs is ridiculous for most of the systems we work with.  Ok, if you have 1-2 servers, go knock yourself out – tail away.  If you have 2,000 servers (or more) tailing a log to get any sort of idea about errors or behavior isn’t just inappropriate, it’s dangerously misleading.  It’s the kind of practice that just gives you and everyone around you the false reassurance that because one of your servers is working well, they are all fine.

So, how do we get a handle on this mess?

#1. Stop Tailing Logs @ Scale – If you have more than, say, 10 servers you need to get out of the business of tailing logs.   If you have a reasonable log volume up to a handful of GBs a day, throw it into some system based on Apache Solr and find a way to make the system as immediate as possible.  That’s the key, figure out a way to get logs indexed quickly (in a couple of seconds) because if you don’t?  You’ll have to go back to tailing logs.

You can also use Splunk.  Splunk works, but it’s also expensive, and they charge by daily bandwidth. If you don’t have the patience to figure out Solr, use Splunk, but you’re going to end up paying crazy money for something that you could get for free.

If you have more than a few GBs a day on the order of tens of GBs, hundreds of GBs, or even a few TBs of data a day.  You are in another league, and your definition of “logging” likely encompasses system activity.  There are companies that do this and they have campuses, and this isn’t the kind of “logging” I’m talking about here.

#2. If possible, keep logs “readable” – If you operate a very large web site this may not be possible (see the first recommendations), but you should be aiming for production log messages that are reasonable.   If you are running something on a smaller scale, or something that needs to be interactive don’t write out a MB every 10 seconds.  Don’t print out meaningless gibberish.  When you are trying to come up with a log message, think about the audience which is partially yourself, but mostly the sysadmin who will need to maintain your software far into the future.

#3. Control log volume – This is related to the previous idea. Especially in production, keep volume under control.  Remember that log activity is I/O activity, and if you don’t log correctly you are making your system wait around for disk (disk is crazy slow).   Also, if you are operating @ scale, all that extra logging is just going to slow down whatever consolidated indexing that is going on making it more difficult to get immediate answers from something like Solr.

#4. Log messages should be terse – Log message should be terse. Aim for a single line when possible and try to stay away from messages that need to wrap.  You shouldn’t print a paragraph of text to convey an idea that can be validated with a single integer identifier.  It should fit neatly on a single line if at all possible.   For example, your log messages don’t need to say:

"In an attempt to validate the presence or absence of a record for BLAH BLAH BLAH INC, it was noted that there was an empty value for corporation type.  Throwing an exception now."

Instead:

"Empty corp type: BLAH BLAH... (id: 5). Fix database."

#5. Don’t Log, Silence is Golden – I can’t remember who it was, but someone once commented on the difference between logging in Java and logging in Ruby (I think it was Charles Nutter talking about the difference between Rake and Maven).  When you run a command-line tool in Ruby it often doesn’t print out anything unless something goes horribly wrong.  When you run a tool like Rake it doesn’t print much if things go as planned.   When you run Maven?  It prints a lot of output, and this is output that no one ever reads. This is a key point.  Normal operation of a system shouldn’t really warrant that much logging at all.  If it “just works”, then don’t bother me with all that logging.

If you are operating a web site @ scale, this is an important concept to think about.  Your Apaches (or nginx) are already going to be logging something in an access log so do you really need to have a log that looks like this?

INFO: Customer 13 Logged In
INFO: Attempting to Access Resource 34
INFO: Resource Access for resource 34 from customer 13 approved
INFO: Sending email confirming purchase to customer 13

I don’t think you need this. First, you should have some record of these interactions elsewhere (in the httpd logs), and second, it’s just unnecessary.  In fact, I think those are all DEBUG messages.  Unless something fails – Unless something needs attention, you should strive for having zero effect on your log files. If you depend on your log files to convey activity, you should look elsewhere for a few reasons: 1. It doesn’t scale, and 2. It is inefficient.  Instead of relying on a log file to convey a sense of activity, tell operations to look at database activity over time.

#6. Give Someone (Else) a Command – This is something no one does, but everyone should.  Your logs should tell an administrator what to do next (and it rarely involves you.) The new criteria for printing a log file in production is either something goes wrong or something needs serious attention. If you are printing a message about something that has gone wrong don’t assume that the person reading this message has any understanding about the internals of the system. Give them a direct command.

Instead of this message:

ERROR: Failed to retrieve bean property for the customer object null.

Write this:

ERROR: Customer object was null after login. Call DBA, ask about customer #342R.

You see the difference? The second log gives the admin a fighting chance (it also shifts blame to the database).  In this case, someone sent you a corrupt customer record, so point someone at the DBA.  You’d likely redirect them there anyway.  This way the sysadmin can skip the call to engineering and go directly to the source of the problem.

If you do this right, you’ll minimize the production support burden. Trust me, you want to minimize your production support burden – if you don’t minimize this you won’t have much time for development because you will be fielding calls from production all the time.

#7. Provide Context – Unless you are logging a truly global condition like “Starting system…” or “Shutting down system…”, every log message should have some contextual data.  This is very often an identifier of a record, but what you should try to avoid is the log message that provides zero data or context.   The worst kind of message is something like this:

ERROR: End of File encountered in a stream.  Here's a stack trace and a line number...

This begs two questions: what is that stream from? What exactly were you trying to do? A better log message might be:

ERROR: (API: Customer, Op: Read, Server: api-www23, ID: 32) EOF Problem. Call API team.

In this second example, we’re using something like Log4J Nested Diagnostic Context to put some details into the log that will help diagnose the problem in production.

#8. Don’t Write Multiple Log Statements at a Time – Some developers see logs as an opportunity to have a running commentary on the system and they log everything that happens in a class. I dislike seeing this in both code and in a log.  Here’s the example, you have a single class somewhere and you see code like this:

log.info( "Retrieving customer record" + id );
Customer cust = retrieveCustomer( id );
if( cust != null ) {
     log.info( "Customer isn't null. Yay!" );
     log.info( "Doing fancy thin with customer. );
     doFancyThing( cust );
     log.info( "Fancy thing has been done." );
} else {
     log.error( "Customer " + id + " is null, what am I going to do?" );
}

And, in the log you have:

INFO: Retrieving customer record 3
INFO: Customer isn't null. Yay!
INFO: Doing fancy thing with customer.
INFO: Fancy thing has been done.

Consolidate all of these log messages into a single message and log the result (or don’t log at all unless something goes wrong).  Remember, logging is often a write to disk, and disks are insanely slow compared to everything else in production.   Every time you write a log to disk think of the millions of CPU cycles you are throwing into the wind.

If a developer is writing a class that prints 10 log statements one after another,  these log statements should be combined into a single statement.  Admins don’t really care to see every step of your algorithm described, that’s not why you pay them to maintain the system.

#9. Don’t Use Logs for Range Checking – There’s a certain kind of logging that creeps into a system that has more to do with bad input than anything else.  If you find yourself constantly hitting NullPointerExceptions in something like Java you may end up trying to print out variables to help you evaluate how things failed in production.  After a few years of this, you’ll end up with a production system that logs the value of every variable in the system on every request.

You’ll end up with this:

Customer logged in value: { customer: { id: 3, name: "Charles", ......}
Purchasing products: { product: { id: 32, manufacturer: { id: 53, name: "Toyota"....}
Running through a list of orders: [ { order: { id: 325... }, { order: { id:2003...} ]

…and so on.  In fact, you may end up serializing the entire contents of your database to your log files using this method.

Programmers are usually doing this because they are trying to diagnose problems caused by bad input.  For example, if you read a customer record from the database, maybe you’ll just log the value of the customer record somewhere in the log so you can have it available when you are debugging some production failure. Have a process that takes a customer and a product, well why not print out both in the log just in case we need them.   There are issues with customer records have null values, so… don’t do this, just create better database constraints.

This is the fastest road to unintelligible log files, and it also hints at another problem.  You have awful data.  If you are dealing with user data, check it on the way in.   If you are dealing with a database, take some time to add constraints to the table so that you don’t have to test to see if everything is null.  It’s an unachievable ideal, I know, but you should strive for it.

#10. Evaluate Logs Often – The things described in this post are really logging policy, and no one has it.  This is why we have these production logging disasters, and this is why we create systems that are tough to understand in production. To prevent this, you should put the evaluation of logging on some sort of periodic schedule.  Once every month, or once every release you should have some metric that tells you if log volume has outpaced customer growth.

You should conduct some investigations into how useful or how wasteful your current approach to logging is.   You should have some policy document that defines what each level means.  What is a DEBUG message in your system?  What should be an ERROR?  What does it mean to throw a FATAL? Prior to every release you should do a quick “sanity check” to make sure that you haven’t added some ridiculous log statement that is going to make maintaining the system awful.

But… most people don’t do these things which is why production logs end up being a disaster.

In the War Room, “Let me take a look at the logs…”

Not long ago, I had the opportunity to help a large company upgrade a fairly critical piece of software that was running everything.  I can tell you that the job involved Tomcat, but that’s about it. As a consultant, you learn to keep the specifics of any engagement close.  What I can tell you is that this was the kind of client that has a “War Room” that looks exactly like what you would expect – three rows of Mission Control-style desks all facing a huge wall of monitors. The lights were dimmed, and (minus the uniforms) it reminded me of War Games.  I’ve seen a few of these rooms, and if I ran a company I’d probably demand that they set one up that matches this movie set almost exactly. (Except in my War Room, I’d make uniforms mandatory.)

3movies_wg_snap_room

At this particular launch event I made a joke about the movie “War Games”, but I don’t think the audience was old enough to remember. Thy must have thought to themselves, “Why is the old guy saying, ‘Would you like to play a game?’ in a strange robot voice?”

In the War Room: Everything is Measured

At a company large enough to have a War Room, almost everything is graphed in real-time: databases, apaches, caches, temperature…  There are multiple teams that can handle emergencies around-the-clock without skipping a beat.  Toward the end of the day, you’ll jump on the conference call with teams on other continents preparing to pass the baton as the work never ends.  During a production rollout someone will get up in front of the room and bellow out commands.  There is a multi-hour process, there are multiple data centers with a large number of servers. It’s serious stuff.

There are checkpoints to be reached and decisions to be made…If metrics rise or fall in response to a deployment, people start to react.  If a graph starts to dip or spike, people start to react. Everyone’s on edge because it’s the War Room.  There are conference rooms off to the side where solemn meetings can be convened, and there’s no shortage of serious faces. It’s the War Room. Everything is measured down to the millisecond, or at least that’s the idea, but there’s always one thing that isn’t measured yet in all war rooms and in all production deployments I’ve been a part of, and that’s the application, because most of the time during a deployment you have guests in the War Room – us developers.

Developers Meet Operations

Developers don’t think like admins or operators. When a developer is working on a new system they often don’t stop and think about what the best way to measure it would be.  If you are creating a new database or writing a new programming language, the last thing on your mind is the graph that some admin has to look at to tell if your system is working.  You are working on features, maybe you are also writing some tests (hopefully).  You are not thinking about when or how to page someone if something goes awry.

Developers tend to just grab what is closest to the code.  In most cases, this is a series of log statements.  So, we write these systems that spit out a lot of ad-hoc log statements.  We’re so prolific with log messages that many operators understand that developer-driven log files are usually something of a mess.  Some systems I’ve worked on in the past print out 10 MB of text in five minutes – that’s less than helpful and yhey don’t mean anything to admins because they are often full of errors and stacktraces.  Application logs contain a lot of miscategorized messages. (Debug messages as Error messages, messages that make no sense to anyone other than the developer who wrote them, etc…)

Ops: “Is it working?”  Dev: “Dude, I don’t know, let me look at the logs…”

Back to the War Room. You’ve been sitting there for a few hours.  People have brought a bajillion of dollars worth of hardware offline to push a new deployment, and finally some guy on the other side has pressed a button that runs the release.  Everyone sort of waits around for a few minutes for hard drives and networks to do whatever it is that they are doing, and then people go back to watching some charts.  They always dip or blip a bit during a release, and people always tend to jump up and blame whatever application for whatever problem starts happening at this point.

Or, maybe things are working, maybe they aren’t, but generally there’s this 5-30 minute window period where people are just waiting for some shoe to drop. If you are a developer responsible for a high risk change this is the moment when, highly stressed, you find yourself tailing a log file.  Even in the most monitored of environments, it all tends to come down to a couple of developers tailing log files on a production machine looking for signs of an error. “Yep, it works.  I don’t see any errors.” Or, “Nope, something blew up.  Not sure what happened.”

I’m usually one of the people doing post-release production analysis, and I can tell you what we’re doing when we just tail the logs.  We’re just watching screenfuls of text roll by and we’re looking for hints of an error.  It’s totally unscientific, and it doesn’t make any sense given the fact that production networks often have an insane number of machines in them now.  Maybe it made sense when there were 10 machines and we could tail a few of them, but today those numbers are much larger (by a few orders of magnitude).  Still, we cling to this idea that we’ll always need a developer to just check a few servers. Tail a few logs and make sure nothing went horribly wrong.

I wonder if this is just how the world works.  When they launch a new rocket, I’m sure they measure everything down to the nanosecond, but I’m also sure there’s some person sitting in that NASA Mission Control center running “tail -f” on the booster rocket.   Or when you watch that landing operation for the last Mars rover, half of those people are just SSH’ing into some Solaris or Linux box and tailing “rover.log”.

There’s Got to be a Better Way

Yes, there’s probably a better way.  You could write post-deployment integration tests, you could build in some instrumentation that tests the internals of a system, but it rarely works that way.  You could write Splunk jobs ahead of time to track particular error conditions.  You could run more frequent deployments (i.e multiple times a day) so that deployments don’t have to have all this ceremony. All of these are good ideas, but it’s more likely that developers will continue to just tail logs from now until we stop writing code.

More often than not, your production network is just a whole different beast.  While staging might have tens of machines, maybe your production network has tens of thousands of machines and hundreds of millions of users.   The error conditions are different, the failure mechanisms can’t be anticipated. There’s just no way to recreate it, and, as a developer, you are the guy on the stardeck sitting there like “Data” from ST:TNG watching an overwhelming amount of log data roll by. “Captain, the logs show no anomalies.”

Break Through Server-side Bias and Surrender to Javascript

You have a server-side bias and you don’t even realize it.  I know this, and you need to know this.  It’s keeping you back a bit. Step one is to admit that you have a problem and that your addiction to easy server-side frameworks is ruining your performance.  You’ve used frameworks like Rails, Django, WordPress, or one of several hundred Java web application frameworks for years and you are resisting this move toward Javascript.  Yes, you’ve “embraced” Javascript throughout your applications, but you might be missing the larger point – Javascript isn’t here to make your server-side applications more “reactive”, it isn’t just a nice feature to add to a larger application.

That Javascript you keep on insisting to serve from your server-side framework…that is the application.  It’s taking over.    Your server-side framework won’t be doing anything resembling templating in a few years because that’s the job of the browser.  Yes, yes, you might have a few “high value” pages or pages that have to be hosted on a server because of security constraints, but if your web site requires a round-trip to a server to render a web page… well that’s old people thinking.

The “geographical center” of your application is no longer on the server-side.  You shouldn’t start your application thinking about what server-side framework it is going to be based on, because it won’t be based on a server-side framework.   In fact, you may use many, but they will only serve to support what will essentially be a client-side Javascript application.

I understand how you feel, you might read this and think – “No, we’re not moving everything to client-side Javascript frameworks, no.”  The willful ignorance you are embracing here is a defense mechanism.  As a Java developer in the middle of the last decade.  I saw the next generation using PHP and Ruby and I tried to explain it away for a few years as just a passing fad.  It wasn’t, and I still see a lot of developers in my age group reflexively resisting dynamic languages.  Resisting change is a mistake – you’ll be obsolete before you can say, “Wait I didn’t realize that the browser could do…”

With the rise of ReactJS, AngularJS, Backbone, and a number of other good client-side Javascript frameworks I’m seeing a new kind of bias – server-side bias.  There are people out there who, for whatever reason, be it ignorance or otherwise – these people fail to realize that the days of having some dynamic templating engine on the server-side merge your data with some HTML… those days are coming to a swift conclusion.   It was fun while it lasted, but this “mail-merge” approach to retrieving a row from a database, packaging it up in some object, and then “merging” it into a template.  The view layer of this is moving client-side, quickly.

And, it’s moving to the client because it’s an order of magnitude faster to serve what is essentially a static AngularJS application from a CDN than it is to muck around with serving the same from some server-farm (even with the help of memcache).  That’s the thing, once you start doing this you realize that you only really need a server-side framework for API calls.  That’s really it.  All your server-side frameworks are doing in five years time? JSON, and maybe futzing around with a few databases.

Don’t get me wrong, there will be people writing Rails RHTML and Java JSPs for many years to come much in the same way that COBOL developers still run systems packed away in government data centers.    But, Ruby and Java developers that fail to embrace this client-side Javascript trend will find themselves confined to internal applications – the Oracle Forms of 2020 is Ruby on Rails.

(An imperfect) Space-inspired OSS Project Analogy

At the risk of sounding like a raving lunatic, I decided to come up with a space-inspired taxonomy for characterizing OSS projects.  I came up with this after kicking around GitHub over the weekend trying to make sense of some new projects. Recently there’s been a huge influx of corporate-sponsored OSS projects that are released with a lot of fanfare.  While there’s a lot of good stuff happening in OSS land, it is also difficult to figure out which projects are truly “vibrant” open source projects and which are simply one engineer’s solo project.  While GitHub makes it easy to track things like a network, number of forks, etc.  These metrics are still something of a popularity contest. When a company puts 40 projects in a GitHub account, what I’d appreciate is some upfront statement: “There are our four major OSS projects, and the rest of the repositories with silly names are just small projects that plugin into our own infrastructure.”

As an exercise in lunacy, I decided to throw together a very flawed OSS project size/health/community analogy using space.   You can classify OSS projects using the following classifications:

Comets: Periodic Celebrities / OSS Projects that Won’t Last

Maybe there is an open source project that is suddenly very hot, but you can tell it’s not going to last very long.  I compare these projects to Comets.  The latest little Javascript utility may streak through industry news for a few weeks and then fizzle out.  Many startups in the OSS space think of themselves as a new planet, when in reality they are just a comet with excessive mass.   The thing about the OSS industry news cycle is that Comets often tend to dominate the news cycle as if they are full-fledged planets (because you can pay for coverage).   We’re all so used to the planets we already know… so when a comet comes into town everyone flips out.

Some comets burn out, some comets show up every few months or years and make a lot of noise attracting attention and contributors but ultimately returning back to the desolate reaches of the Oort cloud for a few months.  I’d name some OSS comets but then this post would attract a whole army of comment haters. If you find yourself attracting a community, losing a community, then attracting a new community, then losing it – you are in a dangerous orbit and you are a comet.

Asteroids: Where is everybody? Who’s running this project?

One person OSS projects.  Projects that are not completely connected to a community.   Projects that look substantial on radar, but then appear to be abandoned upon closer inspection.  Projects not large enough to attract a community (or in this case an atmosphere).   The majority of GitHub is comprised of a series of asteroid belts.

Think RubyGems, some RubyGems are so important they are moons of a planet (activesupport), or even planets in a system depending on your perspective (rails), but a lot of RubyGems are just one-person forks of someone else’s codebase floating around without a lot of discussion.   If you’ve ever found yourself trying to contribute to an OSS project only to find no response, there’s a good chance that you’ve stumbled upon an asteroid.

If you work for a company that just dumps OSS projects out there but doesn’t provide much in the way of support, you are effectively generating more asteroids.  Asteroids can be very useful to a consumer of OSS, but when you take on an asteroid, when you start mining that asteroid for minerals, you own the whole thing. If it breaks you have to fix it.  Also, if your healthy project (your planet) depends on an asteroid, you better keep track of it, or it’s going to impact you at some later date.

Planets: OSS Projects with an Ecosystem

Tomcat is a planet, and on the Tomcat planet live thousands of developers.  If something starts going wrong with Tomcat, the planet, a whole army of people show up to fix problems.   If someone wants to do something drastic to the planet, there’s a whole community which consists of that planet and any associated moons that show up and register an opinion. Taking care of a planet is tough work because there are so many interested parties.

This is the ideal size and scope for an OSS project.  Something large enough to attract a population, something large enough to sustain an atmosphere.  Yes, your planet is going to go through seasons of activity and inactivity, but there will always be signs of life on your project (as long as you do things like monitor the climate and make the necessary adjustments).

Moons: Your planet’s plugins.

Plugins for larger projects are moons.  Maybe.  Moons can gain so much velocity that they need to be rocketed into separate planetary orbits.    Maven plugins == moons.  Gradle plugins == moons.  Can’t think of anything more interesting to say about moons, so I’m moving on…

Systems:  Substantial OSS Projects Revolving Around a Central Idea or Project

Apache httpd is a system (maybe), Rails is a system (but it dominates the Ruby Galaxy).    Node.js is a system in the Javascript galaxy.

While Hadoop itself may have been a planet at one time, you can consider the entire Hadoop ecosystem to be it’s own system.    Or, maybe Hadoop is a planet in the Map/Reduce system.  Maybe Hadoop started out as a planet, it quickly aggregates many moons.  It underwent a sort of ignition point and became a star itself?

This may be where the whole analogy breaks down because if Hadoop is a star, what then is Hive?  A planet? You know what, I don’t know. It’s an analogy and it’s imperfect.  Maybe HDFS is like a singularity that tunnels between dimensions.

Now I’m just being facetious.  You get the gist.

Galaxies –  Galaxies are often more than just a project, they are an entire collection of systems.  For example, Hadoop is in the Java galaxy.    Maybe there is a PHP galaxy or a Javascript galaxy.

Listen and you’ll hear Cosmic Background Radiation?  That’s the constant bickering between proponents of BSD-style licenses and proponents of the GPL.

What is Dark Matter? Some people are convinced that OSS is dominated by corporate influence.  This influence is often very visible, but it is also something that is difficult to keep of track of because it has a weak interaction with mailing lists.

What then is the Apache Software Foundation?   The Apache Software Foundation is like the Federation.  It spans many systems and dominates certain galaxies.  Except they often have a hard time deciding where to go next because none of the ships have a captain. Sulu can stand up at any time and say, “Kirk I’m going to have to -1 that order.”  (That was a joke ASF people… that was a joke.)

Here, watch a YouTube video of Carl Sagan…

A Web Developer from 2001 Wouldn’t Even Recognize this World

I work with people much younger that I, but the reality I’m discussing in this article is really just 12 years ago. It feels like another era entirely. This is especially true if you develop anything that touches the web.

When I started my career it was all about web applications that involved full round-trips to a server. You had a browser (or a WAP phone), your browser makes a request for a web page, waits a few seconds (a few seconds!) and you get a fully assembled HTML page in return. It didn’t matter because the Web was still so full of novelty we were just happy enough to be able to do things like read the news online. Maybe your local newspaper had a website, most likely they didn’t. There was no YouTube. Web pages weren’t really connected together in the way they are now. Back then it wasn’t like loading TheStreet.com required a bunch of asynchronous calls out to social networks to populate Like buttons – there was no social network. It was just HTML and Images, and it took forever. It was fine.

My first two jobs were developing an in-house cross promotional tool for an online gaming company named Kesmai in Charlottesville in 1997, and then I moved to New York to work for TheStreet.com in 1999. Web “applications” at that time were just an inch beyond putting some scripts in cgi-bin. At Kesmai it was Perl-based CGI scripts. Between Kesmai and TheStreet I was working on systems that used a proprietary Netscape server product. And, at TheStreet.com we were using Dynamo behind Apache, so we had JHTML and Droplets and that was my first encounter with a site that had to scale. We had a TV show on Fox and maybe something like 600-700 people could use the site at the same time. (Again, that was huge back then, how times have changed?) Everything was template-based, servlets were around, maybe, but I don’t really remember diving into the Servlet API and JSPs until Struts came along maybe in 2001.

Back then, companies like Forbes.com, which I moved to after TheStreet.com, invested a crazy amount of money in hardware infrastructure. There was still a lot of proprietary software involved in the core of a web site – expensive CMS systems, etc. Open source was around, yes – we ran Apache, but it isn’t like it is now. You likely paid a hefty sum of money for a large portion of your production stack. Around 2001 and 2002, a small group of people were starting to focus on speed, and the way you achieved speed at scale back then? Drop a few million on a couple of big Sun servers. It worked. It seems old-fashioned now, but as a developer I’d work with the operations team (then as now, the operations team didn’t know much about Java), and you’d help them size heaps and figure out how to make the best use of 64 GB of RAM on a E450. You’d run multiple JVMs on the thing, someone might install something like Squid to cache things at the web layer.

Back then, you could touch the servers. They were shipped to your offices. Companies like Sun and SGI invested a lot of money to make servers look all fancy. These things were purple they had blue LEDs (remember high-brightness blue LEDs were, at one point, really new to us all). I remember seeing advertisements for servers in programming magazines. Now if you look back at these, it’s as strange as seeing an advertisement for a station wagon in a National Geographic from 1985. These days, I don’t even know who makes the hardware that runs the sites I work on, and with the exception of the database server, I don’t even care if it is even a physical machine. Back then it was like everybody getting all excited about the E4500 that was in the foosball room.

There was no memcache, there was no AJAX, there was no AngularJS, there was no REST, SOAP was new and you probably didn’t use it yet, there was no Google Analytics, remember, Google was still a tiny startup at the time. I remember having a discussion about Ruby in 2001 with a colleague who was excited by it, but Rails didn’t exist yet. Perl and PHP were around, they’ve been around forever, but you really weren’t confronted with systems that spanned several languages. Javascript was around, but you probably weren’t going to dare use it because it wasn’t like there were any standards between browsers. HTML5, huh? This was back when people still used the blink tag. Need to crunch a huge amount of data: well first of all, huge is relative and you didn’t have things like Hadoop. Just didn’t exist yet. Big Data? Yeah, back then if you had 5-10 GBs you were running a huge database. Huge. XML was still a really good idea. Flash was about to really take off.

If we could travel back in time and snatch a web developer from 2001 and drop them into 2013, they’d flip out. They’d look at your production network and wonder what happened. We’d have to tell them things like, “you’ve missed a bunch of things, this kid at Harvard created a system to keep track of friends in PHP and that changed everything. Google now runs the world. Also, the site ‘Dancing Hamster’ isn’t what it used to be.”

I look at people that started working in 2007 or 2008 and I think about how strange it is that none of this is new – because I’m still living in 1993. I’m still amazed at the functionality of Gopher in 1993.

And you can thank Mark Smith for this YouTube video…

The 7 Rules of Software Development

Nuremberg_chronicles_f_180v_1I hate simplistic blogs like “The 10 Rules of XYZ”, but who am I to buck the trend?  Why the picture?  Every time I read one of these simple lists of edicts it strikes as being very “papal”.  Here I’ve simmered down development into 7 rules:

1. There is No Such Thing as Architecture, and you should avoid “Architects”

Good software developers work in small teams, they discuss everything, and the really good ones understand that “architecture” doesn’t really mean much.  Yes there are choices about what software “stack” you are going to use and how components of the system interact, but it is a fools errand to view “architecture” as something separate from development.  Software is built up piece-by-piece, and you make a thousand little decisions here and there, all of which end up in a final structure.  Developers can discuss “an architecture”, but it is just an abstraction, it doesn’t really “exist” outside of the system as it has been implemented, and it is more ceremony than anything else.

The problem with the whole idea is that it suggests an analogy to construction, you know “real” architecture. We’re not in the business of standing up houses and/or buildings, but, invariably, every time someone tries to create an architecture they end up drawing models that almost never line up with reality. Let’s retire the word “architect” as it applies to software development I’m not saying that there can’t be a plan, or a loose collection of rules, but it is the really bad developers that get stuck on this idea that there is “an architect” who is telling “developers” what to do.  (That’s the worst kind of place to work.)

That’s not how it works.  You can make it work like this, but, trust me, your software will suck as a result and you won’t be able to grow a committed team.  Why? Because of the next rule…

2. Everybody Grows

Why is there no architect?  A couple reasons. First, every real software architect I’ve come into contact with is a train wreck. They used to be a developer and they were either so incompetent they had to be promoted out development or they were the kind of individual that thought so highly of themselves they fell naturally into the role of dictating design decisions. Second, having someone be “the architect” gives everyone else a good excuse – something to blame when a system falls on its face. When you have an “architect” you’ll usually see disengaged teams, teams that don’t want to stay late because they are excited about the work they are doing, and a corporate culture ruled by political dysfunction.  Think of a room full of disaffected developers saying things like “Well, it wasn’t my decision to do it that way?” and “Well, I don’t know why we did that, go ask the architect.”

The working alternative is that everyone on the team makes a contribution, and everyone on the team is responsible for at least some portion of the architecture (which, remember, is a useless abstraction). The rule is that decisions are made by the people doing the work, and you should rarely override these decisions. This is essential for long-term success. If you force a strict hierarchy on to your development team, you’ll never give anyone the opportunity to grow. What you need is a self-aligning hierarchy (or as close to self-aligning as possible… people still need bosses, don’t get me wrong.)

3. Avoid Developers who think there is an Objective “Right”, There is only “Satisfies Current Requirement”

This is not to say that I believe in a sort of software relativity in which no solution is better than another.  I do believe that there are good ideas and bad ideas, but if anyone on your team decides that they are the arbiter of Right and Wrong you need to have a long conversation about how a team works. You need to remind them that there is no “architect” and that there are really just people responsible for different components of the system. Now, a little disagree is alright, but in my travels throughout developerland I see a lot of developers running around with this idea that they’ve discovered the One True Way. They haven’t, and there isn’t one.

(Note: there’s an exception to this rule: Subject matter experts – real subject matter experts.  For example, if you are in a meeting with someone who, say, wrote a book about Tomcat, and you say something about Tomcat that causes that person to object.  Listen.)

If disputes seem unresolvable, you should absolutely intervene and nudge a decision along.  This is what management is for, but if you have people on your team constantly debating software development dogma, you should stop them.  Yes, there’s a right way to do File I/O.  Yes, there’s a right way to use a UITableViewController in iOS, and, yes, there’s a correct way to configure Tomcat. No, there isn’t one right way to implement an API, there are several hundred, and you have to give team members enough room to screw up.

4. One Metric for Success == The Number of Times you’ve screwed up

If you are writing software worth writing then you are likely doing something that’s never been done before.  (If this isn’t the case, you are likely writing software that shouldn’t be written but that’s a whole different post.)   If you are forging new ground, there’s a good chance that you are going to screw up… frequently. Good software development teams talk about failures in a way that doesn’t ostracize the people responsible. You want a team that can joke about how they made a bad decision.  You want a team that isn’t constantly trying to look invincible to management.  Everybody screws up.

I’m not telling you to reward your worst programmer a special nerf gun because he’s constantly screwing up production, you shouldn’t “celebrate” screwing up, but you should set the tone that mistakes are valuable. Read some Petroski, he’s written good books on engineering failure (if I weren’t such a lazy bastard I’d link to them).   Long story short: if you don’t want to read his books is: failure is engineering progress.

5. “Is it done?” is to be met with Laughter.  There is no “Done” there is only “Satisfies Current Requirement”

I’ve worked on software long enough to realize that there is no “done”. Stop saying that word, it’s crazy. No matter how much you may want to forget about that code you wrote seven years ago, there’s no escaping it. Software is never done, and it will always break. There will always be something left to do.  If your management team is saying things like, “I thought we were done with that?” or “We don’t have to worry about that system it was done years ago?”  It is your job to remind them that software development is an endless pit of risk – an inescapable gravity well of budget allocation.

The two acceptable responses to “Is it done yet?” are “Software is never done” or “Welcome to the Jungle, we’ve got fun and games.”

6. Don’t Believe Anything You Read Online About Development (Avoid Dogmatic Process)

This is the most important rule of all.  Don’t find some easy software development dogma or list of rules and follow them… that includes this blog post. You should be very suspicious of any online manifesto that talks about ideal software development practices or anyone who takes time out of his weekend to write a blog post entitled “The 6 Rules of Software Development”. Please think for yourself.

One of the worst development teams I ever worked on was a team that was heavily influenced by a popular agile consulting firm that will remain nameless. First of all, the software being produced was awful, but I’m not sure that had much to do with the team as much as it had to do with the company that ran the project. There were architects, there was a guy running around telling everyone he knew what was right and wrong, the place adhered to dogma so much so that people quit over it. Including me.

If someone comes to you totally energized about some “Agile manifesto” they just read, or if they start preaching TDD as the One True Way.  Take them aside, let the know that the workplace is no place for religious proselytizing. While there are many good ideas in the software process community there are also many ideas designed to sell books and consulting services.  Every project and every team needs a slightly different process.  There is no cut and dried model for software development because there is endless variation in humanity.

7. When Everything in this Post is a Lie, It’s Time to Move On

Eventually, as the project matures you’ll experience a slow shift to maintenance mode.  “Management” will start focusing on “risk” and asking questions like, “Who is the architect of this system?” or “What is the Total Cost of Ownership for ActiveMQ?”  At this point, every single rule in this list will experience a rapid inversion point.  There will be an architect, there will be strict definitions of senior and junior developers, someone will start telling everyone else they know the One True Way, programmers will be fired for making even slightly wrong design decisions, you’ll start having status meetings that talk about “finishing” the project, and someone will show up with a bunch of cultish books on process and convince management send everyone to an Agile re-education retreat.

Let’s Start Lifting Each Other Up (…no, Seriously)

My constant affliction is my Inbox, several constantly-conflicting Google Calendars,  Skype, a cell phone that’s constantly beeping with Twitter, Facebook, and LinkedIn activity… and a lot of this activity is negative.  People whining about technology, society, government, people complaining about everything.  The internet is a 2/47 snark carnival and Twitter is the main stage. (I’m not innocent of this by any means, my popular posts are me calling Ruby on Rails names.)

In this overwhelming storm of negativity I find myself asking, “Where’s my list? When do I get to keep track of the tasks I promised myself I’d complete?”  For this, there is Lift.   Go check it out: http://lift.do – if you don’t like web browsers you can search for the Lift application on your phone, it’s there.

Here’s my Lift at the moment.  It’s simple, that’s what I like about it.  There’s not a lot of bells and whistles being thrown around, the app isn’t constantly trying to give me trophies or convince me to attend a webinar.  There’s no upsell, and I’m not getting overwhelmed by automated emails telling me to pay attention to it.  It’s simple, and it’s focused on the positive.  I’m trying to keep my list manageable at the moment: take some walks, learn Spanish, talk less, listen more, meditate, and write more. Clearly, I haven’t been using it enough, I pay for a gym membership, but I average two gym visits a year (that’s $550 per 20 minute recumbent bike session).

lift-dashboard

What’s great about Lift is that it is based on this idea that the best motivators are each other.  Some people use the platform to quit smoking, others use it as a way to set some personal goals.  It sounds a bit corny to the uninitiated, but you set some positive goals, you record that you’ve met them and random strangers fall out of nowhere to give you “props”.

Right, so in the last five minutes two people just decided to like the fact that I learned a bit more Spanish today. Doesn’t that sound incredibly silly? But, it isn’t. I’m motivated to keep on learning Spanish by these random acts of “drive-by” support.  Who could dislike a site that has this Aristotle quote on it?

Happiness does not consist in pastimes and amusements but in virtuous activities.

Lift was started by Tony Stubblebine some time ago.  Tony Stubblebine was the one who helped build that ancient O’Reilly social network that never panned out, and he was also a very close witness to the founding of Twitter at Odeo (maybe even a participant, at this point he probably doesn’t want to be asked about Twitter very much – I’m just guessing.)  Anyway, he went on to start a conference-focused social networking startup, and he’s the kind of guy who, if you’ve met him (even if only for a moment), you want him to succeed.

So, do it.   Let’s all Tweet a little less; let’s all focus on pastimes and amusements a bit less and let’s focus on “virtuous activity”.  If you were about to go think of the next super snarky Tweet that would gain you another follower: stop what you are doing, and go sign-up for Lift.  I dare you, and I promise to give you mad props (I’m too old to say that in person BTW):

http://lift.do

Getting Off the Analytics Treadmill

Years ago, I had the idea that I should put Google Analytics on my own web site.  You know, why not track the readership, find out why people show up, track top refers, maybe even define a couple of conversion goals.   At the time, maybe it made more sense than it does today.  I had this open source book that I had made available, it got a lot of traffic, and I was thinking about trying to convert readers into newsletter signups. Whatever. My plans were nebulous, and, predictably, those plans were put aside for paying client work, a couple of kids, and life in general.

These days, the idea of tracking my organic vs. direct vs. refer traffic and locating the top metropolitan areas of my blog’s audience just seems silly so I got rid of it.  I turned off analytics, and now I’m realizing that there is one less dial to check.  One less meaningless number to pay attention to.  One less game to play every day when I’m looking for ways to waste time.  Here’s what freedom looks like.

freedom-from-analytics

After a week of this, I’m finding it easier to write.  I’m not tempted to go and dive into these meaningless traffic patterns.  Like some distracted data scientist, asking myself: “What is it about Parisians that attracts them to my complaints about Maven?”   No, I’ll write what I write, and if that garners an audience, great.  If it doesn’t, great. In some ways, who cares? This new idea is writing for writing’s sake – I’m not selling advertising, I’m not paying myself to write this blog – but, then it occurs to me…

…why not do the same for the businesses I help with blogging?  Why not turn off analytics for a month (or maybe two)?  Take a radical approach of just doing interesting things.  Produce content and don’t focus on bounce rate or returning vs. new browsers, just do it.  If someone asks about lead generation form conversions, laugh at them and say, “not my job.” Here’s the thing, I’ve worked for companies that have had amazing growth in traffic. (That Maven book had millions of unique viewers.)  I’ve been responsible for that growth over 24-36 months, and it didn’t correlate to us doing interesting things or even generating revenue. You can make traffic go up and you can make people like you and get high off of your Analytics graphs, but it’s really so worthless.  And, there’s so many graphs to look at you will always find one that is going up.  I’m starting to wonder if analytics is just a silly distraction.

What I’m wondering after this personal experiment of turning off statistics is if: analytics, marketing automation, conversion tracking, ad words… all of this is just detracting from what should be the Prime Directive for a technology startup (or any startup).  Connect with some customers, do what they want you to do efficiently, and iterate on what works.  The only statistic that really matters is revenue, so what I’m contemplating is just turning off (or maybe more accurately not paying attention to) analytics not only for a personal blog but for a business blog as well.

At the end of that day, if you need some silly HTML counter to tell you if your ideas are working, you are not going to succeed. If you are deciding what to do based on a focus group or a poll, you should quit now.

(Some disclaimers.  There’s still a bit of statistics gathering on WordPress.  Wordpress tells me how many readers I get, but it isn’t something I give more than a cursory glance.   (In fact, I wonder if there’s an option to turn it off.  I’m tempted.))

How Java Programmers “Feel” in 2013

My summary of general Java sentiment after attending JavaOne 2013.

“Everyone’s all excited that Java didn’t die. Yay! We made it!”

Ok, that’s not fair, how about:

“Everyone’s excited that Java has new found energy and that Twitter ultimately had to migrate everything to Java. I mean Twitter is using Java! The fact that twenty-somethings are using the platform makes us feel a lot less old. Thanks.”

Ok, let’s try again…

“Everyone is excited that the Ruby on Rails kids are playing defense, that Oracle is paying a lot of attention to Java, and Java EE officially doesn’t suck anymore. Let’s go.”

I’m going to go with that last option.  While it is true that Oracle’s taking the platform in an incrementally more commercial direction (you can’t get 1.6.0_51 without a subscription and there are some new tools only available to paying subscribers), the platform does appear to be healthier than ever.   My own theory is that Oracle is doing a much better job enabling people like Rheinhold and Gupta to innovate than Sun Microsystems ever could have.   There was a lot of pain between the Setting Sun and this New Java Renaissance, but we’re here.

After several industry luminaries predicted the death of Java, we’re still here and not only that, we’re still innovating.  We’re beyond that difficult period of ex-Sun employees griping about Oracle’s takeover and we’re now moving on to things like Java 8.  I wouldn’t have said this four years ago, but I’m generally optimistic about Java as a language and a platform.