Lift Now Has Plans

Two weeks ago I blogged about Lift as a good site to help people meet personal goals.   Now, Lift has announced a new feature “Plans”.

lift-plans

What I like about Lift is it’s simplicity.  It isn’t asking me to tweet every other second, and the mobile application hasn’t decided to ask me to write a review. (Have I mentioned I hate that.)

10 Steps to Get Your Crazy Logs Under Control

Two days ago I wrote a post about how “developers tailing the logs” is a common pattern.  A couple people responded to me directly asking me if I had some sort of telepathic ability because they were stuck in a war room tailing logs at that very moment.  It’s a common pattern.  As developers we understand that tailing log files is much like tasseomancy (reading tea leaves) – sometimes the logs roll by so quickly we have to use a sixth sense to recognize the errors.  We are “log whisperers.”

The problem here is that tailing logs is ridiculous for most of the systems we work with.  Ok, if you have 1-2 servers, go knock yourself out – tail away.  If you have 2,000 servers (or more) tailing a log to get any sort of idea about errors or behavior isn’t just inappropriate, it’s dangerously misleading.  It’s the kind of practice that just gives you and everyone around you the false reassurance that because one of your servers is working well, they are all fine.

So, how do we get a handle on this mess?

#1. Stop Tailing Logs @ Scale – If you have more than, say, 10 servers you need to get out of the business of tailing logs.   If you have a reasonable log volume up to a handful of GBs a day, throw it into some system based on Apache Solr and find a way to make the system as immediate as possible.  That’s the key, figure out a way to get logs indexed quickly (in a couple of seconds) because if you don’t?  You’ll have to go back to tailing logs.

You can also use Splunk.  Splunk works, but it’s also expensive, and they charge by daily bandwidth. If you don’t have the patience to figure out Solr, use Splunk, but you’re going to end up paying crazy money for something that you could get for free.

If you have more than a few GBs a day on the order of tens of GBs, hundreds of GBs, or even a few TBs of data a day.  You are in another league, and your definition of “logging” likely encompasses system activity.  There are companies that do this and they have campuses, and this isn’t the kind of “logging” I’m talking about here.

#2. If possible, keep logs “readable” – If you operate a very large web site this may not be possible (see the first recommendations), but you should be aiming for production log messages that are reasonable.   If you are running something on a smaller scale, or something that needs to be interactive don’t write out a MB every 10 seconds.  Don’t print out meaningless gibberish.  When you are trying to come up with a log message, think about the audience which is partially yourself, but mostly the sysadmin who will need to maintain your software far into the future.

#3. Control log volume – This is related to the previous idea. Especially in production, keep volume under control.  Remember that log activity is I/O activity, and if you don’t log correctly you are making your system wait around for disk (disk is crazy slow).   Also, if you are operating @ scale, all that extra logging is just going to slow down whatever consolidated indexing that is going on making it more difficult to get immediate answers from something like Solr.

#4. Log messages should be terse – Log message should be terse. Aim for a single line when possible and try to stay away from messages that need to wrap.  You shouldn’t print a paragraph of text to convey an idea that can be validated with a single integer identifier.  It should fit neatly on a single line if at all possible.   For example, your log messages don’t need to say:

"In an attempt to validate the presence or absence of a record for BLAH BLAH BLAH INC, it was noted that there was an empty value for corporation type.  Throwing an exception now."

Instead:

"Empty corp type: BLAH BLAH... (id: 5). Fix database."

#5. Don’t Log, Silence is Golden – I can’t remember who it was, but someone once commented on the difference between logging in Java and logging in Ruby (I think it was Charles Nutter talking about the difference between Rake and Maven).  When you run a command-line tool in Ruby it often doesn’t print out anything unless something goes horribly wrong.  When you run a tool like Rake it doesn’t print much if things go as planned.   When you run Maven?  It prints a lot of output, and this is output that no one ever reads. This is a key point.  Normal operation of a system shouldn’t really warrant that much logging at all.  If it “just works”, then don’t bother me with all that logging.

If you are operating a web site @ scale, this is an important concept to think about.  Your Apaches (or nginx) are already going to be logging something in an access log so do you really need to have a log that looks like this?

INFO: Customer 13 Logged In
INFO: Attempting to Access Resource 34
INFO: Resource Access for resource 34 from customer 13 approved
INFO: Sending email confirming purchase to customer 13

I don’t think you need this. First, you should have some record of these interactions elsewhere (in the httpd logs), and second, it’s just unnecessary.  In fact, I think those are all DEBUG messages.  Unless something fails – Unless something needs attention, you should strive for having zero effect on your log files. If you depend on your log files to convey activity, you should look elsewhere for a few reasons: 1. It doesn’t scale, and 2. It is inefficient.  Instead of relying on a log file to convey a sense of activity, tell operations to look at database activity over time.

#6. Give Someone (Else) a Command – This is something no one does, but everyone should.  Your logs should tell an administrator what to do next (and it rarely involves you.) The new criteria for printing a log file in production is either something goes wrong or something needs serious attention. If you are printing a message about something that has gone wrong don’t assume that the person reading this message has any understanding about the internals of the system. Give them a direct command.

Instead of this message:

ERROR: Failed to retrieve bean property for the customer object null.

Write this:

ERROR: Customer object was null after login. Call DBA, ask about customer #342R.

You see the difference? The second log gives the admin a fighting chance (it also shifts blame to the database).  In this case, someone sent you a corrupt customer record, so point someone at the DBA.  You’d likely redirect them there anyway.  This way the sysadmin can skip the call to engineering and go directly to the source of the problem.

If you do this right, you’ll minimize the production support burden. Trust me, you want to minimize your production support burden – if you don’t minimize this you won’t have much time for development because you will be fielding calls from production all the time.

#7. Provide Context – Unless you are logging a truly global condition like “Starting system…” or “Shutting down system…”, every log message should have some contextual data.  This is very often an identifier of a record, but what you should try to avoid is the log message that provides zero data or context.   The worst kind of message is something like this:

ERROR: End of File encountered in a stream.  Here's a stack trace and a line number...

This begs two questions: what is that stream from? What exactly were you trying to do? A better log message might be:

ERROR: (API: Customer, Op: Read, Server: api-www23, ID: 32) EOF Problem. Call API team.

In this second example, we’re using something like Log4J Nested Diagnostic Context to put some details into the log that will help diagnose the problem in production.

#8. Don’t Write Multiple Log Statements at a Time – Some developers see logs as an opportunity to have a running commentary on the system and they log everything that happens in a class. I dislike seeing this in both code and in a log.  Here’s the example, you have a single class somewhere and you see code like this:

log.info( "Retrieving customer record" + id );
Customer cust = retrieveCustomer( id );
if( cust != null ) {
     log.info( "Customer isn't null. Yay!" );
     log.info( "Doing fancy thin with customer. );
     doFancyThing( cust );
     log.info( "Fancy thing has been done." );
} else {
     log.error( "Customer " + id + " is null, what am I going to do?" );
}

And, in the log you have:

INFO: Retrieving customer record 3
INFO: Customer isn't null. Yay!
INFO: Doing fancy thing with customer.
INFO: Fancy thing has been done.

Consolidate all of these log messages into a single message and log the result (or don’t log at all unless something goes wrong).  Remember, logging is often a write to disk, and disks are insanely slow compared to everything else in production.   Every time you write a log to disk think of the millions of CPU cycles you are throwing into the wind.

If a developer is writing a class that prints 10 log statements one after another,  these log statements should be combined into a single statement.  Admins don’t really care to see every step of your algorithm described, that’s not why you pay them to maintain the system.

#9. Don’t Use Logs for Range Checking – There’s a certain kind of logging that creeps into a system that has more to do with bad input than anything else.  If you find yourself constantly hitting NullPointerExceptions in something like Java you may end up trying to print out variables to help you evaluate how things failed in production.  After a few years of this, you’ll end up with a production system that logs the value of every variable in the system on every request.

You’ll end up with this:

Customer logged in value: { customer: { id: 3, name: "Charles", ......}
Purchasing products: { product: { id: 32, manufacturer: { id: 53, name: "Toyota"....}
Running through a list of orders: [ { order: { id: 325... }, { order: { id:2003...} ]

…and so on.  In fact, you may end up serializing the entire contents of your database to your log files using this method.

Programmers are usually doing this because they are trying to diagnose problems caused by bad input.  For example, if you read a customer record from the database, maybe you’ll just log the value of the customer record somewhere in the log so you can have it available when you are debugging some production failure. Have a process that takes a customer and a product, well why not print out both in the log just in case we need them.   There are issues with customer records have null values, so… don’t do this, just create better database constraints.

This is the fastest road to unintelligible log files, and it also hints at another problem.  You have awful data.  If you are dealing with user data, check it on the way in.   If you are dealing with a database, take some time to add constraints to the table so that you don’t have to test to see if everything is null.  It’s an unachievable ideal, I know, but you should strive for it.

#10. Evaluate Logs Often – The things described in this post are really logging policy, and no one has it.  This is why we have these production logging disasters, and this is why we create systems that are tough to understand in production. To prevent this, you should put the evaluation of logging on some sort of periodic schedule.  Once every month, or once every release you should have some metric that tells you if log volume has outpaced customer growth.

You should conduct some investigations into how useful or how wasteful your current approach to logging is.   You should have some policy document that defines what each level means.  What is a DEBUG message in your system?  What should be an ERROR?  What does it mean to throw a FATAL? Prior to every release you should do a quick “sanity check” to make sure that you haven’t added some ridiculous log statement that is going to make maintaining the system awful.

But… most people don’t do these things which is why production logs end up being a disaster.

Break Through Server-side Bias and Surrender to Javascript

You have a server-side bias and you don’t even realize it.  I know this, and you need to know this.  It’s keeping you back a bit. Step one is to admit that you have a problem and that your addiction to easy server-side frameworks is ruining your performance.  You’ve used frameworks like Rails, Django, WordPress, or one of several hundred Java web application frameworks for years and you are resisting this move toward Javascript.  Yes, you’ve “embraced” Javascript throughout your applications, but you might be missing the larger point – Javascript isn’t here to make your server-side applications more “reactive”, it isn’t just a nice feature to add to a larger application.

That Javascript you keep on insisting to serve from your server-side framework…that is the application.  It’s taking over.    Your server-side framework won’t be doing anything resembling templating in a few years because that’s the job of the browser.  Yes, yes, you might have a few “high value” pages or pages that have to be hosted on a server because of security constraints, but if your web site requires a round-trip to a server to render a web page… well that’s old people thinking.

The “geographical center” of your application is no longer on the server-side.  You shouldn’t start your application thinking about what server-side framework it is going to be based on, because it won’t be based on a server-side framework.   In fact, you may use many, but they will only serve to support what will essentially be a client-side Javascript application.

I understand how you feel, you might read this and think – “No, we’re not moving everything to client-side Javascript frameworks, no.”  The willful ignorance you are embracing here is a defense mechanism.  As a Java developer in the middle of the last decade.  I saw the next generation using PHP and Ruby and I tried to explain it away for a few years as just a passing fad.  It wasn’t, and I still see a lot of developers in my age group reflexively resisting dynamic languages.  Resisting change is a mistake – you’ll be obsolete before you can say, “Wait I didn’t realize that the browser could do…”

With the rise of ReactJS, AngularJS, Backbone, and a number of other good client-side Javascript frameworks I’m seeing a new kind of bias – server-side bias.  There are people out there who, for whatever reason, be it ignorance or otherwise – these people fail to realize that the days of having some dynamic templating engine on the server-side merge your data with some HTML… those days are coming to a swift conclusion.   It was fun while it lasted, but this “mail-merge” approach to retrieving a row from a database, packaging it up in some object, and then “merging” it into a template.  The view layer of this is moving client-side, quickly.

And, it’s moving to the client because it’s an order of magnitude faster to serve what is essentially a static AngularJS application from a CDN than it is to muck around with serving the same from some server-farm (even with the help of memcache).  That’s the thing, once you start doing this you realize that you only really need a server-side framework for API calls.  That’s really it.  All your server-side frameworks are doing in five years time? JSON, and maybe futzing around with a few databases.

Don’t get me wrong, there will be people writing Rails RHTML and Java JSPs for many years to come much in the same way that COBOL developers still run systems packed away in government data centers.    But, Ruby and Java developers that fail to embrace this client-side Javascript trend will find themselves confined to internal applications – the Oracle Forms of 2020 is Ruby on Rails.

Getting Off the Analytics Treadmill

Years ago, I had the idea that I should put Google Analytics on my own web site.  You know, why not track the readership, find out why people show up, track top refers, maybe even define a couple of conversion goals.   At the time, maybe it made more sense than it does today.  I had this open source book that I had made available, it got a lot of traffic, and I was thinking about trying to convert readers into newsletter signups. Whatever. My plans were nebulous, and, predictably, those plans were put aside for paying client work, a couple of kids, and life in general.

These days, the idea of tracking my organic vs. direct vs. refer traffic and locating the top metropolitan areas of my blog’s audience just seems silly so I got rid of it.  I turned off analytics, and now I’m realizing that there is one less dial to check.  One less meaningless number to pay attention to.  One less game to play every day when I’m looking for ways to waste time.  Here’s what freedom looks like.

freedom-from-analytics

After a week of this, I’m finding it easier to write.  I’m not tempted to go and dive into these meaningless traffic patterns.  Like some distracted data scientist, asking myself: “What is it about Parisians that attracts them to my complaints about Maven?”   No, I’ll write what I write, and if that garners an audience, great.  If it doesn’t, great. In some ways, who cares? This new idea is writing for writing’s sake – I’m not selling advertising, I’m not paying myself to write this blog – but, then it occurs to me…

…why not do the same for the businesses I help with blogging?  Why not turn off analytics for a month (or maybe two)?  Take a radical approach of just doing interesting things.  Produce content and don’t focus on bounce rate or returning vs. new browsers, just do it.  If someone asks about lead generation form conversions, laugh at them and say, “not my job.” Here’s the thing, I’ve worked for companies that have had amazing growth in traffic. (That Maven book had millions of unique viewers.)  I’ve been responsible for that growth over 24-36 months, and it didn’t correlate to us doing interesting things or even generating revenue. You can make traffic go up and you can make people like you and get high off of your Analytics graphs, but it’s really so worthless.  And, there’s so many graphs to look at you will always find one that is going up.  I’m starting to wonder if analytics is just a silly distraction.

What I’m wondering after this personal experiment of turning off statistics is if: analytics, marketing automation, conversion tracking, ad words… all of this is just detracting from what should be the Prime Directive for a technology startup (or any startup).  Connect with some customers, do what they want you to do efficiently, and iterate on what works.  The only statistic that really matters is revenue, so what I’m contemplating is just turning off (or maybe more accurately not paying attention to) analytics not only for a personal blog but for a business blog as well.

At the end of that day, if you need some silly HTML counter to tell you if your ideas are working, you are not going to succeed. If you are deciding what to do based on a focus group or a poll, you should quit now.

(Some disclaimers.  There’s still a bit of statistics gathering on WordPress.  Wordpress tells me how many readers I get, but it isn’t something I give more than a cursory glance.   (In fact, I wonder if there’s an option to turn it off.  I’m tempted.))