Writing Books in XML


Andrew Savikas wrote about an internal discussion of XML in the book production pipeline at O’Reilly. I think he and many of the participants in that discussion miss a critical part of the discussion. Free books have a life beyond the print pipeline. Books like the Subversion Book or Maven: The Definitive Guide are developed outside of the publishers systems in DocBook format.

XML as “extra” work? Some Authors prefer it…

He writes about the “extra” work required of authors choosing to write in DocBook, in this post:

I would never argue that authors and editors should or will become fluent in XML or be expected to manually mark-up their content. I naively tried fighting that battle before, and was consistently defeated soundly. It is simply too much “extra” work that gets in the way of the writing process.

Many of the authors I’ve worked with prefer DocBook precisely because of the hassle they went through to get the Word or OpenOffice templates to work properly. Writing in DocBook is straightforward once you get your mind around XMLMind, and retreating to a “wiki-like” markup ignores the fact that much of what an author is doing has little to do with presentation and much more to do with semantics. “Manually markup content”? That sounds insane, it also ignores the fact that there is a capable tool on the market.

I’m am likely one of the only authors who has had tried to write a book in XML, Word, and a Wiki-ish markup. Even though it seems counter to reason, I always had the easiest time with DocBook. The Word templates and macros consistently blew up on me once my chapters became numerous, weighty, and highly-cross referenced. The wiki syntax was inscrutable… tell me, when you have to differentiate a code block from a screenblock from a classname… what do you do? You have to invent semantic decorations into your “simplified” wiki markup. With DocBook the process of styling and producing PDF is always very straightforward. Especially for a multi-author book, DocBook is always less work to manage once the author is comfortable with the tools.

Going back to the larger issue…

Most print books do not have online, free analogs hosted outside of the publisher’s infrastructure. A few do, let’s take Subversion and Maven: The Definitive Guide. These books have large daily traffic numbers in addition to print sales (that often rival the print sales of other non-free books). Savikas writes about what O’Reilly does “after they receive a manuscript”… To me, that’s the problem. Old-school printers could assume that, at a certain point, the book was finished and ready to be printed on a stack of dead trees. For a few books today (and many more in the future), there will be a established “pipeline” which exists outside of the print publishing pipeline.

In the short-term future, more technical content will be free by default, publishers that can adapt to external, more “continuous” publishing pipelines will succeed and publishers that hold on to the idea of a lengthy “print” production phase will fail.