Building Book Production Platforms p2.

Amongst the core requirements for a book production platform are the source file format and the editor, and of course, these are intimately linked. The development team is usually faced with choosing the format first, then the editor.

Choosing a format

The choice is pretty much HTML? or not HTML?

Currently, HTML is the ruling choice of format for a web-based book production platform. HTML is native to the browser and has associated standards-compliant support, such as CSS and javascript. Inversely, not choosing HTML puts you in a bit of a hole and can create a lot of overhead.

It might be interesting to look back a little and learn from some others since there have already been projects in this space that started down non-HTML roads and then gave it up for HTML. Kathi Fletcher, originally the project manager and technical director for Connexions (now OpenStax) which built a custom XML editing environment for academic materials, later researched in-browser XML vs HTML editing environments for her Shuttleworth Foundation-funded OERPUB project. Kathi became convinced HTML was the way to go and did some great work on HTML editor usability with the Aloha HTML editor.

We have chosen to use HTML5 as the canonical format for open textbooks, because developers and tools are more plentiful for web technologies than XML technologies.

http://www.w3.org/2012/12/global-publisher/statements-of-interest/29-oerpub.html

The (closed source) O’Reilly Atlas platform also started with the complex AsciiDoc format (a form of markdown) and eventually awoke to the power of HTML in 2012.

HTML5-based authoring offers a streamlined production workflow for producing both print and digital outputs, facilitates “digital first” content development, and is a perfect fit for creating a WYSIWYG, web-based writing experience.

http://radar.oreilly.com/2013/09/html5-is-the-future-of-book-authorship.html

They then got an extra dose of religion and started a project called HTML Book which is a suggested ‘spec’ for a subset of HTML elements to be used in books.

So far I have not seen a book production platform travel the reverse direction, from HTML to something else. Instead, we are seeing more and more platforms start with, or change to, HTML as a source file format.

Markdown

Markdown is sometimes put forward as the way to go but I’m not going to go into that in too much detail here. I have talked about this elsewhere. The only additional thing I will say is that markdown causes even more issues for book production platforms than those included in that article. Namely, in an in-browser markdown environment, the markdown will most likely be displayed as rendered HTML next to the authoring pane. That is a huge amount of lost screen space and extra UI junk for no apparent gain. Think of the UX cost. If you don’t have that rendered display then you will most likely only see pure markdown in a text field with no rendered display. The user won’t really know if their document looks right until it is rendered somewhere down the line, which is also a tremendous cost to the user for no apparent gain. Markdown: all pain, no gain.

NB: There is a possible good use case for markdown as a helpful add-on for HTML WYSI editors but I will cover that later.

LaTeX

There is a more valid use case for LaTeX in the browser since some scientists and academics will never use anything else, and you’ll never convince them to adopt HTML regardless of the benefits. You are up against the great Church of Knuth and I don’t fancy your chances. If your audience is comprised of LaTeX addicts, then I think you have no choice other than to support that.

Many times I have talked about remedies for unstructured MS Word documents (for scientific manuscripts) only to have someone earnestly comment that if everyone just learned LaTeX we would be in a much better position… They might be right, but I’m pretty sure it’s never going to happen.

The preference for LaTeX is a legacy issue, and problematic, but needs to be dealt with. (Unfortunately, today’s Markdown heroes are growing legacy issues like this with each passing day, and that is going to cost us down the road).

Recently there has been some interesting work on in-browser LaTeX editing including the (closed source) Authorea platform and, most notably the (open source) ShareLatex platform. ShareLatex round trips the LaTeX syntax displayed and edited in a text area (in the browser), renders that to a bitmap on the server, and returns it to the browser for a side-by-side ‘WYSIWYG’. The effect is that you can see a just-in-time rendered view of the LaTeX as you type. It’s a neat trick and effective if you insist on LaTeX in a web-based platform. Then you just have to live with the UI costs. However ,you only need this approach if you wish to support the full LaTeX syntax. If you wish to just support LaTeX equations, you can use an HTML editor with a LaTeX plugin based on MathJax or the Khan Academies KaTeX(and there are some other solutions such as Mathoid).

Incidentally, if you need to support full LaTeX I highly recommend checking out ShareLaTeX over WriteLaTeX. They both have the same approach but WriteLaTeX is proprietary whereas you can pick up the ShareLaTeX code and integrate it straight away. You could even build your own ShareLaTeX-like interface, it’s not too tricky – together with a colleague – Rizwan Reza – and I (Riz did all the hard work) we managed to develop a workable prototype in about 2 days, but there are many gotchas setting up the LaTeX compiler correctly.

Not many book projects need LaTeX, so I will leave this as an interesting edge case. There are solutions if you need it, but not many people need it.

XML

I think I will just leave it to the words of the brilliant Dave Cramer (Hachette Book Group):

So we’ve chosen to describe our content with HTML, and build our production system around HTML.

When I tell people that, they smile condescendingly, and chuckle a bit. “That’s cute. Why don’t you use real XML?”

I then ask them what you can do in Docbook (or TEI, or NLM) that you can’t do in XHTML? I haven’t heard a good answer to that question yet. XHTML is XML, by definition. Calling something “para” rather than “p” doesn’t get you anything, except carpal tunnel syndrome and invoices from consultants

The problem with non-HTML XML is that it is essentially just XML the browser can’t use. Hence you lose all that other good stuff like WYSI editors, CSS design tools, cool tricks with JavaScript, and all the cool tools that are being developed for HTML. XML just can’t compete, plus you are going to need to convert the XML into HTML anyway. So don’t make life more complicated than it already is – continue your love affair with XML as long as it’s XHTML!

HTML

HTML is king in the browser and it gives you all you need to make books. I don’t want to spend a lot of time arguing the merits of HTML in this post as there is a lot to say and I want to bring that in at other points of the conversation. But in brief:

  • HTML is supported by JS and CSS.
  • The DOM is known natively by the browser.
  • HTML is standards-based.
  • It is straightforward.
  • HTML is easy to read and easy to clean.
  • HTML is the most popular file format on the planet.
  • You can use HTML to build structure in documents with assigned class and id values, or microdata formats.
  • HTML is the native file format for EPUB.
  • PDF can be rendered directly from HTML in the browser (more on this later).
  • HTML can be paginated in the browser.
  • CSS is moving towards supporting more and more page based elements.
  • The browser can act as a design environment.
  • You can create real what-you-see-is (WYSI) production environments.
  • Basic editing is built into the format itself.
  • HTML is supported by an enormous number of tools for conversion (in and out).
  • HTML is supported by an enormous repository of examples (the web).
  • HTML is cheap to develop with.
  • Even book designers are getting used to it.
  • Some schools teach it.
  • It has a million free tutorials online to help you use it.
  • A lot of people know HTML.
  • HTML is supported by a rapidly proliferating body of JavaScripts for typography, graph production, animation, interactions, dynamic rendering etc etc etc etc

The basic idea really comes down to this.

  • HTML is the cheapest format of our time.
  • HTML is the most popular format of our time.
  • HTML is the networked document format of our time.

Increasingly HTML is the way stories are told, whether that is in books or on the web. It’s a trite analogy perhaps, but HTML is the paper of our time. As Dave Cramer says:

why start with something other than HTML, when you have to turn it into HTML anyway?

It should be noted that Cramer also turns HTML into paper, and the Hachette Book Group have produced many beautiful paper books using HTML as the source format. Many of these books you will now find in the best-selling sections of your local brick and mortar bookstore.

Other print producers are also using HTML as the source. Print-on-demand services, used to producing very ugly books by ingesting MS Word and dealing with all that ugly conversion, are also adopting HTML production environments. Books on Demand, Germany’s largest Print on Demand service, adopted Booktype so their customers could have an easy in-browser book production environment. The source format is HTML but the users don’t know that, and the books look better. That’s the beauty of HTML.

Finally, helped a lot by the efforts of Dave Cramer and the Hachette Book Group, Sourcefabric, the people at O’Reilly, and others adopting HTML, we might be starting to see the very beginning of the changing of the guard.

HTML is the way to go for Book Production Platforms. If you choose another format you will find you inherit a lot of costs and additional overheads and, sadly, you will soon be left behind. There is just no format going forward at the same speed as HTML. Not even close. So, my advice is to first ask the question – can HTML do what you need? Push your team to answer that question. Will format X give you anything HTML can’t? As an exercise ask your team to prove HTML is a bad choice, and if the answer is not-HTML, then contact me and let me try and talk you into it!

Print on Demand vs Demanding Printers

I have been experiencing quite a strange phenomenon recently. On several occasions, I have found myself looking for printers that can print perfect bound books quickly. A ‘perfect bound’ book is a book that is normally called a ‘paperback’  – black and white interior colour cover, and a nice thick one piece cover that tightly hugs the outside of the book and is creased and folded along the spine.

print_on_demand_booksPerfect Bound books printed in less than 20 hours

I have needed these services after a Book Sprint – typically I have spent 5 days in a room with half a dozen others and we have written a book of 300 pages or so. We output the content to book-formatted PDF with Objavi, and next, to make it a real party, we want to see the book the same day we finished it, or the next morning. It is entirely possible to do this, and I have done it many times. However, the one thing that might catch you out is actually finding the right type of printer that can make perfect bound books fast. This is not easy, and sometimes is made harder if you are in a non-English speaking country as the English term ‘perfect bound’ does not easily translate.

What I have found, is that most large cities have these services. In Berlin, for example, there is a service about 5 blocks away from my house. In Paris, you need to travel out to the suburbs to find a service but there is one. In Palo Alto, Kinkos does it (but doesn’t do it well)…etc….

While these services are relatively common, what I have found, time and time again, is that these services are very hard to find. The first issue is that they have no standard way of marketing their services. It is sometimes advertised as ‘print on demand’, sometimes ‘books on demand’ and sometimes they just don’t let people know they have these services until you ask. Hence trying to find a business that does this via a search engine, a phone book, or asking a local, just gets you nowhere. You have to call every printer one by one, carefully explaining exactly what you want. Sometimes this is also difficult since the operators might not be printers and so they don’t actually know the terminology, and I have found myself trying to explain what ‘perfect binding’ is to a ‘printer’.

The other issue, and this is the one that I find strange and has tripped me up so many times, is that often the locals – printers and non-printers alike – do not think this kind of service exists at all. That is, they think its impossible. This frustrates me the most.

Essentially there are two typical responses from printers that do not provide this kind of service. The first is from your typical ‘copy shop’ – they will tell you they provide these services and then, when you turn up to look at the samples, you find they are talking about spiral or tape binding. Ugh. After explaining this is not ‘perfect binding’ the normal response is a blank stare and a comment that ‘it is not possible’ and furthermore, if they acknowledge that maybe it is possible, the copy shop assistants, not usually knowing the printing industry very well, will have no idea who might be able to do this.

The next kind of response comes from your traditional offset printer. They will tell you they can make a book but you have to get 200 done, it will take a week, and it will cost you a lot per book and expensive set-up costs. When explained that this is not what you want, they will understand what perfect binding is, and they do know the local print industry, but they will not think doing this is possible or have any idea who might be able to give more information about where to find such a service.

I have been through this process many times. My advice is – it can be done. You can find, in most large cities, printers that will print a book in hours and print it cheaply. Recently in Paris, we had 50 books (300 pages) printed for 6 Euros each, no setup costs, and delivered in less than 20 hours. It could have been faster if we had less printed. Often 1 book can be done ‘on the spot’. So don’t give up. It’s perfectly possible to get the job done: the hardest part is finding the people who can do it…

 

FLOSS Manuals and the Pursuit of Funky Docs

It is easy enough to point out what is wrong with something and harp on about how it should be. It’s another issue to actually do something about it.

To resolve this, I am involved in a not-for-profit foundation called FLOSS Manuals. We are a community of free documentation writers committed to writing excellent documentation about free software. Anyone can join FLOSS Manuals and anyone can edit the material we publish. All content is licensed under a free license (the GPL).

When we started (the actual point of genesis is hard to determine but we officially launched in October 2007), there was, and still is, no good publication platform for collaborative authoring. Some may say that there are too many Content Management Systems already and surely, SURELY, there must be a CMS to meet our needs?

Well, no. The closer you get to identifying the needs of collaborative publishing systems, the further you stray from the functionality of most Content Management Systems. So we have hacked our way into the wonderful TWiki and developed our own set of plugins. TWiki has proven to be a very good platform for online publication. It has all the structured content features and user administration that make it a good shell for authoring collaborative content. What was missing, and what is missing from other CMSes is good copyright and credit tracking, easy ways to build indexes, and a nifty way to remix content.

However, we have remedied that now with our own custom plugins (which are available through the TWiki repository). There are still some things we need, in fact it’s quite a long list, but piece by piece we are turning TWiki into a publication engine. Currently, we are working on translation workflow features (also in plugin form).

Remixing
So, the word ‘remix’ may have caught your eye and you may have fleetingly thought ‘remixing manuals?!’. It might not seem intuitive at first glance but there are a lot of very good reasons why manuals are excellent material for remixing. I don’t mean remix in the William S Burroughs sense of cut-up… we do cherish linearity in the world of free documentation. I mean remix as in “re-combining multiple chapters from multiple disparate manuals to form one document.” Doing this enables you to create manuals specific to your needs whether they be for self-learning, teaching, in-house training or whatever purpose.

The FLOSS manuals remix feature (http://www.flossmanuals.net/remix) enables the remixing of content into indexed-PDF and downloadable-HTML (in zip or tar compressed form) with your own look and feel (CSS). Now we have also added a Remix API. This means that you can remix manuals and include them in your website by cutting and pasting a few lines of HTML – no messy ftp necessary…

This part of FLOSS Manuals is new and in test form, but it works very well and the possibility for combining remix with print-on-demand is an obvious next step. It can be done now as print-on-demand services use PDF as their source material, but the trick is in getting it to look nice in print form…

Print on Demand
In addition to the free online manuals FLOSS Manuals material is also turned into books via a print-on-demand service. The books look very nice, having been tweaked to look good in print, and they are available at cost price (we don’t put any mark-up on the books so they cost what the print-on-demand company charge to produce and send to the buyer). This is pretty exciting and I hope that we will soon see FLOSS Manuals on the bookshelves of retailers: bookshops after-all are a very important promotional venue for free software.

I find that the books themselves actually get the idea of what FLOSS Manuals is doing very effectively to most people I talk to. Imagining a website is one thing, but handing over a book sparks the understanding and gets people excited. So books are an excellent promotional medium for FLOSS Manuals as much as for the software (it’s a symbiotic relationship after-all).

I imagine print-on-demand will play a bigger role in the future of FLOSS Manuals. There are many possible paths, but, in the end, it comes down to capacity and we are this stage a very small organisation. If you wish to get involved with this (exciting) part of our evolution then let me know…

Quality Control
Lastly, a word on quality. The manuals aim to be better than any available documentation (sometimes this is not hard as there is often no other available documentation!) Keeping this level of quality has some interesting issues when working with an open system. Anyone can contribute to FLOSS Manuals – it is completely open. You need to register but this is not a method for gating contributions, it is there so we can abide by the license requirements of the GPL to credit authorship. Additionally, credit should be given where contributions have been made so we also credit modifications in the manuals.

SPAM is an obvious issue with an open system, as is the possibility of malicious content. Incorrect or malicious information in Wikipedia might lead you to quote the wrong King of Scotland or may misinform yo7u about the origins of potatoes, but incorrect information in documentation might lead you to wipe out your operating system. So we separate the ‘back end’ – where you can write manuals – from the ‘front end’ – where you can read manuals.

Manuals in the ‘WRITE’ section (http://www.flossmanuals.net/write) are in constant development. However, the same manual linked from the front page will be in the ‘stable’ form. This is managed by some existing TWiki tools that we twisted together to form a simple one-step publishing system. It works like this – every manual has a Maintainer. A Maintainer is a person – a volunteer – that keeps an eye on that particular manual. Edits and updates carry on through the WRITE section by anyone that wishes to contribute. When the Maintainer thinks the manual is in good form and an update is appropriate, they push the ‘publish’ button and all the material is copied to the ‘front  end’ version of the manual.

This way, the reader gets stable reliable documentation, and the writers can continue working on those docs without the reader being confronted by half-finished content etc. It’s a simple and effective system.

How you can help
Good free documentation is a necessary component of all good free software. If you can’t program or don’t want to, but you love free software and want to help, then help make free documentation!

Knowing where to contribute is now easy! You can :
read manuals – http://www.flossmanuals.net
write manuals – http://www.flossmanuals.net/write
or remix manuals – http://www.flossmanuals.net/remix

We have a growing number of very talented contributors and Maintainers and good manuals available online, but we need more manuals and more contributors. Contributing is pretty easy, and if you would like to be a part of helping create good manuals, then register with the project (http://www.flossmanuals.net/register) and read our manual on FLOSS Manuals (http://www.flossmanual.net/flossmanuals).

Anyone can contribute. You can spell-check documents, tidy up the layout, suggest ways of improving docs, test/review material, design icons, write or improve any material. Contribute in any way that you can and you will be helping not only to make the documentation better, but you will be assisting in the development and spread of free culture and free software.