October 2014 – Adam Hyde

PLOS

I’m working on a platform with the Public Library of Science (PLOS) in San Francisco. I’m Designer and Product Owner, and working with a talented team of approximately 15 full-time people. We are creating a platform for the production, processing, and publishing of science. It is a very versatile platform and could easily be utilised for many other purposes. Over the next months I’ll be blogging a little about some of the approaches we have adopted and highlighting some interesting technical solutions. The platform will be Open Source.

The platform is an HTML-first environment and includes ingestion of MS Word (and other formats) and conversion to HTML. I first presented some information about these strategies at Books in Browsers V last week in San Francisco. The video of my presentation can be found around the 26th minute here:
http://www.ustream.tv/recorded/54426830

Staticness as a Symptom of an Unwell Book

In the past few years, I’ve been constructing a set of practices around knowledge production. Its been a Lego-like process. I add one brick, move it a bit, choose one of another colour and try and work out where it fits… It’s not so much a process of deconstruction of publishing as the construction of something else. Mainly because I don’t know enough about publishing to deconstruct it, so I have to start with what I know.
Sometimes, however, I realise just how odd that construction is. Usually, this occurs when I see an articulation of ‘how things are’ in ‘the real world’ and I realise… oops! I don’t at all relate to that or see the sense in it. That occurred recently with a discussion on the Read 2.0 list. Someone made a throwaway comment about how books might be changing and one day we might not think of them as static objects. A few comments followed about what the future of the book might be. I was left feeling very much on the outside in my Lego- constructed world. The only thing I could add to this conversation would have to pull apart the founding assumptions of the future ponderings – and I just didn’t know where to begin.

Books are mostly static objects in this world. You make them, ship them, consume them. Next. However, my experience with FLOSS Manuals is that this is exactly what we are trying to avoid. Since 2006, we have been avoiding staticness – rather the aim was, and is, to keep books alive. To a certain extent manuals about software present an obvious case where the value of ‘live’ books is evident. However, I don’t think that advantage is restricted to books about software. Books should be living entities and grow with time, expanding or contracting with input from many people.

So, staticness, through the lens of FLOSS Manuals and a ‘living book’ practice is actually a symptom of an ‘unwell’ book. A book that is not growing is a neglected work. It is left alone on the shelf to gather dust and die, where, by comparison, healthy books are attended to. They have growth spurts, or sometimes slower, prolonged periods of affection. They may fork, or become a central discussion, they might transit into other contexts entirely, or traverse languages. They are alive and more useful to us, vibrant and engaging. They also reveal the fundamental humanity behind the text… the living book as a conversation between living beings. A book, at its best, is a thriving community.

So, I have learned to look for staticness, and when I find it I literally get sad. I see this as a failed work, something that we were not able to diagnose, or failed to get to in time. At the same time, each failed work is a study and we have much to learn about how and why books die.

I think it’s important to learn to look for staticness as an early symptom of a failed book.

Building Book Production Platforms p2.

Amongst the core requirements for a book production platform are the source file format and the editor, and of course, these are intimately linked. The development team is usually faced with choosing the format first, then the editor.

Choosing a format

The choice is pretty much HTML? or not HTML?

Currently, HTML is the ruling choice of format for a web-based book production platform. HTML is native to the browser and has associated standards-compliant support, such as CSS and javascript. Inversely, not choosing HTML puts you in a bit of a hole and can create a lot of overhead.

It might be interesting to look back a little and learn from some others since there have already been projects in this space that started down non-HTML roads and then gave it up for HTML. Kathi Fletcher, originally the project manager and technical director for Connexions (now OpenStax) which built a custom XML editing environment for academic materials, later researched in-browser XML vs HTML editing environments for her Shuttleworth Foundation-funded OERPUB project. Kathi became convinced HTML was the way to go and did some great work on HTML editor usability with the Aloha HTML editor.

We have chosen to use HTML5 as the canonical format for open textbooks, because developers and tools are more plentiful for web technologies than XML technologies.

http://www.w3.org/2012/12/global-publisher/statements-of-interest/29-oerpub.html

The (closed source) O’Reilly Atlas platform also started with the complex AsciiDoc format (a form of markdown) and eventually awoke to the power of HTML in 2012.

HTML5-based authoring offers a streamlined production workflow for producing both print and digital outputs, facilitates “digital first” content development, and is a perfect fit for creating a WYSIWYG, web-based writing experience.

http://radar.oreilly.com/2013/09/html5-is-the-future-of-book-authorship.html

They then got an extra dose of religion and started a project called HTML Book which is a suggested ‘spec’ for a subset of HTML elements to be used in books.

So far I have not seen a book production platform travel the reverse direction, from HTML to something else. Instead, we are seeing more and more platforms start with, or change to, HTML as a source file format.

Markdown

Markdown is sometimes put forward as the way to go but I’m not going to go into that in too much detail here. I have talked about this elsewhere. The only additional thing I will say is that markdown causes even more issues for book production platforms than those included in that article. Namely, in an in-browser markdown environment, the markdown will most likely be displayed as rendered HTML next to the authoring pane. That is a huge amount of lost screen space and extra UI junk for no apparent gain. Think of the UX cost. If you don’t have that rendered display then you will most likely only see pure markdown in a text field with no rendered display. The user won’t really know if their document looks right until it is rendered somewhere down the line, which is also a tremendous cost to the user for no apparent gain. Markdown: all pain, no gain.

NB: There is a possible good use case for markdown as a helpful add-on for HTML WYSI editors but I will cover that later.

LaTeX

There is a more valid use case for LaTeX in the browser since some scientists and academics will never use anything else, and you’ll never convince them to adopt HTML regardless of the benefits. You are up against the great Church of Knuth and I don’t fancy your chances. If your audience is comprised of LaTeX addicts, then I think you have no choice other than to support that.

Many times I have talked about remedies for unstructured MS Word documents (for scientific manuscripts) only to have someone earnestly comment that if everyone just learned LaTeX we would be in a much better position… They might be right, but I’m pretty sure it’s never going to happen.

The preference for LaTeX is a legacy issue, and problematic, but needs to be dealt with. (Unfortunately, today’s Markdown heroes are growing legacy issues like this with each passing day, and that is going to cost us down the road).

Recently there has been some interesting work on in-browser LaTeX editing including the (closed source) Authorea platform and, most notably the (open source) ShareLatex platform. ShareLatex round trips the LaTeX syntax displayed and edited in a text area (in the browser), renders that to a bitmap on the server, and returns it to the browser for a side-by-side ‘WYSIWYG’. The effect is that you can see a just-in-time rendered view of the LaTeX as you type. It’s a neat trick and effective if you insist on LaTeX in a web-based platform. Then you just have to live with the UI costs. However ,you only need this approach if you wish to support the full LaTeX syntax. If you wish to just support LaTeX equations, you can use an HTML editor with a LaTeX plugin based on MathJax or the Khan Academies KaTeX(and there are some other solutions such as Mathoid).

Incidentally, if you need to support full LaTeX I highly recommend checking out ShareLaTeX over WriteLaTeX. They both have the same approach but WriteLaTeX is proprietary whereas you can pick up the ShareLaTeX code and integrate it straight away. You could even build your own ShareLaTeX-like interface, it’s not too tricky – together with a colleague – Rizwan Reza – and I (Riz did all the hard work) we managed to develop a workable prototype in about 2 days, but there are many gotchas setting up the LaTeX compiler correctly.

Not many book projects need LaTeX, so I will leave this as an interesting edge case. There are solutions if you need it, but not many people need it.

XML

I think I will just leave it to the words of the brilliant Dave Cramer (Hachette Book Group):

So we’ve chosen to describe our content with HTML, and build our production system around HTML.

When I tell people that, they smile condescendingly, and chuckle a bit. “That’s cute. Why don’t you use real XML?”

I then ask them what you can do in Docbook (or TEI, or NLM) that you can’t do in XHTML? I haven’t heard a good answer to that question yet. XHTML is XML, by definition. Calling something “para” rather than “p” doesn’t get you anything, except carpal tunnel syndrome and invoices from consultants

The problem with non-HTML XML is that it is essentially just XML the browser can’t use. Hence you lose all that other good stuff like WYSI editors, CSS design tools, cool tricks with JavaScript, and all the cool tools that are being developed for HTML. XML just can’t compete, plus you are going to need to convert the XML into HTML anyway. So don’t make life more complicated than it already is – continue your love affair with XML as long as it’s XHTML!

HTML

HTML is king in the browser and it gives you all you need to make books. I don’t want to spend a lot of time arguing the merits of HTML in this post as there is a lot to say and I want to bring that in at other points of the conversation. But in brief:

HTML is supported by JS and CSS.
The DOM is known natively by the browser.
HTML is standards-based.
It is straightforward.
HTML is easy to read and easy to clean.
HTML is the most popular file format on the planet.
You can use HTML to build structure in documents with assigned class and id values, or microdata formats.
HTML is the native file format for EPUB.
PDF can be rendered directly from HTML in the browser (more on this later).
HTML can be paginated in the browser.
CSS is moving towards supporting more and more page based elements.
The browser can act as a design environment.
You can create real what-you-see-is (WYSI) production environments.
Basic editing is built into the format itself.
HTML is supported by an enormous number of tools for conversion (in and out).
HTML is supported by an enormous repository of examples (the web).
HTML is cheap to develop with.
Even book designers are getting used to it.
Some schools teach it.
It has a million free tutorials online to help you use it.
A lot of people know HTML.
HTML is supported by a rapidly proliferating body of JavaScripts for typography, graph production, animation, interactions, dynamic rendering etc etc etc etc

The basic idea really comes down to this.

HTML is the cheapest format of our time.
HTML is the most popular format of our time.
HTML is the networked document format of our time.

Increasingly HTML is the way stories are told, whether that is in books or on the web. It’s a trite analogy perhaps, but HTML is the paper of our time. As Dave Cramer says:

why start with something other than HTML, when you have to turn it into HTML anyway?

It should be noted that Cramer also turns HTML into paper, and the Hachette Book Group have produced many beautiful paper books using HTML as the source format. Many of these books you will now find in the best-selling sections of your local brick and mortar bookstore.

Other print producers are also using HTML as the source. Print-on-demand services, used to producing very ugly books by ingesting MS Word and dealing with all that ugly conversion, are also adopting HTML production environments. Books on Demand, Germany’s largest Print on Demand service, adopted Booktype so their customers could have an easy in-browser book production environment. The source format is HTML but the users don’t know that, and the books look better. That’s the beauty of HTML.

Finally, helped a lot by the efforts of Dave Cramer and the Hachette Book Group, Sourcefabric, the people at O’Reilly, and others adopting HTML, we might be starting to see the very beginning of the changing of the guard.

HTML is the way to go for Book Production Platforms. If you choose another format you will find you inherit a lot of costs and additional overheads and, sadly, you will soon be left behind. There is just no format going forward at the same speed as HTML. Not even close. So, my advice is to first ask the question – can HTML do what you need? Push your team to answer that question. Will format X give you anything HTML can’t? As an exercise ask your team to prove HTML is a bad choice, and if the answer is not-HTML, then contact me and let me try and talk you into it!

Building Book Production Platforms p1.

There are a number of web-based book production platforms out there at the moment. Booktype was the first. Its birth was around 2006-2007 and it started as a series of extensions (written in Perl) to the TWiki platform. TWiki was, and is, an old school wiki with a very pluggable architecture.

Booktype is now a stand-alone book production platform built with Python (Django). I founded the project and kept it going for many years, and I’m very proud of it. It is free software and now has a permanent new home at the Open Source, Berlin-based, development house Sourcefabric. I’ve since moved on to think about similar issues for the Public Library of Science.

Since Booktype, many platforms have come out of the woodwork including Atlas (closed source), PressBooks (semi open), gitbook, Inkling, PubSweet, Lexicon, Penflip (closed), Gitbook.io, FastPencil (closed), and many many others.

Seeing these evolve has been interesting. Most interesting is that I see a lot of mistakes still being made which we made in the early days of Booktype. I’m not saying Booktype is perfect, it is far from it, but I wanted to document what I consider to be a few high-level gotchas for those wanting to build this kind of platform. We need more book production platforms that improve the game. We need to avoid making the same mistakes over and over, so I hope the following will give those building tools from scratch at least something to think about.

This article is first in a series. It should be noted that I am targeting my comments at those that wish to build Open Source book production platforms. I’m not terribly interested (actively disinterested), in closed source platforms. Hence the narrative encourages you, the Open Source development team, to consider book production platforms as a distinct category of software – don’t use existing software such as MediaWiki, or WordPress etc to do the job.

Stand-alone platforms

I firmly believe that book production platforms should be built as stand-alone applications dedicated to the work at hand – building books. Book production platforms should be specialised software that ‘thinks like a book’. Your users want to make books, not blog posts or wiki articles, and they deserve a platform that is built with their goal as the central premise.

Don’t learn this the hard way as I did. Booktype started with a wiki and it did a pretty good job. At the time TWiki could be considered as a kind of rapid prototyping framework for Perl, much like Django or Rails is for Python and Ruby respectively. It provided essential login features and user management, plus a host of plugins to extend functionality with diffs, reports, permission management etc. It did pretty well for a while. However, the wiki-like paradigm soon started to grow old. Initially, I had pulled the guts out of the wiki and replaced wiki markup editors with HTML editors (more on that later) and did a whole lot of crazy URL redirects using .htaccess to get nice book-like book/chapter URLs and attending structure. That was OK, but when it came to other functionality that books needed, then the wiki paradigm was sagging pretty noticeably. It also took increasing amounts of time and effort to build an interface that was clearly intended for making books instead of wiki pages, not to mention the issues with updating a core application framework for security or performance improvements when the internals had been hacked so extensively it no longer really worked the same way as the original application.

A bit of template paint helped with some of these issues, and I commissioned some programmers to extend TWiki to fill in these book-wiki gaps, but, after a while, it became obvious it would be better to build a dedicated standalone book production platform from scratch. Given all my experience since then, I am convinced that if you want to make books you should have a software built specifically to do that.

Paradigm differences

Book production platforms are a specific category of software as the functional needs are quite different from those for producing wikis, blogs, or other types of CMS. When building book production software there are essential paradigmatic differences that just force you out of the existing categories and into the new…

The following are some examples, and I will document one or more at a time as this series continues. First of all, the Table of Contents.

Table Of Contents

Any book production software needs a way to create and manage a Table of Contents (ToC). Neither wikis, CMS, nor blogs have the need to structure content in this way: they are designed for non-linear, contextual navigation (you click on a link from within a paragraph to be taken to somewhere else). The closest these other software paradigms get to a Table of Contents is through navigation bars and breadcrumbs and hacking together ‘lists’ of associated content.

However, books have a very linear structure so one thing follows another in a vertical narrative. That’s not to say you can’t break the structure. However, if you intend to create paper books or electronic books, then it is very difficult to escape this very strict linearity. An easy and fluid way to manage this vertical linearity is a must for any book production platform.

There have been attempts at making this work with other types of software. Mediawiki makes an attempt to do this with the concept of ‘collections’. A user can save references to articles in a list then jockey them into order. Other software, such as PressBooks, has attempted the same with blogs – organising blog posts into a list. But the solutions are never satisfactory. It always feels like a compromise – the poor user is asked to work with something that feels like it’s actually built for another job, and it is.

A Table of Contents interface should be the heart of a book production platform, and the platform should own that space. There needs to be a fluid and fast interface for creating new chapters, moving items around, quickly seeing the structure of the book, and if you like the idea of concurrent production, then it needs also to be able to show at a glance who is working on what. The user needs to be able to add and remove chapters at a click, move chapters outside of the ToC while keeping them available for possible later use, or rename them – right from the ToC without the need to dig down into individual chapter settings. Additionally, the changes in structure need to be dynamic: updating immediately for every user without the need to refresh. The status of individual chapters needs to be ascertained from a single glance at the ToC. The interface for book production should be fast, clean (separate from additional blog or wiki cruft), and feel like it is an interface built specifically for the job it is attending to.

The ToC is how people get an overview of a book, and book production software necessarily gives them the tools to manage that overview. So start with the Table of Contents as a central UI element, and let the user go from there. The user will then feel that the platform is all about managing their book. The UI will communicate ‘book! book! book!’…not ‘wiki!…err…book!…errr…wiki!’. A ToC as a central motif communicates the purpose of the platform and gives the user a tool that gives an immediate overview of the work at hand, and the output feels like a book from the very beginning.

Book production platforms need to have a Table of Contents interface as a central part of the paradigm, and this should be reflected in the UI – it should not just be something that has been tacked on as a ‘workable solution’. Start building something to manage a ToC beautifully and work out from there.

Typography JavaScripts

A list of Javascripts for typography.

Hyphenation

Sweet Justice
License: BSD
Code: https://github.com/aristus/sweet-justice
WWW: http://carlos.bueno.org/2010/04/sweet-justice.html

Hyphenator
License: LGPL
Code: https://code.google.com/p/hyphenator/downloads/list
WWW: https://code.google.com/p/hyphenator/

Hypher
License: BSD
Code: https://github.com/bramstein/hypher
WWW: http://www.bramstein.com/projects/hypher/

Font Resizing

FlowType.js
License: MIT
Code: http://github.com/simplefocus/FlowType.JS
WWW: http://simplefocus.com/flowtype/

Squishy
License: unspecified (argh! please include a license file!)
Code: https://github.com/lemonmade/squishy
WWW: http://cmsauve.com/projects/squishy/

Fittext
License: WTFPL
Code: https://github.com/davatron5000/FitText.js
WWW: http://fittextjs.com/

SlabText
License: MIT
Code: https://github.com/freqDec/slabText/
WWW: http://freqdec.github.io/slabText/

Responsive Text
License: MIT
Code: https://github.com/ghepting/jquery-responsive-text
WWW: n/a

Line Spacing

Typeset
License: unspecified (argh… )
Code: https://github.com/bramstein/typeset
WWW: http://www.bramstein.com/projects/typeset/

Kerning

Kern.js
License: WTFPL
Code: https://github.com/bstro/kern.js
WWW: http://www.kernjs.com/

Kerning.js
License: MIT (license file is in the wrong place – check the README)
Code: https://github.com/endtwist/kerning.js
WWW: http://kerningjs.com/

TypeButter
License: CC-BY-SA 3.0
Code: https://github.com/hudsonfoo/typebutter
WWW: http://typebutter.com/

Drop Caps

Color

Color Font
Licence: WTFPL
Code: http://manufacturaindependente.com/colorfont/media/js/colorfont.js
WWW: http://manufacturaindependente.com/colorfont/

Font Tricks

Lettering JS
License: WTFPL
Code: https://github.com/davatron5000/Lettering.js
WWW: http://letteringjs.com/

jqisotext
License: MIT & GPL
Code: https://code.google.com/p/jqisotext/downloads/list
WWW: http://workshop.rs/2010/01/jqisotext-jquery-text-effect-plugin/

Arctext.js
License: MIT
Code: http://tympanus.net/Development/Arctext/Arctext.zip
WWW: http://tympanus.net/codrops/2012/01/24/arctext-js-curving-text-with-css3-and-jquery/

Base Lines

Baseline.js
License: WTFPL
Code: https://github.com/daneden/Baseline.js
WWW: n/a

Baseline CSS
License: CC-BY-SA 3.0
Code: http://baselinecss.com/download/baseline.zip
WWW: http://baselinecss.com/

HUGrid
License: GPL
Code: http://bohemianalps.com/tools/grid/HeadsUpGrid_download.zip
WWW: http://bohemianalps.com/tools/grid/

More coming…