HTML Typescript – redistributing labor

Apparently small changes in writing technologies can radicalise publishing workflows. Matthew Kirschenbaum makes a good case for this in his book “Track Changes: A Literary History of Word Processing“. There are a number of cases in the book that illustrate this. I’m about 1/3 of the way through and the example that astonished me was related to Issac Asimov.

The setting is mid 1981 and Asimov has just encountered his first word processor at age 61 (Kirschenbaum places 1981 as, pretty much, the start of the word processing revolution). Previous to this, Asimov had written with a typewriter. He was extremely fast and hence many errors were made that could not be corrected easily with a typewriter. Errors, in those days, were the problem of the copy editor who marked the manuscript up with corrections.

As it happens, Asimov finds typing faster with the new electronic word processor and so he makes more mistakes. The copy editor then gets frustrated and in turn frustrates Asimov with many more clarifying queries about the errors made. Dr Asimov then starts correcting his own text using find-and-replace to correct common errors…

“cold” suddenly becomes “could” and no sign exists that it was ever anything else.

The result is cleaner copy which elicits fewer inline queries from the copy editor. Right there, publishing workflows changed and labor was redistributed from the copy editor to the author.

A migration from a typewriter to a word processor is not a small change. However, on a higher, more abstract, level the ability to correct content ‘as you go’ is not a huge cognitive leap. When the technology enabled this, however, the change it effected was great.

Small design changes can have huge consequences for publishing workflows to the point of redefining roles. With the introduction of the word processor came the ability to change copy easily and consequently the author assumed more of this role and the copy editor’s role was redefined.

I think we might have something similar in the systems we are building at the moment – particularly Editoria, the platform for scholarly monograph production. I’m not claiming the system will have an effect on the same scale of the introduction of word processors, rather that a few, apparently small, design decisions will have a large impact on the workflow of those using the system. This change is not just about efficiency, which is certainly what we are aiming to achieve, but how things are done ie. a similar redistribution of labor.

In the current workflow of the University of California Press (UCP), with whom we are designing the system, the following steps occur:

  1. The acquisition dept acquires the book from the author in form of a collection of Microsoft Word (MS Word) files in docx format. One file per chapter. These are sent to the production dept.
  2. Production editors (in the production dept) open each file on the premises of UCP where MS Word is installed with custom macros. These macros enable the editors to select a part of the text and apply custom styles (see image below).
  3. The newly styled MS Word files are then pushed through the review cycle featuring the copy editor and author(s).
MS Word with custom styling macros
MS Word with custom styling macros

Step 2, styling, is a long manual process. Also, it can only occur on computers that have these macros installed. Further, the subsequent edits by the copy editor and author(s) might revert some styling changes. So some of the work will need to be done again. Lastly, when the version of MS Word is updated then the Macros must be checked, potentially rewritten to keep pace with the new version, and re-installed on each machine.

So…the small change we are introducing we call HTML Typescript. It is nothing fancy, nothing new, but it is a fresh look at how we can bring offline word processing workflows into the browser. The critical problem being, how to do you get the MS word files provided by an author into a web-based production workflow? In clearer, slightly more technical, terms how do you transform a document from MS Word to HTML?  Well… many people literally say “it can’t be done.” The criticism is not as literal as it sounds and really comes down to what you are trying to achieve. If you wish a one-to-one conversion of an author’s structural intent for the production a correctly marked up HTML document, then automated conversion through any process is not going to achieve this. Not even the very complex processes that go from docx to structured XML (of some type) and then to HTML, which is the approach file conversion specialists prefer, will achieve this.

This is where we think HTML Typescript solves some very interesting problems.

HTML Typescript offers a reframing of the problem that, consequently, enables processes that closely match the publisher’s current workflow. It also offers the potential to radically redefine that workflow.

Docx is a type of XML. It is, laughably, called OpenXML by Miscrosoft. Laughably, because while you can unzip a docx file (a docx file is a zip file containing a simple directory structure and some files), you can open the document.xml and poke around with it (see video demo below), but it is incomprehensible. There are two critical problems that contribute to this:

  1. The XML is pretty unreadable by humans. It is dense and verbose and very very messy.
  2. Due to the editing environment (MS Word) and its inability to constrain authors from using ‘font size 12, bold’ instead of marking the text correctly as a Heading Level 2 (for example), the resulting styles applied by the author, and described in the XML, are erratic at best.

Number 2 is the big problem, and issue number 1 helps obfuscate it.

So, when you want to transfer from docx to HTML, it is very difficult to determine the author’s strucural intent without looking at the original display version of the MS Word file. Is this indented single line marked ‘font size 16.6, bold’ a heading 1,2, or 3…or is it meant to be a block quote? For example…

This is exactly why the macros mentioned above must be used, because named styles are not used ‘upstream’ by the author. So the production editors must work their way through each file and apply the correct designated styles using the custom macros.

With XSLT, the conversion process of choice for file conversion pros, it is very possible to convert from docx to HTML. Nothing impossible there. You don’t end up losing content (there are a few gotchas for this which I will cover in later posts) but you do end up with messy unstructured HTML. This is because the original docx is messy and unstructured. This is what people mean when they say ‘it can’t be done’ ie. you cannot infer all the structural intent of the author from the docx file, because it is a mess, and by some magic subsequently convert it to lovely clean, structured, HTML.

But, that is actually ok. Converting to ‘messy’ HTML really puts you in about the same position as having a messy docx file. They are both equivalently unstructured.

File conversion specialists don’t like this position. They want, by their very nature, documents to be nicely structured. HTML does not, in its raw form, stipulate much structure. Sure you can have headings 1,2,3 but also you can have, as MS Word does, arbitrary font sizes and styles that look like a heading, but the underlying markup doesn’t explicitly state this is the case. In addition, HTML doesn’t worry about document sectional structure too much. What if you wish to identify a section of the text as a ‘method’ section? How do you deal with that? Custom XML can deal with this as you can design a mark-up structure to suit. But HTML, as it is described by the standards, doesn’t enable this (yet…web components will change this to some degree). However, with plain ole HTML you can use any sort of div or span you like to wrap around sections of a HTML document and call them what you like. For example, you can have ‘div class=”method”‘ and there you go…the section is defined, for your intents and purposes, as a method section. This does not have much currency in the outside world (the world outside your platform) since there is no standard way of describing a method section in HTML, but for the purposes of creating production systems, this is really ok. We can work with this. The important thing is that we can now apply document structure with HTML should we wish to. When we wish the document to leave the system we can make that conversion at export time (more on this below).

So… Where does that leave us? Ok… well, at the time of conversion we can move from docx to HTML and make some ‘educated’ guesses as to what the author’s intent is. For example, if there is one single line text with ‘font size 24, bold’ and then 16 of ‘font size 20, bold’ and 6 of ‘font size 14, bold’ and then a bunch of sentences groups with font size 12 – then we can map these to heading 1,2,3 respectively, and the last being standard paragraphs. It is not perfect -it will not catch all structure, and the process will make incorrect inferences. However, it will get us part of the way there. So we can already start improving on the structure of the MS Word file automagically.

Arguably, we are in a better position with an initial rules-based clean up of the file structure as it passes from docx to HTML. The good thing is that we can improve these rules over time. In time, the automated conversion will produce better results.

After a conversion of this kind, we have a partially structured file. This is where file conversion specialists often leave the conversation. They don’t like partially structured anything. It is not in their DNA. That is because they are primarily concerned with conforming file A into structure X. Their metric is ‘ is it well structured?’. It is a pass/fail binary. If you are to look primarily at whether file A or B is well structured then you want well-defined schemas that describe the structure, and you want files that subscribe perfectly to that schema. Anything that falls out of this is a fail. Partially structured documents are a fail.

But with the HTML Typescript approach we see partially structured documents as a strength, not a weakness. We know it is not possible to get to perfectly formed documents in one go, so rather than consider this a fail we accept we must get there progressively. That is the fundamental principle behind what we are calling HTML Typescript. It is the use of HTML as a document format that can be progressively improved to get to the structure you desire over time using both machine and manual processes.

HTML is the perfect format for this. HTML’s lack of formal structure, along with its ability to define any kind of structure you want, enables us to progressively add any kind of structure we need to a document. One part of this process is the automated clean up at conversion time, and the next part of the process is where it starts getting interesting… this is the manual application of structure.

This is where the apparent weakness of HTML becomes a strength – we can manually add structure over time. We can progressively structure the document. For this process, we can build, and the Coko team are building, a suite of tools in the production environment so that a production editor (for example) can click on an element (eg a heading) and choose the style they wish to apply from a menu – similar to how they currently work with macros.

The advantages of adding the correct structure in the browser vs MS Word? Well, firstly, as mentioned above, we can computationally improve the structure before the manual process. This results in less work to do. Secondly, we have complete control over the tools available to the platform’s distributed citizens (I don’t like the term ‘user’ and so are experimenting with other terms. Let’s try citizen for now). Hence we can make the tools available to everyone, not to just those citizens with the MS Word macros installed. That means:

  1. There is no need to update the (equivalent of) macros against the underlying desktop software version across many machines.
  2. If I wish to update the features that enable styling, then all users can leverage these updates immediately.

So, as a publisher, I’m not stuck in the harrowing, expensive, cycle of continual software upgrades and installs against random (or planned) updates of MS Word that maybe conducted by my org. That is already a saving. There is a second tier of savings here – we are building open source systems with the intention that they are used by a large number of orgs. If we share a common platform and common toolkit, we can also share the costs of maintenance and development, and free from vendor upgrade cycles. Each publisher doesn’t have to do their own software development to keep their own styling macros up to date. Yet you can still innovate and develop new features – while any new feature can potentially be available to everyone.

But…more importantly…just as with great power comes great responsibility, we could say that with a shared toolset comes shared responsibility. That doesn’t reference the sharing of costs mentioned above (although it could) but rather it references the roles of the people in the organisation using the toolset. If the tools for styling are available to all citizens, then so might the responsibility for using those tools. Much like Asimov noticed he could now do some of the copy editing, so might a publisher re-distribute the work of styling a text. This is where a small design decision might have a large impact. Publishers that are forced to design workflows based, literally, on where the tools are (what machines the macros are installed on in this case), can now design workflows dependent on who they think would be best placed to do the work. That is pretty interesting. Such a small design decision might actually cause pretty radical changes to workflow.

There are a couple of things I want to comment on with regard to this. Firstly, achieving efficiencies like the above are only available to you if you design systems rather than software. Designing software is a technical endeavor, designing systems is not. Designing systems requires an understanding of what is trying to be achieved, by whom, and with what constraints. It is very difficult, for example, to imagine and design a progressive approach to structuring files if you are simply focused on building software that produces well-structured files. It sounds like a distinction without a difference, but it’s not. The difference is profound.

Secondly, what I have described above requires, in this implementation, HTML Typescript. But HTML Typescript is not a format, it is merely a way of using an existing format – HTML – and seeing its apparent weakness (no strict formal structure) as a strength. So HTML Typescript is not really a technical solution so much as it is an approach — one which enables us to build interesting systems to solve the long-standing problem of ‘getting out of MS Word’ and into the web.

Lastly, there may be some concern that this kind of approach doesn’t get us to structured formats like JATS (or whatever) that may be required by your content sector. Well, that’s not a big deal. You can design your version of HTML with all the structure you need to make an easy conversion to whatever XML format you want. That’s not so tricky. HTML can contain that structure and you can build the production tools (the web-based equivalent of the MS Word macros) to suit. In other words, end-of-line formats (for distribution or syndication) should remain an end-of-line problem. When you press ‘export’ (or whatever), then that is the time, presumably, when the document is nicely structured, that you convert it to whatever you need in a relatively straightforward conversion.

In summary, I’ve been kicking around in this area for a long time and feel that there are some ideas that are surfacing that not only solve some tricky problems, but open the door to new possibilities. I think the HTML Typescript approach is one such case and I’m looking forward to finishing the implementation of the HTML Typescript approach with Editoria and trialing in the weeks to come. More thoughts to come.

HTML Typescript is an approach developed by myself and Wendell Piez.

Some publishing systems I have developed

developed_floss

I recently went over some publishing systems with Yannis and Christos from Coko. We looked at various systems and discussed them. As we did this, I realised that I’d actually built quite a few! Although we weren’t just talking about those I had built, I began to think through what I had done right and wrong when building those earlier systems. Each development is a learning process and you will always get things both wrong and right at the same time. The trick is to get less wrong the next time round…

In the ten years that I have been building these systems, I have worked with some pretty talented people, including Luka Frelih, Douglas Bagnall, Alexander Erkalovic, Johannes Wilm, Remko Siemerink, Juan Gutierrez, Lotte Meijer, Fred Chasen, Michael Aufreiter, Oliver Buchtala, Nokome Bentley, Andi Plantenberg, Mike Mazur, Rizwan Reza, Gina Winkler and many others including the entire team of Coko – the most talented bunch of them all. Coko team:

Kristen Ratan – CoFounder, San Francisco
Jure Triglav – Lead PubSweet Developer, Slovenia
Richard Smith-Unna – PubSweet Developer, Kenya
Yannis Barlas – PubSweet Frontend Developer, Athens
Christos Kokosias – PubSweet Frontend Developer, Athens
Charlie Ablett – INK Lead Developer, New Zealand
Wendell Piez – XSL-pro, East Coast USA
Julien Taquet – UX-pro, France
Henrik van Leeuwen – Designer, Netherlands
Kresten van Leeuwen – Designer, Netherlands
Juan Guteirez, Sys Admin, Nicaragua

Alex Theg – Process, San Francisco

All systems except, unfortunately, one (see below) are open source.

FLOSS Manuals

top_read

The first publishing system I designed and built didn’t have a name really. It was the glue behind the FLOSS Manuals English site. FLOSS Manuals was, and is still, a community focussed on building free manuals about free software. I started the development in English but the system needed to be useful to a number of different language communities, of which Farsi was the most interesting. Today, only the French and English communities are still operational.

I built the FLOSS Manuals system in Amsterdam in about 2006. It was based on TWiki, an open-source Perl-based wiki. I chose TWiki because back then it was the only wiki around that had good PDF-generation support – I think this came from some of its plugins. TWiki had a good plugin and template system and I came to think of it as a rapid prototyping application – it was pretty malleable if you knew how. I was a crap programmer so I cut and pasted my way to a system that became usefully functional.

developed_twiki

I leveraged the account creation and permission systems of TWiki, and ripped out the wiki markup editing and replaced it with an HTML WYSIWYG editor (I think it was TinyMCE). So in wiki world terms I had committed something of a crime. Throwing out wiki markup at the time was unheard of in the post-2004 euphoria of Wikipedia. But JavaScript WYSIWYG editors were coming along just fine, and I thought wiki markup no longer made any sense (in fact it had been invented to make making HTML easier than writing raw HTML). But y’know… wiki markup was no easier than using a WYSIWYG editor, despite the sacred status of Wiki markup in the Wikimedia community. And despite my heresy,the Wikimedia did give me a barnstar for my efforts, which was nice of them:)

developed_star

After I reached the limit of my coding skills, a friend, Aco (Alexander Erkalovic), helped build the next bits. I found some money to pay him and that is when things started to move forward. I can’t remember all parts of the system, although it’s still in up and running for some FLOSS Manuals language sites. The core of the system was the manual overview page. This contained a list of all chapters in a manual. You could also set the status, add new chapters, view overall progress etc. from this one page. We had a separate mechanism for creating an ordered table of contents (index) for a manual.

developed_fmindex

Index builder, essentially a dynamic drag and drop mechanism for arranging chapters

developed_fm_overview

The overview page

Essentially, you added a chapter on the overview page and then opened the index builder and arranged the chapters. When saved, the (refreshed) overview page displayed the correct (new) order of chapters. We had to do it like this because back then, in the era of the ‘page refresh,’ it wasn’t possible to synchronise dynamic elements across multiple clients. So we couldn’t have one ‘shared’ index builder that would dynamically update all user sessions simultaneously. Nevertheless, it worked pretty well.

From the overview page, you could choose a chapter to edit. When you did so, you locked the chapter and, through some JS trickery Aco cooked up, we wereable to do this across multiple clients so everyone could see in real time who was editing what.

developed_blog_chapterlist

When editing a chapter, you could save the document and then, when ready, publish it. This way you could have an ‘in progress’ state of a chapter and a published state. At publish time the chapter was copied across to the system’s web delivery cache.

We also added chapter status markers, as you can see from the above image. It was pretty basic but nifty.

Next I hacked in a live chat, initially using IRC (freenode) for a global FLOSS Manuals-wide chat:

developed_irc

developed_irc2-irc-en

Then I hacked a fancy AJAX script into the chapter edit interface so each manual could have its own chat with the interface present while you were editing a chapter. It also looked a little nicer than the IRC channel. From the beginning, I tried to make everything look like it was meant to be there, even if it was a fiddly hack.

developed_adamwindow

FLOSS Manuals also had a lot of other interesting goodies. We had federated content, for example. This was established so one language site could import an entire manual from another and set up a translation workflow.

developed_xchange

We also had a simple side-by-side editor set up for translation that worked pretty well for translators.

developed_fm_metaphorIn addition, we had a remix system for generating new versions using mixed content from other manuals. This was useful for workshops and for making personal manuals, for example.

develped_remix_dropdown

developed_renixed2

One of the cool things about the remix is that you could output in many formats such as PDF and zipped HTML, add your own styles through the interface, PLUS you could embed the remix in your own website 🙂 The embed worked similarly to methods used today to embed YouTube or Vimeo videos in websites (by cut and pasting a simple snippet). The page delivery was ‘live,’ so any updates to the original manual were also displayed in the embed. I thought that was pretty cool. No one used it of course 🙂

developed_remix3

The system also had diffs so you could compare two versions of a chapter. It was good for seeing what had been done and by whom. In addition, it was possible to translate the user interface of the entire system to any language using PO files and a translation interface we custom built:
developed_translateBut by far the most interesting thing for me was building FLOSS Manuals to support Farsi. It was interesting because, at the time, no PDF-renderer I could find would do right-to-left rendering. Behdad Esfahbod advised me to just use the browser to do it. Leslie Hawthorn from Google Open Source Programs Office introduced me to him after I went on a naive search in the free software world for ‘someone that knew something about RTL in PDF’. I was very lucky. The guy is a generous genius. He just suggested an approach a new way to think about it and later I worked with Douglas Bagnall (see below) to work out how to do it. The basic idea being that HTML supported RTL and the PDF print engines rendered it nicely…so…it was my first realisation that the browser could be a typesetting engine.

Implementing RTL in the FLOSS Manuals system was so very tricky, and I was unfortunately on my own for working out Farsi regex .htaccess redirects and other mind-numbing stuff. Just trying to think in right-to-left for normal text did my head in pretty fast, but somehow I got it working.

developed_fm_farsi

Outputs of all language books were PDF and HTML, later also EPUB. I initially used HTMLDoc for HTML-to-PDF conversion. It was pretty good but didn’t really think like a book renderer needs to. This was my first introduction to the overly long wait for a good HTML-to-PDF typesetting engine. Later I found some money and employed Luka Frelih and then Douglas Bagnall to make a rendering engine separate from the FLOSS Manuals site (see Objavi below). Inspired by my chats with Behdad (above) Douglas introduced some clever PDF tricks with the Webkit browser engine to get book formatted PDF from HTML. I can’t remember exactly how he did it but essentially he used xvfb frame buffer to run a ‘headless browser’. In short, and for those that don’t know those terms, he came up with a very clever way to run a browser on a server to render PDF. It did it by rendering HTML to PDF in pages (using the browser PDF renderer) and then rendered another set of slightly larger blank PDF with page numbers etc and embedded one within the other. Wizard. Hence we were able to make PDF books from HTML. It also supported RTL 🙂 I think I need to say here that this whole process was immensely innovative and we did it on a dime. Also, because we refused to use proprietary systems we were forced to innovate. That was a very good thing and I welcomed the constraint and the challenge.

Later we also used WKHTMLTOPDF to make PDF. It worked and we worked with that for a long time. I even tried to start a WKHTMLTOPDF consortium with Jacob Trulson, the founder of the project, it got some way but not far enough (I am very happy WKHTMLTOPDF is still going strong!).

I played a lot with Pisa and Reportlab for generating PDF and finally cracked it when I started book.js (more on this below). Actually, when it comes down to it I think I used everything that wasn’t proprietary to make book formatted PDF from HTML. It was the start of a long love affair with the approach. I promoted this approach for a long time, even calling for a consortium to be formed to assist the approach:
http://toc.oreilly.com/2012/10/the-new-new-typography.html
http://toc.oreilly.com/2012/11/gutenberg-regions.html

As it happens, all this time later I’m still doing the same thing 🙂
http://www.pagedmedia.org

We integrated FLOSS Manuals with the Lulu API (now defunct). This allowed us to generate books automatically from HTML using Objavi (below) and they would be automagically uploaded to the Lulu print-on-demand marketplace for sale…that was amazing! Ah…but also no one used it. Lulu shut down the API as soon as they realised no one was using it.

We made many many manuals with this setup. Many of them printed from the auto-PDF magic and were distributed as paper books. Many of the books were in Farsi. The One Laptop Per Child community even built a FLOSS Manuals app that was distributed on all OLPC laptops with the documentation made with FLOSS Manuals.

developed_books

The system produced loads of manuals and printed books about free software. All free content.

developed_stripofbooks

The FLOSS Manuals platform didn’t have a name. It was a hack of TWiki. While you could get all the plugins online, it would have been a nightmare to deploy. I did deploy it many times but I essentially copied one install to another directory and then cleaned out all the content and user reg etc. and went to town rewriting all the .htaccess redirects. It sounds stupid now, but I spent a lot of time doing URL redirects to ditch the native TWiki URLs (which were extremely messy) and make them readable. Hence the system was a pain to deploy or maintain. It was feeling like we needed a standalone, dedicated, system….

Booki

Booki was the next step. It was clear we needed something more robust and also it was clear no web-based book production system existed, hence the hackery Twiki approach. So it was about time to build one. At the time, though, I remember it being very difficult to convince anyone that this was a good idea. I didn’t have access to publishers, and everyone else thought books were just soooo 1440. What they didn’t realise is that we were building structured narratives and that, IMHO, will have a lot of value for a long time. Call it a book or not. Anyway, we built Booki on Django (a Python framework) and the first time we used it was a Book Sprint for the Transmediale Festival in Berlin.

developed_collab

Aco literally would be building the platform as it was being used in the Book Sprint – restarting when everyone paused to talk. It was an effective trick. We learned a lot working with the tool and building it as it was being used.

At the time I couldn’t imagine book production being anything other than collaborative. FLOSS Manuals collaboratively built community manuals. Book Sprints were short events building books collaboratively. So Booki was meant to be all about collaboration. Booki also was run as a website (booki.cc) which was freely available for anyone to use.

develope4d_booki1

Many groups used it which was cool.

developed_booki2

Booki.cc run from with the OLPC laptop

Booki had all the basic stuff the FLOSS Manuals system had and we advanced the feature set as our needs became more sophisticated, but the basics were really the same.

developed_booki3

The main differences were that we had a dynamic ‘table of contents’ where you could add and rearrange chapters etc and the updated ToC would be dynamically updated across all user sessions. Hence the ToC became a kind of ‘control panel’ for the book.

developed_bookitoc

We also experimented with data visualisations of book production processes.

developed_civicrm

developed_seo

We did some cool stuff with the visualisations. For example, during the Open Web Book Sprint (also in Berlin) we worked in the Hungarian Embassy. They had a huge window that you could backwards project onto so people could see the projections on the street. So we made a visualisation of text from the book being overlayed as we wrote it. Below is an image shot with us standing in front of the projector…I mean..c’mon…how cool is that? 🙂

developed_hungary

developed_hungary2

Booki also had federation. You could enter the target URL of a book from another install and Booki 1 would make a portable archive (booki.zip) and send it to Booki 2. Booki 2 then unpacked the zip and populated the book structure complete with images etc. I liked the idea very much of using EPUB instead as a transport technology instead and was later able to do so for Aperta and PubSweet 1 (see below).

In general, Booki didn’t advance much further from the FLOSS Manuals set up. It kind of did ‘more of the same’. I think the only stuff that went further than the previous system was the dynamic table of contents plus it was easier to install and maintain. Having groups was also new, but that wasn’t used much. It was in some ways just a slightly different version of the previous system.

Booki was also used for producing a tonne of books.

developed_2books

developed_collabfutures

Objavi 1 & 2

Alongside the development of the FLOSS Manuals system and Booki, I headed up the development of Objavi. Objavi is basically a separate code base that was used as a file conversion workhorse. Objavi 1 was a little bit of a hacky maze but it did a good job. It would basically accept a request and then pipe that through a preset conversion pipeline. It did a good job. What I found most useful from this is that each step left a dir that I could open in a browser to inspect the conversion results. So if the conversion needed several steps, this was very helpful in troubleshooting.

Objavi 2 was meant to be an abstraction. However I don’t think it really got there, and Booki, which later became Booktype, came to internalise these conversion processes after I left the project. I always thought internalising file conversion was a bad idea because file conversion is resource-intensive, making it better to throw it out to another external service. And having a separate conversion engine enables you to completely overhaul the book production code without changing the conversion code. Hence FLOSS Manuals could migrate to Booki but still use the same external conversion engine. This was a HUGE advantage. Further, all the FLOSS Manuals instances, as well as booki.cc could use a single Objavi install for their conversions.

Objavi was actually also the magic behind the federated content in both the FLOSS Manuals system and Booki. All content would be passed through Objavi for import and export so Objavi became the obvious conduit for passing a book from one system to another. This gave me a lot of ideas about federated publishing which I have written about elsewhere and archived here.

Booktype

developed_booktype1

Booktype was really Booki taken to market. I was frustrated by not getting much uptake, so I took Booki to Sourcefabric in Berlin and headed up the development there. Booktype gained a UX cleanup. The editor was changed after an event I organised in Berlin called WYSIWHAT. The event was meant to catalyse energy around the adoption of a shared editor for many projects. It was of many things I did in pursuit of the perfect editor. At that event, we chose the Aloha contenteditable editor. I don’t think that was as useful a change as expected at the time, but back then contenteditable looked like the way to go despite there being few editors that supported it. Since then TinyMCE, CKEditor and others all have contenteditable support, further econtenteditable became a bit of a lacklustre implementation in browsers.

Booktype was literally the same code as Booki but rebranded. So many of the same features persisted for quite some time until Booktype eventually took on a life of its own.

developed_booktype2

Displaying ‘diffs’ (differences) between 2 different versions of a chapter

developed_booktype3

 

Activity stream of a book

I prototyped some interesting things in Booktype but not much of it got built. For example, I was keen on editing content in the style of the final output. The example below is using the CSS layout of Open Design Now which I mentioned below with reference to BookJS.

developed_booktype4

I also made a task manager prototype based on kanban cards but it was never integrated into Booktype.

developed_booktype5

I think there are only really three parts of Booktype’s development that occurred while I was in charge of the project. First was the integration of a short messaging system into a user’s dashboard and into the editor. Essentially you could highlight text in a chapter, click on the msg widget and a short message could be sent with that text to whoever you wanted (in the system). It was intended for fact checking or editing snippets etc. The snippets were loaded into an editor to assist with this kind of use case.

developed_booktype6

A good idea but seldom used.

In addition, Booktype supported groups, so users could form groups which were populated by users and books. The idea behind this was that you could form a group to work on a collection of books collaboratively.

Lastly, the renderer was integrated in a more sophisticated way so you had more control of the output from within Booktype. Essentially you could choose a number of output options and style them from within Booktype.

developed_booktype7

These were all interesting additions, but in the end, Booktype was really only Booki taken a step further as a product without offering much that was new. I think we should have probably actually removed a lot of things rather than adding new things that didn’t get used.

Booktype developers added some interesting stuff after I left. I particularly like the image editor and the application of themes to the content.

The image editor enables you to resize and effect an image from within the chapter editor.

developed_booktype8

The theme editor allows you to choose from an array of styles/themes and edit them.

developed_booktype9

developed_booktype_10

Booktype 2 has since been released. It has become a standalone business and is doing good work. Also, the Omnibook service is a ‘booki.cc’ online service based on Booktype 2.

Booktype has been used by many organisations and individuals to produce books.

developed_cryptoparty

developed_booktypeangola

I think Booktype 2 is good software but I didn’t particularly like the direction of the Booktype UX after I left the project – it was becoming too ‘boxy’ and formal. ‘Good UX’ is not necessarily good UX. So I eventually developed another system with a much simpler approach, specifically for using with Book Sprints (but it has had other uses as well). More in this in a bit.

book.js

One innovation, and a further exploration of using the browser as a typesetting engine, that resulted from my time with Booktype is book.js. Essentially I had been looking for a new way to render books from HTML using the browser. I wanted to understand how Google docs could have a page in the browser and then render it to PDF with 1-to-1 accuracy. Surely the same could be done with books? However, no one could tell me how it was done. So I researched and eventually found out about the nascent CSS Regions that would allow you to flow text from one box to another in HTML. I worked with Remko Siemerink at a workshop in Amsterdam to explore PDF production from CSS. We made an interesting book with some of the Sandberg designers.

developed_bookjs

After more research and breaking down what I thought could happen, Remko worked with CSS Regions (& js) to replicate the page design of a book called “Open Design Now.” It was amazing. He got the complex design down plus it was all just HTML and CSS, it looked like a page AND when printed from the browser it retained a 1:1 co-relation. Awesome.

The following summer I was able to employ someone for Booktype to work on the tech and I hired a developer (Johannes Wilm) to work on the PDF rendering. From there we eventually had BookJS that enabled in-browser rendering of books which could then be exported to print-ready PDF by just printing from the browser. Whoot!

developed_contents

After time, BookJS (original site still available) could also formulate a table of contents etc. It was, and is, pretty cool and IMHO is the right way to do this. Unfortunately, Google Chrome decided to discontinue CSS Regions so if you now want to use BookJS you have to use a very old version of Google Chrome. However, better technologies have come along that support the same approach, namely Vivliostyle (which Johannes later worked on).

Github Editor

During the time with Booktype, I also did some experiments in other processes. One was using Github as the store for an EPUB and using the native zip export that Github offered to output EPUB (since EPUB is just a zip formatted in a specific way). Juan Gutierrez put together a quick demo and I published about it here:
http://toc.oreilly.com/2013/01/forking-the-book.html

Sadly the demo is no longer available but later a good buddy, Phil Schatz, adopted the idea and built something similar and (I think) much better:
http://philschatz.com/2013/06/03/github-bookeditor/

PubSweet 1

Since Booktype was going its own way, I wanted to develop a new, lighter, system for Book Sprints. Hence Juan and I worked out PubSweet, a simple PHP-based system. This would later be reformulated as a JavaScript system (see below).

developed_pubsweet1

PubSweet was very simple. Essentially 3 components – a dashboard, a table of contents manager, and an editor. It would later evolve to include more features but it was really the same as the systems I had developed earlier, though simpler. I wanted to get to a cleaner interface and bare basic features. I also wanted to retain some book elements, hence the table of contents manager looks a little like a book table of contents (except the garish colors ;).

PubSweet 1 is still in use by the Book Sprints team and functions well. It uses BookJS for rendering paginated books and for PDF rendering straight out of the system. It can also generate EPUB directly. Apart from that, it is pretty simple and effective. I used a basic card-based task manager for this, based on an earlier prototype I made when working with Booktype. It was a simple kanban type board but we never properly integrated it.

We included annotation using Nick Stennings Annotator project:

developed_pubsweet2

PubSweet 1 has been involved in producing more books than I can count, for everyone from OReilly books to Cisco, to the World Bank, UNECA, IDEA etc etc etc

developed_oreilly

developed_labcraft

 

developed_cisco

developed_f5

PleigOS

Somewhere along the way, I developed a simple system for creating Pleigos – one-page books created by folding a single piece of paper which has text and images. The system places text and images so that when you fold a single page after printing, a small book is formed. It was more an artistic experiment than anything but it was fun.
http://www.pliegos.net/

Note: the Pleigos project was by my good friend Enric Senabre Hidalgo, I just developed the initial Pleigos software.

Lexicon

The UNDP approached me to build software for developing a tri-lingual lexicon of electoral terminology. The languages to be supported were English, French, and regional Arabic. It sounded interesting so I used PubSweet as the basis for this.

The main difference between Lexicon and PubSweet was that you could choose to create a chapter from 2 different types of editor. Editor 1 (‘WYSI’) was a typical WYSIWYG editor. This was used for producing chapters with prose. The second type of editor (‘lexicon’) allowed you to create a list of terms, each with three different translations – English, French, and Arabic. This opened my eyes to the possibilities of having different types of editors for different content types, a strategy I hope to use again.

developed_lexiconThe ‘lexicon’ editor in action

The final print output looked pretty good:

developed_lexicon2

The system enabled a dozen or so people from different Arabic regions to discuss translations and work collaboratively through a list while at remote locations. We built a specific discussion forum for them as part of the system and had up and down voting etc. I liked this project a lot because it was very different from any other project I had worked on yet it employed many of the same strategies.

Aperta

Along the way I was approached by John Chodacki of PLOS to build a Journal system for them. I knew nothing about scientific journals, so I went through a process of re-education 🙂 It sounded interesting! I spent a year-and-a-half heading up a team to design and develop this system. This is pretty much the first time I wasn’t scraping together pennies to build a system.

developed_aperta

Journal systems aren’t that different from book production systems, in fact doing this project helped me realise that the production of knowledge, in general, follows a particular high-level conceptual schema:

  1. produce
  2. improve
  3. manage
  4. share

Each artifact (journal article, chapter, book, issue, legislation, grant application etc) follows its own kind of path, with its own unique processes, through these four stages. I have written more about this elsewhere in this blog so won’t go into more detail about it here.

Research articles, (Aperta wasn’t initially intended to deal with Journal Issues which are a collection of articles) come into a Journal as (predominantly) MS Word. They then need to follow a process of checks (eg. to make sure the article is right for the journal etc) before going through the hands of various editors (handling editors, academic editors, etc) and reviewers, including a back-and-forth with the author(s) and a final pass through a production team and/or external vendor to prep the files for publication. The biggest difference I found from my previous experience with book production systems is that there is more to-and-fro involved. Simply put, there is more workflow. So Aperta addresses this with a simple workflow engine based on the Trello/Wekan kanban card-based model.

Aperta was built in Rails with Ember JS. The front end was pretty much decoupled from the backend. It was pretty ambitious and we decided to use the Wikimedia Foundations Visual Editor at the heart of the project. It was the most sophisticated editor available at that moment. I have come to learn that you have to take what you have at the time and work with it. We committed to adopt it and make contributions. I hired the Substance.io chaps (Michael and Oliver) to work on the editor. They did a great job. (Subsequent development of their own editor libs has enabled us to work with them for Coko (more below). )

Aperta’s approach was to simplify the submission process that was managed by Aries Editorial Manager. We leveraged the OxGarage project for MSWord-to-HTML conversion and imported the manuscripts into Aperta. Then workflow templates could be set using kanban cards (as per above). These cards then enabled a flexible method for gathering information (submission data) from the authors at submission time. A lot of time was spent on getting good clean, simple, UX. The kanban card system was very modular – the cards themselves were their own applications, enabling anyone to build a card to their own requirements. This made the cards extremely powerful as they were essentially portable applications.

Further, I developed a model of sorting a lot of cards into columns much like Tweetdeck’s saved-columns model. It was all rather neat. A ton of innovation. Unfortunately, I don’t have any screenshots apart from what I see PLOS has on their website:
http://journals.plos.org/plosbiology/s/aperta-user-guide-for-authors
http://journals.plos.org/plosbiology/s/file?id=d45b/Aperta-Quickstart.pdf
http://journals.plos.org/plosbiology/s/aperta-user-guide-for-reviewers

Aperta is in production now for PLOS Bio but unfortunately the code still isn’t available.

I think Aperta radically rethought how Journal workflows might be managed. I learned a lot through this process and it informed decisions and architecture for the next platform I was to work with – PubSweet 2, developed by Coko.

I also think taking a step into another realm where many concepts were transferable but the use case was different was very good for me. It helped me abstract a few notions. It was also good to build a sophisticated framework in another language and to have the resources to try things out. It was an excellent incubation period where I learned a lot while building a pretty good platform.

During this period I also developed a kind of Objavi 3 but I’m not sure now what it is called or if it is used.

PubSweet 2

The following systems are a result of a team of very talented people at Coko. This has been the first group of people that I have worked with that enable the systems to get close to what I believe is an ideal state for the problems at hand, the time we live in, and with the technology available. If I have learned one thing over the past years, it is that, generally speaking, development teams often prevent good solutions from happening. The Coko team is a rare case where the solutions can flourish and become what they need to be. I’m very lucky to be working with such people.

PubSweet 2 is not actually a publishing platform, it is a de-coupled JavaScript framework that enables the production of multiple publishing platforms.

developed_pubsweet_2dev

With all the other platforms I have been involved with I have always learned something. So the next platform is always better. Strangely enough, because when I look back I remember, for example, thinking that Booki was the best platform ever, and perhaps the best I could do. But not so. Each subsequent platform has been much better. So… why not build a framework that enables the rapid development of many platforms at once? good idea! Imagine what you could learn… Indeed!

Using PubSweet, we have developed two platforms to date. Science Blogger and Editoria. For the purposes of this blog post I’ll just focus on Editoria as Science Blogger is more of a reference implementation. However, all these front components can be developed independently of each other, and independent of the core framework. Hence we have released some PubSweet NPM modules (sort of like front end plugins) online:
developed_node

If you would like to read more about PubSweet check here:
http://slides.com/eset/ismte

Editoria

https://editoria.pub/blog/

developed_editoria

Editoria Book Builder component

What is interesting about Editoria is that I didn’t design it. I facilitated the staff at the University of California Press to design it. And what is really interesting is that it is a better book production software than any of the above that I did design…. That’s not to say that I finally realised I was a crappy designer 😉 , rather I came to realise that given the right guidance and parameters, the people with the problems can solve the problem with more deep understanding and nuance than I could as an outsider to their processes. It was a great thing to realise.

In general, Editoria follows the same model as PubSweet 1. It is a simple collection of 3 interfaces – dashboard, book builder (in PubSweet 1 we refer to it as a table of contents manager) and a chapter editor. Very similar to PubSweet 1 and also these components fit into the conceptual schema I mentioned above of:

  1. produce – this happens outside the system and we convert authored MS Word files to HTML
  2. improve – styling, editing, reviewing, all occurs in the editor component
  3. manage – basic workflow managed by the Book Builder component
  4. share – when done we export to book formatted PDF, and EPUB

Editoria is a very simple and elegant solution. It’s the only one of the systems I have worked with that is not in production but I’m looking forward to seeing it up and running soon.

It uses the Substance.io libraries to build a custom made and elegant editor. Finally with some real $ to spend on moving this field forward, and as part of my 2015 Shuttleworth Fellowship I committed $60,000 USD (10k a month) or so to Substance, Coko followed it by a similar amount, so they could focus on the development of their libraries to 1.0. Coko also put considerable effort into founding a consortium around Substance (although I’m not convinced it is really working well yet).
http://substance.io/consortium/

Editoria brings some nice implementations to the table including the use of Vivliostyle to render PDF from HTML. Plus MSWord-to-HTML conversion, front and back matter divisions plus body content. Pagination information (left/right breaking) etc. In addition, we are building into Editoria various tools in the editor including its own annotation system and a host of publisher-specific markup options for different styles (quotes, headings etc). Coko has employed Fred Chasen for some part time work to contribute to the Vivliostyle development (although we are still working out what we should work on).

In many ways, Editoria is the system I always wanted to be involved in. It is better than any other system I have seen or been involved in and this is a result of two critical factors:

  1. the technologies have matured, including Vivliostyle and Substance.io, that enable critical solutions to be solved
  2. the people who need the system have designed it

There is more to come from Editoria, so stayed tuned…

INK

INK is like Objavi on steroids. It is possibly an Objavi 4 😉 but done the right way. INK is a Rails-based web service which is primarily built to manage file conversions. However, it can actually be used for the management of any job you wish to throw at it. These jobs are what we call steps, and steps can be compiled into recipes. For example, you might have a step ‘convert MS word to HTML’ and then a second step ‘Validate HTML’. Hence INK enables you to chain together these steps.

develop3ed_ink

Additionally, these steps are plugins written as Gems. A gem is a Ruby-based plugin architecture. Accordingly, you can develop gem steps and distribute them online for others to use. We hope in time there will be a free (as in beer) market around these steps.

Summary thoughts

I think I learned a lot from writing this personal ‘sense making’ piece. I didn’t realise just how strong some of the themes were that I pursued until a wrote them down…for example, I pursued collaborations from inter-organisational consortiums a number of times and none of them have worked. Huh. Interesting. I’ve also seen various practices evolve from an idea to being mature thoughts, prototypes, workable solutions and eventually, eventually, adopted. But over many more years than I expected. Also interesting.

It is also obvious that there are some big ticket technological problems that still need to be solved for good to really move forward. They are mostly there but still in need of work, the top two being:

  1. browser as typesetting engine
  2. sophisticated editor libraries

I add a third which I haven’t noted much in the above. I have worked on this since Aperta onwards and it is necessary only in the publishing world where content is authored in the legacy ‘elsewhere’ (ie MSWORD):

  • reusable, sensible, MSWord to HTML conversion

This has been solved many times but has not yet been solved well.

These all need to be open source solutions. At Coko we are trying to move all these on and we are making good contributions. We, as a sector, are nearly there. Although it would be good to work more together to solve these problems by either making contributions to the existing efforts, or using their technologies. This is really the only way things move forward.

Finally, in the tech world people sometimes quote “Being too far ahead of your time is indistinguishable from being wrong,”. I’m not sure who said it but when I look through the evolution of technologies I have written about above, seeing various things evolve from idea to a production implementation many years later, I’m also thinking that perhaps sometimes waiting might be indistinguishable from being right 🙂

The beauty of small teams

athens_meetup

I have been involved in setting up a team for the Collaborative Knowledge Foundation (Coko). We are about 12 all up now and spread around the globe. At our core we have :

Jure – Lead PubSweet Developer Slovenia
Richard – PubSweet Developer Kenya
Yannis – PubSweet Frontend Developer Athens
Christos – PubSweet Frontend Developer Athens
Charlie – INK Lead Developer New Zealand
Wendell – XSL-pro East Coast USA
Julien – UX-pro France
Henrik – Designer Netherlands
Juan – Sys Admin Nicaragua

and then we have myself, Kristen Ratan (co-Founder), and Alex (process manager) in San Francisco.

So… just looking at how development works… in the dev side of things we are pretty well distributed over the world and across a large variety of time zones, in fact, NZ, EU, USA is about as bad as it can get when setting up meetings at reasonable hours for all affected… so how do we manage to make this work?

Well, firstly we obviously rely on some remote tools. We all coalesce around Mattermost – an open source version of Slack (By the way, don’t use Slack – https://clearchat.com/blog/forget-applevsfbi-slack-gmail-have-backdoors/). This is a good avenue for remote chatting. We use it for both chit chat, y’know, hanging out and taking bullshit, and also work. Work mostly gets done on side channels, I wrote something about it here.

Okok..so that’s Mattermost. It’s a good tool but if we were just a bunch of people scattered around the world with a chat channel then it probably wouldn’t work as effectively.

So, I wanted to say something about creating space for people to operate and how remoteness can actually play into that rather nicely.

First of all, we are an organisation that likes people to like what they do. So we like to trust them and give them a bit of room to move. All people are creative and I believe all folks need challenges where they can exercise that creativity. Nothing new here – if you want to read more about this, with special reference to development teams, then read Peopleware (terrible title, great book – very old school… you can stop reading when they talk about optimum cubicle layouts, but the rest is awesome).

What I have found over the years is that responsible people work responsibly. So, I am assuming you work with some people like this 🙂 Otherwise return to zero and start again…. the thing about responsible people is that they like to get their head down into a flow and get on with it. They love hard problems and they love the trust given to them to solve these problems. But sure sure, this can happen in real space in a shared office…but…can it? I think there is an argument to made here that remote teams enable a more optimal working environment for this kind of team.

In our case, for example, we actually only have two people in the same place (not counting San Francisco). They are Yannis and Christos. Very nice chaps if you ever happen to be fortunate enough to meet them. Yannis and Christos work on developing the front-end components for the PubSweet publishing framework that Richard and Jure are building. We have an advantage here that we have carefully chosen to create a technical architecture that supports the autonomous development of front-end components which is separate to the backend that they rely on (the backend is known as PubSweet or sometimes we refer it to PubSweet core or ‘the backend’). Anyways… note to self, how you build what you build is a determining factor on how your team is structured and where they are… but… back to remoteness…

When we develop front-end components for clients, I go with Alex and/or Kristen to the client’s locale and then work through their design in a Collaborative Design Session. We then write this up and meet with Julien, Yannis and Christos remotely with a online whiteboard tool (BigBlueButton). And we have another collaborative session where we nut out the details. This is all a little by-the-by but just to say that we can segment our work quite nicely and the communication that does take place remotely is either:

  1. bullshit and fun
  2. high value and short

These whiteboard sessions are type 2 from the above. We set 2 hours to have the complete design sorted. When that meeting is done, everyone knows what they need to do. After that, we leave Yannis and Christos to get on with it and let us know when they are done. Any miscellaneous stuff gets sorted out bit-by-bit between the people involved. If, for example, some more UX mocks are needed, or some CSS detail, then Yannis or Christos will ping Julien on Mattermost and sort it out. No project management, no scheduled meetings. Short and sweet as per type (2).

At the same time, Jure and Richard are sorting out the backend. They communicate amongst themselves and solve the complex problems that such an abstracted system requires. All over Mattermost, too. Of course, there are various breakouts to real-time calls when required but it’s a minimum. Watching Jure and Richard tumble through difficult problems in Mattermost is a sight to behold. Both firing off each other and throwing ideas back and forth.

My job is largely to give each of these folks room to move.

So what about when each team needs something from the other? Being right up on the coal face of the client needs, I can see what things we need to address a little ahead of time and I call out to those affected and work out how we can sync the various threads. For example, we need to integrate PubSweet with INK (more about this below) for importing Word docs into PubSweet as HTML. So, seeing this, I talk to Jure mostly, and think through the issues. Jure interprets my partial developer-/partial user-speak and thinks about it. He then discusses it with all those necessary and they work out the best approach. Then we work out when it all needs to happen by and the order of play. Everyone then orientates around that timeline and away we go…in otherwords…the parallel teams ‘touch’ only when necessary – when they need something from each other. That’s not to say we try and stop people from talking to each other 🙂 Quite the opposite, we don’t gate any communication at all and leave people to sort it out. What we don’t do is build things before we need them or layer on complex management overhead. These are smart responsible people, let them work it out 🙂

athens_whiteboard

the above image drawn with the fabulous MyPaint (open source) on a touchscreen Thinkpad running Ubuntu.

We are also achieving the same process with INK. Charlie is down in the Southern hemisphere tinkering away. She’s like a deep sea diver in Ruby and coming up to explain to us surface dwellers what’s going on down there. You saw whaat? …an uncontrolled multi-threaded waaaaaat!?oh good, you killed it Just-in-Time???…phew….. Charlie is building a system for managing file conversion pipelines but it is very abstracted. Essentially it is a job-agnostic queue manager. You can throw anything at it and it will do it. We are first using it for MS Word to HTML conversion. Wendell and Alex are working together to get the conversion process into shape. Wendell is writing the code and Alex is testing. Then we are making this into what we call a ‘step’ – essentially a plugin to INK. So Charlie can build the architecture for INK and then we need to make a light INK wrapper for the conversion scripts Wendell is making and then Alex can (not happening yet but soon) actually use INK to test the whole sheeeeebang. Cool. So, they can all operate pretty well remotely to each other building the same machine.

The cool thing about all this is that, sure sometimes details are hard to transmit, but mostly what happens is that people have most of the day, every day, to work on what they need to work on. I think this is helped by being remote. We have to be efficient. We don’t get in each other’s way. We have to get clear understandings fast. Remote comms aren’t nice for long conversations. We need them short, sweet, and efficient. Then everyone can just get on with what they want to do most…solve interesting problems.

I’m not going to say we are perfect. I’m also not going to say this process is perfect. We see the issues. We plan to have at least one full in-person team meet every 0.8 years so everyone can hang out and fill up the camaraderie tank (important)…the last one was in Athens and it was a blast

athens_meal

athens_pause

…and I make sure I go and see everyone every few months to make sure we are all on the same page and feeling connected. This setup needs these additional things. But it works, and when I travel around to see everyone, I see it working well and I see people performing exceedingly well doing things they want to do.