We may be looking for another JS dev to add to the kotahi team…not advertising it as yet but if you are looking for something cool to do and you are a good JS dev then just ping me.


Recently I’ve been amazed at what we can do with Editoria. The complexity of the content we can manage and output is really astounding. I’ll share some EPUB and PDF soon that shows what the content can look like.

This is the result of many years work. I have been developing book production systems for a long time. The first one I produced was for FLOSS Manuals and it was me cutting and pasting perl code to extend TWiki into a book engine. The thing about it is that it worked. It actually had all the main components Editoria has now – a dash, a book builder, a chapter editor, an export engine etc… we could, at the time, produce books which were printed. The output was determined by what the editors and PDF generation tools were capable of back then. I was quite happy with the results at the time and so I figured this kind of process would soon become the norm in publishing. This was in 2007 or so.

In some sense the output of those systems did become the norm in some circles. The output, for example, was very similar to what you can see coming out of book production tools today like PressBooks.

Since then I’ve been involved in all sorts of publishing systems and learned a lot along the way. What I’ve learned is the road to get from what we could do in 2007 to where we are today is long. At the time I didn’t understand this. I figured the books output to PDF were ‘good enough’. But then I started working directly with publishers, I was no longer an outsider – I saw what it was they needed that was not in the systems that I had built to date.

What those publishers needed actually sounds small. In general, they need tools that could enable them to have fine control over the text and semantics of the content. I didn’t understand how much effort getting the latter was going to take. What I didn’t understand was that the tools required to manage these semantics for editing/creating book display semantics  and rendering it was PDF would need to be exponentially more sophisticated.

By display semantics I mean this – a book as rendered in print form has various elements that form the structure, look and feel of the book. We are talking about titles for chapters (at the simplest level) or image captions. These things need to be identified in the underlying markup so that they can be styled and positioned when outputting the PDF.

However this is just the tip of the iceberg. There are also things like two column content, footnotes, floating images, text flow around images, block images (that fill an entire page to the edge), two page spreads, overlays, inserts, indexes etc…. all of which must first be marked up in the right semantics (and these semantics must be maintained throughout the books production lifecycle without being screwed up) so that we can style and position that content on the page correctly at render time.

This is also not all of it. It might not be obvious but in many books we have call outs to content that appears later in the book… how do we enable publishing staff to mark up this kind of relationship? How do enable this kind of inclusion when it comes to rendering the PDF?

We need a high resolution of semantics in the underlying content so we can achieve complex output for print. We then also need a typesetting engine than understands these semantics and can apply the right style rules to produce the design we are after. These tools have to work together and they have to also understand referencing of content throughout the book for creation of indexes etc…

Well, you might think that an easy path is to stay with non-complex output. I did a the time. But, as it happens publishers aren’t happy with that answer.

Also, I don’t believe many folks really understand how complex this problem is. What I found is that we needed a word processor and a typesetting engine of the kind that did not exist. We needed one for the web that met the standards publishers demand. The problem was – there weren’t any.

So the process of building a book production engine became the problem of building a word processor and a typesetting engine. These two technologies are not simple categories. These are sophisticated problems. Who are the organisations that have solved this in the past for publishers? Microsoft and Adobe (although Microsoft never solved this issue well which is one of the many reasons we produced other tools like XSweet).

Anyways… it is a little ridiculous to have to build these things. They should have existed. But they didn’t. I have to say I had a large quantity of crap thrown at me by various folks over the years for not being able to solve this over night. I guess it wasn’t their fault, they didn’t understand how difficult this problem was.

The path to building a word processor also wasn’t helped by the fact that we had to build it twice as the first build was on third party libraries that eventually fell into disarray. That was a tough one.

So its been a long journey. Not many will understand what we have achieved. Its not just the individual components, but what we had to understand to do it and to put them all together in ecosystems (platforms) that work. Its enormous. Really really enormous. Its also been, obviously, a huge team effort by Coko and Cabbage Tree Labs folks. But we did it. I still really don’t comprehend how big it is myself. How did we, a small rough-as team of nobodies, build tools that took Microsoft and Adobe decades to build and refine with all the resources they had to throw a the problems? Did we do it in a way we wanted to do it, that brought insights to the problem very few have? I believe we did. Its really really astonishing.

However, this week I really felt we got there. It was a more profound feeling than I have had before of ‘getting there’. . I mean, I kind of knew we had solved these problems for a while now. But this time the feeling was deep. It came because we recently made some relatively small changes to Editoria that brings it all together and the results are quite astonishing. I’ll be sharing some content we have produced recently with Editoria but it is exceptionally good. It puts Editoria in a class of its own. Editoria is now capable of very sophisticated output of the kind that will make many publishers happy.

The same is true of the EPUB and web based HTML output coming out of Editoria… but more on all this soon…