Editoria Vid

If you missed the webinar, the video is here (it will be available from the Coko site also)…starts 9min 30s in:

Also, we had 40 or so live questions during the Q&A. We answered them all and Julien decided to do a quick dive into Right to Left (RTL) support. He posted this screenshot with mixed RTL and LTR text to show it works out of the box (still more to test):

archmerge_2018-05-16_10-25-17

Whats Involved in Building Editoria

We have a webinar coming up soon for Editoria. In anticipation of that I thought I’d write a little overview of what was involved in building the product. So… Starting with a little hand drawn schematic…

ed
The above shows all the moving parts that we had to consider, and build, to make Editoria what it is. From the user’s perspective, it looks like one app – the Editoria interface – where they can upload a docx file (or files) and start work on producing a book.

From an ‘under the hood’ perspective, it looks like the following…

xsweet

As it happens, we had to build all of the above to make it possible… So, here is a list of all the parts starting from ingest…

Ingest

Ingest consists of 2 parts:

  1. INK
  2. XSweet

INK is a job processing workhorse. Essentially it will run any kind of job you want and ‘steps’ are written for specific kinds of jobs/processes. These are a kind of ‘wrapper’ around executable code. INK has an API that enables platforms like Editoria to send files to it and request a specific set of steps to be run, and then return the result. In this instance, INK runs a set of steps that wrap up XSweet.

XSweet is a set of scripts that take a docx file and convert it to HTML. It is very modular and can be pretty easily extended and improved. It can also be customised per use case.

So, on the ingest stage it looks like the following to the user:

  1. user logins in to Editoria
  2. they create a new book from the book dash
  3. they vlivk on the new book
  4. from the book builder interface they choose ‘upload multiple word files’ or ‘upload word’.
  5. they select the word file(s) from a system dialog
  6. these files are automagically appear ‘in the book’ and can be edited
    1. note : in the case of multiple docs files, if the files have been named according to a simple convention, Editoria will create each new chapter and automatically populate the structure of the book and load the content in the right place

From the tech side the following happens:

  1. when a word doc file or files are selected, the files are sent to INK via the INK API, with a request to convert to docx using the XSweet steps
  2. INK runs the docx file(s) through the XSweet steps, creating HTML
  3. when the job is done, INK signals Editoria that the files are good to go
  4. Editoria grabs each file when it is ready and puts in the the right chapter, creating the chapter if multiple word files are uploaded at once

So… we had to build all of the above. First, INK exists because, crazy as it sounds, we could not find an elegant open source pipeline manager to manage jobs like this. So we decided to build from zero. In addition, which also sounds crazy, there is not a comprehensive open source library out there for converting docx to HTML that was easy to extend or customise for different use cases. We certainly know of many scripts for docx->html (many are listed on the XSweet site), but they are either unmaintained, or difficult to extend. So, unfortunately, we had to build that also.

Editoria App

The Editoria App is what people think of ‘as Editoria’. Everything the user interacts with is, on the surface, Editoria interfaces in the browser. However, Editoria comprises of 3 main parts:

  1. PubSweet
  2. the Editoria interfaces
  3. Wax

PubSweet is the ‘under the hood’ CMS that enables apps like Editoria to be built on top. PubSweet is a Node/React application written entirely in Javascript and it sits on the server and provides a ‘wrapper’ for the Editoria interfaces. You can think of PubSweet as a ‘headless’ CMS that thinks like a publishing system. Once again, when we started (and still today) there was nothing out there that could realise multiple publishing use cases, as diverse as micropubs, books, journals, content aggregation etc, built on top of the same CMS. So we had to build it ourselves. PubSweet is now pretty mature and indeed supports multiple publishing platforms – at last count I think that totals 6 platforms built on top of PubSweet (and it is early days).

The Editoria interfaces are the ‘web pages’ that make up the Editoria application. This includes the dashboard, the book builder etc… these are all custom code written on top of PubSweet to realise the Editoria use case / platform.

Wax is an online editor, it is so sophisticated we prefer to call it a web-based word processor. It is actually an Editoria interface, except that it is so complex and such a critical part of any online publishing tool, that it is a product in its own right. Nothing like this existed either, so we built our own (typical WYSIWYG editors are not sophisticated enough for the publishing use case).

PagedJS

From Editoria, you can press a button and have a book appear in the browser. If you print from the browser, you will then get a PDF ready to take to the printers with all the typographical bells and whistles that books need. It looks magical. We currently use Vivliostyle for this but we have hit many limitations with the software. We reached out to the org that produces the code but they weren’t ready to collaborate to improve the product. Hence, unfortunately, we had to build our own. Soon we will replace Vivliostyle with this new offering. The speed of dev has been incredibly fast on paged.js (working title) and so watch this space for some exciting news.. there are some early demos in the pagedjs repo if you are interested.

Summary

…. So, as you can see, Editoria is more complex than it appears to the eye of the user. Also, the crazy thing is that most of these parts should have existed already, but, crazy as it is, they didn’t exist. There has been no full page, open source, editor out there gives the source control and extensability required for the publishing use case. The docx conversion tools available do not yet do a good enough job of the conversion, and are not easy to customise. There is no headless CMS that ‘thinks like a publisher’ and would help us build Editoria; and we had to create a pagination engine (paged.js) because the one open source option couldn’t give us what we needed and we failed at trying to help them improve the product.

Its not like we wanted to build all this, but the sad fact is for the publishing industry the web is still an innovation they haven’t embraced (except as a distribution medium). So while all of these parts (or many of them) should exist, the sad truth is they did not exist. We did a pretty good look around to see if we could save ourselves the effort before embarking on these projects. We did, for example, bank on Substance Lens Writer for our editor, but it was not maintained and proved unstable, and our eventual assessment was that it was better to start again – hence Wax. We also have gone a long way down the path of using and promoting Vivliostyle, until it became apparent that neither the technology nor the technology providers could get us to where we needed to be… consequently, after implementing Vivliostyle (it is still baked into Editoria) we started a new approach to this problem – Pagedjs.

So, we had to build all of this ourselves… the good news is, that all of these moving parts are reusable in other scenarios… so you can take whatever you like from the above, and add it to your own product and hopefully that will fuel the publisher sectors ability to innovate and transform their products and workflows.

Some Links about Editoria

Editoria Webinar

Editoria is the book production platform we have been working on with the University of California Press and the Californian Digital Library. If you’d like to know more there is a webinar coming up (May 15) led by Editoria Community Manager Alison McGonagle-O’Connell.

Sign up for the webinar here: https://ucpress.zoom.us/webinar/register/WN_6UmmLJI2RviMh-hXcJTq_Q

Weird Scenes Inside the Vivliostyle

Strange days… We use Vivliostyle at Coko, but it has been a fraught experience. Vivliostyle is an open source javascript library for automating typesetting of HTML/CSS content. We use it in Editoria for automating the production of book-formatted PDF from an HTML source. You can read a little about how this works here.

The source code is open source, but it was apparent from the beginning that Vivliostyle-the-company were not into collaborating unless we could pay. We tried very hard to get inside their bubble and contribute, but were pretty well cold-shouldered. We offered meeting resources for community building, and developer and designer time, to help move it all on. Coko went to quite some effort to promote what they were doing (eg https://www.adamhyde.net/why-vivliostyle-is-important/ + the posts on PagedMedia.org by Coko’s Julien Taquet. But they wanted none of it. I was also pretty sure they were being paid to extend Vivliostyle but not putting those changes back into the common pool. Fauxpen, as they say.

We weren’t the only ones to feel things were odd – I spoke to many people who felt the same way and who had had similar experiences. As a result, to de-risk ourselves I founded another initiative – pagedmedia.org – so we would not be reliant on Vivliostyle. My thoughts on this were very similar to what I recently wrote about editors ie. if an open source community is not open to building community, something is wrong and you should carefully consider whether you should be involved with them. They could at any time change direction, and if there was no community at play, then you could be left high and dry. Sometimes you have to make hard decisions and we made it…

No one really wants to build an automated typesetting solution from zero unless they have to. After trying to solve this problem for the last 10 years or so (with open source) I wanted to see it done…so… sigh… if it (unfortunately) wasn’t going to be Vivliostyle after all, then what choice did I have? So I committed to this new endeavor (which is going very well! some cool updates soon) and we launched at MIT Press earlier this year with a community-first approach. We are, similarly, currently considering our choice of editor libraries because of similar concerns.

So, as it happens, I guessed it right and moved at the right time… as that is exactly what has happened with Vivliostyle. A few weeks ago they mysteriously posted a new company name (Trim-Marks Inc) and site under the old URL and pointed vivliostyle.com to vivliostyle.org. No further information was forthcoming. Now there is an announcement about it all on vivliostyle.org… you can read it here:
https://vivliostyle.org/blog/2018/03/26/a-new-beginning/

Essentially Vivliostyle-the-company went off to form trim-marks.com.
I have no idea what they do, but it isn’t open source. Vivliostyle, the open source project, is apparently continuing under the new .org site, led by Shinyu Murakami and Florian Rivoal. I know them and like them both. Very talented and committed people. I met Shinyu when he was thinking about how to tackle this problem – we met at Books in Browsers a long time ago and talked about strategies for solving this problem. So, I wish them all the best, but I retain some initial skepticism – until I see them actively building community, I won’t be terribly interested in going down that path again… They are good folks, so who knows… fingers crossed (the more projects active in this space the better).

Two new PagedMedia Posts

One from Nellie McKesson on her awesome new project Hederis.

https://www.pagedmedia.org/introducing-hederis-and-why-we-care-so-much-about-pagination/

And another from Erich van Rijn about Editoria and pagination.

https://www.pagedmedia.org/editoria-building-a-book-in-a-browser/

Editoria vids

Editoria 1.2 released today! https://gitlab.coko.foundation/editoria/editoria/blob/master/CHANGELOG.md

Some new vids Alex Theg (Coko) made showing Editoria in action. Starting with export to PDF (styling still to be improved):

Track changes:

and editing of your own comments/annotations: