Wax

Roadmapping Athens to SF

Just been in Athens, hanging out with the Coko Athens crew – Yannis, Christos, Giannis and Alexis. A thoroughly good bunch.

I think before I’ve mentioned how proud I am to work with them and what we are achieving. I mean, it’s a teamof 4 people and together they are developing 3 publishing platforms and a sophisticated web-based word processor … essentially 4 platforms – Editoria, xpub (Journals), micropubs, and Wax (editor)… I mean… talk about punching above your weight! More than that, we have fun doing it.

Every month I come to Athens and we discuss the coming month’s roadmaps. I come for a few days, we drink a lot of Freddo Espresso and sometimes (at the other end of the day) some margaritas. In between, we order some iffy delivery food and plan the future.

The roadmapping sessions involve us going through each platform together, and looking at last month’s roadmap. Discuss the next priorities and approaches, and then commit those to the next month’s roadmap for that project. It’s pretty interesting and super great to have everyone involved in the process. We get a good wide range of opinions and at the same time give everyone ownership of their own project. This is what collaboration is all about.

So some short notes here on this month’s roadmapping…

Wax – We have come very close to a fully functional web-based word processor based on the Substance libs. It is looking amazing with support for the normal editor stuff – headings, images, bold etc – plus some amazingly sophisticated features including track changes, notes management (more complex than it sounds), thread-based commenting/annotations, diacritics support … and a lot more… However, we have decided to start building a new Wax based on the ProseMirror libs, mainly because there is a lack of community around the substance.io libraries and we wish to de-risk ourselves going forward. So we’ll finish off Wax 1.0 with a Substance upgrade which will also bring us table support, nested lists, and some other issues. At the same time we will continue developing the new Wax (we already have something basic working) – first by adding some interesting widgets that the ProseMirror community have built, and then by building a simple plugin structure for editor widgets. We split Wax roadmaps up, but the important stuff for the current wax is listed in the Editoria repo – https://gitlab.coko.foundation/editoria/editoria#roadmap

xpub – we are now fixing a few bugs in the journal system and moving forward with migrating to the new shared data model that was collaboratively designed at the recent PubSweet meeting in Cambridge (UK). This will include some initial research into GraphQL (part and parcel of the migration). Part-way through August, Giannis is attending the Libero workshop at eLife and will work on that with the eLife team for 2 weeks. Libero is the open source web-delivery part of the publishing cycle that eLife has designed and is about to build. So we want to put into that effort and learn what we can. That also means the actual migration to the new data model will happen after those 2 weeks ie. We’ll start it on Sept 1. More info here – https://gitlab.coko.foundation/xpub/xpub#roadmap

Editoria – this is coming along fast. Alexis just added EPUB export and overhauled the workflow management tool. Editoria is pretty much ‘fully fledged’ although we have many ideas for new features – however much of this will wait until the Editoria community meeting in San Francisco in October. In the meantime, we are being good open source citizens and writing a lot of under-the-hood tests. Alexis will also spend a day or so writing a new Authsome mode to match the Book Sprints workflow to show that the auth app Jure (PubSweet lead dev) built is indeed plug-and-play. More info here – https://gitlab.coko.foundation/editoria/editoria#roadmap

Micropubs – we are developing a micropublications platform with the Wormbase crew. Its early days but ison schedule with the first thin slice. It contains complex integrations and complex submission forms. But Yannis is making good speed, so more of the same! https://gitlab.coko.foundation/micropubs/wormbase#roadmap

It is, as I said above, an awesome team. I’m very proud of what we are achieving together and I like to punch above our weight. Much more information coming soon about all of this as we go!

Spending a night tomorrow in the UK, and then to San Francisco for a meet with Kristen, attending FOO camp and many other bits n pieces.

CodeMirror Embedded in Wax

The Wax (a web based word processor we are developing) lead dev -Christos Kokosias – yesterday embedded another editor, in the editor. We need this for the PubSweet Book Sprint where we will be including code snippets in the documentation we create. So we need a nice way to manage that… consequently Christos embedded the CodeMirror code editor into Wax. So it has things like syntax highlighting, line numbering, auto complete etc all built into the code blocks…its pretty cool… below is are two short vids Christos made to show it in action:

Pretty cool stuff and extremely useful for writing documentation that includes code…ask yourself… can MS Word or Google Docs do that? #opensourceisbetter

Whats Involved in Building Editoria

We have a webinar coming up soon for Editoria. In anticipation of that I thought I’d write a little overview of what was involved in building the product. So… Starting with a little hand drawn schematic…

The above shows all the moving parts that we had to consider, and build, to make Editoria what it is. From the user’s perspective, it looks like one app – the Editoria interface – where they can upload a docx file (or files) and start work on producing a book.

From an ‘under the hood’ perspective, it looks like the following…

xsweet

As it happens, we had to build all of the above to make it possible… So, here is a list of all the parts starting from ingest…

Ingest

Ingest consists of 2 parts:

INK
XSweet

INK is a job processing workhorse. Essentially it will run any kind of job you want and ‘steps’ are written for specific kinds of jobs/processes. These are a kind of ‘wrapper’ around executable code. INK has an API that enables platforms like Editoria to send files to it and request a specific set of steps to be run, and then return the result. In this instance, INK runs a set of steps that wrap up XSweet.

XSweet is a set of scripts that take a docx file and convert it to HTML. It is very modular and can be pretty easily extended and improved. It can also be customised per use case.

So, on the ingest stage it looks like the following to the user:

user logins in to Editoria
they create a new book from the book dash
they vlivk on the new book
from the book builder interface they choose ‘upload multiple word files’ or ‘upload word’.
they select the word file(s) from a system dialog
these files are automagically appear ‘in the book’ and can be edited
1. note : in the case of multiple docs files, if the files have been named according to a simple convention, Editoria will create each new chapter and automatically populate the structure of the book and load the content in the right place

From the tech side the following happens:

when a word doc file or files are selected, the files are sent to INK via the INK API, with a request to convert to docx using the XSweet steps
INK runs the docx file(s) through the XSweet steps, creating HTML
when the job is done, INK signals Editoria that the files are good to go
Editoria grabs each file when it is ready and puts in the the right chapter, creating the chapter if multiple word files are uploaded at once

So… we had to build all of the above. First, INK exists because, crazy as it sounds, we could not find an elegant open source pipeline manager to manage jobs like this. So we decided to build from zero. In addition, which also sounds crazy, there is not a comprehensive open source library out there for converting docx to HTML that was easy to extend or customise for different use cases. We certainly know of many scripts for docx->html (many are listed on the XSweet site), but they are either unmaintained, or difficult to extend. So, unfortunately, we had to build that also.

Editoria App

The Editoria App is what people think of ‘as Editoria’. Everything the user interacts with is, on the surface, Editoria interfaces in the browser. However, Editoria comprises of 3 main parts:

PubSweet
the Editoria interfaces
Wax

PubSweet is the ‘under the hood’ CMS that enables apps like Editoria to be built on top. PubSweet is a Node/React application written entirely in Javascript and it sits on the server and provides a ‘wrapper’ for the Editoria interfaces. You can think of PubSweet as a ‘headless’ CMS that thinks like a publishing system. Once again, when we started (and still today) there was nothing out there that could realise multiple publishing use cases, as diverse as micropubs, books, journals, content aggregation etc, built on top of the same CMS. So we had to build it ourselves. PubSweet is now pretty mature and indeed supports multiple publishing platforms – at last count I think that totals 6 platforms built on top of PubSweet (and it is early days).

The Editoria interfaces are the ‘web pages’ that make up the Editoria application. This includes the dashboard, the book builder etc… these are all custom code written on top of PubSweet to realise the Editoria use case / platform.

Wax is an online editor, it is so sophisticated we prefer to call it a web-based word processor. It is actually an Editoria interface, except that it is so complex and such a critical part of any online publishing tool, that it is a product in its own right. Nothing like this existed either, so we built our own (typical WYSIWYG editors are not sophisticated enough for the publishing use case).

PagedJS

From Editoria, you can press a button and have a book appear in the browser. If you print from the browser, you will then get a PDF ready to take to the printers with all the typographical bells and whistles that books need. It looks magical. We currently use Vivliostyle for this but we have hit many limitations with the software. We reached out to the org that produces the code but they weren’t ready to collaborate to improve the product. Hence, unfortunately, we had to build our own. Soon we will replace Vivliostyle with this new offering. The speed of dev has been incredibly fast on paged.js (working title) and so watch this space for some exciting news.. there are some early demos in the pagedjs repo if you are interested.

Summary

…. So, as you can see, Editoria is more complex than it appears to the eye of the user. Also, the crazy thing is that most of these parts should have existed already, but, crazy as it is, they didn’t exist. There has been no full page, open source, editor out there gives the source control and extensability required for the publishing use case. The docx conversion tools available do not yet do a good enough job of the conversion, and are not easy to customise. There is no headless CMS that ‘thinks like a publisher’ and would help us build Editoria; and we had to create a pagination engine (paged.js) because the one open source option couldn’t give us what we needed and we failed at trying to help them improve the product.

Its not like we wanted to build all this, but the sad fact is for the publishing industry the web is still an innovation they haven’t embraced (except as a distribution medium). So while all of these parts (or many of them) should exist, the sad truth is they did not exist. We did a pretty good look around to see if we could save ourselves the effort before embarking on these projects. We did, for example, bank on Substance Lens Writer for our editor, but it was not maintained and proved unstable, and our eventual assessment was that it was better to start again – hence Wax. We also have gone a long way down the path of using and promoting Vivliostyle, until it became apparent that neither the technology nor the technology providers could get us to where we needed to be… consequently, after implementing Vivliostyle (it is still baked into Editoria) we started a new approach to this problem – Pagedjs.

So, we had to build all of this ourselves… the good news is, that all of these moving parts are reusable in other scenarios… so you can take whatever you like from the above, and add it to your own product and hopefully that will fuel the publisher sectors ability to innovate and transform their products and workflows.

Show and Tell for Wax from Coko people Jure and Yannis. Using Jitsi open source video for the demo. Will be re-encoded to ogg vorbis video and posted to the Coko blog shortly. We used Youtube (boooo!) to archive this time around since we couldn’t quite get the jitsicon recording working… will get it right for next time.

The big 4

I’ve been in the business of trying to work out how to get Publishing (capital ‘P’) into the web. From the start, there have been some ‘big ticket’ items that have needed to be solved. Some are more urgent than others, but by and large we are cracking these nuts one by one. I have considered for a long time the big 4 to be:

MS Word to HTML conversion
an open source web-based word processor
paginated output via the browser to print-ready copy
in-browser designer

1,2 an 3 are the ‘now’ critical items, number 4 is necessary but a little further down the line. Thankfully, at Coko we are solving these first 3 problems. To solve (1) we are building XSweet, a comprehensive (open source) XSLT conversion suite for converting MS Word to HTML. We are also building Wax to solve (2), Wax is an open source web based word processor based on the Substance.io libs. And for (3) we are using Vivliostyle for in-browser rendering. Number 4) is still on the cards.

Interestingly, the pagination technology (3) might need re-evaluating since pagination will eventually be required for the editor and the in-browser designer.

While pagination inside a web-based processor is not critical for publishers, it is critical for authors and small offices etc and if we are going to get publishers to use a web-based word processor then it would be better that they share infrastructure rather than sit on their own island of technology ie. eventually we need authors to use these tools too. By sharing infrastructure I don’t mean they need to use exactly the same tools, they just need to use compatible tools. So, eventually, we need to migrate authors into the web. It is not critical now, but over time, as the workflow for authors and publishers inevitably becomes more integrated, it will turn out to be necessary.

For in-browser design we need pagination support also so we can work off a single source for the content and then design in the browser to output to the various formats publishers need. Think Gimp or InDesign in the browser. It’s not as far away as it might sound, but to do this we need to be able to paginate inside the browser and have that update with live style changes to CSS.

So far, we are solving the big ticket issues 1-3, but for the next stage of changes we may need to change the tools we use for pagination so we can live-update content and styles and reflow in an editor and in-browser designer. That may mean we to start looking for a different pagination solution.

Cool stuff coming up on Editoria

While Yannis and Christos have been in San Francisco they worked on a number of features that will appear in the soon-to-be-released Editoria 1.1

The two standout features include:

automagic book builder
diacritics interface

The automagic book builder built primarily by Yannis enables a user to populate the structure of a book automatically from a directory of MS Word files. Essentially, from the book builder component the user can click ‘upload word files’ and a system dialog opens. They can navigate to a folder on their computer and select the files they wish to use to create the book. Editoria then sends all these files to INK to convert to HTML, creates the structure of the book, and populates all the chapters and parts with the right content. At this moment it ‘knows’ which MS Word file is in the front/body/back matter and whether it is a part or a chapter by the file name. We will add a config so the rules regarding what a file is called and where in the book it lands can be determined per publisher or (perhaps) per user.

This feature means that a production editor can simply ‘point’ Editoria at a directory of MS Word files, which is what they are used to working with, and without doing anything else other than press ‘upload’ the book will be built for them in the correct structure (assuming they named everything right). It’s a lot less work than uploading every file individually.

The second feature is a Diacritics interface for the editor. This rather nicely made by Christos so that you can have per-publisher assigned special characters categorised and listed in a simple interface (opened from the editor). That is in itself interesting as it might lend itself to other options like a special character ‘favorites’ list etc…but two other elements add some elegance to the implementation, the first is a simple checkbox that will determine if you leave the Diatrics dialog open or if it closes automatically once a character is placed. The second is that the user can search through the Diacritics with key words. For example you could type ‘dash’ to get a list of all the dash characters (m-dash etc), or typing ‘turned g’ would give you the result: ᵷ

We observed that copy editors search Wikipedia for the special characters they want. Now they can search within the editor interface, using the naming conventions they are used.

The Diacritics feature is, of course, a wax-editor component….which is used as the editor in Editoria. We are working a lot on Wax and it will become a substantial product in its own right.

Coko Products

I am currently planning how to keep all the Coko projects balanced and moving forward. It gave me a moment to reflect on just how productive we have been. At present we have 6 major products, all moving forward at an excellent pace, they include :

PubSweet – the API toolkit for building publishing workflows (website coming soon).
INK – the file prosessing (conversion, extraction, enrichment etc) framework.
Editoria – the monograph production platform for publishers
XSweet – the XSLT production for converting MS Word to HTML (HTML Typescript)
Wax – the web based word processor built on top of Substance.io libs
xpub – the (early stage) journal platform.

All this in less than 18 months, which is amazing enough but also consider that Coko was only 3 people (with Jure being the only developer) until 12 months ago. Its kind of astonishing to me.