CodeMirror Embedded in Wax

The Wax (a web based word processor we are developing) lead dev -Christos Kokosias – yesterday embedded another editor, in the editor. We need this for the PubSweet Book Sprint where we will be including code snippets in  the documentation we create. So we need a nice way to manage that… consequently Christos embedded the CodeMirror code editor into Wax. So it has things like syntax highlighting, line numbering, auto complete etc all built into the code blocks…its pretty cool… below is are two short vids Christos made to show it in action:

 

Pretty cool stuff and extremely useful for writing documentation that includes code…ask yourself… can MS Word or Google Docs do that? #opensourceisbetter

Whats Involved in Building Editoria

We have a webinar coming up soon for Editoria. In anticipation of that I thought I’d write a little overview of what was involved in building the product. So… Starting with a little hand drawn schematic…

ed
The above shows all the moving parts that we had to consider, and build, to make Editoria what it is. From the user’s perspective, it looks like one app – the Editoria interface – where they can upload a docx file (or files) and start work on producing a book.

From an ‘under the hood’ perspective, it looks like the following…

xsweet

As it happens, we had to build all of the above to make it possible… So, here is a list of all the parts starting from ingest…

Ingest

Ingest consists of 2 parts:

  1. INK
  2. XSweet

INK is a job processing workhorse. Essentially it will run any kind of job you want and ‘steps’ are written for specific kinds of jobs/processes. These are a kind of ‘wrapper’ around executable code. INK has an API that enables platforms like Editoria to send files to it and request a specific set of steps to be run, and then return the result. In this instance, INK runs a set of steps that wrap up XSweet.

XSweet is a set of scripts that take a docx file and convert it to HTML. It is very modular and can be pretty easily extended and improved. It can also be customised per use case.

So, on the ingest stage it looks like the following to the user:

  1. user logins in to Editoria
  2. they create a new book from the book dash
  3. they vlivk on the new book
  4. from the book builder interface they choose ‘upload multiple word files’ or ‘upload word’.
  5. they select the word file(s) from a system dialog
  6. these files are automagically appear ‘in the book’ and can be edited
    1. note : in the case of multiple docs files, if the files have been named according to a simple convention, Editoria will create each new chapter and automatically populate the structure of the book and load the content in the right place

From the tech side the following happens:

  1. when a word doc file or files are selected, the files are sent to INK via the INK API, with a request to convert to docx using the XSweet steps
  2. INK runs the docx file(s) through the XSweet steps, creating HTML
  3. when the job is done, INK signals Editoria that the files are good to go
  4. Editoria grabs each file when it is ready and puts in the the right chapter, creating the chapter if multiple word files are uploaded at once

So… we had to build all of the above. First, INK exists because, crazy as it sounds, we could not find an elegant open source pipeline manager to manage jobs like this. So we decided to build from zero. In addition, which also sounds crazy, there is not a comprehensive open source library out there for converting docx to HTML that was easy to extend or customise for different use cases. We certainly know of many scripts for docx->html (many are listed on the XSweet site), but they are either unmaintained, or difficult to extend. So, unfortunately, we had to build that also.

Editoria App

The Editoria App is what people think of ‘as Editoria’. Everything the user interacts with is, on the surface, Editoria interfaces in the browser. However, Editoria comprises of 3 main parts:

  1. PubSweet
  2. the Editoria interfaces
  3. Wax

PubSweet is the ‘under the hood’ CMS that enables apps like Editoria to be built on top. PubSweet is a Node/React application written entirely in Javascript and it sits on the server and provides a ‘wrapper’ for the Editoria interfaces. You can think of PubSweet as a ‘headless’ CMS that thinks like a publishing system. Once again, when we started (and still today) there was nothing out there that could realise multiple publishing use cases, as diverse as micropubs, books, journals, content aggregation etc, built on top of the same CMS. So we had to build it ourselves. PubSweet is now pretty mature and indeed supports multiple publishing platforms – at last count I think that totals 6 platforms built on top of PubSweet (and it is early days).

The Editoria interfaces are the ‘web pages’ that make up the Editoria application. This includes the dashboard, the book builder etc… these are all custom code written on top of PubSweet to realise the Editoria use case / platform.

Wax is an online editor, it is so sophisticated we prefer to call it a web-based word processor. It is actually an Editoria interface, except that it is so complex and such a critical part of any online publishing tool, that it is a product in its own right. Nothing like this existed either, so we built our own (typical WYSIWYG editors are not sophisticated enough for the publishing use case).

PagedJS

From Editoria, you can press a button and have a book appear in the browser. If you print from the browser, you will then get a PDF ready to take to the printers with all the typographical bells and whistles that books need. It looks magical. We currently use Vivliostyle for this but we have hit many limitations with the software. We reached out to the org that produces the code but they weren’t ready to collaborate to improve the product. Hence, unfortunately, we had to build our own. Soon we will replace Vivliostyle with this new offering. The speed of dev has been incredibly fast on paged.js (working title) and so watch this space for some exciting news.. there are some early demos in the pagedjs repo if you are interested.

Summary

…. So, as you can see, Editoria is more complex than it appears to the eye of the user. Also, the crazy thing is that most of these parts should have existed already, but, crazy as it is, they didn’t exist. There has been no full page, open source, editor out there gives the source control and extensability required for the publishing use case. The docx conversion tools available do not yet do a good enough job of the conversion, and are not easy to customise. There is no headless CMS that ‘thinks like a publisher’ and would help us build Editoria; and we had to create a pagination engine (paged.js) because the one open source option couldn’t give us what we needed and we failed at trying to help them improve the product.

Its not like we wanted to build all this, but the sad fact is for the publishing industry the web is still an innovation they haven’t embraced (except as a distribution medium). So while all of these parts (or many of them) should exist, the sad truth is they did not exist. We did a pretty good look around to see if we could save ourselves the effort before embarking on these projects. We did, for example, bank on Substance Lens Writer for our editor, but it was not maintained and proved unstable, and our eventual assessment was that it was better to start again – hence Wax. We also have gone a long way down the path of using and promoting Vivliostyle, until it became apparent that neither the technology nor the technology providers could get us to where we needed to be… consequently, after implementing Vivliostyle (it is still baked into Editoria) we started a new approach to this problem – Pagedjs.

So, we had to build all of this ourselves… the good news is, that all of these moving parts are reusable in other scenarios… so you can take whatever you like from the above, and add it to your own product and hopefully that will fuel the publisher sectors ability to innovate and transform their products and workflows.

The big 4

I’ve been in the business of trying to work out how to get Publishing (capital ‘P’) into the web. From the start, there have been some ‘big ticket’ items that have needed to be solved. Some are more urgent than others, but by and large we are cracking these nuts one by one. I have considered for a long time the big 4 to be:

  1. MS Word to HTML conversion
  2. an open source web-based word processor
  3. paginated output via the browser to print-ready copy
  4. in-browser designer

1,2 an 3 are the ‘now’ critical items, number 4 is necessary but a little further down the line. Thankfully, at Coko we are solving these first 3 problems. To solve (1) we are building XSweet, a comprehensive (open source) XSLT conversion suite for converting MS Word to HTML. We are also building Wax to solve (2), Wax is an open source web based word processor based on the Substance.io libs. And for (3) we are using Vivliostyle for in-browser rendering. Number 4) is still on the cards.

Interestingly, the pagination technology (3) might need re-evaluating since pagination will eventually be required for the editor and the in-browser designer.

While pagination inside a web-based processor is not critical for publishers, it is critical for authors and small offices etc and if we are going to get publishers to use a web-based word processor then it would be better that they share infrastructure rather than sit on their own island of technology ie. eventually we need authors to use these tools too. By sharing infrastructure I don’t mean they need to use exactly the same tools, they just need to use compatible tools. So, eventually, we need to migrate authors into the web. It is not critical now, but over time, as the workflow for authors and publishers inevitably becomes more integrated, it will turn out to be necessary.

For in-browser design we need pagination support also so we can work off a single source for the content and then design in the browser to output to the various formats publishers need. Think Gimp or InDesign in the browser. It’s not as far away as it might sound, but to do this we need to be able to paginate inside the browser and have that update with live style changes to CSS.

So far, we are solving the big ticket issues 1-3, but for the next stage of changes we may need to change the tools we use for pagination so we can live-update content and styles and reflow in an editor and in-browser designer. That may mean we to start looking for a different pagination solution.

Cool stuff coming up on Editoria

While Yannis and Christos have been in San Francisco they worked on a number of features that will appear in the soon-to-be-released Editoria 1.1

The two standout features include:

  1. automagic book builder
  2. diacritics interface

The automagic book builder built primarily by Yannis enables a user to populate the structure of a book automatically from a directory of MS Word files. Essentially, from the book builder component the user can click ‘upload word files’ and a system dialog opens. They can navigate to a folder on their computer and select the files they wish to use to create the book. Editoria then sends all these files to INK to convert to HTML, creates the structure of the book, and populates all the chapters and parts with the right content. At this moment it ‘knows’ which MS Word file is in the front/body/back matter and whether it is a part or a chapter by the file name. We will add a config so the rules regarding what a file is called and where in the book it lands can be determined per publisher or (perhaps) per user.

This feature means that a production editor can simply ‘point’ Editoria at a directory of MS Word files, which is what they are used to working with, and without doing anything else other than press ‘upload’ the book will be built for them in the correct structure (assuming they named everything right). It’s a lot less work than uploading every file individually.

The second feature is a Diacritics interface for the editor. This rather nicely made by Christos so that you can have per-publisher assigned special characters categorised and listed in a simple interface (opened from the editor). That is in itself interesting as it might lend itself to other options like a special character ‘favorites’ list etc…but two other elements add some elegance to the implementation, the first is a simple checkbox that will determine if you leave the Diatrics dialog open or if it closes automatically once a character is placed. The second is that the user can search through the Diacritics with key words. For example you could type ‘dash’ to get a list of all the dash characters (m-dash etc), or typing ‘turned g’ would give you the result: 

We observed that copy editors search Wikipedia for the special characters they want. Now they can search within the editor interface, using the naming conventions they are used.

The Diacritics feature is, of course, a wax-editor component….which is used as the editor in Editoria. We are working a lot on Wax and it will become a substantial product in its own right.

Coko Products

I am currently planning how to keep all the Coko projects balanced and moving forward. It gave me a moment to reflect on just how productive we have been. At present we have 6 major products, all moving forward at an excellent pace, they include :

  1. PubSweet  – the API toolkit for building publishing workflows (website coming soon).
  2. INK – the file prosessing (conversion, extraction, enrichment etc) framework.
  3. Editoria – the monograph production platform for publishers
  4. XSweet – the XSLT production for converting MS Word to HTML (HTML Typescript)
  5. Wax – the web based word processor built on top of Substance.io libs
  6. xpub – the (early stage) journal platform.

All this in less than 18 months, which is amazing enough but also consider that Coko was only 3 people (with Jure being the only developer) until 12 months ago. Its kind of astonishing to me.