ProseMirror and the Guardian

This from the Guardian:

“Just over five years since Scribe was born, we have decided to bring to an end the active development of the project. We are working on implementing a new text editor based on ProseMirror that we hope to open source in the future.”

Wax 2 is the open source Scholarly Editor we are building on ProseMirror. Scholarly Editors have a unique workflow (eg citation management) and tools (eg entity linking) that are quite unique.

Anyone that wants to join the effort, please let me know!

Code Editor in Wax 2

Christos has been working on our new web-based word processor -Wax 2. It is based on the best-of-breed ProseMirror open source libs. A week or two ago, Christos integrated a code editor (CodeMirror) with it and sent me the following vid… looks great (see below). We also have a similar feature in Wax 1, but the integration isn’t as nice… check out the last bit of the video where Christos is ‘undoing’ items. It works seamlessly. In Wax 1 the code editor is a separate embedded object to the parent page so the ‘undo’ history does not integrate as nicely as it does here..

The Beatles of Platform Dev

Every month I go to Athens to work with the Coko platform dev team. We talk through platform issues and set the month’s roadmap. I just met with them a few days ago. They are an awesome team. It deserves mention that pictured below are four folks and between them they develop and maintain no less than four publishing platforms – Editoria, xpub-collabra, Micropubs and Wax. Incredible.

Highly productive and also amazing to work with.


Editoria Feature Proposals

Proposals now closed. Below is the synopsis of each proposal, cut and pasted from GitLab. Titles link to the GitLab issue. I didn’t copy across  any images, so where you see a reference to an image, please check the original item.

All items can be found in GitLab also of course:

Create an account here:

1. Feature Proposal: Color code tracked changes by user

Feature request: Please display tracked changes in different colors by user.

The request originates from UCP for Editoria, where only three users are expected to be within a given book/chapter editing at any given time.

Suggest retaining the tooltip, too, but adding this additional ‘at a glance’ feature to allow editors to quickly see who entered which tracked change within a longer book where checking each change by tool tip would become time consuming.

2. Feature Proposal : enable “read only” chapter mode

Org: Book Sprints

User story: As an author, I want to review a chapter while someone else is editing it.

To do this, I’d like to click on a read-only link for each chapter in the book builder. It doesn’t matter about the layout of the content, it just needs to be readable so that I can read it without having to render the entire book. No edit tools or comment tools are necessary, just a read-only version. As many people as needed should be able to click on the read-only view. Ideally, the read-only link should appear next to the edit button, and be permanently available (no permissions necessary).

3. Feature Proposal : Indenting chapters in book builder

Org: Book Sprints

User story : As an author, I want to be able to see the difference between parts and chapters at one glance.

To do that, it would be great to indent the chapters more so that the difference is more easily visible.

4. Feature Proposal : Configure book builder to omit blue buttons

Org: Book Sprints

User story: As an author in a Book Sprint, I want the book builder to look as much as a table of content as possible. We do not upload word files in a Book Sprint, so we have no need for the blue buttons showing the upload status, therefore this UI element is unnecessary and distracting.

We would like to have the option to omit the blue buttons.

5. Feature Proposal: Autocomplete for adding book team roles

punctum books proposes an autocomplete feature when typing in names for book team roles (see attached image)

For example, when one of the possible users is Vincent it would be great if when I type Vinc it already suggests the rest of the name. Especially with large teams, this can be useful because you may not remember the exact spelling of each team member.

6. Feature Proposal: List User Roles under Users

In the overview of Users under the Users menu include the different roles these users fulfill in the different Books

For example, under Username Vincent I would see Production Editor of followed by a list, Author of followed by a list, etc. This allows a quick review of which Users are involved in which Books.

7. Feature Proposal: Error message when uploading incorrect file format

punctum books proposes: an error message when you try to upload a file format that is not DOCX (such DOC) rather than XSweet trying to process it and then crashing.

8. Feature Proposal: Toggle invisible characters in Wax

punctum books proposes a toggle for invisible characters in the Wax editor.

This would allow quick discovery of double spaces, returns instead of enters, correctly inserting em/en spaces, weird tabs, and so on.

9. Feature Proposal: Author Style

punctum books proposes: An additional Wax paragraph style “author.”

10. Feature Proposal: Automatic typographer’s quotes in Wax

punctum books proposes that Wax automatically uses typographer’s quotes (for now, until there is proper language control).

When typing single or double quotes in Wax, they are not formatted as typographer’s quotes

11. Feature proposal: Always show chapter name

punctum books proposes a feature that makes sure that the chapter name is always shown somewhere on the top of the page, so you know where you are.

12. Feature Proposal: Usernames allowed with special characters

punctum books proposes that usernames with underscores, periods, etc. are allowed.

Currently, it appears that no such usernames, such as “John_Doe” or “Paul.Smith” are allowed. When you try to create them, you get a 400 error without any further explanation.

13. Feature Proposal: Ask for first name and surname on sign up

punctum books proposes that new users are asked for first name and surname on signup.

Currently the only way to distinguish users is by username, and this becomes quickly problematic as the number of total collaborators on all book projects increases, especially if at some point users will start to make their own usernames which may not be directly comprehensible to Production Editors.

Registering first names and surnames allows for a better way of organizing users on the Users page.

14. Feature proposal: Kill automatic numbering in numbered list style

Proposed by University of California Press

Adding sequential numbers to text styled as a numbered list would be helpful if one were authoring in Editoria. However, authors who want a list numbered submit it that way in .docx files, so the automatic numbering feature in this style adds a duplicate set of numbers to list items.

15. Feature proposal: Front- and backmatter styles

Proposed by University of California Press

To move to the next stage of testing and put a whole book through Editoria, we need styles for frontmatter components and a few more backmatter ones.

I’d like a separate section each for front- and backmatter styles in the vertical styling toolbar, with the former at the top and the latter at the bottom. There do not need to be too many styles in each of these lists, since the styles in the other sections (display, general text, lists) should be context-sensitive (e.g., for backmatter level-head, use Heading 1).

Because the front- and backmatter styles are rarely used in body components, it would be great if those style lists could be collapsed. They should still be available because there are instances where one might want one of those styles in a body component (e.g., in an edited volume, you might have bibliographical listings at the end of a chapter).

FRONTMATTER Here are the (most of the) styles we need that are not currently accounted for in the other lists (I am not including toc styles–that’s coming in another feature request):

Half Title Title Subtitle Author [if Barbara’s proposal of adding an author style is implemented, I assume it would be in Display, and we could just use that] Publisher

Series Title Series Editor Series List

Copyright Page


Frontmatter Signature

For the rest of the elements in frontmatter, we could use styles in other sections (on output, the system would impose the frontmatter versions of these elements):

BEP Book Prose Epigraph —> Epigraph: Prose

BEPSN BEP Source Note —> Source Note

BEPO Book Poetry Epigraph —> Epigraph: Poetry

BEPOSN BEPO Source Note —> Source Note

FMH Frontmatter Head —> Title

FMAU Frontmatter Author —> Author

FM1 Frontmatter 1 Head —> Heading 1

FMTXT Frontmatter Text —> General Text

FIL Frontmatter Illustration List —> Numbered list

FAB Frontmatter Abbreviation List   --->  Hoping we can add Abbreviations as a List style

BACKMATTER Here are the (most of the styles we need that are not currently accounted for in the other lists:

Bibliography Glossary Contributor list Index

For the rest of the elements in backmatter, we could use styles in other sections (on output, the system would impose the backmatter versions of these elements):

Appendix Number —> Number [assuming we add a number style to Display]

Appendix Title —> Title

Appendix Subtitle —> Subtitle

Backmatter Head —> Title

Backmatter 1 Head —> Heading 1

Backmatter 2 Head —> Heading 2

Backmatter 3 Head —> Heading 3

Backmatter Text —> General text

Backmatter Prose Extract —> Extract: Prose

Backmatter prose extract Source Note —> Source note

Backmatter poetry —> Extract: Poetry

Backmatter poetry Source Note —> Source note

BM Numbered List Extract —> List / Bulleted

BM Unnumbered List Extract —> List / Unnumbered

BM Bulleted List Extract —> List / bulleted

Chronology —> Hoping we can add Chronology as a List style

Backmatter Abbreviation List —> Hoping we can add Abbreviations as a List style

I realize there’s a lot here. Please let me know if clarification or rethinking is needed. Thanks!!

16. Feature proposal: Automatically generated table of contents

Proposed by the University of California Press

Although authors usually provide a toc with their manuscripts, we would like one to be generated from the book components. I do think a placeholder needs to be present in the book builder so it can be specified for pagination. It could say something like “Table of Contents will be generated on final output.”

For frontmatter, toc should include every component that follows the toc and use text with the style Title.

For body text, toc should list chapter number and text styled as Title, Subtitle, and (if there is one) Author.

For backmatter, toc should include every component and use text with the style Title.

When a book has backnotes (most of our titles), it is sometimes a little tricky to figure out where they go (they could be preceded in the backmatter by appendixes, abbreviation lists, etc.), so maybe the easiest thing to do is add a placeholder component to the backmatter with the heading Notes styled as Title and boilerplate language saying something like “Backnotes will be gathered here on final output.” That way, we could specify placement, pagination, and the title could pull into a generated toc.

17. Feature Proposal: Toolbar button to change case

Proposed by University of California Press

Authors often format display type in all caps. If the design specifies title case, we need to specify how letters are treated (e.g., articles and prepositions lowercased)–it is an editorial decision, not one that can be automated. Right now, the only option seems to be to retype text to remove cap formatting. It’s time-consuming and a chance for introducing errors. Word’s options are uppercase, lowercase, sentence case (first letter only capped), and title case (first letter of all words capped–gets you most of the way there).

18. Feature proposal: No global italics in BookBuilder component listings

Proposed by University of California Press

Display component names in Book Builder in roman, not italic, type (italic type really shouldn’t be used unless there is a reason). Allow italic type to be within the name. See, e.g., in UCP deployment Wickes / Bible and Poetry** chapter 1 title: Ephrem’s Madrashe on Faith in Context.

19. Feature proposal: Allow toggle to turn off reveal of spaces in record changes

Proposed by University of California Press

The revealed spaces between words make longer sections of inserted text difficult to read. Could we have a toggle to show these or not?

20. Feature Proposal—Media Library/Asset Manager

This is a placeholder feature proposal. Right now, Editoria expects camera-ready artwork to be placed inline in order to become a part of the book. At the moment, a caption can be included with the inserted artwork, but no additional metadata about the image. This feature would allow a user to insert artwork from a media library. The media library would be similar to WordPress’s media library and allow for media to be uploaded and inserted inline in a book via the Wax editor, which would then become part of the media library, or to be added directly to the media library. The media library would allow for some basic media editing as well. It should be a “media” library rather than an image library because ultimately Editoria should be able to support multimedia in addition to static images.

21. Feature Proposal—CSS Book Page Template Gallery

This is a placeholder feature request for a CSS template gallery that would allow users to select from a gallery of page templates that can be rendered to PDF using Editoria’s CSS/javascript-based automated typesetting process. These templates would include the following variables:

  • Page dimensions (translates to book trim size)
  • Display and text fonts
  • One or two-column designs
  • Various other differences

22. Feature Proposal—Epubcheck EPUB Output Validation

This is a placeholder feature request for Editoria to validate EPUBs produced by the system using the open source Epubcheck validation tool to validate the EPUB output for adherence to the EPUB standard.

23. Feature Proposal—Book-level (and perhaps chapter-level) metadata

This is a placeholder feature request for a mechanism to add a book and/or chapter-level metadata data entry mechanism to Editoria. This could be achieved either by integration with an API or export-drive feed from a publisher’s existing title management system or by adding a form to Editoria that would allow for the entry of this metadata. In some cases, this metadata will need to be added to create a valid EPUB file, so an MVP could include just those metadata elements necessary to create a valid EPUB. This would also be useful for output and distribution to other systems (e.g. a publisher’s digital asset management system).

24. Feature Proposal—Accessibility Validation and Tools

This is a placeholder feature proposal to develop tools for validating EPUB outputs against accessibility guidelines as codified by the W3C and other standards organizations. This could include anything from validating the inclusion of alt-text and alerting users to missing alt-text for images/media to validating whole book files against API-driven accessibility validation services such as those being developed by the DAISY consortium.

25. Feature Proposal: Allow components to move from body to front- or backmatter and vice versa

Longleaf Services proposes that component types be changeable. Currently, it’s possible to change the order of components within each type, but it is not possible to change the component type without deleting a file and reloading it. For example, Introductions are often moved from the main text into the front matter, and abbreviations lists are moved from front matter to the back matter. Being able to use the drag-and-drop function would be more efficient.

26. Feature Proposal: Add more information to the “Books” dashboard and make it sortable

Longleaf Services proposes that author name(s), team member names and roles, author name, and current status be visible for each project, and that the page be sortable.

Currently only the title of a project is visible, and projects are listed alphabetically. Team members tend to think of projects by author, not title, so being able to see the author will be helpful. Being able to see the other team members for each project, and what stage the project is in, will help in managing the work. Finally, as the number of projects increases, it’ll be useful to be able to sort out those that are complete from those still in progress. Being able to sort on other attributes, such as series or imprint for example, will also be helpful.

In the interest of conserving space, a user could click on the project title and the list of team members could appear, as it does when one clicks on “Team Manager” on the individual page for each project.

27. Feature Proposal: Provide a larger text box for inputting figure captions

Longleaf Services proposes the caption text box be enlarged. Currently only a limited amount of the caption text is visible; a larger text box would make reviewing/editing the text easier. If there is a total character count limit, it would be helpful to provide the number of characters remaining.

Additionally, perhaps we could encourage accessibility/alt-friendly caption or descriptive entry? Maybe above the text box there could be a suggestion such as, “Consider providing a caption that is accessibility/alt-text friendly.”

28. Feature proposal: Export parts of a book

punctum books proposed the option to export only parts of a book.

Currently, only an entire book can be exported through paged.js. It would be great if you could select on or more chapters for export, which again would be very useful for edited collections.

29. Feature proposal: discretionary hyphens

punctum books proposes an additional special character: a discretionary hyphen.

Foreign names such as Nietzsche are often broken off erroneously by automatic hyphenation, for example Niet-zsche. A discretionary hyphen is an invisible character that shows paged.js where to properly hyphenate a word or name.

30. Feature proposal: Inline semantic markup

punctum books proposes inline semantic markup in Wax.

Currently, there is an asymmetry between paragraph styles, which have semantic values (Title, etc.), and character styles, which have typographical values (italic). Especially with an eye on future export to InDesign formats, it’s important that italics for emphasis and italics for book title, for example, are separated.

At the same time, authors are usually not familiar with this type markup and it might cause confusion. I’m not entirely sure how to solve this practicality.

31. Feature Proposal: Language control

punctum proposes a language control feature.

I guess this is a big one, and not an enormous priority for us, but something that needs to exist in the future because we often work with multilingual books. There needs to be a way to signal that an entire chapter of part of a chapter is written in a certain language in order to properly control hyphenation (upon export) and interpunction, such as the type of quote marks (curly, fishhook), non-breaking spaces before colons/semicolons (in French etc.), writing direction (Arabic/Hebrew), etc.

32. Feature Proposal: Tag what you want in Wax

punctum books proposes a features that allows you to “tag what you want” in Wax.

This proposal is in line with (and expansion of) our proposal on inline semantic markup. The current number of standard paragraph and character styles is limited, and should remain so. However, there are frequently books that require custom definitions.

I suggest that in Wax you can add a new paragraph or character style by means of (+) button, prompting you to give the style a name (New Fancy Style), from which a slug is generated (new_fancy_style) that you can then define in CSS, and a simple display format in Wax (italic, bold, indent, etc.). The display format doesn’t have even to be matching or fancy, as long you can then control the tagged words/paragraphs in CSS.

33. Feature Proposal: Indexing tool

This is a proposal for an indexing tool, written from the perspective of an indexer, not a production editor. It would be great to have some feedback from the production side. I do not know what happens to an index after it is submitted to a press.

Ideally, Editoria should accommodate, or improve on, the two most common professional back of the book indexing methods:

  1. Writing and editing an index using the final page proofs. This method has multiple approaches, which includes a) marking the page proofs, then entering the headings and page numbers into an editor, and b) displaying the page proofs on a monitor, and entering the headings and page numbers directly into the indexing software/editor, which simplifies many aspects of the editing process and automates all of the formatting specifications, which can vary by publisher and or text.
  2. Embedded indexing, where an indexer can create an interactive/XML index by marking up/tagging the text during the production process for exporting with the final page proofs. While I’m not as familiar with embedded indexing, I am aware that there is a general sentiment among indexers that they do not produce the quality well-structured analyzed indexes desired for scholarly monographs. However, publishers cite benefits to the workflow. For example, first page proofs will be of the entire book, including a fully set index in page form; overall schedule time should be shorter; they have a fully usable index in whatever form the content is published in, now and in the future. Few indexers are trained in this method. If Editoria could simplify (GUI interface required) the process of compiling and editing a quality index during the production process, rather than as the final task before the book goes to press, it would be adopted widely. I’ve discovered a new commercial effort that claims to do this, but I haven’t used or seen it demonstrated, Index Manager.

METHOD 1, TRADITIONAL INDEXES Most indexes are created by professional freelance indexers (using one of the three highly sophisticated indexing software programs, Sky, Cindex and Macrex), who are hired by the authors. Freelancers may submit the final index file to the author or the production editor. Indexes are generally submitted to the press/author as a double-spaced single column file, with the requisite detailed formatting Feature_proposal-indexing_tool.docxSample_Index.RTF requirements. I do not know what production editors do with the index once it is received from the indexer, or how that process could be simplified for them.

  • Editoria should allow for an indexer to write and edit an index within Editoria using the final page proofs AND allow an indexer, or team manager, to submit into Editoria an RTF file of an index generated from the final page proofs

METHOD 2, EMBEDDED/XML INDEXES Any automated XML indexing tool should be able to handle these very basic and common indexing practices:

  • To create multiple index styles automatically – indented and run-on (examples can be supplied if necessary).
  • To create main headings and subheadings, including words and phrases that do not appear in the text.
  • To alphabetize all headings and subheadings and choose between letter-by-letter and word-by-word alphabetizing order.
  • To designate stop/ignored words in alphabetizing.
  • To create and auto-format cross reference style and placement, See and See also.
  • To indicate how page ranges should be handled (e.g., simple, aggressive, aggressive 2, Chicago, Chicago 2; examples of effects can be supplied if necessary).
  • To indicate the character to use between page numbers, hyphen or en dash.
  • Punctuation placement (commas, semicolons, colons) should be automated.
  • To indicate how many columns the index will be.
  • To generate an introductory note to describe special features of the index.

As mentioned above these are basic practices. Fine tuning some of the more complex aspects of constructing and editing an index would be an ongoing effort. They concern what occurs after completing the rough index, when an indexer edits it for structure, clarity and consistency, formats it to specifications, and proofreads it. Anything to simplify the XML-index editing and output process in Editoria would be an improvement over the current complex and time-consuming options that indexers currently have for XML-indexing that is not compatible with their indexing software programs.

I’ve attached a Word file of the proposal and a sample index, submitted to Cambridge.

34. Feature Proposal: Alert team members when workflow status changes

Longleaf Services proposes a function that allows team members to be notified when the status of a project changes—For example, “File _____ is available for editing”

I’d think an email alert would be best, as it doesn’t require the person to be logged into Editoria to see the notification: An author, for example, wouldn’t necessarily bother logging in unless they knew the file was ready for them.

If feasible, development-wise, it would be useful to let the Production Editor decide who (that is, which team member) should get the notifications for a project. There are instances in which one user (generally the PE) performs multiple functions, such as also being the copyeditor, and wouldn’t want/need the notification that a file is ready to edit.

35. Feature Proposal: Configurable archive options for completed / abandoned books

Attaching our sketch and some notes from the community meeting:

Completed / abandoned books need a long-term archiving solution, and users probably won’t want these titles cluttering up their Books Dashboard, so we propose a new ‘archive’ option on the Books Dashboard that would spawn a modal with options for where to send the archive and what to do with the dashboard entry. These options could be configured by an admin to connect to various preservation solutions (e.g. LOCKSS, Portico, FTP).

There was also some debate regarding whether this functionality made more sense at the Book Dashboard level, or at the final export step; though export is something that – at least while it’s functioning as a preview, which it is now – is used repeatedly, while archiving likely happens once.

… of course there might be cases where an archived book comes back to life, so there would need to be a way to enabled that, and / or perhaps a clone / fork feature (e.g. for a future edition).

36. Feature Proposal: Add a “Print” option

Longleaf Services proposes that the option to print be available.

Our pilot project requires that a print version of the final file be available to participating authors/presses to satisfy tenure committees, reviewers, and awards committees that don’t yet accept digital versions. Would it be possible to provide the option to download a project’s pdf version, next to the option to download the epub?

Athens Road Map Meet

Had a great monthly roadmap meeting with the Coko Greek crew yesterday in Athens. We discussed and planned micropubs, Editoria, Wax 2, xpub-columbia/collabra, and chatted about Dan Morgan’s pizza theory for workflow, and stuff and things. Drank lots of coffee and this time had some yummy good for lunch. Always fun. Always productive. Fantastic bunch 🙂


Cultural Method

I’ve been pondering some stuff in preparation for a presentation at Open Source Lisbon this week. In essence, I’m trying to understand Open Source and how it works… not to say I don’t know how Open Source works, we do it well at Coko…I mean to zoom up a level and really understand the theory and not just the mechanics. It is one thing to facilitate a bunch of people to meld into a community, it is quite another to understand why that is important, and what the upsides and downsides are on a meta level. If you take the ideals out and look purely at the mechanics from a bird’s eye view. then what, ultimately, makes Open Source a better endeavor than proprietary software? What is exactly going on?

I have some clues… some threads…but while each thread makes sense when you consider it on its own, when you combine them all it doesn’t exactly make a nice neat little montage. Or if it does, I am currently not at the right zoom level to see it clearly instead I see lots of different threads criss-crossing each other,

Ok…so enough rambling… what is it I’m trying to understand…well, I think when you embark on making software there is this meta category of methods known as the Systems Development Life Cycles (SDLC). Its a broad grouping that describes the path from conception of the idea, through to design, build, implementation, maintenance and back again etc…

Under this broad umbrella are a whole lot of methods. Agile is one which you may have heard of. As is Lean. Then there are things like Joint Application Design (JAD), and Spiral, Xtreme programming, and a whole lot more. Each has its own philosophy and if you know them you can sort of see them like a bookshelf of offerings…you browse it and intentionally choose the one you want. Except these days people don’t choose really, they go with the fashion. Agile and Lean being the most fashionable right now.

The point is, these are explicit, well documented, methods. You can even get trained and certified in many of them.

But… Open Source doesn’t have that. There isn’t a bookshelf of open source software development methods. There are a few books, with a few clues, but these are largely written to explain the mechanics of things and they seldom acknowledge context. I say that because the books I have read like this make a whole lot of assumptions and those assumptions are largely based on the ‘first wave’ of Open Source – the story of the lone programmer starting off and writing some code then finding out it’s a good idea to then build community instead of purely code therefore magnifying the effect. A la Linus Torvalds.

But its very down-on-the-ground stuff. I’m thinking of Producing Open Source by Karl Fogel, and The Art of Community by Jono Bacon. Both very well known texts and I have found both very useful in the past. But they don’t provide a framework for understanding open source. I’ve also read some research articles on the matter that weren’t very good. They tend to also regurgitate first generation myths as if open source is this magic thing and they struggle to understand ‘the magic’. In other words, I miss a ‘unified theory’, a framework, for open source…

I think it is particularly important these days as we are beyond the first generation and yet our imaginations are lagging behind us. There are many more models of open source now than when Eric Raymond described a kind of cultural method which he referred to as ‘the bazaar’ in his cathedral and the bazaar. We now have a multitude of ways to make open source and so the license no longer prescribes a first generational approach, producing open source is much  richer than that these days.

As it happens, Raymond’s text does attempt to provide some kind of coherent theory about why things work although it often mixes ‘the mechanical’ (do this) with an attempt to explain why these processes work. It doesn’t do a bad job, there is some good stuff in there, but it varies in level of description and explanation in a way that is uneven and sometimes unsatisfying. Also, as per above, it only addresses the first generation ‘bazaar’ model. While this model is still common today in open source circles, it needs a more thorough examination and updating to include the last 15 years of other emergent models for open source. There are, for example (and to stretch the metaphors to breaking point), many cathedral models in open source these days that seem to work, and some that look rather like bazaar-cathedral hybrids.

Recently Mozilla attempted to make some sense of these ‘new’ (-ish) models with their recent paper on ‘archetypes’

Here they kind of describe what reads as Systems Development Life Cycle methods…indeed they even refer to them as methods

The report provides a set of open source project archetypes as a common starting point for discussions and decision-making: a conceptual framework that Mozillians can use to talk about the goals and methods appropriate for a given project.

They have even given them names such as ‘Trusted Vendor’ and ‘Bathwater’ and the descriptions of each of these ‘types’ of open source project sound to me like they are trying to make a first stab at a taxonomy of open source cultural practices – so you can choose one, just like a proprietary project would choose, or self identify as, Agile or Lean. Infact, the video on the blog promoting this study pretty much says as much. It’s Mozilla’s attempt at constructing a kind of SDLC  based on project type (which is like choosing a ‘culture’ instead of a method).

However it doesn’t quite work. The paper compacts a whole lot of stuff into several categories and it is so dense that, while it is obvious a lot of thought has gone into it, it is pretty hard to parse. I couldn’t extract much value of what one model meant vs the other, or how I would identify if a project was one or the other. It was just too dense.

Mozilla has effectively written a text that describes a number of different types of bazaars, and also some cathedrals, without actually explaining why they work – except in a few pages that sort of off-handedly comment on some reasons why Open Source works. I’m referring to the section that provides some light assertions as to why Open Source is good to:

  • Improve product quality.
  • Amplify or expand developer base.
  • Increase the size or quality of your organization’s developer hiring pool.
  • Improve internal collaboration within one’s own organization.
  • ..etc…

But this is the important stuff… if these things, and the other items listed in that section are true (I believe they are), they why are they true? Why do they work? Under what conditions do they work and when do they fail?

In other words, I think the Mozilla doc is interesting, but it is cross cutting at the wrong angle. I think a definition of archetypes is probably going to yield as many archetypes as there are open source projects – so choosing one archetype is a hopeful thought. Also the boundaries seem a little arbitrary. While the doc is interesting, I think it is the characteristics listed in the ‘Benefits of Open Source’ section of the Moz doc that are the important things to understand – this is where a framework could be built that would describe the elements that make open source work…..allowing us to understand in our own contexts what things we may be doing well at, what we could improve, what we should avoid, useful tools etc

The sort of thing I’m asking for is a structured piece of knowledge that can take each of the pieces of the puzzle and put them together with an explanation of why they work…not just that they exist and, at times, do work, or are sometimes/often grouped together in certain ways. An explanation of why things work would provide a useful framework for understanding what we are doing so we can improvise, improve our game, and avoid repeating errors that many have made before us.

With this a project could understand why open source works, and then drill down to design the operational mechanics for their context. They could design / choose how to implement an open source framework to meet their needs.

Such texts do exist in other sectors. Some of these actually could contribute to such a model. I think, for example, the Diffusion of Innovations by Everett Rogers is such a text, as is Open Innovations by Henry Chesbrough. These texts, while focused on other sectors, do explain some crucial reasons why open source works. Rogers explains why ‘open source can spread so quickly’ (as referenced in one line in the Moz doc), and Chesbrough provides substantial insights into why innovation can flourish in a healthy open source culture, and how system architecture might play a role in that.

Also the work of John Abele is important to look at and his ideas of collaborative leadership. As well as Eric Raymond’s text…but it all needs to be tied together in a cohesive framework…

This post isn’t meant to be a review of the Moz article. It reflects the enjoyment I have gained from understanding elements of open source by reading comprehensive analysis and explanation of phenomenon like diffusion and open innovation. These texts are compelling and I have learned a lot from them which have helped when developing the model for Coko because at the end of the day, there is no archetype that exactly fits – it is better to construct your own framework, your own theory of open source, to guide how you put things together, than to try and second guess and copy another project from a distance. Its for this reason that I would love to have a unified framework for open source that takes a stab at explaining why all these benefits of open source work so I can decide for myself which ones fit or how they fit with the projects I am involved with.

Open Innovation

Reading Open Innovation – a thesis evolved by Henry Chesbrough in 2003. I have also the follow-up book published in 2006 which is a collaboration with other researchers going through his earlier thesis.

I’m researching this as I’m interested in what current literature exists that explains Open Source and why / how it works which is not from the Open Source domain. Books that emanate from the Open Source domain tend to be religious in nature and it is also true that most attacks against Open Source take it from the religious angle… so having literature that endorses the model which is not open source evangelicalism is very useful.

Previous to this I found a lot of value in The Diffusion of Innovations (originally published in 1962) by Everett Rogers.

Open Innovation and the Diffusion of Innovations separately explain quite a bit about why Open Source works, and I think I’ll post more about this as it becomes clearer in my head.

Chesbrough’s thesis can be summed up in one quote

The Open Innovation paradigm treats R&D as an open system. Open Innovation suggests that valuable ideas can come from inside or outside the company and can go to market from inside or outside the company

Essentially it is the admission that any one company doesn’t have all the smart teams/people/ideas. So how about re-imagining innovation and release it from a so-called ‘vertical innovation’ model, where all the R&D is done inhouse and where IP (Intellectual Property) is jealously guarded, to a open model where innovation essentially comes through collaboration with orgs and individuals outside the company.

From an Open Source point of view this is a ‘duh’ moment… Open Source has long expounded this approach. But…I have never found it well explained…

So it is good to find this argument made elsewhere and in clearer terms…but unfortunately the Chesborough thesis was published in 2003 when Open Source was still very young. Consequently Chesbrough reads Open Source as a idealistic and altruistic movement… he doesn’t really consider open source projects to have a business model and a business model is central to his thesis. Its a pity as Open Source has moved on since then and there are a lot of very successful and interesting examples of Open Source business models. But if you sorta squint while you are reading, and blur out the dated-ness then there is a lot of stuff that could just be quoted verbatim that makes a strong argument for Open Source as seen through the lens of the Open Innovation thesis.

Thats pretty interesting as, combined with the Diffusion of Innovations, these two bodies of work explain the value (and consequently provide a rationale which does not come from the open source sector directly) of open source. Open Innovation explains why open source is a good idea if you are a company whose business requires software to function in its core offerings, and the Diffusion of Innovation theory helps us understand why open source can beat closed source software in the arena of adoption.

The point is, if you can combine the two you have a winner – a model that enables rapid adoption and innovates faster than closed alternatives/competitors. If you can marry successful commercial activity to this you have something very powerful that can potentially wipe out the existing proprietary offerings – which is what we need in the publishing sector. The aim of what we are now doing in Coko, in this post-foundational stage, is to seed the commercial activity around the very healthy core of community technologies we have built.

Anyways… here are some quotes I liked from some of the chapters….Some of the quotes come from this chapter by Joel West and Scott Gallagher

Open Innovation is the use of purposive inflows and outflows of knowledge to accelerate internal innovation, and expand the markets for external use of innovation, respectively. Open Innovation is a paradigm that assumes that firms can and should use external ideas as well as internal ideas, and internal and external paths to market, as they look to advance their technology. Open Innovation processes combine internal and external ideas into architectures and systems. They utilize business models to define the requirements for these architectures and systems. The business model utilizes both external and internal ideas to create value, while defining internal mechanisms to claim some portion of that value. Open Innovation assumes that internal ideas can also be taken to market through external channels, outside the current businesses of the firm, to generate additional value

…useful knowledge is scarce, hard to find, and hazardous to rely upon (a root cause of the NIH syndrome). In Open Innovation, useful knowledge is generally believed to be widely distributed, and of generally high quality

IP becomes a critical element of innovation, since IP flows in and out of the firm on a regular basis, and can facilitate the use of markets to exchange valuable knowledge. IP can sometimes even be given away through publication, or donation.

Recently, open source software has emerged as an important phenomenon that utilizes external knowledge in a network structure (Lerner and Tirole 2002; O’Mahoney 2003; Dedrick and West 2004; von Hippel 2005)

 Most software users would face significant switching costs in using some other software package, due to some combination of retraining user skills and converting data stored in proprietary file formats. As Arthur (1996) observes, software thus has tremendous positive returns to scale, generally allowing only one (or a small number) of winners to emerge.

These winners are tempted to extract rents from their customers by increasing prices and creating additional switching costs to protect those rents (Shapiro and Varian 1999). From these production economics, commercial software firms seek to build complete systems to meet a broad range of needs, in hopes of forestalling potential competitors and protecting high gross profit margins

In other cases, a system architecture will consist of various components. Some mature (or highly competitive) components may be highly commoditized, while other pieces are more rapidly changing or otherwise difficult to imitate and thus offer opportunities for capturing economic value. Two open source examples are the IBM’s WebSphere and Apple’s Safari browser…

…Customers access the WebSphere e-commerce software using standard web browsers, so IBM originally developed a proprietary httpd (web page) server. IBM later abandoned its server for the Apache httpd server, recognizing that it would be wasting resources trying to catch up to the better quality and larger market share enjoyed by Apache (West 2003). Today, IBM engineers are involved in the ongoing Apache innovation, both for the httpd server and also related projects hosted by the Apache Software Foundation ( website)

What is xPub?

So, there are a lot of product names at Coko – PubSweet, Editoria, XSweet, INK, xPub etc etc etc… so becoming tricky to track, but I wanted it seems there are quite a few people interested in xPub right now.

xPub is not really a product as such, it’s more a group of products – each of which relates to Journal workflows. The names for each product indicyae those relationships: xpub-collabra, xpub-elife, xpub-faraday (Hindawi) and xpub-epmc

Each of these is a platform, so yip, you read it right – there are actually no less than 4 journal platforms being developed in the Coko community, not including the Micropublications platforms (of which there are two – one developed with Wormbase, and the other with the Organisation for Human Brain Mapping).

So, right from the get go our ‘offer’ is not standard. We don’t offer one platform to rule them all – there are many journal platforms in production. All of these are built on top of PubSweet… PubSweet is a kind of ‘headless’ publishing platform….it’s more or less the backend brains in that it is a kind of framework that ‘thinks like a publishing system’ but has not determined a workflow for you. So, each of the xpub-* platforms are actually publishing platforms built to meet a specific workflow and built on top of PubSweet…

Here is a crappy diagram (drawn in haste) to get the basic idea across:


In the above, you see a PubSweet platform (eg. xpub-elife). PubSweet itself is the whole app – it sort of ‘encapsulates’ everything. Really though, you get the PubSweet core and then extend it with front-end components to meet your workflow needs. A typical front-end component might be a login screen, or a dash, a submission info page, a reviewer form etc.

You can also extend PubSweet on the back end as well (eg to enable integration with external services etc). The following is a more accurate but slightly techy architectural overview (you can skip this image and the following paragraph if you want to avoid the slightly techie part of this article).


In case you are wondering – everything is written in Javascript. Why did we choose JS? It was a deliberate choice to go with a language that was prolific. Almost every dev around these days needs to know some Javascript (it’s the most popular language by far on Github), this makes finding a developer to work on your project is as easy as we could possibly make it. JS is also a phenomenal language these days. Fast, sophisticated and more than capable of supporting large-scale publishing requirements. I mean, if it’s good enough for Paypal, Netflix, Linkedin, Uber and ebay, then it is good enough for you.

So each PubSweet platform has its own collection of front and back end components to meet the workflow of someone’s dreams… the idea is that to achieve the platform of your dreams you can reuse what others have already built and then just build what isn’t already available. In a sense, you can ‘assemble’ your platform from existing parts.


The nice thing about this is that each of the organisations building Journal platforms are sharing the following:

  1. the same back end/framework (PubSweet)
  2. various front (and back) end components
  3. lessons learned…

Each organisation has a vision of their ideal Journal workflow. They then design and build this on top of PubSweet, but as they do, they build various components (either page-based components such as a dashboard, or smaller UI components we call atoms and molecules) and they share these components with everyone. Hence, you should check the list of components before you start building in case the component you need already exists.

We have various agreed-upon ways to build and share components (see this as an example). These best practices are continuously evolving but you can read some of the latest discussions about this approach here –

Of course, all code is reusable in the Coko community because it’s all open source. The best practices are there to make the code easy to reuse.

All agreements as per above are made by consensus by the community. It is actually a pretty snappy process – don’t believe every crappy thing you read about how open source is built. Open source community processes can be elegant and fast, and the resulting code can beautiful. Coko is a good example of this – a community of professionals collaborating together to make fantastic open source software.


So… back to the xPub story. To make these decisions on how to share components etc, we have regular meetings with various workgroups (we keep the numbers in each group small so we can move fast), and we also have quite a few in-person meetings. Not only do we have PubSweet community meetings where all of these organisations meet, but we have various get-togethers on various topics if required. For example, we met in Cambridge recently to discuss Libero (the eLife delivery product), before that EBI there was an onboarding session in Athens, next month we have a special designers workgroup in-person meet, anbd so on. All this helps us keep in contact with each other, which helps build trust, but also turns out to lift energy levels and boost production. These meetings are fun.


Also at these meetings, we learn from each other. One of the big problems in this sector, and one of the reasons people ask ‘why are publishing platforms so hard to build?‘ (after a number of high profile failures), is that there has not been a focused effort to share experiences on how to build publishing platforms. So, that is what we are doing – at each meeting we talk about what we have learned, how we are thinking about things, and show each other what we have done. The last meeting, for example, each xPub platform gave a deep dive to everyone in the group in a process known as speed geeking (it wasn’t so speedy as each table had 25-30 mins to go through their platform).



So, xPub isn’t a single platform, it’s more the community that is building journal platforms on top of PubSweet and sharing learnings and components. It is a very cool thing.

As a community, we also produced a book entitled ‘PubSweet – how to build a publishing platform’. You can get this for free here –

I can also send you a print copy (they look great!) if you send me a postal address.


The book covers many things – a whole lot of technical documentation for how to build and share components etc, as well as some information about to think about workflow and how to map this into a PubSweet system. Personally, I don’t think the technology is very hard when it comes to building platforms – we knew what we wanted from the beginning with PubSweet and went about and built it (not to downplay the extraordinary job Jure Triglav has done in leading this effort). But the real hard stuff is actually thinking about workflow – not many publishing orgs have had the opportunity to think about designing their workflow to meet exactly what they want (rather than shoe-horning it into an existing system), and so we have had to beat this track for other to follow. It has been quite a journey but the book pretty effectively outlines how to think about workflow (which of course is a process that can be accelerated using Workflow Sprints which is a process I designed to facilitate a publisher to design their own workflow and PubSweet platform in one day).


In that book you can see some high level ‘architectures’ of each of the xPub platforms such as the following (xpub-collabora):


Or this (xpub-epmc):


So, you might be asking…. at what stage of development  are each of these platforms? … I’ll leave it to each of the teams to say exactly, but it has been shocking to see how fast things have come along. I mean, the xPub community only really got together for the first time mid-last year, and building started for most late last year and early this year. It looks like we will have a lot of platforms landing by the end of this year. In this sector that is lightning fast. EBI are particularly impressive – they started two months ago and are almost ready to go – if it weren’t for the fact we are all replacing the under-the-hood data model we designed together at the last PubSweet meet, they would be good to go already. They can achieve such speed because they are reusing a lot of the components that the other teams built before them (and because EBI are just very good 🙂 ).

I can speak a little more about xpub-collabra since that is the platform Coko is building for and with the Collabra Psychology Journal. It’s looking pretty nice. We have some work tidying up the UI, replacing the data model and a few other things, but it’s looking rather good. We are also putting some time into making it a generic platform since the Collabra workflow follows a fairly ‘plain vanilla’ model. So we are building in some management interfaces and various bits and pieces to make it more widely useful – for example, Giannis just built in a Submission Info builder – enabling a Journal admin to build their own submission forms. It requires some hand-holding to use right now, but we’ll shape it to be very usable by your general journal adminy-type. We also have to integrate ORCID and DOIs, extend the range of submission file types etc… but it’s pretty close.

Below is a video showing xpub-collabra in action. It is a a version from some months ago, but you can see the workflow pretty well in this demo.

I think most of the xpub community is going to offer their new platforms to the market in various ways. This will be interesting as there are very different approaches at play. Hindawi, for example, is looking to make their platform a multi-tenant platform, eLife will put JATS at the centre of the workflow etc… So look for more news on that also. For our part, Coko will offer xPub through a partnership with a hosting provider – probably with the same organisations we will work with for Editoria hosting services. Since everything we do is open source, we will also be supplying all the Docker, Helm and Kubernetes scripts so that you can set up your own commercial hosting service if that is your cup of tea (or you can extend your offering should you already be a hosting provider). Coko is pretty close to getting our first hosting partnership set up, so look for news of that soon!

One last thing – because of the modular nature of any platform built on top of PubSweet, it is possible to take any of the xPub platforms and customise it to meet your needs. No need to start building from zero. Additionally, the modularity means you can extend the systems with your own interesting new innovation – finally a place for innovation to call home in the publishing platform world….

So… there is a lot going on in the xPub world….we look quite different to the rest of the market because we are not building one platform – we have instead focused on building a community to support the development of multiple platform solutions for journal workflows. You can pick and choose which one you want, or build something else, reusing as much of the other systems as you can to reduce your development costs (we also spend a lot of time onboarding new folks and supporting them as much as we can).

But it’s not just about improving the game today – supporting the optimisation of workflows is one part of what we are trying to do, the other is to support future innovations.

If you want to know more, feel free to jump into the Coko community channel and chat –

Come and join the party. We are happy to support you and happy to learn from you…. not-for-profit or commercial we don’t care, build a better journal platform on PubSweet than the rest of us and we’d be happy!  We are in it for the mission – come to talk to us! no preciousness here 🙂

The Awesome Paged.js

So, I’ve been pursuing this dream for many years… every since I started rendering books in the browser using an ad-hoc collection of tools around 10 years ago…. then I instigated the book.js project (which unfortunately died due to lack of browser support of CSS Regions), and now… paged.js…

Built by the talented trio of Julie Blanc, Fred Chasen and Julien Taquet – it’s all open source and modular… there is a lot to this story, but we’ll get to that. Full release in a few weeks, this is a sneak peak:

Paged.js – sneak peeks

This project is entirely funded by the awesome Shuttleworth Foundation.