Building Book Production Platforms p2.

Amongst the core requirements for a book production platform are the source file format and the editor, and of course, these are intimately linked. The development team is usually faced with choosing the format first, then the editor.

Choosing a format

The choice is pretty much HTML? or not HTML?

Currently, HTML is the ruling choice of format for a web-based book production platform. HTML is native to the browser and has associated standards-compliant support, such as CSS and javascript. Inversely, not choosing HTML puts you in a bit of a hole and can create a lot of overhead.

It might be interesting to look back a little and learn from some others since there have already been projects in this space that started down non-HTML roads and then gave it up for HTML. Kathi Fletcher, originally the project manager and technical director for Connexions (now OpenStax) which built a custom XML editing environment for academic materials, later researched in-browser XML vs HTML editing environments for her Shuttleworth Foundation-funded OERPUB project. Kathi became convinced HTML was the way to go and did some great work on HTML editor usability with the Aloha HTML editor.

We have chosen to use HTML5 as the canonical format for open textbooks, because developers and tools are more plentiful for web technologies than XML technologies.

http://www.w3.org/2012/12/global-publisher/statements-of-interest/29-oerpub.html

The (closed source) O’Reilly Atlas platform also started with the complex AsciiDoc format (a form of markdown) and eventually awoke to the power of HTML in 2012.

HTML5-based authoring offers a streamlined production workflow for producing both print and digital outputs, facilitates “digital first” content development, and is a perfect fit for creating a WYSIWYG, web-based writing experience.

http://radar.oreilly.com/2013/09/html5-is-the-future-of-book-authorship.html

They then got an extra dose of religion and started a project called HTML Book which is a suggested ‘spec’ for a subset of HTML elements to be used in books.

So far I have not seen a book production platform travel the reverse direction, from HTML to something else. Instead, we are seeing more and more platforms start with, or change to, HTML as a source file format.

Markdown

Markdown is sometimes put forward as the way to go but I’m not going to go into that in too much detail here. I have talked about this elsewhere. The only additional thing I will say is that markdown causes even more issues for book production platforms than those included in that article. Namely, in an in-browser markdown environment, the markdown will most likely be displayed as rendered HTML next to the authoring pane. That is a huge amount of lost screen space and extra UI junk for no apparent gain. Think of the UX cost. If you don’t have that rendered display then you will most likely only see pure markdown in a text field with no rendered display. The user won’t really know if their document looks right until it is rendered somewhere down the line, which is also a tremendous cost to the user for no apparent gain. Markdown: all pain, no gain.

NB: There is a possible good use case for markdown as a helpful add-on for HTML WYSI editors but I will cover that later.

LaTeX

There is a more valid use case for LaTeX in the browser since some scientists and academics will never use anything else, and you’ll never convince them to adopt HTML regardless of the benefits. You are up against the great Church of Knuth and I don’t fancy your chances. If your audience is comprised of LaTeX addicts, then I think you have no choice other than to support that.

Many times I have talked about remedies for unstructured MS Word documents (for scientific manuscripts) only to have someone earnestly comment that if everyone just learned LaTeX we would be in a much better position… They might be right, but I’m pretty sure it’s never going to happen.

The preference for LaTeX is a legacy issue, and problematic, but needs to be dealt with. (Unfortunately, today’s Markdown heroes are growing legacy issues like this with each passing day, and that is going to cost us down the road).

Recently there has been some interesting work on in-browser LaTeX editing including the (closed source) Authorea platform and, most notably the (open source) ShareLatex platform. ShareLatex round trips the LaTeX syntax displayed and edited in a text area (in the browser), renders that to a bitmap on the server, and returns it to the browser for a side-by-side ‘WYSIWYG’. The effect is that you can see a just-in-time rendered view of the LaTeX as you type. It’s a neat trick and effective if you insist on LaTeX in a web-based platform. Then you just have to live with the UI costs. However ,you only need this approach if you wish to support the full LaTeX syntax. If you wish to just support LaTeX equations, you can use an HTML editor with a LaTeX plugin based on MathJax or the Khan Academies KaTeX(and there are some other solutions such as Mathoid).

Incidentally, if you need to support full LaTeX I highly recommend checking out ShareLaTeX over WriteLaTeX. They both have the same approach but WriteLaTeX is proprietary whereas you can pick up the ShareLaTeX code and integrate it straight away. You could even build your own ShareLaTeX-like interface, it’s not too tricky – together with a colleague – Rizwan Reza – and I (Riz did all the hard work) we managed to develop a workable prototype in about 2 days, but there are many gotchas setting up the LaTeX compiler correctly.

Not many book projects need LaTeX, so I will leave this as an interesting edge case. There are solutions if you need it, but not many people need it.

XML

I think I will just leave it to the words of the brilliant Dave Cramer (Hachette Book Group):

So we’ve chosen to describe our content with HTML, and build our production system around HTML.

When I tell people that, they smile condescendingly, and chuckle a bit. “That’s cute. Why don’t you use real XML?”

I then ask them what you can do in Docbook (or TEI, or NLM) that you can’t do in XHTML? I haven’t heard a good answer to that question yet. XHTML is XML, by definition. Calling something “para” rather than “p” doesn’t get you anything, except carpal tunnel syndrome and invoices from consultants

The problem with non-HTML XML is that it is essentially just XML the browser can’t use. Hence you lose all that other good stuff like WYSI editors, CSS design tools, cool tricks with JavaScript, and all the cool tools that are being developed for HTML. XML just can’t compete, plus you are going to need to convert the XML into HTML anyway. So don’t make life more complicated than it already is – continue your love affair with XML as long as it’s XHTML!

HTML

HTML is king in the browser and it gives you all you need to make books. I don’t want to spend a lot of time arguing the merits of HTML in this post as there is a lot to say and I want to bring that in at other points of the conversation. But in brief:

  • HTML is supported by JS and CSS.
  • The DOM is known natively by the browser.
  • HTML is standards-based.
  • It is straightforward.
  • HTML is easy to read and easy to clean.
  • HTML is the most popular file format on the planet.
  • You can use HTML to build structure in documents with assigned class and id values, or microdata formats.
  • HTML is the native file format for EPUB.
  • PDF can be rendered directly from HTML in the browser (more on this later).
  • HTML can be paginated in the browser.
  • CSS is moving towards supporting more and more page based elements.
  • The browser can act as a design environment.
  • You can create real what-you-see-is (WYSI) production environments.
  • Basic editing is built into the format itself.
  • HTML is supported by an enormous number of tools for conversion (in and out).
  • HTML is supported by an enormous repository of examples (the web).
  • HTML is cheap to develop with.
  • Even book designers are getting used to it.
  • Some schools teach it.
  • It has a million free tutorials online to help you use it.
  • A lot of people know HTML.
  • HTML is supported by a rapidly proliferating body of JavaScripts for typography, graph production, animation, interactions, dynamic rendering etc etc etc etc

The basic idea really comes down to this.

  • HTML is the cheapest format of our time.
  • HTML is the most popular format of our time.
  • HTML is the networked document format of our time.

Increasingly HTML is the way stories are told, whether that is in books or on the web. It’s a trite analogy perhaps, but HTML is the paper of our time. As Dave Cramer says:

why start with something other than HTML, when you have to turn it into HTML anyway?

It should be noted that Cramer also turns HTML into paper, and the Hachette Book Group have produced many beautiful paper books using HTML as the source format. Many of these books you will now find in the best-selling sections of your local brick and mortar bookstore.

Other print producers are also using HTML as the source. Print-on-demand services, used to producing very ugly books by ingesting MS Word and dealing with all that ugly conversion, are also adopting HTML production environments. Books on Demand, Germany’s largest Print on Demand service, adopted Booktype so their customers could have an easy in-browser book production environment. The source format is HTML but the users don’t know that, and the books look better. That’s the beauty of HTML.

Finally, helped a lot by the efforts of Dave Cramer and the Hachette Book Group, Sourcefabric, the people at O’Reilly, and others adopting HTML, we might be starting to see the very beginning of the changing of the guard.

HTML is the way to go for Book Production Platforms. If you choose another format you will find you inherit a lot of costs and additional overheads and, sadly, you will soon be left behind. There is just no format going forward at the same speed as HTML. Not even close. So, my advice is to first ask the question – can HTML do what you need? Push your team to answer that question. Will format X give you anything HTML can’t? As an exercise ask your team to prove HTML is a bad choice, and if the answer is not-HTML, then contact me and let me try and talk you into it!

Building Book Production Platforms p1.

There are a number of web-based book production platforms out there at the moment. Booktype was the first. Its birth was around 2006-2007 and it started as a series of extensions (written in Perl) to the TWiki platform. TWiki was, and is, an old school wiki with a very pluggable architecture.

Booktype is now a stand-alone book production platform built with Python (Django). I founded the project and kept it going for many years, and I’m very proud of it. It is free software and now has a permanent new home at the Open Source, Berlin-based, development house Sourcefabric. I’ve since moved on to think about similar issues for the Public Library of Science.

Since Booktype, many platforms have come out of the woodwork including Atlas (closed source), PressBooks (semi open), gitbook, Inkling, PubSweet, Lexicon, Penflip (closed), Gitbook.io, FastPencil (closed), and many many others.

Seeing these evolve has been interesting. Most interesting is that I see a lot of mistakes still being made which we made in the early days of Booktype. I’m not saying Booktype is perfect, it is far from it, but I wanted to document what I consider to be a few high-level gotchas for those wanting to build this kind of platform. We need more book production platforms that improve the game. We need to avoid making the same mistakes over and over, so I hope the following will give those building tools from scratch at least something to think about.

This article is first in a series. It should be noted that I am targeting my comments at those that wish to build Open Source book production platforms. I’m not terribly interested (actively disinterested), in closed source platforms. Hence the narrative encourages you, the Open Source development team, to consider book production platforms as a distinct category of software – don’t use existing software such as MediaWiki, or WordPress etc to do the job.

Stand-alone platforms

I firmly believe that book production platforms should be built as stand-alone applications dedicated to the work at hand – building books. Book production platforms should be specialised software that ‘thinks like a book’. Your users want to make books, not blog posts or wiki articles, and they deserve a platform that is built with their goal as the central premise.

Don’t learn this the hard way as I did. Booktype started with a wiki and it did a pretty good job. At the time TWiki could be considered as a kind of rapid prototyping framework for Perl, much like Django or Rails is for Python and Ruby respectively. It provided essential login features and user management, plus a host of plugins to extend functionality with diffs, reports, permission management etc. It did pretty well for a while. However, the wiki-like paradigm soon started to grow old. Initially, I had pulled the guts out of the wiki and replaced wiki markup editors with HTML editors (more on that later) and did a whole lot of crazy URL redirects using .htaccess to get nice book-like book/chapter URLs and attending structure. That was OK, but when it came to other functionality that books needed, then the wiki paradigm was sagging pretty noticeably. It also took increasing amounts of time and effort to build an interface that was clearly intended for making books instead of wiki pages, not to mention the issues with updating a core application framework for security or performance improvements when the internals had been hacked so extensively it no longer really worked the same way as the original application.

A bit of template paint helped with some of these issues, and I commissioned some programmers to extend TWiki to fill in these book-wiki gaps, but, after a while, it became obvious it would be better to build a dedicated standalone book production platform from scratch. Given all my experience since then, I am convinced that if you want to make books you should have a software built specifically to do that.

Paradigm differences

Book production platforms are a specific category of software as the functional needs are quite different from those for producing wikis, blogs, or other types of CMS. When building book production software there are essential paradigmatic differences that just force you out of the existing categories and into the new…

The following are some examples, and I will document one or more at a time as this series continues. First of all, the Table of Contents.

Table Of Contents

Any book production software needs a way to create and manage a Table of Contents (ToC). Neither wikis, CMS, nor blogs have the need to structure content in this way: they are designed for non-linear, contextual navigation (you click on a link from within a paragraph to be taken to somewhere else). The closest these other software paradigms get to a Table of Contents is through navigation bars and breadcrumbs and hacking together ‘lists’ of associated content.

However, books have a very linear structure so one thing follows another in a vertical narrative. That’s not to say you can’t break the structure. However, if you intend to create paper books or electronic books, then it is very difficult to escape this very strict linearity. An easy and fluid way to manage this vertical linearity is a must for any book production platform.

There have been attempts at making this work with other types of software. Mediawiki makes an attempt to do this with the concept of ‘collections’. A user can save references to articles in a list then jockey them into order. Other software, such as PressBooks, has attempted the same with blogs – organising blog posts into a list. But the solutions are never satisfactory. It always feels like a compromise – the poor user is asked to work with something that feels like it’s actually built for another job, and it is.

A Table of Contents interface should be the heart of a book production platform, and the platform should own that space. There needs to be a fluid and fast interface for creating new chapters, moving items around, quickly seeing the structure of the book, and if you like the idea of concurrent production, then it needs also to be able to show at a glance who is working on what. The user needs to be able to add and remove chapters at a click, move chapters outside of the ToC while keeping them available for possible later use, or rename them – right from the ToC without the need to dig down into individual chapter settings. Additionally, the changes in structure need to be dynamic: updating immediately for every user without the need to refresh. The status of individual chapters needs to be ascertained from a single glance at the ToC.  The interface for book production should be fast, clean (separate from additional blog or wiki cruft), and feel like it is an interface built specifically for the job it is attending to.

The ToC is how people get an overview of a book, and book production software necessarily gives  them the tools to manage that overview. So start with the Table of Contents as a central UI element, and let the user go from there. The user will then feel that the platform is all about managing their book. The UI will communicate ‘book! book! book!’…not ‘wiki!…err…book!…errr…wiki!’. A ToC as a central motif communicates the purpose of the platform and gives the user a tool that gives an immediate overview of the work at hand, and the output feels like a book from the very beginning.

Book production platforms need to have a Table of Contents interface as a central part of the paradigm, and this should be reflected in the UI – it should not just be something that has been tacked on as a ‘workable solution’. Start building something to manage a ToC beautifully and work out from there.

Typography JavaScripts

A list of Javascripts for typography.

Hyphenation

Sweet Justice
License: BSD
Code: https://github.com/aristus/sweet-justice
WWW: http://carlos.bueno.org/2010/04/sweet-justice.html

Hyphenator
License: LGPL
Code: https://code.google.com/p/hyphenator/downloads/list
WWW: https://code.google.com/p/hyphenator/

Hypher
License: BSD
Code: https://github.com/bramstein/hypher
WWW: http://www.bramstein.com/projects/hypher/

Font Resizing

FlowType.js
License: MIT
Code: http://github.com/simplefocus/FlowType.JS
WWW: http://simplefocus.com/flowtype/

Squishy
License: unspecified (argh! please include a license file!)
Code: https://github.com/lemonmade/squishy
WWW: http://cmsauve.com/projects/squishy/

Fittext
License: WTFPL
Code: https://github.com/davatron5000/FitText.js
WWW: http://fittextjs.com/

SlabText
License: MIT
Code: https://github.com/freqDec/slabText/
WWW: http://freqdec.github.io/slabText/

Responsive Text
License: MIT
Code: https://github.com/ghepting/jquery-responsive-text
WWW: n/a

Line Spacing

Typeset
License: unspecified (argh… )
Code: https://github.com/bramstein/typeset
WWW: http://www.bramstein.com/projects/typeset/

Kerning

Kern.js
License: WTFPL
Code: https://github.com/bstro/kern.js
WWW: http://www.kernjs.com/

Kerning.js
License: MIT (license file is in the wrong place – check the README)
Code: https://github.com/endtwist/kerning.js
WWW: http://kerningjs.com/

TypeButter
License: CC-BY-SA 3.0
Code: https://github.com/hudsonfoo/typebutter
WWW: http://typebutter.com/

Drop Caps

DropCap.js
License: confused (All rights reserved and apache??)
Code: https://github.com/adobe-webplatform/dropcap.js
WWW: http://blogs.adobe.com/webplatform/2014/10/02/drop-caps-are-beautiful/

Color

Color Font
Licence: WTFPL
Code: http://manufacturaindependente.com/colorfont/media/js/colorfont.js
WWW: http://manufacturaindependente.com/colorfont/

Font Tricks

Lettering JS
License: WTFPL
Code: https://github.com/davatron5000/Lettering.js
WWW: http://letteringjs.com/

jqisotext
License: MIT & GPL
Code: https://code.google.com/p/jqisotext/downloads/list
WWW: http://workshop.rs/2010/01/jqisotext-jquery-text-effect-plugin/

Arctext.js
License: MIT
Code: http://tympanus.net/Development/Arctext/Arctext.zip
WWW: http://tympanus.net/codrops/2012/01/24/arctext-js-curving-text-with-css3-and-jquery/

Base Lines

Baseline.js
License: WTFPL
Code: https://github.com/daneden/Baseline.js
WWW: n/a

Baseline CSS
License: CC-BY-SA 3.0
Code: http://baselinecss.com/download/baseline.zip
WWW: http://baselinecss.com/

HUGrid
License: GPL
Code: http://bohemianalps.com/tools/grid/HeadsUpGrid_download.zip
WWW: http://bohemianalps.com/tools/grid/

More coming…

The Case for HTML Word Processors

If you like my thoughts on formats, publishing systems, development methods, and open source subscribe to my newsletter.

Making a case for HTML editors as stealth desktop word processors… the strategy has been so stealthy that not even the developers realised what they were building.

We use all over-complicated software to create desktop documents. Microsoft Word, LibreOffice, whatever you like – we know them. They are one of the core apps in any user’s operating system. We also know that they are slow, unwieldy and have lots of quirky ways of doing things. However, most of us just accept that this is the way it is and we try not to bother ourselves by noticing just how awful this software actually is.

So, I think it might be interesting to ask just this simple question – what if we used desktop HTML editors instead of word processors to do word processing? It might sound like an irrational proposition… word processors are, after all, created for word processing. HTML editors are for creating…well, …HTML. But let’s just forget that. What if we could allow ourselves to imagine we used an HTML editor for all our word processing needs and HTML replaced .docx and .odt and all those other over-burdened word processing formats. What would we win and what would we lose?

The first thing to recognise is that word processors and HTML editors actually look and work in kinda the same way. They have a big blank page to start with – the empty text canvas. They have similar toolbars with similar tools. They both essentially just allow you to write words on a page and place other stuff on it. You can also change font sizes, styles, colours, backgrounds etc and add images, tables, whatever you like.

There are two big differences that are apparent in their interfaces, however:

  • in a Word Processor there are nice margins to write within
  • in an HTML editor there is a ‘view source’ allowing you to see the markup behind the display version

Seeing the source is quite nice for those that know how to edit raw HTML. That is certainly an advantage, but for those that do not know how to edit HTML then this difference means relatively little. However, having margins in the Word Processor feels pretty necessary. It is the legacy from the age of print that we just can’t seem to shake. We still need electronic word processors to create interfaces that conform to the standard paper sizes of our region. In the US, it’s default US Letter, and in Europe, it is A4. As crazy as it sounds, margin-less word processing is going to take a long time to take off because of our legacy attachment to paper. That is why Google Docs looks like a page. It doesn’t make sense but it does make a difference, especially in adoption.

The good news is… adding margins to an HTML editor is easy because we can just add CSS to the document and there you go… in fact I think you can add quite nice margins, much nicer (and easier to change) than you do for the typical word processor document. If you define the print region CSS to the page size you want to have printed, then HTML docs can look, feel, and work pretty much the same way word processor docs do. With some CSS trickery, it is even possible to include pagination.

But what about storage? Word processors store nice single files on your computer. HTML files, however, have all these messy attachments. CSS files, JS, images… scattered all over the show.

But but but!…. a .docx file, the format created by the latest MS Word, is just a compressed archive containing many files. It is actually a zip file. You can try this for yourself. Grab a .docx file, change the suffix from ‘.docx’ to ‘.zip’ and then open it with whatever you use to open zip archives. Taadaa! A folder containing a whole bunch of XML files and other crufty stuff. We think of .docx as a file but it is not, it is a collection of files stored in a compressed container (zip).

So, isn’t that cheating a little? It’s a cheap way to clean up a file system and lucky for us HTML has a companion technology that does just the same thing – EPUB. EPUB is an ebook format which is also just a zip file. You can do the same trick to open an EPUB as you used to open a .docx file. So why not use EPUB as a local storage format for desktop word processing?

But HTML editors don’t allow you to export to EPUB…well, that is one of these side steps we will have to recognise if we went down the path of the HTML word processor. We must think about the page, as well as the application as being the component that offers user functionality. If we can make this conceptual side step then we can see that it would be quite possible to add EPUB export functionality to the document using Javascript… we don’t have to build these features into the core application. The tools are already there… there are plenty of desktop HTML editors (BlueGriffin, Kompozer, Dreamweaver (eek! proprietary!) etc) out there we just need some really smart looking and easy to use, feature-full templates (HTML files)…

So, could an HTML editor with nice margins, and output stored as EPUBs on your file system to keep things clean, be used as a tool for word processing? I just can’t see a reason why not. The main thing in the way is our own stupidity. We think that:

  • HTML is for the web
  • HTML editors are for creating ‘web pages’
  • EPUBs are for ‘ebooks’

But these are conventions. Conventions don’t have to stand. We can pull them down if they don’t make sense and these particular class differences just don’t seem to make much sense. We are making very stupid category mistakes and it is preventing a lot of innovation and efficiencies.

If we could break the way we think of HTML editors down a little and re-imagine them as document creators whose format happens to be HTML, then we would get to some very interesting places very fast. It would help us break free of lock-in legacy ‘ways’ rained down upon us by the creators of out of date technologies like LibreOffice and MS Word.

What’s Wrong with Markdown?

Markdown (.MD) is a text format that lazy people use to write HTML. Unfortunately once those same lazies are used to the format, their eyes glaze over and they start to believe .MD is the solution for all the world’s problems. They share a lot in common with githubbies who think github is the solution for book production, open source, bad democracies etc..

.MD files are common in the geek world. Programmers love them. The design of .MD is simple and efficient. If you know the syntax, you can write basic text documents with headers and bullet lists, blockquotes, bold & emphasis etc. pretty quick. That makes it a handy tool for the elite of text workers – programmers – to develop simple text documents, quickly. So it’s a popular format for writing, for example, human-readable README files that tell you a little bit about the software you are about to install. However, that is where the use case ends. .MD is not a useful format for many other cases unless you want to prove to the programmers that you too can do tricky stuff in plain text. For the rest of us, it has little value.

Markdown was originally developed by John Gruber in 2004 and you can read about some of the reasons why he developed it here. The original purpose of .MD is that it can be read without converting it to another format like HTML. Presumably, John Gruber wanted a format that could be read easily by the eye, allowing the user to be able to quickly understand which part of a text is a heading, which part is a list, which part is a paragraph etc .MD is designed not to be rendered for display, it is meant to be the display.

For example, a list written in .MD would look something like this:

* item 1 
* item 2
* item 3

That actually looks like a list. In HTML we would do something similar and it would look like this:

  • item 1
  • item 2
  • item 3

An asterisk looks like a bullet, so there is little cognisant drag here. Pretty readable.

However, things start to fall apart pretty fast. Can we really parse at speed, for example, nested bold and emphasis in .MD like this:

The quick *brown fox* **jumps** *over* the lazy **dog**  

Is that or the following easier to read?

The quick brown fox jumps over the lazy dog

I’m going to say the second is easier to read. Way easier to read. So, that is just the start of the problems; from here on in it goes downhill pretty quick for use cases beyond simple READMEs.

MarkDown isn’t designed for creating HTML

So, let’s assume we can agree Markdown is readable in a very limited number of scenarios – and move on to rationalise (grasp at straws) other needs for the format. That is pretty much where we are today, with the big selling point being that Markdown is an easy way to create HTML. But let’s face it, even programmers don’t like reading raw Markdown and even in the most popular .MD repository of them all – github – the Markdown files are rendered in the browser as nicely formatted HTML. Great! A use case we can stand behind – use Markdown to create HTML.

However .MD is really a pretty bad way to create HTML. Firstly, you need something to convert the .MD into HTML. So if you use just a plain text editor to create .MD files and load it into the browser you will see just plain, boring Markdown. No nicely formatted documents for you. There are tools that programmers like, and so the rest of us are also expected to like them, for converting .MD to .html. After all, according to the technically gifted, converting a .MD file to HTML is “really really easy.”

One of the most common tools for doing this is Pandoc. Pandoc is a great software and extremely useful. However having to install and learn how to run Pandoc – a complex tool at best – to convert a text file to something readable – sounds a little like the long way home. And  that’s not where the rot ends, far from it, the rot has only just set in and the worst is to come. If everyone was to use Pandoc to convert .MD to HTML we would have consistent conversion results. Unfortunately, that’s not what happens. Each to their own, and we have a lot of different tools with which to do these conversions, and hence we have different results created from the same source. Ugh. That is a file format nightmare right there.

And let’s say you want to add a little colour to your text. Perhaps a highlight? Forget about it. Markdown lacks the tools to enable you to do it. Pandoc might help, however – let’s add some colour highlights to a code block with this easy to remember command from the Pandoc manual:

pandoc code.text -s --highlight-style pygments -o example18a.html

So, first of all, do you know what a command is? Do you know what a terminal is? Happy using one? Oh..that’s actually not ok for you? No problems, there are plenty of online introductory courses on the command line. So before you write that funding document, “about” page on your website or scholarly research document – just whip through a quick course on the command line and you’ll be all set! (don’t forget to read the sections about installing software from the command line, you’ll need it to get Pandoc working).

Problems with conversion tools aside – Markdown struggles to find a nice way to represent HTML. It’s just a bad fit. Use Markdown for creating HTML and you will find all sorts of little formatting gotchas that will cause you frustration. It is why many markdown environments/conversion tools also support HTML tags.

All HTML is valid Markdown. If you’re stuck, not able to format your content as you would like (for example using tables), you can always use plain HTML instead of Markdown. http://support.ghost.org/markdown-guide/

So if you want to really write HTML with Markdown you have to, well, write HTML. Klaro.

Markdown was never intended for writing HTML. It wasn’t designed that way and for good reason – it doesn’t do it well.

Codified text

As mentioned above, by design, the original markdown has a very small subset of elements that can be converted to HTML. As John Gruber says in his philosophy:

Markdown is not a replacement for HTML, or even close to it…The idea for Markdown is to make it easy to read, write, and edit prose.

So, Markdown is not actually designed to be a good format for creating HTML. And it lives up to its design. It is for this reason that some Markdown formats ‘extend’ Markdown to include HTML code, and there are also other forks of Markdown that do some really weird stuff that I can hardly explain. For example, Ghost Markdown, the version of Markdown used for the (Open Source) Ghost blogging platform, tries to wrangle image formatting into Markdown. To place an image you have to write the following:

![]()

Intuitive, right? Nope.

The above is really a leap from ‘readable’ to ‘codified’. It is codified text and in order to be able to work with it, you need to know how to de-code the text… I’m sorry, but I just don’t get it. Markdown adds another level of codified complexity which I then need to de-code first (according to non-standardised, and not-standard rules written in some help file somewhere if I’m lucky), so that I can then sally forth and read the content? No thanks.

Say no to codified text.

Non-standardised formats suck

Efforts to take Markdown and extend it to meet a wider variety of formatting needs are actually where the big trouble starts. Markdown has gone off in a hundred-and-one different directions, each with its own syntax.

That means, if I want to write a Ghost blog (I love Ghost by the way, no disrespect to them) in Markdown (their required format) then it is not enough to learn Markdown. I must learn Ghost markdown …their particular reading of what a good markdown format is… So, that leads us to one of the really big problems. Markdown is not standardised.

Can any of us think of another non-standardised text format and where that leads us? Does MS Word and ‘world of pain’ ring any bells? Yes, Markdown is non-standardised and that is a very big no-no. It is, in fact, quite shocking that programmers, big on standards, do not quite see that by advocating Markdown they advocate dropping some central best practices. Can’t say anything more about that really.

But Markdown is structured!

I often hear the term “structured text” when referring to Markdown. For example, the opening lines from the CommonMark pre-amble.:

Markdown is a plain text format for writing structured documents

Sounds good doesn’t it? Sounds very techy and convincing. But what is structured text? Structured text means basically that we can see if something is a heading or something is a bulleted list, or something is a paragraph. Huh? But that describes just about any text document. Structured text is the basic requirement of any text you create – without it, you just have a flat plain-text document with no headings, no bullet lists etc. So… we might as well start every sentence about documents designed to be read as being ‘structured’. I think tomorrow I will go and buy a structured book. Or perhaps I will write a structured narrative on my text-structuring word processor. Excuse me word processor sales person, do you have structured text word processors? Ugh. Meaningless.

What is left?

Markdown is good for limited use cases. Use it for README files on github.

If you have a good dose of Markdown cool aid then don’t let me bring you down from your sugar high. Markdown away. However, if you have heard that there is this cool format available and it cures all your textual needs and it is just really really easy to use and really really quick… then think about the elixir you are being offered and re-read this document before slurping away…

Some comments on this article on Hacker News here:
https://news.ycombinator.com/item?id=8403783

MarkDown, ugh

MarkDown…lets call it what it is – A Less Useful MarkUp (ALUMU?)…We should also invent a new category for both ALUMU and MarkUp. I don’t have a good name for this superset but it can be characterised easily as “non-standardised, codified text”.

Why anyone would want to use non-standardised, codified text is very hard to understand. I thought we had learnt that mistake from MS Word.

Detained Text

Desktop documents such as MS Word and ODT could be called ‘detained text’ formats. A work might exist on the desktop of any number of people at the same time. However, if more than one person needs to work on that file, we have a problem. There is no way to synchronise the changes made on every copy of the file across all those desktops.

The work-around is for one person to work on the file and then email it to the next person. Hence the text is detained by the current person working on the document. All those waiting down the line must wait until the text is released to them. The further down the line they are, the longer they wait.

Detained text slows down any process where more than one person is involved – whether it be writing, processing, editing, reviewing or other such activities. Detained text slows down text production.

The Collaborative Spectrum

We often talk about collaboration in text production as if it is a switch with default position ‘off’. There is no in-between state. Either collaboration is turned on, or it is turned off.

Collaboration is a broad spectrum of activities and affects all text production. Any given work can be discussed as a position on a spectrum of collaborative activity. At one end we have ‘weak collaboration’, and at the other ‘strong collaboration’. The character of production can be described by a collection of attributes which identify where it sits (at a given time) on this continuum.

Weak Collaboration

  • coordination through technical frameworks
  • little or no human coordination
  • coordination rather than collaboration
  • isolated producers
  • small contributions
  • no shared visions on the individual level
  • discrete parallelised production outputs

Strong collaboration

  • the technical framework is an enabler
  • strong human facilitation
  • intensely connected contributors
  • large contributions
  • shared vision
  • synthesised production

A text may move along this spectrum as it undergoes different stages of production.

Collaborative Knowledge Production

Collaborative production of any knowledge artefact could be called ‘Collaborative Knowledge Production’ or CKP. Understanding the right strategy for CKP is reliant on an understanding of the content to be produced, the greater context, and the resources available. From these, a process can be designed.