Math Typesetting

Typesetting math in HTML was for a long time one of those ‘I can’t believe this hasn’t been solved by now!’ issues. It seemed a bit wrong – wasn’t the Internet more or less invented by math geeks? Did they give up using the web back in 1996 because it didn’t support math? (That would explain the aesthetic of many ‘home pages’ for math professors.)

MathML is the W3C-recommended standard markup for equations – it’s like ‘HTML tags for math.’ While MathML has a long history and has been established in XML workflows for quite some time, it was only with HTML5 that mathematics finally entered the web as a first-class citizen. This will hopefully lead to some interesting developments as more users explore MathML and actually use it as creatively as they’ve used ‘plain’ text.

However, the support in browsers has been sketchy. Internet Explorer does not support MathML natively, Opera only supports a subset, and Konqueror does not support it. Webkit’s partial support only recently made it into Chrome 24 and was held back on Safari due to a font bug. Firefox wins the math geek trophy hands down with early (since FF 3) and best (though not yet complete) support for MathML.

What’s the hold-up?

It seems that equation support is underwhelmingly resourced in the browser development world. Webkit’s great improvements this year (including a complete re-write + the effort to get it through Chrome’s code review) have been due to a single volunteer, Dave Barton. Frédéric Wang plays the same heroic (and unpaid) role for Firefox. The question is: why is this critical feature being left to part-time voluntary developers? aren’t there enough people and organisations out there that would need math support in the browser (think of all the university math departments just for starters…) to pay for development?

As a result, we have no browser that fully supports MathML. While you cannot overstate the accomplishments of the volunteer work, it’s important to tell the full story. Recent cries that Chrome and Safari support MathML can give the wrong impression of the potential for MathML when users encounter bad rendering of basic constructs. The combination of native MathML support, font support, and authoring tools, [ see this more recent post for more detail] makes mathematical content one of the most complex situations for web typesetting.

There have been some valiant attempts to fix this lack of equation support with third-party math typesetting code (JavaScript), notably Asciimath by Peter Jipsen with its human-readable syntax and image fallbacks, jqmath by Dave Barton (same as above), and MathQuill.

However, by far the most comprehensive solution is the MathJax libraries (JavaScript) released as Open Source. While MathJax is developed by a small team (including Frédéric Wang above) they are bigger than most projects, with a growing community and sponsors. MathJax renders beautiful equations as HTML-CSS (using web fonts) or as SVG. The markup used can be either MathML, Asciimath or TeX. That means authors can write (ugly) markup like this:

J\alpha(x) = \sum\limits{m=0}^\infty \frac{(-1)^m}{m! , \Gamma(m + \alpha + 1)}{\left({\frac{x}{2}}\right)}^{2 m + \alpha}

into an HTML page and it will be rendered into a lovely looking scalable, copy-and-paste-able equation. You can see it in action here. 

So why is this really interesting to publishing?

Well… while EPUB3 specifications support MathML, this has not made it very far into e-reader devices. However, MathJax is being utilised in e-reader software that has JavaScript support. Since many ebook readers are built upon Webkit (notably Android and iOS apps, including iBooks) this strategy can work reasonably effectively. Image rendering of equations in ebook format is another strategy, and possibly the only guaranteed strategy at present, however, you can’t copy, edit, annotate or scale the equations and you cannot expect proper accessibility with images.

The only real solution is full equation support in all browsers and e-readers either through native MathML support or through the inclusion of libraries like MathJax. We can’t do anything to help the proprietary reader developments get this functionality other than by lobbying them to support EPUB3 (a worthwhile effort) but since more and more reading devices are using the Open Source WebKit, we can influence that directly.

The publishing industry should really be asking itself why it is leaving such an important issue to under-resourced volunteers and small organisations like MathJax to solve. Are there no large publishers who need equation support in their electronic books that could step forward and put some money on the table to assist WebKit support of equations? Couldn’t more publishers be as enlightened as Springer and support the development of MathJax? You don’t even have to be a large publisher to help – any coding or financial support would be extremely useful.

It does seem that this is a case of an important technological issue central to the development of the publishing industry ‘being left to others.’ I have the impression that it is because the industry as a whole doesn’t actually understand what technologies are at the centre of their business. Bold statement as it is, I just don’t see how such important developments could go otherwise untouched by publishers. It could all be helped enormously with relatively small amounts of cash. Hire a coder and dedicate them to assist these developments. Contact Dave Barton or MathJax and ask them how your publishing company can help move this along and back that offer up with cash or coding time. It’s actually that simple.

Many thanks to Peter Krautzberger for fact checking and technical clarifications.

Published on O’Reilly, 26 November 2012


InDesign vs. CSS

The explosion in web typesetting has been largely unnoticed by everyone except the typography geeks. One of the first posts that raised my awareness of this phenomenon was From Print to Web: Creating Print-Quality Typography in the Browser by Joshua Gross. It is a great article which is almost a year old yet still needs to be read by those that haven’t come across it. Apart from pointing to some very good JavaScript typesetting libraries, Joshua does a quick comparison of InDesign features vs. what is available in CSS and JS libs (at the time of writing).

It’s a very quick run down and shows just how close things are getting. In addition, Joshua points to strategies for working with baselines using grids formulated by JavaScript and CSS. Joshua focuses on Hugrid but also worth mentioning is the JMetronome library and BaselineCSS. These approaches are getting increasingly sophisticated and, of course, they are all Open Source.

It brings to our attention that rendering engines utilising HTML as a base file format are ready to cash in on some pretty interesting developments. However, it also highlights how rendering engines that support only CSS (such as the proprietary PrinceXML) are going to lose out in the medium and long term since they lack JavaScript support. JavaScript, as I mentioned before, is the lingua franca of typesetting. JS libs enable us to augment, improve, and innovate on top of what is available directly through the browser. Any typesetting engine without JavaScript support is simply going to lose out in the long run. Any engine that ignores JavaScript and/or is proprietary, loses out doubly so since it is essentially existing on a technical island and cannot take advantage of the huge innovations happening in this field which are available to everyone else.

Once again, this points the way for the browser as typesetting engine; HTML as the base file format for web, print, and ebook production; and CSS and Javascript as the dynamic duo lingua franca of typesetting. All that means Open Source and Open Standards are gravitating further and faster towards the core of the print and publishing industries. If you didn’t think Open Source is a serious proposition it might be a good idea to call time-out, get some JS and CSS wizards in, and have a heart-to-heart talk about the direction the industry is heading in.

Correction: The latest release of PrinceXML has limited beta Javascript support. Thanks to Michael Day for pointing this out.

Originally published on O’Reilly, 19 Nov 2012


Open API: The Blanche DuBois Economy

‘Open API’ is a well-known term that seldom gets challenged. It passes in conversation as an agreed-upon good. However, it should be recognised that there is no such thing as an Open API – the term is a euphemism for a specific kind of technical service offered with no contractual agreement to secure those services.

‘Open API’ trades off the term ‘open’ to imply some kind of transparency – which often is missing on both technical and business levels. Badly documented APIs with no guarantee of service are usually what you will be dealing with. To rely on an Open API is to  lack transparency – it is doing business by relying on the kindness of strangers (as Blanche DuBois would say).

In other words, the Web 2.0 economy encourages us to utilise third-party services with no contractual agreements in place for these critical, and sometimes core, business processes.

This week there was a very interesting case in point. announced they are  discontinuing support for their publishing API. In an email sent out early in the week, clients were informed:

We’d like to thank you for your participation in the Lulu API Program and share an important update. Over the last two years, we have made significant investments into building our APIs and making them useful to third-party developers, however they have not achieved the adoption which we had anticipated and we will no longer be supporting APIs.

We will discontinue support of our existing APIs on January 15th, 2013.

The Lulu API was, by the way, a very good API and very well documented. However, the above means that if you relied on the Lulu API for your business, then bad luck, you better find another way to do it.

Lulu, of course, is subject to its own business processes and they have obviously invested a lot in the development of an API for little return. Their approach has been very straightforward and they communicated to their clients well in advance that there was going to be a change in their API service. As far as things go, Lulu has done it pretty much by the book, however, my point is addressed to a wider concern: API providers should offer their clients a contract which guarantees the terms and conditions of technical services provided by their APIs. Without this contract, businesses are not in a position (and should not be tempted) to confidently build innovative models that leverage the powerful possibilities that these APIs can offer.

Originally posted on O’Reilly, 13 November 2012

Why Attribution is not Sustainable

It’s easy to understand the problems with attribution. In the one book = one author days, it was easy – put the author’s name on the cover. 50 years later, same book, same solution.

However, this does not work for collaboratively-produced works and it is one of the most difficult issues for free culture going forward. Already in FLOSS Manuals, we have books that have 8 pages of credits – each chapter individually referenced with the copyright and attribution data. Some books have 40-50 chapters and some chapters have 20+ collaborators, all of which need to be listed. For FLOSS Manuals, we solved the problem by generating these lists – when edits are done in Booktype, there is a digital record of who did what, so we just automatically generate this list. However, this is just the beginning and we are already asking ourselves – are 8 pages of credits really necessary?

The answer is no, it is not necessary. Attribution is not as important with collaboratively-produced works as one might have suspected. Those involved in the process are not too worried about it – there is some kind of excitement generated by pushing your name forward as a protagonist at some key moment in the life of a book but this can happen in the text, as part of the book’s story, or in press releases, blog posts and so on. We don’t actually need attribution – the prominent foregrounding of all the names of the individuals involved in the book’s production. What we need is backgrounding of this information and the ability to know the history and lineage of a book.

We need standards to maintain history records for a book’s development and we also need this to be stored in publicly- accessible repositories so we can check this information for interest’s sake or more meaningful uses such as historical records, or research.

These records of book history could be also very necessary for verification of content. If, for example, we use open licenses or (one day) no copyright then how do we verify quotes, for example? (In reality, provenance is orthogonal to copyright restrictions, and indeed such restrictions are an obstacle to maintaining provenance, but still. people often ask this question.)  How do I know that this book, which purports to be the thoughts of the author, actually does present the thoughts of the author and not some maliciously invented text masquerading as such ?

Currently, in free software circles, there are at least three major strategies for dealing with this kind of issue:

  • Use a publicly-trusted source for the distribution of content so people know that if the book comes from a trusted source it is what it says it is.
  • Use a ‘checksum’ – a method for verifying of any errors have been introduced in the text during the transmission (delivery) of the content.
  • Digitally-sign content so that it can be verified as coming from the purported source.
  • Libraries or public archives could play a very strong role here. Records of history would become very important in a federated or prolific free-content environment, not just for research but for providing public mechanisms and standards for the ongoing verification of sources.

Attribution is the star system of the single author = single book publishing system. With the breakdown of these models, or at least with the diversification of these models, author attribution will, out-of-necessity, become a more transient commodity. Collating version and contribution history, however, might become a business in itself.

Which is all to say that lowercase attribution (a useful bit of provenance info) is sustainable. By contrast, Attribution to Authors as creator-gods is suspect and superfluous.

Gutenberg Regions

With the onward march towards the browser typesetting engine, not an invention but a combination of technologies with some improvements (much like Gutenberg’s press), it’s interesting to think what that would mean for print production. Although there are various perspectives on how the printing press changed society – and they mostly reflect on the mass production of books – and while the first great demonstration involved a book (The Gutenberg Bible), the invention itself was really about automating the printing process. It affected all paper products that were formerly inscribed by hand and it brought in new print product innovations (no not just Hallmark birthday cards but also new ways of assembling and organising information).

The press affected not just the book but also print production. And now we have browsers able to produce print-ready PDF and the technology arriving to output PDF from the browser that directly corresponds with what you see in the browser window, we are entering the new phase of print (and book) production – networked in-browser design. We are not talking here of emulated template-like design but the 1:1 design process you experience with software like Scribus and InDesign.

The evolution of CSS

There are several JavaScript libraries that deal with this, as I have mentioned in earlier posts (here and here), and the evolution of CSS is really opening this field up. Of particular interest is what Adobe is doing behind the scenes with CSS Regions. These regions are part of the W3C specification for CSS and the browser adopting this specific feature the fastest is the Open Source browser engine Webkit, which is used by Chrome and Safari (not to mention Chromium and other browsers).

CSS Regions allow text to flow from one page element to another. That means you can flow text from one box to another box, which is what is leveraged in the book.js technology I wrote about here. This feature was included in Webkit by the Romanian development team working for Adobe. It appears that Adobe has seen the writing on the wall for desktop publishing although they might not be singing that song too loudly considering most of their business comes out of that market. Their primary (public facing) reason for including CSS Regions and other features in Webkit is to support their new product Adobe Edge. Adobe Edge uses, according to the website, Webkit as a ‘design surface’.

A moment of respect please for the Adobe team for contributing these features to Webkit. Also don’t forget in that moment to also reflect quietly on Konquerer (specifically KHTML), the ‘little open source project’ borne out of the Linux Desktop KDE which grew into Webkit. It’s an astonishing success story for Open Source and possibly in the medium term a more significant contribution to our world than Open Source kernels (I’m sure that statement will get a few bites).

”HTML-based design surface” is about as close to a carefully constructed non-market-cannibalising euphemism as I would care to imagine. Adobe Edge is in-browser design in action produced by one of the world’s leading print technology providers but the role of the browser in this technology is not the biggest noise being made at its release. Edge is ‘all about’ adaptive design but in reality it’s about the new role of the browser as a ‘target agnostic’ (paper, device, webpage, whatever – it’s all the same) typesetting engine.

A consortium, not an individual company

However, we should not rely on Adobe for these advances. It is about time a real consortium focused on Open Source and Open Standards started paying for these developments as they are critical to the print and publishing industries and for anyone else wanting to design for devices and/or paper. That’s pretty much everyone.

Gutenberg died relatively poor because someone else took his idea and cornered the market. Imagine if he put all the design documents out there for the printing press and said ‘go to it.’ Knowing what you know now, would you get involved and help build the printing press? If the answer is no, I deeply respect your honesty. But that’s where we are now with CSS standards like regions, exclusions, and the page-generated content module. The blueprints are there for the new typesetting engine, right there out in the open. The print and publishing industries should take that opportunity and make their next future.

Originally posted on Nov 6, 2012 on O’Reillys Tools of Change site: