The Case for HTML Word Processors

If you like my thoughts on formats, publishing systems, development methods, and open source subscribe to my newsletter.

Making a case for HTML editors as stealth desktop word processors… the strategy has been so stealthy that not even the developers realised what they were building.

We use all over-complicated software to create desktop documents. Microsoft Word, LibreOffice, whatever you like – we know them. They are one of the core apps in any user’s operating system. We also know that they are slow, unwieldy and have lots of quirky ways of doing things. However, most of us just accept that this is the way it is and we try not to bother ourselves by noticing just how awful this software actually is.

So, I think it might be interesting to ask just this simple question – what if we used desktop HTML editors instead of word processors to do word processing? It might sound like an irrational proposition… word processors are, after all, created for word processing. HTML editors are for creating…well, …HTML. But let’s just forget that. What if we could allow ourselves to imagine we used an HTML editor for all our word processing needs and HTML replaced .docx and .odt and all those other over-burdened word processing formats. What would we win and what would we lose?

The first thing to recognise is that word processors and HTML editors actually look and work in kinda the same way. They have a big blank page to start with – the empty text canvas. They have similar toolbars with similar tools. They both essentially just allow you to write words on a page and place other stuff on it. You can also change font sizes, styles, colours, backgrounds etc and add images, tables, whatever you like.

There are two big differences that are apparent in their interfaces, however:

  • in a Word Processor there are nice margins to write within
  • in an HTML editor there is a ‘view source’ allowing you to see the markup behind the display version

Seeing the source is quite nice for those that know how to edit raw HTML. That is certainly an advantage, but for those that do not know how to edit HTML then this difference means relatively little. However, having margins in the Word Processor feels pretty necessary. It is the legacy from the age of print that we just can’t seem to shake. We still need electronic word processors to create interfaces that conform to the standard paper sizes of our region. In the US, it’s default US Letter, and in Europe, it is A4. As crazy as it sounds, margin-less word processing is going to take a long time to take off because of our legacy attachment to paper. That is why Google Docs looks like a page. It doesn’t make sense but it does make a difference, especially in adoption.

The good news is… adding margins to an HTML editor is easy because we can just add CSS to the document and there you go… in fact I think you can add quite nice margins, much nicer (and easier to change) than you do for the typical word processor document. If you define the print region CSS to the page size you want to have printed, then HTML docs can look, feel, and work pretty much the same way word processor docs do. With some CSS trickery, it is even possible to include pagination.

But what about storage? Word processors store nice single files on your computer. HTML files, however, have all these messy attachments. CSS files, JS, images… scattered all over the show.

But but but!…. a .docx file, the format created by the latest MS Word, is just a compressed archive containing many files. It is actually a zip file. You can try this for yourself. Grab a .docx file, change the suffix from ‘.docx’ to ‘.zip’ and then open it with whatever you use to open zip archives. Taadaa! A folder containing a whole bunch of XML files and other crufty stuff. We think of .docx as a file but it is not, it is a collection of files stored in a compressed container (zip).

So, isn’t that cheating a little? It’s a cheap way to clean up a file system and lucky for us HTML has a companion technology that does just the same thing – EPUB. EPUB is an ebook format which is also just a zip file. You can do the same trick to open an EPUB as you used to open a .docx file. So why not use EPUB as a local storage format for desktop word processing?

But HTML editors don’t allow you to export to EPUB…well, that is one of these side steps we will have to recognise if we went down the path of the HTML word processor. We must think about the page, as well as the application as being the component that offers user functionality. If we can make this conceptual side step then we can see that it would be quite possible to add EPUB export functionality to the document using Javascript… we don’t have to build these features into the core application. The tools are already there… there are plenty of desktop HTML editors (BlueGriffin, Kompozer, Dreamweaver (eek! proprietary!) etc) out there we just need some really smart looking and easy to use, feature-full templates (HTML files)…

So, could an HTML editor with nice margins, and output stored as EPUBs on your file system to keep things clean, be used as a tool for word processing? I just can’t see a reason why not. The main thing in the way is our own stupidity. We think that:

  • HTML is for the web
  • HTML editors are for creating ‘web pages’
  • EPUBs are for ‘ebooks’

But these are conventions. Conventions don’t have to stand. We can pull them down if they don’t make sense and these particular class differences just don’t seem to make much sense. We are making very stupid category mistakes and it is preventing a lot of innovation and efficiencies.

If we could break the way we think of HTML editors down a little and re-imagine them as document creators whose format happens to be HTML, then we would get to some very interesting places very fast. It would help us break free of lock-in legacy ‘ways’ rained down upon us by the creators of out of date technologies like LibreOffice and MS Word.

Print Friendly, PDF & Email

21 thoughts on “The Case for HTML Word Processors”

  1. This is exactly what I think. Writing document in HTML format sounds right and the format should last longer because at the end, it is text that focuses on content hierarchy.

    Some background: I write HTML5 games books and teach web design class. So I’ve been thinking how I can write in HTML format and publish to different outputs, instead of writing in word processors.

    1. Thats an interesting point Thomas – the push towards ‘interactivity’ (for want of a better term – perhaps Brett Victors Exploratory Documents is better) would suggest also that HTML authoring environments for (primarily) text based content is inevitable.

  2. what do you think about COMPOSER of SEAMONKEY for html word processor?
    to me it seems to be the right solution
    could it be a kind of answer?

  3. In fact Google docs today has an excellent download zipped html or ebub function. Downing exactly what you’re suggesting..

  4. This is exactly the design idea behind the word processor for html documents : ConstEdit (disclaimer : I am the author of the software).

    ConstEdit performs what you are suggesting. It allows you to edit documents in the html format without any prior knowledge on html coding. It works more or less in the same way as a regular word processor. It makes use of html5 sectioning features to allow rearranging sections with drag and drop. And of course, it also makes good use of external CSS to separate formatting from content authoring. The epub part of your suggestions is not supported though; only plain html files are saved.

    ConstEdit is implemented in the form of a Google Chrome web-browser extension, available from Chrome Web Store.

  5. Hallo,
    I’m more or less on the same track :).
    May you can tell how migrate from office documents to modern web technologies based documents. The reason is clear a push towards ‘interactivity’ what is a way easier with HTML than with office tools.

        1. To do this you need a tool to produce more interesting functionality… we are producing such a thing – Wax – it is a web based ‘word processor’ but more than that…it can be extended with js to do, more or less, whatever you like…

  6. I totally agree with your points.
    One of my biggest reasons for wanting to use HTML is revision control. Most revision control systems treat every revision of a word processor file as one big binary blob. There is no way to look at the exact text changes with common tools. It is also very inefficient to store revisions this way. One could save the file as HTML, but word processors create overly complex HTML.

    1. ha! So, I wondered when the grumpy old skool would turn up! Latex is a dying breed matey! HTML is by FAR the dominant document format in the world. Get used to it.

      1. Good to know that its literally impossible to take anything you say seriously.

        Next, you’re going to tell us that HTML is a programming language.

        1. Well, you obviously took me seriously enough to spend some of your very valuable time to write a comment on my post! Or maybe you just have too much time to spare. Get a hobby.

Leave a Reply to other mark Cancel reply

Your email address will not be published. Required fields are marked *