Wendell Piez and I just co-wrote a post for the Coko website about the quandary of going from MS Word to HTML:
In the days of the typewriter, a typescript was a typed copy of a work. The typescript copy was used for improving the document through the editorial process – for reviewing, commenting, fact and rights checking, revision etc. For many years, since the advent of desktop publishing, the hand-typed
The point being that publishers take badly structured Word documents and process them. Adding structure etc and then ‘throwing them over the wall’ to outsourced vendors to convert into other formats. When publishers add structure to documents, they often do this with MS Word and custom-built extensions. They simply click on part of the text, choose the right semantic tag, move to the next. Just imagine… how many publishers have built these custom macros (it is very common) and also imagine that each publisher must tweak the macro code with new releases of MS Word. Tricky and expensive!
So, the point is, why not do that in the browser using web-based editors? It not only brings the content into an environment that enables new efficiencies in workflow but it also means publishers don’t have to keep upgrading these macros all the time. Further, if the tools for doing this in the browser are Open Source…well… you get the picture – share the burden, share the love.
So the article is a small semantic manoeuvre to get the conversation away from the rather opinionated, but dominant position, that MS Word-to-HTML conversion is terrible because you can’t infer structure during the conversion process… The implication is that HTML isn’t ‘good enough’. Our point is, you don’t need to infer the structure because it wasn’t there in the first place. Plus, HTML is an excellent format for progressively adding structure since it is very forgiving – you can have as much, or as little, structure as you like with HTML. Hence we can look to shared efforts to build shared browser-based tools for processing documents rather than creating and maintaining one off macros.