Books are Evil, Really Evil pt1

Right now books are something of an ironic artefact for me. I am involved in the rapid production of books through a process known as a Book Sprint. We create books. We throw a bunch of people in a room for a week, and carefully facilitate them through a process, progressing them step by step, from zero to finished book, in 5 days or less.Write a book in a week?! An astonishing proposal. Most people who attend a Book Sprint for the first time think it is impossible. Create a book in a week?! Most think that maybe they can get the table of contents done in that time. Maybe even some structure. But a book? 5 days later they have a finished book and they are amazed.

There are many essential ingredients to a Book Sprint. An experienced Book Sprint facilitator is a must. A venue set up just so… Lightweight and easy-to-use book production software. A toolchain that supports rapid rendering of PDF and EPUB from HTML. Good food… A writing team… and a lot more.

One of the contributing factors to success is the terror caused by the seemingly impossible idea that the group will create a book. It is a huge motivator. Such is the enormity of the task in the participants’ minds that they follow the facilitator and dedicate themselves to extremely long hours, working on minute details even when exhausted. There is a lot of chemistry in there. Camaraderie and peer pressure are pushed to maximum effect as a motivational factor, as is fear of failure, especially fear of failure before your peers, both inside and outside the Sprint room. The pleasure of helping your peers is a strong motivator, as is the idea that together we will do this! But the number one motivator is the idea that we are going to produce a book.

We all know that books these days, paper books, are published from a PDF. You send a PDF to the printer, and the final output is a perfect bound book. This happens for most Book Sprints – we send the final PDF to a printer for them to produce the printed book. So what we are creating is actually a PDF (along with an EPUB) …but imagine if we were to call the event “PDF Sprint”. At the beginning of the PDF Sprint we could announce that we have gathered everyone together…so that…at the end of the week…they will have….(gasp!)…a PDF!

Nope. Doesn’t work. Doesn’t even nearly work. A book is the seemingly impossible outcome that Book Sprint participants have come to conquer. Even though the definition of ‘what a book is’ is completely up for grabs, it is abook they are determined to produce. A book is the pinacle of knowledge products, and writing a book is about equal in cerebal achievements to climbing Everest. A PDF is merely getting to base camp, or perhaps the equivalent to planning the trip from your armchair.

So, what’s the problem? Books are good then! A great motivator for Book Sprints. Where exactly is the irony? How can I complain?

Book Sprints are extraordinary events. The people are not just put into a room and left to write. They are led through a process where notions of single authorship and ownership of content just no longer make sense. Such ideas are unsustainable and nonsensical in this environment, and participants slowly deconstruct ideas of authorship over the 5 days.

The participants actively collaborate during the event. Really collaborate. Book Sprints are a kind of collaborative therapy. Each participant learns to let go of their own voice so they can contribute to constructing a new shared voice with the rest of the team. They learn new ways to contribute to group processes, to communicate, to improve each other’s contributions, to synthesize, to empower and encourage others to improve the work without having to ask permission.

The resulting book has no perceivable author. It has been delivered by what is now a community. And as a result, most of the books, about 99% I would say, end up being freely licensed. A book born by sharing is more easily shared. More easily shared than a book created with the notions of author-ownership. The idea of sharing is embedded in the DNA of the Book Sprint, part of the genesis of the product, and sharing more often than not becomes part of the life of the book after the Book Sprint is completed.

But books are evil

So, how is it possible I can take the position that books are evil? Where exactly is the irony? It is a lovely story I just painted. Lots of flowers and warm fuzzy feelings. Wow. Sharing, sharing, sharing… it’s a book love-in!

Well… with some regret, I have to admit that most books do not come into the world this way. They are produced and delivered through legacy processes. Cultural norms shape the production and reception of books, and the ideas contained within them are not born into freedom. These books are, normatively, created by ‘single author geniuses’, born into All Rights Reserved knowledge incarceration, and you cannot recycle them.

Try as we may, we are a little group of people. A small band of Book Sprinters, and it is unlikely that we can sway the mainstream to our way of doing things. We have many victories – Cisco released one of its Book-Sprinted books freely online! Whoot! That’s massive! But… as big as Cisco is, one Cisco book in the sea of publishing is merely a grain of salt in the Pacific. By adding our special grain of salt to this ocean we are by no means making our point more salient.

Books are doomed to be the gatekeepers of knowledge. If you make a book, you are, more than likely, sentencing the words in it to life + 50 years (depending on where you live).

Books are in fact the very artefacts that maintain proprietary knowledge culture.

It comes down to these three issues for me:
1. books gave birth to copyright
2. books gave birth to industrialised knowledge production
3. books gave birth to the notion of the author genius

These three things together are the mainstays of proprietary knowledge culture, and proprietary knowledge culture has been firmly encased and sealed, with loving kisses, between the covers of the book. Ironically these three things, through the process of the Book Sprint, are what we are trying to deconstruct.

many thanks to Raewyn Whyte for improving this post

Building Book Production Platforms p4

The renderer

Note: this is an early version. It has been cleaned up some, but is still needing links and screenshots…. Apologies if the rawness offends you 🙂

This series is skipping around the toolchain, depending on what’s most in my mind at the moment. Today it’s file conversion, otherwise known as ‘rendering’. This is the process of converting one file type to another, for example, HTML-to-EPUB or Word-to-HTML, and so on.

It’s important to have file conversion in the book production world because we often want to convert the HTML to a book format – like book-formatted PDF, or EPUB, mobi and so on, or to import into a new document existing content contained in a file like MS Word.

Manual conversions

It is, of course, quite possible to do all your file conversion manually.

Should you wish to convert HTML into a nice book-formatted PDF, one possible strategy is to go out to InDesign or Scribus and lay it all out like our ancestors did as recently as 2014. Or, if you want to convert MS Word, for example, to HTML, you can just save it as HTML in Word… Yes, Word copies across a lot of formatting junk, but you can clean it up using purpose-built freely available software (such as HTMLTidy and CleanUp HTML), online services (like DirtyMarkup),or a handy app (such as Word HTML Cleaner)…

Manual conversion is not too bad a strategy, as long as it doesn’t take you too long, and it is often more efficient and faster than those convoluted hand-holding technical systems which promise to do it for you in one step. Despite the utopian promises made by automation… you often get better results doing the conversion manually.

I sometimes hear people in Book Sprints, for example, complain something to the tune of “why can’t I just click a button and import part of this paragraph from Wikipedia into the chapter, and then if the entry is updated in Wikipedia, I can just click the button again and it will be updated here”…

I try not to sigh too loudly when I hear this kind of ‘I have all the solutions!’ kind of ‘question’. Some day that may be feasible, but in the meantime, all the knowledge production platforms I have built have an OS-independent trans-format import mechanism which allows those handy keyboard shortcuts ‘control c’ and ‘control p’… sigh. Don’t knock copy and paste! It can get you a long way.

You can also build an EPUB by hand…

But, who really wants to do any of this? Isn’t it better to just push a button and taaadaaa! out pops the format of choice! (I have all the solutions! haha).

I think we can agree it is better if you are able to use a smart tool to convert your files, and the good news is that within certain parameters and for loads of use cases, this is possible. But don’t under-estimate the amount of tweaking for individual docs that might, at times (not always), be required.

Import and export are the same thing

The process of ‘importing’ a document is also sometimes known as ingestion. Before delving down into this, the first gotcha with file transformation is to avoid thinking about import and export as separate technical systems. That can, and has, caused a lot of extra work when building file conversion into a toolchain.

Both import and export are, actually, file conversion. The formats might differ, import might solely be Word-to-HTML in your system and the export HTML-to-EPUB. However, the process of file conversion has many needs that can be abstracted and applied to both of these cases. A quick example – file conversion is often processor and memory intensive. So effective management of these processes is quite important, and in addition, fallbacks for errors or fails need to be managed nicely. These two measures are required independent of the filetypes you are converting from or to. So don’t think about pipelining specific formats, try and identify as many requirements as possible for building just one file conversion system, not an import system plus an export system.

Ingestion

In importing documents to an HTML system, the big use case is MS Word. Converting from MS Word is a road full of potholes and gotchas. The first problem is that there is no single ‘MS Word’ file format, rather there are many many different file formats that all call themselves MS Word. So to initiate a transformation, you need to know what variety of MS Word you are dealing with.

Your life is made much easier if you can stipulate that your system requires one variety – .docx. If you do have to deal with other forms of Word, then it is possible to do transformations on the backend from miscellaneous Word file type X to .docx and then from .docx to HTML. Libreoffice, for example, offers binaries that do this in a ‘headless’ state (it can be executed from the command line without the need to fire up the GUI). However, the more transformations you undertake, the more errors in the conversion you are likely to introduce. Obviously, this then causes QA issues and will increase your workload per transform required.

Another real problem with MS Word versions before .docx, is that .docx is transparent, actually is just XML. So you can view what you are dealing with. Versions before this were horrible binaries – a big clump of ones and zeros – and after that a bunch of gunk. That same problem also exists when you use binaries like soffice (the Libreoffice binary for headless conversions) as it is also a big bucket of numbers. You can’t easily get your head into improving transformations with soffice unless you want to learn to etch code into your CPU with a protractor.

If you have to deal with MS Word at all, I recommend stipulating .docx as the accepted MS Word format. I am not a file type expert, far from it, but from people who do know a lot about file formats I know that .docx looks like it has been designed by a committee… and possibly, a committee whose members never spoke to each other. Additionally, Microsoft, being Microsoft, likes to bully people into doing things their way. .docx is a notable move away from that strategy, and does make it substantially easier to interoperate with other formats, however, there are some horrible gotchas like .docx having its own non-standard version of MathML. Yikes. So, life in the .docx lane is easier, but not necessarily as easy as it should be if we were all playing in the same sandbox like grownups.

I have tried many strategies for Word to HTML conversion. There are many open source solutions out there, but oddly, not as many good ones as you would hope. Recently I looked at these three rather closely:

  • Calibre’s Python based ebook converter script
  • OxGarage
  • soffice (Libreoffice)

There are others…I can’t even remember which ones I have looked at in detail over the years. I have trawled Sourceforge and Github and Gitorious and other places. But the web is enormous these days and maybe there is just the oh-so-perfect solution that I have missed. If you know it then please email it to me, I’ll be ever so grateful (only Open Source solutions please!).

These three are all good solutions, but at the end of the day, I like OxGarage. I won’t go into too much detail about all of them but a quick top-of-mind whys and why-nots would include:

  • Calibre’s scripts are awesome and extendable if you know Python, however they don’t support MS MathML to ‘real’ MathML conversions. That’s a show stopper for me.
  • On the good side, though, Calibre’s developer community is awesome, and they are heroes in this field and deserve support, so if you are a Python coder or dev shop then, by all means, please pitch in and help them improve their .docx to HTML transforms. The world will be a better place for it.
  • soffice does an ok job but it’s a black box, who knows what magic is inside? It tends to make really complex HTML and it is also really heavy on your poor hardware. I have used it a lot but I’m not that big a fan.
  • OxGarage…well…I love OxGarage, so I really recommend this option…

OxGarage was developed by a European Commission-funded project and then, as is common for these kinds of projects, it dried up and was left on a shelf. Along came Sebastian Rhatz, a guru of file transformation, big Open Source guy, and also a force behind the Text Encoding Initiative. Sebastian is also the head of Academic IT Sevices at Oxford University. The guy has credentials! Also, he’s a terribly nice and helpful guy. He has so much experience in this area I feel the trivialness of my questions about our .docx to HTML woes at PLOS… afraid he might absentmindedly swipe me out of the way like I was an inconsequential little midge.. but he’s such a nice chap, instead he invites midges out to lunch.

So, Sebastian picked up the Java code and added some better conversions. OxGarage is essentially a Java framework that manages multiple different types of conversions. You feed it and are fed from it by a simple web API. It doesn’t have the best error handling, but it does do a good job. The .docx to HTML conversion is multi-step. First, the .docx is converted to TEI – a very rich, complex markup, and then from TEI via XSL to HTML. That means that all you really need to worry about is tweaking the XSL to improve the transformation and that’s not too tricky. It could be argued that the TEI conversion is a redundant step. I think it is. But OxGarage works out of the box and does a pretty good job so we have adopted it for the project I am working on for PLOS, and we are happy with it. We have added some special (Open) Sauce but I’ll get to that later. We are using it and will shoot for more elegant solutions later (and we have designed a framework to make this an easy future path).

If you are looking for Word-to-HTML conversion tools, I recommend OxGarage. Im not saying it’s the optimal way to do things, but it will save you having to build another file conversion system from scratch, and from what I can tell from Sebastian, that would take considerable effort.

HTML to books

The other side of the tracks is the conversion of the HTML you have into a book file format. We live in a rather tangled semantic world when it comes to this part of the toolchain. Firstly, it’s hard to know what a book file format actually is these days… on a normal day, I would say a book file format is a file format that can display a human readable structured narrative. Yikes. That’s not particularly helpful… Let’s just say for now that a book file format is – EPUB, book formatted PDF, HTML, and Mobi.

So, transforming from HTML to HTML sounds pretty easy. It is! The question is really how do you want your book to appear on the web? Make that decision first, and then build it. Since you are starting with HTML this should be rather easy and could be done in any programming language.

The next easiest is EPUB. EPUB contains the content in HTML files stored in a zip file with the .epub suffix. That is also easy to create and, depending on your programming language, there are plenty of libraries to help you do this. So moving on…

Mobi. Ok.. mobi is a proprietary format and rather horrible. It contains some HTML, some DB stuff…  I don’t know…  a bit of bad magic, frogs legs… that kind of thing. My recommendation is to first create your EPUB and then use Calibre’s awesome ebook converter script to create the mobi on the backend. Actually, if you use this strategy, you get all the other Calibre output formats for free, including (groan) .docx if you need it. Honestly, go give those Calibre guys all your love, some dev time, and a bit of cash. They are making our world a whole lot easier.

Ok… the holy grail… people still like paper books, and paper books are printed from PDF. Paper these days is a post-digital artifact. So first you need that awkward sounding book-formatted PDF.

Here there are an array of options and then there is this very exciting world that can open to you if you are willing to live a little on the bleeding edge…. I’m referring to CSS Regions… but let’s come back to that.

First, I want to say I am disappointed that some ‘Open Source’ projects use proprietary code for HTML-to-PDF conversion. That includes Press Books and Wikipedia. Wikipedia is re-tooling their entire book-formatted-PDF conversion process to be based on LaTeX and that is an awesome decision. However, right now they use the proprietary PrinceML as does Press Books. I like both projects, but I get a little disheartened when projects with a shared need don’t put some effort into an Open Source solution for their toolchain.

All book production platforms that produce paper books need an HTML-to-PDF renderer to do the job. If it is closed source then I think it needs to be stated that the project is partially Open Source. I’m a stickler for this kind of stuff but also, I am saddened that adoption of proprietary components stops the effort to develop the Open Source solutions we need, while simultaneously enabling proprietary solutions to gain market dominance – which, if you follow the logic through, traps the effort to develop a competitive Open Source solutions in a vicious circle. I wish that more people would try, like the Wikimedia Foundation is trying, to break that cycle.

The browser as renderer

There is one huge Open Source hero in this game. Jacob Truelson. He created WKHTMLTOPDF when he was a university tutor because he wanted his students to be able to write in HTML and give him nicely formatted PDF for evaluation. So he grabbed a headless Webkit, added some QT magic, some tweaks, and made a command line application that converts HTML to book-formatted PDF. We used it in the early days of FLOSS Manuals and it is still one of the renderer choices in the Booktype file conversion suite (Objavi). It was particularly helpful when we needed to produce books in Farsi which contain right to left text. No HTML to PDF renderer supported this at the time except WKHTMLTOPDF because it was based on a browser engine that had RTL support built in.

Some years later WKHTMLTOPDF was floundering, mainly because Jacob was too busy, and I tried to help create a consortium around the project to find developers and finance. However I didn’t have the skills, and there was little interest. Thankfully the problem solved itself over time, and WKHTMLTOPDF is now a thriving project and very much in demand.

WKHTMLTOPDF really does a lot of cool stuff, but more than this, I firmly believe the approach is the right approach. The application uses a browser to render the PDF…that is a HUGE innovation and Jacob should be recognised for it. What this means is – if you are making your book in HTML in the browser, you have at your fingertips lots of really nice tools like CSS and JavaScript. So, for example, you can style your book with CSS or add javaScript to support the rendering of Math, or use typography JavaScripts to do cool stuff… When you render your book to PDF with a browser, you get all that stuff for free. So your HTML authoring environment and your rendering environment are essentially the same thing…  I can’t tell you how much that idea excites me. It is just crazy! This means that all those nice JavaScripts you used, and all that nice CSS which gave you really good looking content in the browser will give you the same results when rendered to PDF. This is the right way to do it and there is even more goodness to pile on, as this also means that your rendering environment is standards-based and open source…

Awesome. This is the future. And the future is actually even brighter for this approach than I have stated. If you are looking to create dynamic content – let’s say cool little interactive widgets based on the incredible tangle! Library – for ebooks (including web-based HTML) … if you use a browser to render the PDF you can actually render the first display state of the dynamic content in your PDF. So, if you make an interactive widget, in the paper book you will see the ‘frozen’ version, and in the ebook/HTML version you get the dynamic version – without having to change anything. I tested this a long time ago and I am itching to get my teeth into designing content production tools to do this.

So many things to do. You can get an idea how it works by visiting that Tangle link above… try the interactive widgets in the browser, and then just try printing to PDF using the browser… you can see the same interactive widgets you played with also print nicely in a ‘static’ state. That gets the principle across nicely.

So a browser-based renderer is the right approach, and Prince, which is, it must be pointed out, partly owned by Håkon Wium Lie, is trying to be a browser by any other name. It started with HTML and CSS to PDF conversion and now…oo!… they added Javascript… so…are they a browser? No? I think they are actually building a proprietary browser to be used solely as a rendering engine. It just sounds like a really bad idea to me. Why not drop that idea and contribute to an actual open source browser and use that. And those projects that use Prince, why not contribute to an effort to create browser-based renderers for the book world? It’s actually easier than you think. If you don’t want to put your hands into the innards of WebKit, then do some JavaScript and work with CSS Regions (see below).

This brings us to another part of the browser-as-renderer story, but first I think two other projects need calling out for thanks. Reportlab for a long time was one of the only command line book-formatted-PDF rendering solutions. It was proprietary but had a community license. That’s not all good news, but at least they had one foot in the Open Source camp. However, what really made Reportlab useful was Dirk Holtwick’s Pisa project that provided a layer on top of Reportab so you could convert HTML to book-formatted-PDF.

The bleeding edge

So, to the bleeding edge. CSS Regions is the future for browser-based PDF rendering of all kinds. Interestingly Håkon Wium Lie has said, in a very emphatic way, that CSS Regions is bad for the web…perhaps he means bad for the PrinceML business model? I’m not sure, I can only say he seemed to protest a little too much. As a result, Google pulled CSS regions out of Chrome. Argh.

However CSS Regions are supported in Safari, and in some older versions of Chrome and Chromium (which you can still find online if you snoop around). Additionally, Adobe has done some awesome work in this area (they were behind the original implementation of CSS Regions in WebKit – the browser engine that used to be behind Chrome and which is still used by Safari). Adobe built the CSS Regions polyfil – a javaScript that plays the same role as built-in CSS regions.

When CSS regions came online in early 2012, Remko Siemerink and I experimented with CSS Regions at an event at the Sandberg (Amsterdam) for producing book- formatted PDF. I’m really happy to see that one of these experiments is still online (NB this needs to be viewed in a browser supporting CSS Regions).

It was obviously the solution for pagination on the web, and once you can paginate in the browser, you can convert those web pages to PDF pages for printing. This was the step needed for a really flexible browser-based book-formatted-PDF rendering solution. It must be pointed out however, that it’s not just a good solution for books… at BookSprints.net we use CSS Regions to create a nicely formatted and paginated form in the browser to fill out client details. Then we print it out to PDF and send it…

Adobe is on to this stuff. They seem to believe that the browser is the ‘design surface’ of the future. Which seems to be why they are putting so much effort into CSS Regions. Im not a terribly big fan of InDesign and proprietary Adobe strategies and products, but credit where credit is due. Without Adobe CSS Regions ^^^ would just be an idea, and they have done it all under open source licenses (according to Alan Stearns from Adobe, the Microsoft and IE teams also contributed to this quite substantially).

At the time CSS Regions were inaugurated, I was in charge of a small team building Booktype in Berlin, and we followed on from Remko’s work, grabbed CSS Regions, and experimented with a JavaScript book renderer. In late 2012, book.js was born (it was a small team but I was lucky enough to be able to dedicate one of my team, Johannes Wilm, to the task) and it’s a JavaScript that leverages CSS Regions to create paginated content in the browser, complete with a table of contents, headers, footers, left-right margin control, front matter, title pages…etc… we have also experimented with adding contenteditable to the mix so you can create paginated content, tweak it by editing it directly in the browser, and outputting to PDF. It works pretty well and I have used it to produce 40 or 50 books, maybe more. The Fiduswriter team has since forked the code to pagination.js which I haven’t looked at too closely yet as I’m quite happy with the job book.js does.

CSS Regions is the way to go. It means you can see the book in the browser and then print to PDF and get the exact same results. It needs some CSS wizardry to get it right, but when you get it right, it just works. Additionally, you can compile a browser in a headless state and run it on the command line if you want to render the book on the backend.

Wrapping it all up

There is one part of this story left to be told. If you are going to go down this path, I thoroughly recommend you create an architecture that will manage all these conversion processes and which is relatively agnostic to what is coming in and going out. For Booktype, Douglas Bagnall and Luka Frelih built the original Objavi, which is a Python based standalone system that accepts a specially formatted zip file (booki.zip) and outputs whatever format you need. It manages this by an API, and it serves Booktype pretty well. Sourcefabric still maintains it and it has evolved to Objavi 2.

However, I don’t think it’s the optimal approach. There are many things to improve with Objavi, possibly the most important is that EPUB should be the file format accepted, and then after the conversion process takes place EPUB should be returned to the book production platform with the assets wrapped up inside. If you can do this, you have a standards-based format for conversion transactions, and then any project that wants to can use it. More on this in another post. Enough to say that the team at PLOS are building exactly this and adding on some other very interesting things to make ‘configurable pipelines’ that might take format X though an initial conversion, through a clean up process, and then a text mining process, stash all the metadata in the EPUB and return it to the platform. But that’s a story for another day…

Zero to Book in 3 Days

One of the burdens of book creation is the enormous time periods involved. Ask any publisher for a timeline for producing a book and you will be surprised if you get back an answer this side of 12 months. In this day however that timeline is looking increasingly glacial. How can we accelerate book production? How fast could it get? How does three days sound? Enter Book Sprints.sprint_booksThese three books were created in a three-day Book Sprint and output to paper, MOBI and EPUB on the third day.

Book Sprints bring together 4-12 people to work in an intensely collaborative way, going from zero to book in 3-5 days. There is no pre-production and the group is guided by a facilitator from zero-to-published book in the time available. The books produced are made available immediately at the end of the sprint in print (using print-on-demand services) and ebook formats. Books Sprints produce great books and they are a great learning environment and team-building process.

This kind of spectacular efficiency can only occur because of intense collaboration, facilitation and synchronous shared production environments. Forget mailing MS Word files around and recording changes. This is a different process entirely. Think contributors and facilitators, not authors and editors.

There are five main parts of a Book Sprint (thanks to Dr D. Berry and M. Dieter for articulating the following so succinctly):

  • Concept Mapping: development of themes, concepts, ideas, developing ownership, and so on.
  • Structuring: creating chapter headings, dividing the work, scoping the book (in Booktype, for example).
  • Writing: distributing sections/chapters, writing and discussion, but mostly writing (into Booktype, for example).
  • Composition: iterative process of re-structure, checking, discussing, copy editing, and proofing.
  • Publication

The emphasis is on ‘here and now’ production and the facilitator’s role is to manage interpersonal dynamics and production requirements through these phases (illustration and creation of other content types can take place along this timeline and following similar phases).

Since founding the Book Sprints method four years ago, I have refined the methodology greatly and facilitated more than 50 Book Sprints – each wildly different from the other. There have been sprints about software, activism, oil contract transparency, collaboration, workspaces, marketing, training materials, open spending data, notation systems, Internet security, making fonts, OER, art theory and many other topics.

People love participating in Book Sprints, partly because at the end of a fixed time they have been part of something special – making a book – but they are also amazed at the quality of the books made and proud of their achievement. The Book Sprints process releases them from the extended timelines (and burdens of guilt) required to produce single-authored works.

Here are some interestingwrite-upss that provide more detail on the process:

http://techblog.safaribooksonline.com/2012/12/13/0-to-book-in-3-days/

http://google-opensource.blogspot.com/2013/01/google-document-sprint-2012-3-more.html

http://www.booksprints.net/2012/09/everything-you-wanted-to-know/

Originally published on O’Reilly, 28 Jan 2013 http://toc.oreilly.com/2013/01/zero-to-book-in-three-days.html

Paying for Books that don’t Exist (yet)

Kickstarter.com has taken up the concept of crowdfunding with significant success. The premise is simple: an individual defines a project that needs funding, defines rewards for different levels of contribution, and sets a funding goal. If pledges meet the funding goal, the money is collected from pledgers and distributed to the project creator, who uses the funding to make the project. If the project does not reach the funding goal by the deadline, no money is transferred. Most projects aim for between $2,000 and $10,000.

Kickstarter approaches have their issues, but they raise an interesting point – people are prepared to fund a book before it is produced. Or to put it another way and one which covers a wider spectrum of emerging book economics – people are willing to pay for books that don’t yet exist.

A while ago I worked with a Dutch organisation by the name of greenhost.nl. They are a small hosting provider based in Amsterdam with a staff list of about 8. The boss wanted to bring their team to Berlin to make a book about basic internet security so they hired me to facilitate a Book Sprint. We invited some locals to help and organised a venue for four days. In total, about 6 people were in attendance (including myself as facilitator) and we started one Thursday and finished the following Sunday – one day earlier than expected. The book is a great guide to the topic and quite comprehensive – 45,000 words or so written in 4 days with lots of nice illustrations.

The following morning the book went to the printers and then was presented  in print form two days later at the International Press Freedom Day in Amsterdam.

The presentation at International Press Freedom Day was complemented by a PR campaign driven by Greenhost. The attention worked very well as the online version of the book received thousands of visits on the manual within a few hours (slowing our server down considerably at one point) and there was also a lot of very nice international and national (Dutch) press attention. This worked very well for Greenhost as this is the kind of promotional coverage that is otherwise very hard to generate. That makes sponsoring of Book Sprints a very good marketing opportunity for organisations.

Many of the organisations I work with approach Book Sprints with similar ideas in mind. They think about what kind of book their organisation would want to bring into the world, then design a PR strategy around the book. The book is often given away free in electronic form to their target market, maximising the reach and goodwill created.

Of course, this approach does not come without its issues. Organisations that pay to have something produced generally do not like it if the product disagrees with them. Worse is the mindset that this possibility can produce in the producers. Anticipating and avoiding disagreement is in effect a kind of self-policing that can stifle creativity, especially when you are working collaboratively. However, this can be mitigated by hiring a good facilitator.

Lastly, The Long Tale. The long tail was popularised in the age of the net by Chris Anderson. It’s the familiar strategy of selling a large number of books to small niche markets, the idea being that a lot of sales of niche items adds up to a good profit, or as he put it in the title, Why the Future of Business Is Selling Less of More.

However, there is another possible ‘long tale’ market here – instead of seeing a total inventory as having a ‘long tail,’ each book in itself can be customised for resale over a number of smaller markets – one book distributed over several markets, each with its very own version of the book. We have experimented with this a little in FLOSS Manuals – customising the same book for specific markets. Remixing books can be considered to be exactly this strategy but on a very small scale. Many workshop leaders use the remix feature of FLOSS Manuals to generate workbooks with content taken from several existing books. We have also encouraged consultants to take books from FLOSS Manuals, clone them, and customise the book to speak directly to their potential and existing customers. It is a powerful pre- and post- sales device. The long tale here has a market of 1 – the client. This is the very end of the long tale but the return can be lucrative for the consultant that secures a sale or return sale because of their valued-added services powered by customised documentation.

I believe there is a business here – either creating or customising content as a service or providing the tools for people to customise their own content. In either case, we are seeing a broad willingness for organisations and individuals to pay to get the content they want before it is available ‘off-the-shelf’.

Visualising your book

Booki provides an RSS feed for every book. This means you can follow a book and see the edits made. Each RSS feed is linked from the info page. For example, the book about OpenMRS has an info page here  and the RSS is linked from the bottom.

A few weeks ago, we asked for some help creating a visualisation using this source. Pierre Commenge responded and started developing a Processing visualisation of the RSS feed. Processing is a free software used a lot for creating visualisations.

Pierre has a prototype available that runs in a java applet. So this look pretty cool. The live version enables you to play a timeline and see the development of the book over the period of 1 day.

This not only looks cool but it enables you to see how a book is being made. This is extremely interesting – imagine if we had all the data about how every book has been made up until now… it would tell us a lot of things about the book production process and the differences between different models etc… It’s a very exciting idea and we hope to be able to explode this idea in the following weeks and months in our experiments. Many thanks to Pierre for getting this underway.

Process vs Content Templates in Book Sprints

 I have been working with a group of very interesting people over the last 3 days producing a book that can be used for generating campaigns about Internet Literacy. We generated texts on a large and varied range of topics. More on all this later. One very interesting issue that has been more clearly illustrated for me in this process is the necessity to understand the role of templates when generating content. When I talk of templates here I mean pre-configured templates that are meant to illustrate what the final product of a chapter or ‘content unit’ should look like.

I have always avoided using templates because I think it shuts down a lot of creative discourse about what the content could be and it kills those amazing surprises that can leap out of working in a freer manner. Perhaps even more importantly, templates can confuse people – sprint participants need to first just create what they know or are energised by – forcing output immediately into templates is not helpful to this process. However, I can see there is a role for templates, not as structure for the final content but as tools that can help the process of generating content.

In this particular Sprint, we generated a very lightweight template before the sprint. This is something I really dislike doing for the reasons stated above but the fear was (and I think it is justified in this instance but I would want to be careful before advocating its usefulness in other contexts) that we would float too far in conceptual territory without any boundaries. We wanted very much to glue the creative discourse and thinking at the Sprint to defined actionable units (campaigns). So for this purpose after discussion with one of the initiators of the sprint we generated a very light weight template that provoked only 7 points. Really just the ‘who, what, why’ material that campaigns need to address. This was then used as a process template – a template acting as a foundation for the sprinters to define the context of their content – not a template that would become the structure for the final content.

It worked very well – enabling the participants to let their creative energies flow while providing a backdrop or context within which the content needed to rest. The ‘process templates’ also allowed those who think conceptually to ‘build up’ so to speak, and those that thought in more concrete terms could also define their content. It provided a common scaffold for Sprinters to build in the direction that most interests/energises them.

So while it does not change my mind regarding content templates, I think I have discovered a place for very lightweight process templates that can give some kind of framework for the participants to work with, refine, define, and fill.

Improving Dostoyevsky

Largely because of the cheapness of paper and the cultural context arising from this cost, combined with the stardard print production process, we have come to worship the book as a static cultural artifact. It almost seems to us that ‘static-ness’ is a part of a book genetics so much so that many people find it even hard to pick up a pen and write notes in the margin of books. We have forgotten that notes like this (‘marginalia’) were once very common – when paper was hard to come by, sometimes the margin notes were where books were written. There is even a science dedicated to reconstructing manuscripts (‘textual criticism’) which is in part focused on how to construct ‘the text’ from works where the author has commented-on and changed their own works via the marginalia. It is hard to call these alterations ‘comments’ since they are direct interventions by ‘the author’. In the days when margins were used for notes by both readers and writers, it was sometimes difficult for the copyists (the profession that copied books which was common before the printing press) to know which were the author’s additions and which marginalia were ‘by others’. Hence textual criticism is often focused on the arguments surrounding which marginalia should be considered part of the ‘final’ work.

It would make some kind of sense that margin notes might come back into fashion since paper is so cheap that we can easily purchase clean copies of books to replace those ‘contaminated’ by marginalia. However, the choice has been to keep notes in note books, and leave the printed volume unaltered.

There are a few digital projects (notably Commentpress – http://www.futureofthebook.org/commentpress/ –  and some ebook readers) that enable types of margin notes. In the case of Commentpress, these notes are the point of the book – a place to start discourse (almost literally) around the book.

The point is, that now, through projects like Commentpress, we are in a position where we can start to deconstruct the ‘unalterably’ of books. Ironically, we can welcome marginalia again, not because the price of paper is too high that we need to use the margins, or  so low that it doesn’t matter if we use the margins – but because we don’t need paper at all. There is an interesting historical irony at play since we do not need ‘margins’ if we do not need paper. However, we can now feel marginalia is appropriate because it does not alter the source of a book.

It seems we are finding ways to have marginalia that do not contribute to the book but contribute around the margins of the book. Textual criticism in a few hundred years may might be an easy job since the textual critic can just parse the margins notes out of the source. The Foundation of the Long Now might have something to say about this since they advocate that we are living in what will be known as ‘the Digital Dark Age.’ Digital data has a very short lifespan and hence the data for digital-only texts might not exist at all or might only be accessible through forensic means. Still, the point is, we are still not talking about the unalterability of books, and we do not seem to be able to move towards changing the book only working around the outlines. This remains unchallenged, even though we can ‘fork’ books (copy the entire text and work on it leaving the original unaltered) and do with them as we like (especially now that free licenses are becoming more popular). We somehow still cannot bring ourselves to consider changing an existing book. Even harder is to allow ourselves the opportunity to believe that we can improve a book.

Why not? Translation is a way to improve a text. If this was not done then many texts within a single language would hardly be understandable today. Ever try and read some old English? Know what this is?

Oure fadir þat art in heuenes halwid be þi name;
 þi reume or kyngdom come to be. Be þi wille don in herþe as it is dounin heuene.
 yeue to us today oure eche dayes bred.
 And foryeue to us oure dettis þat is oure synnys as we foryeuen to oure dettouris þat is to men þat han synned in us.
 And lede us not into temptacion but delyuere us from euyl.

It is this :

‘Our father which art in heaven, hallowed be thy name.
 Thy kingdom come. Thy will be done in earth as it is in heaven.
 Give us this day our daily bread.
 And forgive us our debts as we forgive our debters.
 And lead us not into temptation, but deliver us from evil.’

The first text is in middle English (which existed in the period between Old and Modern English). In effect the work has been ‘improved’ so we can understand it (not a ‘literary’ improvement as such). Translation like this is a type of re-use. You take the text and transform it into another context. In this example the new context is another time. Translation being what it is, we accept it can always be improved even though sometimes there are ‘authoritative’ star translators – people who have translated a text with such nuance that it is considered hard to improve their translation. The German translations of Dostoyevsky by Svetlana Geier,(subject of the film ‘the Woman with the 5  Elephants’) are almost considered ‘final’ works in themselves. Somehow Svetlana Geier has come to be regarded as some kind of manifestation of Dostoyevsky. Even so, her works are translations and hence it is somehow easier for us to believe we can improve these because they are not the original.

So why not? Why not improve the original? Can’t we take a book, any book, and improve it? Why is that idea so difficult for us to engage with? Why is it easier for us to consider improving a translated work but not OK for us to consider improving the original? Why can we improve the work of Svetlana Geier but we can’t improve Dostoyevsky?.

Before going on – a few seconds to note a great irony here – we have the legal right to improve Dostoyevsky since his works are in the public domain – the copyright has expired so we are legally permitted to do what we like with the works. However we do not have the legal right to improve Svetlana Geier’s translations since they are translated works and as such are considered by copyright law to be original works. Svetlana’s works are still bound by copyright and will not expire for some time. And that, to me, goes to illustrate that ‘free licenses’ have very little to do with free culture..  but that’s another story…

One part of the puzzle involves publishing and authorship of static books building a robust unalterable context for the authoritative version ie the version born from the author. We (you or I) are not that author and so we cannot know the author’s intent with all its nuances. We should not, therefore, meddle with a work because we would be breaking our unspoken contract to preserve the author’s intent. It would not be, even though we have the tools and licensed freedom (in many cases) to change, considered an appropriate thing to do. We do not have the authority to do it. The authority is inherent in the author alone – so much so that the role of the author to the book is analogue to the role of ‘god’ to its creation. The author is the creator.

In William Golding’s Lord of the Flies the children use Piggy’s glasses as a magnifying glass to start a fire. However, Piggy was short-sighted and hence starting fires with his glasses would be impossible as they are concave and concave lenses disperse light. You cannot start a fire with a concave lens. And yet would we allow anyone to alter the book to improve upon what is a rather trivial fact? No. No, because the book is Golding’s world and in Golding’s world, concave lenses start fires. Golding is the creator. He has the authority to change his creation and we do not.

So many layers to unravel. Let’s roll back a little to Book Sprints again – they are interesting here because the books are born from collaboration. There is no single author whose intent we need to imagine and hold dear. The authority is distributed from the outset. However, in my experience, it is still difficult to get people to cross that imaginary threshold and improve a work, even though the invitation is explicit. Many people still ask if they can improve a Book Sprinted work even though the mandate to change a work is obviously being passed by ‘the creators’ to anyone.

In fact, there is no guarantee that collaborative works pass on the mandate to change. Wikipedia is an interesting case in point. Wikis and Wikipedia have managed to introduce ideas of participative knowledge creation, but, as Lawerence Liang (http://vimeo.com/10750350)  has argued, Wikipedia is possibly trying to establish itself as an authoritative knowledge base which also has the effect of revoking the mandate to change as has been experienced by many new contributors that find their edits reversed.

I think we will leave this all behind in time but it’s going to be a long time.

All books can be improved – even the most sacrosanct literary works. This is a good example of the ways that change is often not a result of the possibilities of technology but instead a rsult of the possibilities that have been closed to us through our internalisation of old technology. We have inherited a notion of Immovable Type. The only thing that can change that is the shock of possibility, necessity, or time.

Why repositories are important

Booki is for free books only (at least if you use the installation at www.booki.cc). The idea we are trying to engender is that when you create a book in Booki, you are also contributing to a body of re-usable material that can help others make books. The practice of building re-usable repositories in this way is a well-known concept and it’s extremely powerful. However, it takes time to build a corpus that can actually work in this fashion. You really need a lot of material before re-use like this can start having a real affect. I recently saw the first substantial use of Booki materials like this just last week. It occurred  with the FLOSS Manuals implementation of Booki (http://www.flossmanuals.net) which is a repository for materials about how to use free

I recently saw the first substantial use of Booki materials like this just last week. It occurred  with the FLOSS Manuals implementation of Booki (http://www.flossmanuals.net) which is a repository for materials about how to use free software. Last week we had a Book Sprint on Basic Internet Security and we were able to import about 9 chapters from 3 other manuals totalling approximately 15,000 words that we did not have to create fresh. Of course, the material needed some work to fit the new context, but it was still a substantial time-saver and extended the scope of the book well beyond what we could have produced had we not had the material.

This was really quite amazing for me to see. The idea was imagined from the moment FLOSS Manuals was built but, 3 years later, this was the first real case of substantial re-use. It takes time to build up the materials to make sense of re-use in this way, however, after 3 or so years waiting for the moment, I took a great deal of pleasure in seeing it happen for the first time.

I have been working with a group of very interesting people over the last 3 days producing a book that can be used for generating campaigns about Internet Literacy. We generated texts on a large and varied range of topics. More on all this later. One very interesting issue that has been more clearly illustrated for me in this process is the necessity to understand the role of templates when generating content. When I talk of templates here I mean pre-configured templates that are meant to illustrate what the final product of a chapter or ‘content unit’ should look like.

I have always avoided using templates because I think it shuts down a lot of creative discourse about what the content could be and it kills those amazing surprises that can leap out of working in a freer manner. Perhaps even more importantly, templates can confuse people – Sprint participants need to first just create what they know or are energised by – forcing output immediately into templates is not helpful to this process. However, I can see there is a role for templates, not as structure for the final content but as tools that can help the process of generating content.

In this particular Sprint, we generated a very lightweight template before the Sprint. This is something I really dislike doing for the reasons stated above but the fear was, (and I think it is justified in this instance but I would want to be careful before advocating its usefulness in other contexts,) that we would float too far in conceptual territory without any boundaries. We wanted very much to glue the creative discourse and thinking during the Sprint to defined actionable campaigns. So for this purpose, after discussion with one of the initiators of the Sprint, we generated a very lightweight template that provoked only 7 points. Really just the ‘who, what, why’ material that campaigns need to address. This was then used as a process template – a template acting as a foundation for the Sprinters to define the context of their content – not a template that would become the structure for the final content.

It worked very well – enabling the participants to let their creative energies flow while providing a backdrop or context within which the content needed to rest. The ‘process templates’ also allowed those who think conceptually to ‘build up,’ so to speak, and those that thought in more concrete terms could also define their content. It provided a common scaffold for sprinters to build in the direction that most interests/energises them.

So while it does not change my mind regarding content templates, I think I have discovered a place for very lightweight process templates that can give some kind of framework for the participants to work with, refine, define, and fill.

‘Here-and-now’ Production

Book Sprints are not something that should involve a lot of pre- or post- production. In an earlier post, I have listed some reasons why too much pre-production is potentially harmful. Post-production is not really harmful, in fact, it’s most usually a good thing, however, it’s never a guaranteed thing and that’s the problem. If you want to finish a book in 2-5 days then you must bring the focus to the people ‘in the Sprint’ – the book will be whatever they make it. That includes the text, images, formatting, credits, chapter titles, section titles, cover etc etc etc. In a Sprint, you should never leave a task ‘to be done in post-production’. It both removes the emphasis that everything must be done now by ‘us,’ and post-production, despite goodwill, seldom ever happens. As soon as everyone walks out the door to go home, you have lost 99% of the energy and commitment from the people involved. That’s just how it is.

So do not rely on pre- or post-production. Put the emphasis on ‘here-and-now’ production. If you cannot do it here-and-now with the people in this Sprint then it’s not part of the book… You will be amazed at how good a book can be and how many good decisions get necessarily made because of these circumstances.

 

Why ISBN does not work

ISBN stands for “International Standard Book Number”. It is a 13 digit number that identifies your book. No two ISBN numbers are the same and they usually appear on your book in numeric form and as a bar code. Generally, you buy ISBN numbers and each country manages this slightly differently. Some countries require you to be a publisher before you can order an ISBN. In the USA, I believe, you buy them in blocks of 10, whereas in New Zealand you just apply for them – they give them away.

If you wish to distribute a book through established book channels then you mostly need an ISBN. Book shops such as Barnes & Noble or your local book shop require ISBN so they can track, sell and order stock (books). Most online retailers of any size also require this – Amazon, for example, require an ISBN if you wish to sell through their channels. However, some online channels do not require ISBN – lulu.com for example.

The big problem with ISBN is that you need a new ISBN for every new edition. So if you release a book and then edit it and re-release it you need two ISBN numbers. This can take a long time to order and process and it can be expensive (depending on how you get your ISBN).

This is not the real issue. Admin takes a long time, we are all used to that. But sometimes an administrative system gets built to work for a certain model and when that model changes, then things stop making sense.

ISBN works well in a publishing world where books take years to produce and the products are identifiable as distinct bodies of work. However, in the world of Booki, this is not the typical process. For example, when working with a Book Sprint team, we typically write and release a book in 5 days. You can register the ISBN before the event, no problem. However, quite often after the event we may ‘release’ a new version of the book  – 5, 10, 15 times in one day. Some of these releases may be substantial revisions. This quite clearly does not sit neatly with the slow ISBN process. Even with a more conservative development cycle for a ‘Booki book’ the implication is clear – ISBN expects content to be static, it does not expect books to ‘live’.

Its a real problem for free content and content that exists in an environment where ongoing contributions to the source are encouraged. If you manage a book like this in Booki and you wish to distribute the book through traditional distribution channels, then there is a point where you must ‘freeze’ the content and release the ‘snapshot’. This is not altogether satisfying since then you must either make the book ‘die’ for a time so the printed work and the source remain equal, or you must acknowledge that the paper version is merely a soon-to-be-outdated archive.

Letting content die, or temporarily freezing contributions, can kill a book, which is not a very desirable result considering it often takes a lot of work soliciting ongoing contributions in the first place. The alternative, accepting that the printed book is an archive, is probably not going to make many distributors very happy since you are asking them to sell an out-of-date product (although this is conjecture since I have never tried this).

My answer to this dilemma is to actually walk away from traditional distribution channels. Free content should travel freely across media and in front of the eyes (and ears in the case of audio books) of whoever wants it and in whatever form they want it. Let the content go, don’t constrain it to these traditional channels.

Typically these channels are pursued however for ‘legacy’ reasons. Some you can’t escape – if you are an academic you live off ISBN and the education system will be slow to change that. However, if it’s a business model you are after, then don’t make the mistake of thinking that selling books is the only way to go… new models are emerging – get people to pay you to write the content, for example. One such successful example of this is the Rural Design Collective who successfully raised $2000 (US) via crowdsourcing on Kickstarter.)

So there are alternatives. ISBN is blocking the way, but it’s probably about time to start believing there are better ways….