A Trip Down Memory Bonkersville

Um yeah… for some reason I’ve been led down memory lane and re-discovering just how crazy some of my past projects have been… for example the Bookimobile…

So, Booki was the predecessor to Booktype, a software I founded and brought to Sourcefabric in Berlin to further develop and market. It was an online book production software built on the learnings of a similar software I cut and paste together for FLOSS Manuals. The Bookimobile…well, was a complete book production suite bundled into a VW T4 van. I set this up when I lived in Berlin.

bookimobile_taken_by_adamhyde_in-barcelona_at_drumbeat_licence_cc-0

There are some crazy stories related to this. First, the Bookimobile is based on Brewster Khales/ Archive.orgs Bookmobile… except it was called the Bookimobile after the booki software. Second, I have to borrow 2 grand (euro) from my buddy Micz cause I didn’t have the readies …the crazy part of this was that it took me about 2 years to pay back Micz. He never asked for the money and I was as poor as a church mouse. When I was offered a job at PLOS, I finally had some $ and the next time I went back to Berlin, Micz and I went out and got hideously drunk, on margaritas of course (my fav poison). I had to fly out early, so I think we literally drank until I had to leave for the airport. What happened in the early hours before that and neither of us clearly remembered before we saw some evidence (blurry photos at a cash machine in Neukoln) is that I withdrew 2000 euro from a cash machine and stuffed it all into Micz’s pockets. He woke up fully clothed the next morning with his lovely partner Laura standing over him. He was still drunk and hung over and started pulling hundreds and hundreds of Euro out of his pockets wondering what on earth happened the night before (as did Laura). We later found some photos of Micz holding up all this money near the cash machine, money flowing out of his hands and a look of glee on his face.

Last mad story is that my partner at the time (Laleh) and I drove around France and Spain visiting festivals with the Bookimobile and printing out books for people. And what do you know…this video I found today of an interview of me while showing off the Bookimobile at the first mozfest in Barcelona…

The Road to PubSweet 1.0

We are pretty close to our PubSweet 1.0 with the RFC now out for PubSweet 2.0, and a PubSweet dev site release next week.

pubsweet

It has been an amazing effort, particularly by Jure Triglav, the lead dev for PubSweet at Coko, but also fantastic work from Richard Smith-Unna, Alf Eaton, Yannis Barlas, and Christos Kokosias. Also more recently some great contribution from Alex Georgantas.

9

So, we are pretty much there and I’m presenting in San Francisco this week as part of a small Coko event to reflect on the future of the framework and discuss the RFC. For this purpose I’d thought I’d write a post to help me think through the thinking that got us here.

So…the thinking behind PubSweet started when I came back from Antarctica around 2007 or so (I was there setting up an autonomous base for artist-scientist collaborations).

5jan2007-2

I decided I wanted to give up the art world and try something new. The something new turned out to be FLOSS Manuals – a community writing free manuals about free software. I started it when I was living in Amsterdam somewhere around 2007. In order to execute on this mission I needed to get a couple of things sorted. Namely, learn how to build community, work out processes for rapid book production, and work out the tooling.

The tooling started with me scratching around with TWiki. A wiki written in Perl that happened to have the best plugins for rendering PDF. I scratched around, writing some Perl and cutting and pasting a whole lot more, and added some crazy .htaccess URL rewriting to produce a basic system for producing books. It was pretty scratchy but it actually worked. Later a buddy helped extend the system and later still I was able to pay him and others to extend it.

At the time it pretty much comprised a page (per book) for creating a table of contents.

developed_blog_chapterlist

and an interface to edit the content (chapters). I ripped out the native wiki markup editor and replaced it with a WYSIWYG editor, I think it was TinyMCE…

developed_fm_farsi

As you can see Right-to-Left content (in this case, Farsi) was also supported. There were also some basic things in place for keeping track of the status of a chapter, the version number, side by side diffs, side by side translation interfaces, and, later, dynamic table of contents organisation and edit locks.

Coupled with some basic PDF rendering stuff and a way to push the content from the ‘draft’ to the publishing front end and we were away.

flossmanuals-1

It actually had some other pretty cool stuff, such as side by side translation interfaces…

flossmanuals-howtotranslate-sideby-en

Remix

..a built in live chat for talking with collaborators…

booksprints-remotecontrib-chat-en

and even a way to send books between different instances (eg for sending a book from the FLOSS Manuals French community site to FLOSS Manuals Finnish for translation)….

flossmanuals-xchange-xchange_1-en
We could even render book formatted PDF and push to the lulu.com print on demand services. I just now checked and some of the books are still there!

Not bad for a Perl-based system, built on top of a wiki that wasn’t supposed to do this kind of thing, and built very with very few resources. The TWiki extensions were contributed back upstream to the TWiki repo and it was all open source but it was pretty hard to rebuild and no one I knew actually had a similar use case.

After this, I embarked on a journey to replace the system with a custom built solution specifically for book production. I can’t remember exactly when this started, maybe 2008 or 2009 or something. It was originally called Booki…

b

…which later became Booktype. Booki (and later Booktype) replaced the FLOSS Manuals tooling, although you can still see the working old tool here. That ole Perl code is still functional with no maintenance after 10 years, I can hardly believe it. The docs on how to use it also still exist.

Booki was built with Django (python) and pretty much had all the same stuff. Although the look and feel was changed quite a bit in the transition. There aren’t too many images around of Booki although I did find these screenshots of Booki taken by someone using it on the OLPC XO! (FLOSS Manuals did all the docs for OLPC/Sugar OS etc).

readingandsugar-booki-en

readingandsugar-booki5-en

It was hard to get financial support for it. Internet Archive gave us $25,000 at the time which seemed like a fortune. The evolution of Booki to Booktype represented me taking the project to a buddy’s in Berlin (I was living there at the time) based org (Sourcefabric) and parking it there so I could get more resources to build it out.

Booki/Booktype pretty much had, and has, the same stuff as the FLOSS Manuals system, just purpose built. So it had, a table of contents manager

images-cms-image-000003634

And a book (chapter) editor…

booktype_version_2_editor_screenshot

…chat…

booktype_2-0_edit_communication_tabs_chat

And the other stuff. Perhaps the only new features (compared to the FLOSS Manuals system) were a dashboard…

booktype_1-6-1_my_dashboard_740x429_1

…groups…

all_groups

and an interesting way to have Twitter-like messaging to pass snippets from chapters to other users.

figure-7_reference

Before I left Sourcefabric I wanted to get some other innovations built but didn’t get there. I did build some prototypes though. There was a task editor…

booktype-1024x531

…and live in-browser book design…

developed_booktype4-1024x768

Booktype is still going strong, now it is its own company (based in Berlin) and they also run the Omnibook commercial service using the software.

I left because John Chodacki and Kristen Ratan from PLoS invited me to come work for PLoS to design a new web-based journal submission system. I agreed…

But, before I leave the book story behind for a bit..I had set up Book Sprints as a company and put a small amount of my own money into building two new book production systems somewhere between leaving Sourcefabric and starting at PLoS. These two systems were PHP-based and Juan Gutierrez built them over some months.

juan

I wanted to do this because I was a little frustrated by Booktype not moving forward and also the platform was becoming more difficult to use. We were using it for Book Sprints but after I left the product took a new UI direction and I was finding Book Sprints participants were not enjoying using the system. So I built a Book Sprints specific system called… PubSweet… the namesake of the current Coko system which has eventually turned into something of a prototype for the new PubSweet… this new system was a lot simpler and easier to use than Booktype. It was initially meant to be modular but I think we lost that somewhere along the way. Cleanly modular systems take a lot of extra effort and time to produce so we gave in for speed of development’s sake.

The old PubSweet had a dashboard….

screenshot-from-2017-07-15-18-37-13

..table of contents manager…

ps1

and editor. Just like before!

screenshot-from-2017-07-15-18-40-08

We also introduced some new innovations including visualisations of the book production process…

developed_seo

Plus annotation (using Nick Stennings annotator software)…

screenshot-from-2017-07-15-18-48-01

 

and other stuff…I think threaded discussions, outline views, review page, an in-browser book renderer, book stats and I can’t remember what.

Anyway …I also built a platform on top of this old PubSweet for the United Nations Development Project. It was called Lexicon. Lexicon was pretty interesting as it opened my mind for the first time to the idea that an editor is not an editor is not an editor. Different content types (in a book) may require different editors or production environments.

Lexicon was produced to collaboratively produce a tri-lingual (Arabic, French, English) lexicon of electoral terms for distribution in Arabic regions.

screenshot-from-2017-07-15-19-09-11

Lexicon had all the same stuff as the old PubSweet but with one major innovation, you could create chapters that were WYSIWYG based, or you could create a chapter which enabled you to add and sort individual terms and provide translations.

screenshot-from-2017-07-15-19-09-34

It was a pretty interesting idea and we were able to make a really cool book which the UNDP printed and distributed across many Arabic-speaking countries. I still have the book on my bookshelf.

The other interesting thing was that the total cost for building this on top of the old PubSweet was $10,000 USD. This was mostly because we could leverage all the existing stuff and just build the difference…interesting idea!

Ok, so then I dropped book production systems around 2013 or so for a while and went to work for PLoS on a system that was called Tahi and then became Aperta. The name Tahi came from the name of the street I was living on in New Zealand before I had a US work visa and was designing the system – Reotahi Road (cool road). Reotahi means ‘one voice’ and ‘tahi’ means ‘one’ in Maori. It was built on Rails with Ember. Essentially the front end and backend were decoupled although it was really pushing the technology at the time to do this. I designed the system and moved to San Francisco to manage the team to build it.

Tahi (Aperta) had a dashboard (surprise!) and editor, just like the book production systems but I introduced two major innovations – Cards, and card-based workflow management interfaces. Unfortunately, while I was asked to come and build an open source system, things went a little weird at PLoS and they closed the repos, effectively making it a closed platform. So I quit. That also means I don’t have any screenshots to show you. Pity. If you sign an NDA with PLoS I believe they might show it to you.

However, you can picture it a little – imagine something like Trello, or Wekan – these are card based kanban systems. But imagine if you could custom make cards to do anything. Effectively cards were first class citizens of the platform and could access the db, perform system operations, make external calls, do validations, whatever you wanted. In hindsight, I think they were as close to an idea of an ‘app’ that you could have in a browser platform, although that wasn’t the way I thought about them at the time. Additionally, cards were imported into the system since each card was actually a gem file. This meant any publisher could custom make their own cards to do whatever they wanted and place them within the kanban-like workflow space (task manager). Pretty neat.

So, cards could be surfaced and used anywhere in the system. We used them for authors to enter submission data, but also for production staff to perform operations, for reviewers etc etc etc. They could also be placed on a kanban board to make a workflow. Cards could be moved around the workflow and deleted or new ones added at any time.

To manage all this my other idea was to let these cards flow through a TweetDeck-like interface. So you could sort cards, per role, per user, at volume.

Tahi essentially had four spaces – a dashboard, a submission page (which displayed the manuscript in an editor, and submission data could be entered through cards), a task manager (workflow for the article, using cards as tasks), and a ‘flow manager’ (the TweetDeck-like interface for sorting all your cards across all your articles). While the FLOSS Manuals, Booki and Booktype platforms were pretty much monolithic systems, the old PubSweet was sort of modular. However, Tahi did decouple the front end and back end but I wanted to also break these four spaces into discreet components. That would have given the system enormous flexibility but unfortunately I wasn’t able to do this before I left.

Anyways, Tahi/Aperta is a little old now but it was pretty cool. I don’t know what happened to Aperta but I believe it is now being used for PLoS Biology.

After I left PLoS I was offered a Fellowship by the Shuttleworth Foundation to continue on the mission to reform publishing. So I started Coko with Kristen Ratan (who was the publisher at PLoS)….

img_9224_792x445

So there are some themes from building the past 7 or 8 publishing systems (depending on how you count it… there were also some other interesting experiments in between). First, the next system you build is always better. That is for sure. It’s an important thing to realise because when I developed the FLOSS Manuals system I thought that was it. Nothing could be better! But I was wrong. Then Booki/Booktype and I felt the same thing. I was so proud of it and nothing could be better! haha… you get the picture. The reason why it’s important to understand this is because I think it gives someone like me a bit of freedom. I can take some risks with systems knowing you get some stuff right, you get some stuff wrong. But the next system will get that bit you got wrong, right. Taking this attitude also takes the pressure off and you can have more fun which is good for your health, the team you are working with, and the system.

As far as technical lessons learned… well… after looking back at all these systems when we started Coko, I realised that the idea of independent ‘spaces’ for publishing workflows had a heap of currency. How many systems did I have to build with baked in dashboards, task managers, editors, table of content managers, etc etc etc before I could realise it doesn’t make sense to do this over and over. I wanted to take the idea of these kinds of spaces forward and not have to build them again and again… so some kind of system where you could include whatever spaces/components you wanted would be ideal… This would have two very important side benefits:

  1. I could learn so much because if the next system you build is always better, what about a framework that would allow you to easily build a whole lot of systems at once! Or build a lot quickly over a short amount of time… just imagine how much you could learn…
  2. It would open the door for others to innovate. I have since given up the idea that my system (so to speak) was the best ever and no one could top it. That’s just the testosterone talking. I’m kinda over it (sorta). I want other people to be able to make better stuff than what I have produced so far, to bring in innovations I never thought of. I want to make that easy for them and now I understand a whole lot better how publishing workflows actually work I’m in a very good position to do that.

That was a lot of the thinking behind the new PubSweet – PubSweet 1.0. But there is some other stuff too. Through my time at PLoS, I came to understand just how many variables affect workflow choices in journal publishing and that each publisher has slightly different conditions and roles that affect this. That means that the access control is complex. We might think there are various roles – author, editor, reviewer etc that shepherd an article through a process but it’s not that simple. Any number of conditions can affect who gets to see or do what and when. So we need to have a very sophisticated way to set and manage this.

There was a lot of other stuff to take into account to but I mention these two specifically because recently when I was talking to Jure (lead PubSweet dev) about PubSweet 1.0 and reflecting on how far we came he nailed it, he identified the two major innovations of the system being:

  1. reusable/sharable components (spaces)
  2. attribute-based access control

I agree entirely. I think I might add another:

  • developer experience

It is pretty easy, and getting easier, for developers to develop publishing platforms/workflows (call them what you will) with PubSweet. I think it is pretty astonishing and I think these 3 characteristics put together enable us to build multiple publishing systems fast and in parallel (with small teams) as well opening the door for other to do the same and huge opportunities for innovation.

If we are successful at building community this will be a huge contribution to the publishing sector.

In a future post, I’ll break PubSweet spaces / components down in more detail. There were also a lot of other similar stories regarding technical innovations on the way (eg Objavi->iHat->INK), but I’ll break them down into posts on another day.

I meant to also talk about Editoria here, the monograph production system built on top of PubSweet, and xpub – the PubSweet-based Journal system.

2016-06-28_15-41-37

They are both pretty amazing and leverage so much more than the previous systems identified above.

Login page for our first Journal platform.

I think the main thing with them is that we are working extremely closely with publishers using the method I developed – the Cabbage Tree Method.

Editoria Design Session

This means that I am no longer involved in building, what I would call, naive publishing systems. Naive in the sense that publishers could use, for example, Booktype, but it’s not really built for publishers. It’s a general book production system built by someone who didn’t know much about publishing at the time. That’s great of course, there is a place for it. However, Editoria is not a naive system. It is designed by publishers for publishers and the difference is enormous.

But I will leave a longer rant about this for another post.

I do however, want to say that I didn’t, of course, build any of the above systems by myself. There were many people involved and I have credited them elsewhere in this blog. I’m not going to do another roll call here except for Jure Triglav.

Jure and I sat down just over 18 months ago to discuss some of the lessons I learned as explained above. We jammed it out over post-its, whiteboards, coffee, and food in Slovenia and you can read a little more about that process in the PubSweet 2.0 RFC. But Jure trusted me, and I trusted him, and he took these ideas and, with a small team in very good speed, made them a reality. As a result, I think PubSweet is an exciting system and will only get better. Congratulations Jure, you deserve special thanks and recognition for the absolutely amazing job you have done.

dsc00516-1

The Old Days

img_3814
First Book Sprint using Booki, Berlin, 2010

Wow…I was browsing some old archives to update this new version of my site. I found the most incredible stuff in the Internet Archives Wayback Machine including the outline of a description of Booki (2010) many years before it became Booktype. Amazing! I didn’t think I had the product manager in me but it seems once upon a time I was really focused on this kind of acute detail for product management. I had forgotten!

Forgive the long post, it’s just pure indulgent nostalgia for me. In any case, here is one of the emails I found really fascinating, from back in 2010, talking about features for Booki and Objavi (book renderer). This has been taken from the zip of a public list we used for dev at the time:
https://web.archive.org/web/20111029143503/http://lists.flossmanuals.net/pipermail/booki-dev-flossmanuals.net/

I’m so astonished how much of my thinking recorded in this email carries through to the way we are approaching product development for Coko now. The statement:

You might have noticed that I prefer to take the easy road for features, leaving as much open as possible, and then refine according to use. That is because,from experience, I have learned that when designing software it is better to be led by the user rather than force them into an imagined work flow.

Might as well be out of the Collaborative Product Design manifesto.

I’ts kind of incredible. The email documents so much of how we were thinking at the time, including using HTML and CSS to create paginated books using browser engines:

* Objavi utilises Webkit for PDF generation. Later Gecko will be added.

…and later in the product description…

3.2.2 CSS Book Design
Status: High Priority, Implemented
Function: The default PDF rendering engine for Booki is now Webkit and will eventually be Mozilla Firefox hence there is full CSS support for creating book formatted PDF in Booki. This changes the language of design from Indesign to CSS - which means any web native can control the design of the book.

Pretty interesting, if only to me! Anyway, the email is below, it documents some features we built on commission for Source Fabric before they eventually took over the project. Thank you for indulging me 🙂

From adam at flossmanuals.net Wed Jul 28 09:11:21 2010
From: adam at flossmanuals.net (adam hyde)
Date: Wed, 28 Jul 2010 18:11:21 +0200
Subject: [Booki-dev] notes to meeting
Message-ID: <1280333481.1582.143.camel@esetera>

hi Frank,

It was good to meet you and I'm glad Source Fabric is considering working with us and you to develop features they and we need (Aco is also keen for this). 

I have sent this email to the dev list and to you and Micz. It might be good for you both to consider joining the list.
http://lists.flossmanuals.net/listinfo.cgi/booki-dev-flossmanuals.net

Below the content of this email is a very basic requirements doc. It does not outline the notes tab, so I thought I would make some notes here for your (and Micz's) consideration should Source Fabric decide they wish to commission all or part of this development. 

In essence, I think that the notes tab could nest the following:
1. To do list
2. Book notes
3. Style guide

These could be hidden via a dropdown or accordian style interface. Our plan is to keep everything as simple as possible so I would imagine a page with three headings and clicking on each reveals the information behind it.

Some ideas:
1. To do list
The basic form could be a Jquery to do as we looked at today:
http://demo.tutorialzine.com/2010/03/ajax-todo-list-jquery-php-mysql-css/demo.php

If this is the format, it would be good enough as it is. The good news is that this is done using Jquery so I imagine this is a very easy implementation. What you would need to work out, however, is how Aco implements the dynamic updates so that when a to do is altered everyone has that info updated.

If there was room to take this development a step further, I would recommend considering adding the following fields:
* assigned to
* due date
* priority

I am not married to those ideas though as I think we need to insure that the interface does not have too many things going on. So I would actually recommend we start with the basic implementation and move on. When users have tried it then we can consider extending it with these items.

2. Book Notes
Something like etherpad would be good but too complex (see.
http://piratepad.net/ )
I would suggest considering either a) the same interface as we have now in the notes pad except with a very very simple WYSIWYG or b) a threaded comment system. I think the best would again be to do the easiest and simplest - what we have now with a WYSIWYG interface (and no need to press 'save'). Then when users use it we extend according to demand for most-needed features. 

3. Style Guide
This is pretty much the same as (2) except it would be used for storing the Style Guide. A style guide is optional but many people request it in FLOSS Manuals and some go out of their way to create one so I think this would be a very good feature to anticipate based on our user experience so far.


I think all of the 3 above are simple and I think Source Fabric's working process (especially for the forthcoming Sprints) would benefit a lot from them.

You might have noticed that I prefer to take the easy road for features, leaving as much open as possible, and then refine according to use. That is because from experience I have learned that when designing software it is better to be led by the user rather than force them into an imagined workflow.

It has worked well for us so far - everything you now see in Booki is pretty much that way because we have tried similar ideas in FLOSS Manuals and seen their effect. I would prefer to continue to work this way with Booki. 

So...there was one more feature we discussed - Chapter Level notes. I think this would be extremely useful for Source Fabric (but Micz needs to comment on this) but we need to be careful that we get it right because it is not so obvious how this might work. 

I think the notes have to be associated with the chapter page when you edit it - however there is very little space there. One possibility is to build this into the WYSIWYG editor - Xinha - as a 'notes server' or some such. ie. it opens from the WYSIWYG editor but stores the content (chapter notes) in the booki db. The risk here is that people will not know that the notes are there...so we need to consider this. Another possibility is to build this into a 'sliding tab' as Micz suggested. I think that might be ok but it would have to be done carefully as it might look too much like a gimmick.

The other issue with chapter level notes is that I strongly believe that an overview of all chapter notes for a book should be able to be seen somewhere, in one place. Otherwise it would mean checking each chapter which would be a tedious job (books easily have 30+ chapters). So if you consider Chapter notes then you must also consider how to do this. 

So on this I am not so clear what would work well for Chapter level notes and because of this I think it's not such a good feature for our first adventure working together. I would recommend instead the first three to be done all together - however this is up to Micz.

My feeling is that the first 3 are an extremely quick development, first however you need to know how it all fits together so i would suggest emailing this list when you have questions and I am sure Aco will answer your questions...

Also, Aco is currently working on the Booki site update so I expect the GIT repo is not updated but will be within the next days once the booki www is updated....

also you should meet Doug - doug is on this list and he is the Objavi (PDF generator) developer....doug - frank, frank - doug

also, meet John who does the Booki manual and other essential tasks intro intro :)


:)

adam






1 INTRODUCTION

1.1 Description
Booki is designed to help you produce books, either by yourself or collaboratively. A book in this context is a "comprehensive text" which can be output to book-formatted PDF (for book production), epub, odt, screen readable PDF, templated HTML and other formats.

Booki supports the rapid development of content. Booki has tools to support the development of content in 'Book Sprints'. Book Sprints are intensive collaborative events where collaborators in real and remote space focus on writing a book together in 3-5 days. 

While you can use Booki to support very traditional book production processes, the feature set matches the rapid pace of publishing possible in the era of print on demand and electronic readers. Booki can output content immediately to multiple electronic formats. Print ready source (book formatted PDF) can be immediately generated, and then uploaded to your favorite Print on Demand (PoD) service, taken to a local printer, or delivered to a publisher.

1.2 Purpose
Booki embraces social and collaborative networked environments as the new production spaces for comprehensive (book) content. 
 
1.3 Scope
Booki is available online as a networked service (http://www.booki.cc) for free. This service is a production tool for the creation of free content and not a publishing/hosting service. Content produced within Booki.cc is intended to be published elsewhere, either under another domain, in paper form (ie. books), distributed in electronic formats, or re-used in other content. 

Booki can be installed by anyone wishing to utilise this software under their own domain or within private or local networks. 
 
 
2 OVERALL DESCRIPTION

2.1 Product Perspective
Booki takes what was learned from building the FLOSS Manuals tool set and posits these lessons within a more suitable architecture. 

Booki is the name of the collaborative production environment, however there are 2 associated softwares that provide all the services required :
Booki - production environment
Objavi - import and export engine
This document refers to Booki 1.5 and Objavi 2.2

2.2 Booki Functions
* User account creation requiring minimal information
* One click book creation
* Drag and drop Table of Contents creation
* One click editing of chapters
* Chapter level locks
* Live chat on a book and group level
* Live book status reports (editing, saving, chapter creation) delivered
to the chat window
* Drop down chapter status markers
* One click to join a group
* One click to add a book to a group
* One click exporting to epub, screen pdf, book formatted pdf, odt, html with default templates
* Easily accessible advanced styling options for export (CSS controlled)
* User profile control (status, image, bio)
* One click group creation
* Easy importing of book content from Archive.org, Mediawiki, other Booki installations
* Option to upload content to Archive.org
 

2.3 User Characteristics
2.3.1 Contributor
The majority of users will be contributors to an existing project. They may contribute to one or more project and may produce text and/or images, provide feedback or encouragement, proof, spell check, or edit content. These are the primary users and the tool set should first meet their needs.

2.3.2 Maintainer
These are advanced users that create their own books or have been elevated to maintainer status for a book by group admins. Maintainers have associated administrative tools for the books they maintain which are not available to other users.

2.3.3 Group admin
These are advanced users that wish to establish and administrate their own group. They have maintenance tools for every book in their group plus additional group admin tools.

2.4 Operating Environment
Booki is designed primarily for standards-based Open Source browser comparability but is tested against other browsers. 
 
2.5 General Constraints
* Booki and Objavi are Python-based.
* Booki is built with the (bare) Django framework.
* Booki uses Jquery for dynamic user interface elements. 
* Booki uses Postgres as the database but sqlite3 can also be used
* Redis is used by Booki for persistent data storage to mediate dynamic data delivery to the user interface
* Objavi utilises Webkit for PDF generation. Later Gecko will be added. 
* Rendering of .odt by Objavi requires OpenOffice to be installed with unoconv. 
* The Booki Web/IRC gateway may eventually (and optionally) require a dedicated standalone IRC service hosted on domain. 
* Content editing in Booki is done by default with the Xinha WYSIWYG editor
* XHTML is the file format for content. 
* Content will be ultimately be stored in GIT. 
* Localisation in Booki is managed with Portable Object files (.po).
* The code repository for both projects is GIT with a dedicated Trac for bug reporting and milestone tracking :
http://booki-dev.flossmanuals.net 
* A Dev mailing list is maintained here:
http://lists.flossmanuals.net/listinfo.cgi/booki-dev-flossmanuals.net 
* Developers can be reached in IRC (freenode, #flossmanuals)
* Each release will be as source. Beta and later releases will also be available as Debian .deb packages. 
* User and API Documentation will be maintained in the FLOSS Manuals
Booki Group. 
* For development we use Apache2 for http delivery
* The license is GPL2+ for all softwares

2.5 User Documentation
Maintained here : http://www.booki.cc/booki-user-guide/


3 SYSTEM FEATURES

3.1 Booki Features

3.1.1 Booki-zip (Internal File Format)
Status: High Priority, Implemented
Function: A Booki-specific file structure for describing books 
Interface: Used for internal data exchange between Booki and Objavi. 
Notes: booki-zip definition maintained here :
http://booki-dev.flossmanuals.net/git?p=objavi2.git;a=blob_plain;f=htdocs/booki-zip-standard.txt

3.1.2 Account Creation
Status: High Priority, Partially Implemented
Function: Quick access to a registration form from the front page for account creation 
Interface: Requires only username, password, email and real name (required for attribution). Email is sent to the user with autogenerated link for verification
Notes: email confirmation mechanism missing

3.1.3 Sign in
Status: High Priority, Implemented
Function: Quick access to a sign-in form from the front page 
Interface: Username and Password form and submit button. Username and
pass remembered. 

3.1.4 Profile Control
Status: Medium Priority, Implemented
Function: When logged in the user can access a profile settings page to set personal details (email, name, bio, image). Personal details can be browsed by other users
Interface: "My Settings" link in user-specific menu on left gives access to a form for changing the details.

3.1.5 Book Creation
Status: High Priority, Implemented
Function: Users can create a book from their homepage ("My Profile").
Interface: User can click on "My Profile" link from the user-specific menu on the left. On the Profile page a text field for the name of the book, and a license drop down menu (license *must* be set) is presented.
Clicking on "Create" adds the empty book with edit button to the listing of the users books on the same page.

3.1.6 Archive.org Book Import
Status: Medium Priority, Implemented
Function: Users can import books from Archive.org
Interface: "My Books" link in the user-specific menu on the left presents the user with a field for inputting the ID of any book from
Archive.org. The book is then imported when the user clicks "Import".
Notes : Interface is through Booki but Objavi does the importing and returns Booki zip to Booki. Relies on Archive.org successfully delivering epub for each book but this is not always happening. Needs error catching and user friendly progress/error messages.

3.1.7 Wikibooks Book Import
Status: Medium Priority, Implemented
Function: Users can import books from Wikibooks
(http://en.wikibooks.org)
Interface: "My Books" link in the user-specific menu on the left presents the user with a field for inputting the URL of any book from Wikibooks. The book is then imported when the user clicks "Import".
Notes : Interface is through Booki but Objavi does the importing and returns Booki zip to Booki. Needs thorough testing as it is sometimes failing possibly due to time-outs. Needs error catching and user friendly progress/error messages. Should be extended to be a "mediawiki import" tool, not just for Wikibooks.

3.1.8 Epub Book Import
Status: Medium Priority, Implemented
Function: Users can import any epub available online
Interface: "My Books" link in the user-specific menu on the left presents the user with a field for inputting the URL of any epub. The book is then imported when the user clicks "Import".
Notes : Interface is through Booki but Objavi does the importing and returns Booki zip to Booki. Needs thorough testing as it is sometimes failing possibly due to time-outs. Needs error catching and user friendly progress/error messages.

3.1.9 Group Creation
Status: High Priority, Implemented
Function: Users can create groups. 
Interface: "My Groups" link in the user-specific menu on the left presents user with 2 text fields - group name, and description. When a name for a group is entered and "Create" is clicked then the group is created.
Notes: Group admin features missing.

3.1.10 Joining Groups
Status: High Priority, Implemented
Function: Users can join groups with one click.
Interface: "Groups" link in the general menu on the left presents a list of all Groups, by clicking on link the user is transported to the homepage for that group. At the bottom of the page the user can click "Join this group" and they are subscribed.

3.1.11 Adding Books to Groups
Status: High Priority, Implemented
Function: Users can add their own books to groups they belong to.
Interface: While on a Group page that the user is subscribed to the user can add their own books to the group. 
Notes: When Group Admin features are in place we will change this so that Group Admins set who can and cannot add books to groups. At present a book can only belong to one group.

3.1.12 Readable Book Display
Status: High Priority, Implemented
Function: Users can read stable content in Booki without the need to log-in.
Interface: Upon clicking on the "Books" link in the general menu on the left a page listing all books is presented. Clicking on any of these presents a basic readable version of the stable content. Alternatively users can link to a book on the url http://[booki install domain]/[book name]

3.1.13 Edit Page
Status: High Priority, Implemented
Function: Page for editing content.
Interface: The edit page is accessed by clicking on "edit" next to the name of a book in "My Books" or "Books" (all books) listings. The user is then presented with a page with tabs for : editing, notes, exporting, history

3.1.14 Edit Tab
Status: High Priority, Implemented
Function: Edit interface for chapters.
Interface: Clicking ?edit? on a chapter title will open the Xinha WYSIWYG editor with the content in place. 

3.1.15 Notes Tab
Status: High Priority, Implemented
Function: A place for contributors to keep notes on the development of the book
Interface: User clicks on the Notes tab for a book and is presented with a shared notepad for recording issues or discussing the development.
Notes : Implemented but future direction TBD 

3.1.16 History Tab
Status: High Priority, Implemented
Function: Shows edit history of the book
Interface: User clicks on the history tab and can see the edit history with edit notes. 
Notes: Implemented but unreadable. Users should also be able to access diffs here.

3.1.17 Export Tab
Status: High Priority, Implemented
Function: Export content to various formats
Interface: User clicks on the Export tab and is presented with a form for choosing export options. Clicking "Export" returns the desired output for download. 

3.1.18 Version Tab
Status: High priority, Not Implemented
Function: can easily freeze content at stable stages while work continues on the unstable version.
Interface: From the Edit Page a maintainer sees an extra tab "Version".
>From here a maintainer can click "create stable version" - the last stable version is archived recorded and the current version becomes the new stable version. 

3.1.19 Subscribe to edit notifications
Status: High Priority, Not Implemented
Function: Users can subscribe to edit notifications
Interface: User clicks "Subscribe to edit notifications" from the Edit Page for a book. If there are edits made a synopsis is emailed with basic edit information (time, chapter, person who made the change, version numbers) and a link to the diff.

3.1.20 Chat
Status: High priority, Implemented
Function: A real time chat (web / IRC gateway).
Interface: Persistent on the edit page for any book. 

3.1.21 Localisation
Status: High priority, Not Implemented
Function: Booki needs to be available in any language where it is needed. Hence we may integrate the Pootle code base into Booki to enable localisation of the environment.
Interface: TBD

3.1.22 Translation
Status: High priority, Not Implemented
Function: Content can be forked and marked for translation. A
translation version of a book will provide link backs to the original
material, be placed in a translation work flow, and edited in a
side-by-side view where the translator can also see the original
source. 
Interface: TBD 

3.1.23 Copyright Tracking (Attribution)
Status: High Priority, Implemented 
Function: Any user contributions are recorded and attributed.
Interface: All attributions are automated in Booki. Book level attribution is output in any chapter that contains the string "##AUTHORS##"
Note: should be a syntax for producing Attribution notes on a per-chapter basis eg. "##CHAPTER-AUTHORS##"
 

3.2 Objavi Features

3.2.1 Book-Formatted PDF Output
Status: High Priority, Implemented
Function: the server side creation of Book Formatted PDF is a pivotal feature. This is managed by Objavi which runs as a separate service. The book formatted PDF supports Unicode, bi-directional text, and reverse binding for printing right-to-left texts on a left-to-right press and vice versa. The formatting engine outputs customisable sizes including split column PDF suitable for printing on large scale newsprint.
Interface: This feature is managed by Objavi, an API is functional and feature rich but not well documented at present. Objavi also presents a web interface for those wanting more nuanced control (see http://objavi.flossmanuals.net/).

3.2.2 CSS Book Design
Status: High Priority, Implemented
Function: The default PDF rendering engine for Booki is now Webkit and will eventually be Mozilla Firefox hence there is full CSS support for creating book-formatted PDF in Booki. This changes the language of design from Indesign to CSS - which means any web native can control the design of the book. 

3.2.3 Export Formats
Status: High Priority, Implemented 
Function: Users also can export to self contained templated (tar.gz) HTML, to .odt (OpenOffice rich text format), epub, and screen readable PDF. Other XML output options can be developed as required. 


I guess I can never claim to not having project management experience again. Darn it.

Visualising your book

Booki provides an RSS feed for every book. This means you can follow a book and see the edits made. Each RSS feed is linked from the info page. For example, the book about OpenMRS has an info page here  and the RSS is linked from the bottom.

A few weeks ago, we asked for some help creating a visualisation using this source. Pierre Commenge responded and started developing a Processing visualisation of the RSS feed. Processing is a free software used a lot for creating visualisations.

Pierre has a prototype available that runs in a java applet. So this look pretty cool. The live version enables you to play a timeline and see the development of the book over the period of 1 day.

This not only looks cool but it enables you to see how a book is being made. This is extremely interesting – imagine if we had all the data about how every book has been made up until now… it would tell us a lot of things about the book production process and the differences between different models etc… It’s a very exciting idea and we hope to be able to explode this idea in the following weeks and months in our experiments. Many thanks to Pierre for getting this underway.

How Book Sprints work for sponsors

Manual examples

Last week I worked with a Dutch organisation by the name of Greenhost.nl. They are a small hosting provider based in Amsterdam. They wanted to bring their crew to Berlin to make a book on Basic Internet Security and they wanted me to facilitate the Book Sprint. We got a small team together and sprinted the book over four days. Started Thursday, finished Sunday. Actually one day earlier than expected. 45,000 words or so and lots of nice illustrations.

Illustrations in Basic Internet Security

You can see the book here (all generated with the Booki installation at http://booki.flossmanuals.net):

http://www.flossmanuals.net/basic-internet-security/

http://www.flossmanuals.net/_booki/basic-internet-security/basic-internet-security.epub

http://www.flossmanuals.net/_booki/basic-internet-security/basic-internet-security.pdf

And improve it here:
http://booki.flossmanuals.net/basic-internet-security/edit/

The following morning, the book went to the printers and then was presented the next day in print form at the International Press Freedom Day in Amsterdam.

Reading the bound book at International Press Freedom Day

The presentation at International Press Freedom Day was complimented by a little bit of PR from FLOSS Manuals and a little bit of PR from Greenhost. The attention seems to be working very well as we are getting thousands of visits on the manual and we are also getting a lot of very nice press attention. Now, I don’t care one way or the other about press attention except that in this instance it is working for the book (I believe people need to know about Basic Internet Security) and for the sponsor that put their muscle behind getting the book created. That makes sponsoring of Book Sprints a very good marketing opportunity for organisations. There are of course some issues raised here, the first being that this will only work for the sponsor if they keep their marketing-speak out of the book itself. If they put marketing texts into the book they sponsor, they are going to look very very bad – and let’s not forget it’s free content: if someone thinks your marketing rant is too much, probably they will remove it. Let the book do what it has to do and get the kudos by saying you made it happen. Anyway… here’s some links from the last hours of comments about the book:

http://www.bright.nl/omzeil-big-brother-met-een-boek#comment-292324

http://www.volkskrant.nl/vk/nl/2694/Internet-Media/article/detail/1884010/2011/05/03/Het-internet-wereldverbeteraar-of-bedreiging-van-de-vrijheid.dhtml

http://www.netzpolitik.org/2011/buch-grundlagen-der-sicherheit-im-internet/

https://flattr.com/thing/183622/Buch-Grundlagen-der-Sicherheit-im-Internet

http://thepiratebay.org/torrent/6369126

http://www.boingboing.net/2011/05/02/will-technology-make.html

http://www.tech-blog.net/sicherheit-im-internet-alles-was-mit-wissen-sollte/

http://metaowl.de/2011/05/05/buch-grundlagen-der-sicherheit-im-internet/

Lastly, this kind of press is also good because it raises the profile of the book and makes it known to people  who can help improve it and distribute it. Take, for example, translation. The profile of a freely licensed book can make it seem a worthwhile prospect to translators. Not many people want to spend the needed hours translating a book that won’t be read, but if it’s a book with an established high profile then it’s a better proposition. To demonstrate this by example, we have already two offers by groups to start the German and Farsi translations:
http://translate-new.flossmanuals.net/basic-internet-security_fa/_v/1.0/edit/
http://translate-new.flossmanuals.net/basic-internet-security_de/_v/1.0/edit/

In addition, in the links above, you may have noticed the link to a torrent file on Pirate Bay. We didn’t create this torrent – someone noticed the book, downloaded it, and made the torrent. Hence others are helping a lot to get the book out there. ..nice.

So… think about what kind of book your organisation may want to bring into the world. Think of a great book that would help make the world a better place. For example, are you a design or typography company? Want to make a book about How to Make Fonts with Free Software? Are you a law firm? Want to make a book about basic rights in your country? … you get the idea…

Why repositories are important

Booki is for free books only (at least if you use the installation at www.booki.cc). The idea we are trying to engender is that when you create a book in Booki, you are also contributing to a body of re-usable material that can help others make books. The practice of building re-usable repositories in this way is a well-known concept and it’s extremely powerful. However, it takes time to build a corpus that can actually work in this fashion. You really need a lot of material before re-use like this can start having a real affect. I recently saw the first substantial use of Booki materials like this just last week. It occurred  with the FLOSS Manuals implementation of Booki (http://www.flossmanuals.net) which is a repository for materials about how to use free

I recently saw the first substantial use of Booki materials like this just last week. It occurred  with the FLOSS Manuals implementation of Booki (http://www.flossmanuals.net) which is a repository for materials about how to use free software. Last week we had a Book Sprint on Basic Internet Security and we were able to import about 9 chapters from 3 other manuals totalling approximately 15,000 words that we did not have to create fresh. Of course, the material needed some work to fit the new context, but it was still a substantial time-saver and extended the scope of the book well beyond what we could have produced had we not had the material.

This was really quite amazing for me to see. The idea was imagined from the moment FLOSS Manuals was built but, 3 years later, this was the first real case of substantial re-use. It takes time to build up the materials to make sense of re-use in this way, however, after 3 or so years waiting for the moment, I took a great deal of pleasure in seeing it happen for the first time.

I have been working with a group of very interesting people over the last 3 days producing a book that can be used for generating campaigns about Internet Literacy. We generated texts on a large and varied range of topics. More on all this later. One very interesting issue that has been more clearly illustrated for me in this process is the necessity to understand the role of templates when generating content. When I talk of templates here I mean pre-configured templates that are meant to illustrate what the final product of a chapter or ‘content unit’ should look like.

I have always avoided using templates because I think it shuts down a lot of creative discourse about what the content could be and it kills those amazing surprises that can leap out of working in a freer manner. Perhaps even more importantly, templates can confuse people – Sprint participants need to first just create what they know or are energised by – forcing output immediately into templates is not helpful to this process. However, I can see there is a role for templates, not as structure for the final content but as tools that can help the process of generating content.

In this particular Sprint, we generated a very lightweight template before the Sprint. This is something I really dislike doing for the reasons stated above but the fear was, (and I think it is justified in this instance but I would want to be careful before advocating its usefulness in other contexts,) that we would float too far in conceptual territory without any boundaries. We wanted very much to glue the creative discourse and thinking during the Sprint to defined actionable campaigns. So for this purpose, after discussion with one of the initiators of the Sprint, we generated a very lightweight template that provoked only 7 points. Really just the ‘who, what, why’ material that campaigns need to address. This was then used as a process template – a template acting as a foundation for the Sprinters to define the context of their content – not a template that would become the structure for the final content.

It worked very well – enabling the participants to let their creative energies flow while providing a backdrop or context within which the content needed to rest. The ‘process templates’ also allowed those who think conceptually to ‘build up,’ so to speak, and those that thought in more concrete terms could also define their content. It provided a common scaffold for sprinters to build in the direction that most interests/energises them.

So while it does not change my mind regarding content templates, I think I have discovered a place for very lightweight process templates that can give some kind of framework for the participants to work with, refine, define, and fill.

The Art of Losing Control

The production of a book is usually very tightly controlled by the author(s) and publisher(s) that produce it. We have come to accept that as just the way it is. You want to write a book, then naturally you have the right to decide what the text of that book will be.  Seems almost non-controversial.

So, it’s normal to be asked how can you exercise a similar amount of control over a book in Booki. Its an understandable question but very difficult to answer. Difficult because the answer has to cross paradigms – the first paradigm being the established book production and publishing model that we all know, and the second being book production with free licenses in an open system. So I usually find myself answering questions like this with a simple “You can’t,” and waiting for the reaction. It’s intended to be a provocative answer and the further the eyes roll back in the skull the more I know I have to unwrap the concept of ‘publishing’ in the new(ish) era of free culture for whoever it was that asked the question.

But the reality isn’t so simple – it’s much more interesting.

First, there seems often to be an unspoken assumption that control is necessary. Along with this comes the assumption that open content must be protected. Protected from harm – not just the malicious kind, but harm inflicted by contributions that lower the quality of the text. My experience from four years running an entirely open system (FLOSS Manuals) is that there is little to fear except spam. In four years running FLOSS Manuals, I have not seen a single malicious edit. It seems to be the case that if people are not interested in your book they will leave you alone. If they are interested, I have found that the approaches to the text are sensitive and respectful and more often than not they improve the work – sometimes in very surprising ways. On one book I worked on, a retired copy editor went from top to bottom of the 45,000 word text in his afternoons and made an incredible improvement to the text. I would like to have thanked him but I never met him.

The trick is not to protect the text but to manage it. To do this, first, you must make a decision on what kind of development process this will be and what kind of contributions you would like.  From my experience, the best strategy is to try and relinquish as much control as possible in order to achieve the right kind and amount of contributions. To this end, Booki provides some very useful tools to help you. If you want to keep your book very quiet, then you can hide a book so that it does not appear on Booki at all, except on your profile page. Privacy through obscurity. If you want to keep things really really quiet, then you can grab the Booki sources and install Booki on your own server (or laptop) somewhere out of reach of anyone. Or if you want the book totally open for anyone to jump in, then that is the default position with Booki all you have to do then is get the word out as much as you can and invite people to contribute. If you create a new book or chapter then that information gets broadcast on the front page of Booki, however, it is often harder than you think to attract attention and contributions. It often relies on how effectively you can get the word out and how attractive you make the offer. You need to reach out to people and inspire them. The more direct the approach the better – personal emails work best – and emphasising concrete outcomes is very likely to improve results, as is making the offer fun, relevant and illustrating a real need. But the usual rules apply for attracting volunteers in any realm – it’s a mix of luck and getting the tone and channels right.

Once the contributions start rolling in, then it’s up to you to manage this process. To this purpose, there are a number of tools available in Booki – most importantly the history tab where you can view changes and roll back to earlier versions of any chapter as you wish. If things get out of control, you can clone (copy) the entire book and decide on a more moderate development approach. However, the best tool for managing input and getting the book to where you want it to be is social management. You need to coerce the contributors to come along with you and share your vision of what the book should be. At the same time, you need to also be able to make the process satisfying to them. There are tools available to help with this communicative process (chat, notes etc) but it’s often reliant on your tone and approach.

‘How to control’ a book is a question I would like to see asked more often with more nuance and colour to the question. However, I think if you can lose the feeling that you must control the book and instead relinquish as much control as possible, you will be surprised and very probably excited by the results. In a world of free culture, it’s all about the art of losing control…

 

Barcelona Bookimobile

Barcelona now has a Bookimobile. We introduced the new Bookimobile to Spain at the Kosmopolis 11 Literature Festival.

The stamp we put on all books made by Booki with the Bookimobile

Booki Mobile Barcelona

We have worked here now for two days doing workshops, helping people produce books produced in Booki, and talking to people about book production. Tomorrow we have workshops and presentations. It has been loads of fun and we made a lot of cool books and also worked a lot making pleigos. Pliego is a small book format that can be made from a single page (folded into 8 or sixteen pages and ripped along the edges to make a book). It is an extremely simple and beautiful format.

The Bookimobile was designed to take the ideas of Booki to people and make real books that have been created in Booki. It now exists in Berlin and now also in Barcelona. The first Bookimobile (now in Berlin) was based on the Internet Archives Book Mobile. It was also initially sponsored by Mozilla, CiviCRM, Archive.org, Franophonie.org, and iCommons. The binder for the Barcelona Bookimobile was donated by Google Summer of Code.

Why ISBN does not work

ISBN stands for “International Standard Book Number”. It is a 13 digit number that identifies your book. No two ISBN numbers are the same and they usually appear on your book in numeric form and as a bar code. Generally, you buy ISBN numbers and each country manages this slightly differently. Some countries require you to be a publisher before you can order an ISBN. In the USA, I believe, you buy them in blocks of 10, whereas in New Zealand you just apply for them – they give them away.

If you wish to distribute a book through established book channels then you mostly need an ISBN. Book shops such as Barnes & Noble or your local book shop require ISBN so they can track, sell and order stock (books). Most online retailers of any size also require this – Amazon, for example, require an ISBN if you wish to sell through their channels. However, some online channels do not require ISBN – lulu.com for example.

The big problem with ISBN is that you need a new ISBN for every new edition. So if you release a book and then edit it and re-release it you need two ISBN numbers. This can take a long time to order and process and it can be expensive (depending on how you get your ISBN).

This is not the real issue. Admin takes a long time, we are all used to that. But sometimes an administrative system gets built to work for a certain model and when that model changes, then things stop making sense.

ISBN works well in a publishing world where books take years to produce and the products are identifiable as distinct bodies of work. However, in the world of Booki, this is not the typical process. For example, when working with a Book Sprint team, we typically write and release a book in 5 days. You can register the ISBN before the event, no problem. However, quite often after the event we may ‘release’ a new version of the book  – 5, 10, 15 times in one day. Some of these releases may be substantial revisions. This quite clearly does not sit neatly with the slow ISBN process. Even with a more conservative development cycle for a ‘Booki book’ the implication is clear – ISBN expects content to be static, it does not expect books to ‘live’.

Its a real problem for free content and content that exists in an environment where ongoing contributions to the source are encouraged. If you manage a book like this in Booki and you wish to distribute the book through traditional distribution channels, then there is a point where you must ‘freeze’ the content and release the ‘snapshot’. This is not altogether satisfying since then you must either make the book ‘die’ for a time so the printed work and the source remain equal, or you must acknowledge that the paper version is merely a soon-to-be-outdated archive.

Letting content die, or temporarily freezing contributions, can kill a book, which is not a very desirable result considering it often takes a lot of work soliciting ongoing contributions in the first place. The alternative, accepting that the printed book is an archive, is probably not going to make many distributors very happy since you are asking them to sell an out-of-date product (although this is conjecture since I have never tried this).

My answer to this dilemma is to actually walk away from traditional distribution channels. Free content should travel freely across media and in front of the eyes (and ears in the case of audio books) of whoever wants it and in whatever form they want it. Let the content go, don’t constrain it to these traditional channels.

Typically these channels are pursued however for ‘legacy’ reasons. Some you can’t escape – if you are an academic you live off ISBN and the education system will be slow to change that. However, if it’s a business model you are after, then don’t make the mistake of thinking that selling books is the only way to go… new models are emerging – get people to pay you to write the content, for example. One such successful example of this is the Rural Design Collective who successfully raised $2000 (US) via crowdsourcing on Kickstarter.)

So there are alternatives. ISBN is blocking the way, but it’s probably about time to start believing there are better ways….

 

Importing Archive.org Books with Booki

For some months, Booki has been able to import Archive.org books. This development was sponsored by Archive.org. When importing a book, Booki requests an ePub from Archive.org, converts this to the ‘native file format’ (booki-zip) and loads this into the Booki database. It is then possible to export the same book back into an ePub file.

So, if Booki can import an Archive.org ePub and then export it as ePub what is the point? Seems like Booki is an unnecessary conduit. Well, one point is that with Booki you can export the book into multiple formats – such as book-formatted PDF. That means you can take any of those luscious out-of-copyright books, import them into Booki and make real books from them. This is pretty exciting when you see just how lovely some of these books are. Take for example the copy of Cinderella in the American Libraries section of Archive.org.

Cinderella original edition
Cinderella Edward Dalziel, 1865

This version of Cinderella is out-of-copyright and you can republish as you like. This is a pretty exciting prospect, opening the door for anyone to start their own publishing house importing content from Booki, styling, and exporting to print-formatted-PDF for printing.

However, there are a few steps that you may need to go through first, and this is the real reason why we have implemented importing from Archive.org. All the books in the Archive.org libraries have been created using OCR (Optical Character Recognition) scanning. The process involves loading books onto book scanners and scanning each page.

Archive.org Book Scanner.

However, scanning creates a certain amount of errors. OCR doesn’t render all text correctly and cannot tell the difference between text on a page and text in an image. Hence images with embedded text are usually split up, with the text elements saved as plain text and the surrounding image saved as multiple smaller images. So the OCR-scanned books need proofing and the import feature in Booki enables proofing of OCR scanned books from Archive.org. This means that teams can get together remotely, choose a selection of Archive.org books, and get to work improving them.

While this is all working, we want to build a tighter workflow and a few extra tools to assist the proofing process (if you are a developer familiar with Python and interested in helping us with this good cause then let us know). Douglas Bagnall (Booki/Objavi developer) recently extended the import functionality so that all the metadata imported from Archive.org is preserved. This opens the door to utilising this information to assist proofing of the content – we hope, for example, to eventually be able to show the complete digital image of the original scan, before it was reduced to OCR, alongside the OCR pages to assist proofing. Watch this space!

Incidentally, Booki can import any ePub, so this means that the way is open for the same proofing process to be applied to other OCR scanning projects. If you have a project like this then let us know, maybe we can help.