4am at Baltimore airport…yawn… I’ve never seen an airport so busy at this time of the morning…While in Baltimore, Kristen (co-Founder, Coko) and I had a really good meeting with the nice folks at Project MUSE. We are helping them with EPUB conversions using INK. I think it will lead to a community meeting soon as we are getting quite a bit of interest around it.
Now back to San Francisco. Hopefully I can snooze a little on the plane. Meet with UCP when back later in the morning to work through the list of items for development on Editoria. I’m really looking forward to this. Lots of good feedback from recent tests. I think we’ll also tweak the development process (Cabbage Tree Method) a little for this next phase and try user-dev-UX collaborative meetings facilitated online. It will be faster for picking up and fixing issues in the testing phase.
Many publishers start with MS Word docs and need to move to more structured file formats. To do this they are often advised to go straight from the badly structured markup of docx to a structured document format (some form of XML). However, I believe this to be a mistake.
Wendell Piez and I wrote about a strategy for MS Word-to-HTML conversions. We call it ‘HTML Typescript‘, the core principle being to first make a faithful representation of the docx file in HTML, without rushing too quickly to interpolating structure or translating first into descriptive XML intermediary format (which is the common way to do this), the idea being that clean HTML is a better place to start improving the structure of the document than either ugly docx markup or overly complex XML of some other variety.
Once we have clean HTML, we can then more easily work with it, manually or programmatically adding structure, and we can more easily parse it, for example, for entity extraction etc…
I did a demo of this today at Project MUSE in Baltimore (for a very nice bunch of people) and I wanted to share the following screenshots that more viscerally illustrate the benefits of this strategy. Displayed below are two screenshots of source code. The first is the markup of a docx file, followed by a screenshot showing the result of converting that docx to HTML using the HTML Typescript converters (XSweet) that we built.
And the HTML result after running the same file through XSweet (docx -> HTML Typescript converter).
The question is – which would you rather work with?
One of the best arguments against using proprietary platforms is that you have no control over what they might do. I wrote about this earlier with regard to Medium and, as if I had asked them to prove a point, they pivoted a week later and left a lot of their users out in the cold.
That is the risk of using proprietary platforms – you just never know what they might do and, importantly, you have no say in what they might do.
Here is another example by way of GitHub and DMCA takedowns. Earlier this week a complaint was filed for a DMCA takedown against an open source project known as Gadgetbridge (DMCA is the Digital Millennium Copyright Act, one of the things this (US) Act enables are ‘takedowns’ – GitHub has a pretty good explanation of them on their site).
It seems that someone had created an issue in the GitHub Gadgetbridge repo which included a screenshot of a competitive product. The actual DMCA complaint is here. There is a discussion about it on reddit here.
It appears that there is little basis for arguing a copyright infringement.
The problem here is that it appears GitHub has a ‘keep our hands clean’ policy towards takedowns ie. they will just read the complaint to see if the process has been followed and go ahead and takedown the recommended repositories. In this case, they took down the entire Gadgetbridge repo. GitHub doeshave an option here, they could have looked into the case further and decided that the complaint had no basis and, consequently, refused to takedown the repositories. Alternatively, they could have isolated the takedown to specific files and not the entire repository.
Imagine what this would do to your community if you run an open source project and your entire workflow revolves around GitHub (as it does for most open source projects).
I highly recommend you don’t use GitHub or any other proprietary service for hosting your code. If you do so, you are vulnerable to these kinds of acts. Cynically, it is not unimaginable that proprietary competitors could leverage GitHub policies to get you taken offline. If you run GitHub pages for your site that would also mean your web presence would go down. At the very least, GitHub are not trusted stewards of your code. Host your code and all other services on your own instances of free software applications eg Mattermost, Gitlab etc
Some records I’m getting into at the moment. Nothing particularly new, but still pretty awesome. First, the fantastic Sleaford Mods from Nottingham, UK. Soooooooo good. They remind me of a perfect mix of The Goats, 8 bit bip hop, and classic 70s pop punk. If anyone knows Hamilton (NZ) band MSU …they remind me a lot of them in attitude and approach. I have just brought their album Key Markets but here are some newer clips.
And I’ve been listening to the ‘This is Kologo Power’ compilation coming out of Ghana… sooooooo good too! It features, amongst others, King Ayisoba. Think of a kookier, happier, Fela Kuti. Long, fantastic, stories with an irresistible groove. I’m *so* hooked!
I’ve been eating out at apocalyptic fiction recently. A sign of the times perhaps as it seems this kind of escapism is pretty popular at the moment. Some of it is pretty dark… Just read American War by Omar El Akkad. A slow dystopian end-of-the-world saga. Grim but a good read. My fav line: “Everyone fights an American War”. I think it was a little long, somehow it felt like it had at least two points that could have been endings before it actually finished. But still very enjoyable.
Currently working through the following two books, both of which I’m really enjoying.
Since I also love non-fiction, and generally eat this stuff up as fast as it comes out, on the non-fiction bookshelf I am currently working through Killers of the Flower Moon by David Grann.
It is about the Osage tribe of Native North Americans that were thrown off their land by the government of the time only to be given (unknowingly) land which was rich in oil. They became the richest people per head on earth. Then, ‘mysteriously’, they started being murdered at an astonishing rate. It is a pretty fascinating part of American history.
Also, to get out of all the gloomy escapism, I decided to try something a little lighter – The Long Way to a Small Angry Planet by California’s own Becky Chambers. 1/3 of the way through and it’s awesome! Highly recommended if you are looking for an entertaining and engaging, light (as in mood), read.