When I first encountered Dat, there was some mumble in the air about ‘git for data’. A nice elevator pitch. I’m not sure where that meme came from. I know some of the Dat people (Max Ogden and Karissa McKelvey specifically) and I don’t think I remember them using this phrase. It is often this way with tech – an idea gets out there, no one knows what it means, and then, before you know it, it’s everywhere. I have never been at the center of this kind of viral excitement but I have seen it many times, and, as always, the meme reflects very little truth about what is actually going on with the tech or how it might be useful.
In the case of Dat, it took me some time to work out what it was. At first I understood it as breed of peer-to-peer technology specifically for the distribution of datasets. Indeed, that is what they say on their website
Sure… so it’s this sciencey thing that is intended for use by researchers for sharing data. It sounds like many of the things we have also been discussing at Coko, to do with the early sharing of research data. So Kristen Ratan and I approached Dat and started up conversations which are leading us towards some interesting collaborations – not yet around implementing Dat in Coko projects but around developing open source-open science communities (more on this later).
However, it wasn’t until I spent some time in the Maasai Mara that I understood what Dat was all about.
I traveled to Kenya to spend some time with Richard Smith-Unna who had just joined the Collaborative Knowledge Foundation. We spent some time together just outside of Nairobi and then traveled down to the Maasai Mara for a few days camping.
It was a pretty basic camp in beautiful surroundings. Its a wonderful thing to sleep deeply at night while hippos and buffalo walk around your campsite and, on one night, around your tent. During the days we went exploring and talked tech while watching cheetahs or lions, or termite mounds.
Richard had been the primary developer behind Science Fair which uses Dat libraries. Science Fair is supported, to a small degree, by eLife. eLife received 25 million pounds earlier this year to ‘do their thing,’ some of which includes spending small amounts of money on technology innovations built by people like Substance.io or Richard.
So Richard, mostly, put together Science Fair. In essence, it is a desktop browser with a very specific content focus – research articles – and a very specific distribution strategy – which is where Dat comes in.
Science Fair, an entirely JS application running on your desktop, leverages the Dat JS libs….but to do what?
Dat enables content to be stored – you throw stuff into it and get it back later. However Dat isn’t just a content store, otherwise it wouldn’t be very interesting, since that problem is well solved. Dat is a peer to peer store.
In the case of Science Fair, when you download something to read (which is stored with Dat) you become a peer in that content’s network. You become a server for that piece of content. When someone else requests that same article then you may be the one serving the content to them (if you are the closest peer to them).
In other words, Dat is a kind of open source Content Distribution Network (CDN) technology. One with a few interesting extra features to leverage ie. a peer to peer design.
You don’t have to use the peer-to-peer functionality. You can just use Dat as a single content store – without replicating the content to other nodes. That is quite useful in itself but there are many other technologies that can do this – a normal file system on a server somewhere, for example. You could also use Dat purely as a CDN – a network of content stores which replicate and deliver your content closer to where your users are. Once again there are open source technologies that can do this like jsDelivr. However, what Dat can also do, is turn your CDN into a peer-to-peer network where users become the content servers. When a user fetches some content, they then become another node in that content’s delivery network.
That is pretty interesting. It means Science Fair, while looking like a search-and-read interface for content, also is a peer-to-peer content delivery node for that same content.
The question is – is that interesting or useful? Well… it is a fantastic example of federated content and, possibly in time, federated publishing. As researchers and/or publishers seed content into this network, the boundaries and roles of Journals may start to become a little fuzzy.
For example, Open Access (OA) is interesting because it is a movement for making research materials available for free. Free as in no cost, and free through the application of liberal Creative Commons licenses. However, OA still follows many of the norms of the publishing world, in that there are (capital P) Publishers which curate and control the access, display, and ‘functionality’ (although article functionality is a rather impoverished idea in this sector) for content. If an OA Publisher classifies article A as belonging to category B due to their internal taxonomy then that is where article A will go. If a Publisher enables annotation for ‘their’ content then you have annotation. If a Publisher enables threaded comments for discussion around the article then you have one place where you can discuss the findings. But…while Science Fair might sound like this – a place to find content (just like a Publisher) – it is not this. Science Fair distributes the content into a Dat network and how that content is surfaced, tagged, commented on etc is entirely up to the type of interface you use to access that content. If you wanted to share user-specific tagging taxonomies, for example, you can build that into Science Fair or a Science Fair-like interface. No need to wait for the Publisher.
The researchers, then, could have complete control on how content is curated, displayed, discussed etc since in some sense the users start to become the publishers.
That is a pretty big step sideways.
I’m aware that distribution is not the only thing Publishers do. But it is why they exist in their current form. If Publishers were not the branded content portals they are then it is unlikely they would exist in the form we know them now, rather they would be service providers that do all, or part, of the other services they currently provide like quality control, technical checks, conflict of interest checking, validation and normalization, review management, format conversion etc. The point is that at the core of these services currently is the Publisher – the brand holding this all together, so to speak. But what happens when one of their primary offerings – sharing/distribution of content – starts to be diminished by other channels? What if researchers decided this is not how they want to access content. What becomes of the Publishing model when faced with an erosion of one of their primary offerings?
Federated publishing breaks down all the ways that we think of publishing, as a way to access content, today. It fundamentally remaps ideas of centralised publishing and opens up many many interesting de-centralised possibilities and questions. It is a fundamental shift of power from the center to the periphery.
I find this interesting because at the time I wrote the piece on federated publishing, I mentioned that earlier there had been quite a bit of chatter about federation. Diaspora, status.net (now pump.io), and Thimbl were three projects that looked to the centralised power dynamics of social networks and saw federation as a way out. Ward Cunningham also evolved the Federated Wiki around the same time. Everyone felt, for one reason or another, that distributed power worked best. None of these projects were successful in terms of adoption, nor were my attempts to start federated publishing using Booki/Booktype. However, that is possibly exactly what makes Dat and Science Fair interesting.
I have watched many great ideas developed into softwares over the years and witnessed the death of those same projects. This kind of cycle reflects the well-known Silicon Valley mantra
to be right too early is the same as being wrong
While Dat, IPFS, Science Fair etc might actually herald in an interesting new era of federation one thing for sure, the change will not occur overnight. It requires persistence, strategy, and working closely with researchers to encourage them to use the tools and to shape them so they find them useful. A slow displacement of existing tools and their inherent politic is the better strategy for radical change. Radical change at a slow persistent pace is far more likely of success than a gangbusters approach that will soon lose energy if change isn’t instantly catalyzed. Persistence is the key.
While Dat is not restricted to the sharing of datasets as they imply, it is interesting to see how this idea has been realised in part by Science Fair as an interface for browsing and reading articles. The question for me is not ‘is this a good idea’ (it is) but rather, could the timing and execution be right this time? If it is, then could applications like Science Fair evolve more utility than publishers can provide? Could this, in turn, lead to these applications being widely used by researchers? Could this, in time, lead to a huge ‘unbound’ peer-to-peer content store of research data? Do the Science Fair and Dat teams have the patience to strategise and set their collective minds on a persistent, slow, change that will enable the radical reshaping of the power dynamics they are addressing? And if so, what happens then?
Coda: Dat is capable of more than what I have described above. Its has other very interesting features such the ability to cryptographically sign content and its ability to update content by updating only the difference between versions (as opposed to the entire file). The above post is not an audit of Dat and its total utility but rather a sense making piece reflecting on some features of Dat and what it could mean in this use case as exposed by Science Fair.