What’s a Crowdsourced Information Platform?
Crowdsourced Information Platform is how I refer in my head to the collective of services that have started in recent times to collect news reports, incident reports, personal stories et al in the form of a web post (as in Social Media) or an audio report recorded via an IVR system, a video uploaded via smartphone or even a simple SMS…and all the others that are sure to follow.
Essentially any form of two way communication, with a large user base. Note that one way information dissemination systems (such as TV, print media and IVR systems that dont let you record) don’t fall under my definition of a Crowdsourced Information Platform.
Some examples of Crowdsourced Information Platforms:
RuaiSMS (and other sites based on FrontlineSMS)
All kinds of Ushahidi websites
Facebook, Twitter et al
Google…all of it
Email (yep, and I’ll explain…in painful detail)
Why we need integration between platforms
We all know that most of the Web is user generated…simply because everyone on the Web IS a user. However, as we open more channels of content, the definition of user must undergo some evolution. Different users will have different access mechanisms and therefore potentially radically different views of the same information. In order to be able to provide some order to the chaos, mechanisms will be required to group and link content.
Attempts are already underway to set up repositories of crowdsourced data. However, the variance in media being the way it is, it is unlikely that any single centralized system will be able to cover everything and everyone.
A distributed system is needed where each node is able to communicate using a minimum acceptable standard. Where a medium cannot meet the peer communication standard, a translator can be introduced as an intervention.
By keeping the peer standard constant, the only requirement to add a new medium to the network remains to write a translator from that medium’s existing communication mode to the network standard.
Our (Mojolab’s) recommendation for that standard is EMail.
Email has been around for a looong time. It is one of the earliest applications of networking between computers and has been hammered into shape by, in IT timescales, the weight of millennia.
It’s been used effectively for
Unicasting – like when you send your wife a one liner from the convention saying “Wish you were here”
Multicasting – like when you erroneously copy in your girlfriend
Broadcasting – like when facebook tells all your friends by email that you changed your marital status to “Divorced” and your Relationship Status to “Single” on the same day.
It can be used to send a payload plus metadata
You can send text and you can send binary data too. This makes email particularly attractive to multiplex different kinds of data, such as audio, video, text and pictures.
Mail clients are cheap to write. Python has a nice IMAP library that will do most of your work for you. So do most of the other languages. It’s really easy to script email!
Authentication and Security is outsourced
With regard to anything to do with mass usership, authentication and identity management is a nightmare all by itself. With email as the peer communication standard, that nightmare can be left to whoever is running the mail server, which is usually someone competent to handle it (or so we hope)
Users who live in bandwidth abundant places like Sweden, Palo Alto and Bangalore (among others) will have trouble understanding how important asynchronicity is when you have a connection thats bursty based on everything from weather conditions to political scams.
However, when it comes to bad bandwidth areas, its great when your mail can queue up without holding up your interface and go out all at once when the connection gets better.
Case Study: Mojomail – Using Mailman and GMail together to create a Distributed Content Management System
CGNet Swara is a crowdsourced news portal for Central India focussing on Adivasi (indigenous) and other marginalized communities. It started off as a pilot in 2010 with a single IVR number linked to a blog, moderated by Shubhranshu Choudhary (Shu). Over time, Shu and his team have trained people in the field to use the platform and also to train others to use it to share relevant stories from the grassroots in media dark regions of Central India.
Users call the number and are presented with the option of recording fresh content, or listening to voice posts left by other users.
Each recording needs to be listened to, verified, quality edited (amplified, cleaned up) and summarized in text by a human moderator. Then it is published to the web interface (the blog) as well as the IVR interface.
The verification process could involve calling back the user and confirming things, or checking with other sources in the vicinity.
Initially, when the platform was new, the volume of incoming calls and consequently the number of audio recordings coming in was low, limited to less than 10 a day. Today the platform receives over 300-400 calls a day and over 60 recordings, each about 2-3 minutes in length.
The moderation effort has therefore increased sixfold.
Moreover, as the community gets more and more comfortable using the platform, they are more and more eager to get involved in the content management process as well as to own, replicate and customize the platform.
In 2012, we deployed 3 additional IVR servers to complement the existing one in Bangalore.
These correspond to the MP/CG, BR and AP telecom circles respectively.
At the same time, we added further lines to the existing server in Bangalore.
In 2013, we added on a further channel to the Bangalore server, called Adivasi Swara, which is in the Gondi language.
To move towards community moderation across all these channels, we needed a system that :
is accessible by many users – Loudblog, the existing interface on the IVR servers is multiuser, but not really (as are most content management systems)
provides some form of centralized access control,so that the community can choose who to share content with
hould work on all kinds of bandwidth, from 2G (about as bad as a 56 k modem) to leased lines (4 MBPs stuff that we dream about)
should allow people to contribute in multiple languages and also send back binary information like photos or audio responses to attach to the incoming content
To design and implement a new system of this sort would be a fairly expensive exercise
So we took some shortcuts
We used what people already know
GMail is the interface part of our DMUCMS (Distributed Multi User Content Management System). Everyone who wants to moderate just gives us their GMail address, or creates a new one. Thats all it takes! No training manual, no learning curve while people figure out the form interface…this is the part where the node brings the communication know how. Of course, this raises the bar on being a moderator…you have to know email.
Why we like GMail –
- Conversation view
- Drive support
- Hangouts and chats baked in
- Multimedia attachments
- Presents an individual view of shared data (mails on a mailing list)
- We used what has worked through generations of other hardware and software
We make a mailing list, and put those addresses that people gave us on it. Thats it, no more needed. Thats our content management system database set up.
Why we like Mailman –
- Fully functional
- Keeps inboxes in sync
What we absolutely needed that we couldnt find..we wrote
Mojomail is essentially ABBOTS – A Big Bunch Of Tiny Scripts. It automates the formatting and sending of emails to the list we made in Mailman.
Each of our IVR servers gets two email addresses, one that it sends out on (like a public key) and one which it receives on (private key, known only to server admin).
Every time a new piece of content, i.e. a recording comes in, the server simply creates an email from template, attaches the MP3 of the recording to it and sends it out to the mailing list so that all the moderators get a copy.
Sort of like those very proper people who always write in the same format and include helpful tagging information so that you can search it better
Thats it. Thats our DMUCMS. The subject lines being preformatted means that GMail nicely organizes recordings into conversations, and each moderator can contribute to the processing of each message and update the conversation. They can also attach edited versions of the recording, other media…pretty much whatever is needed.
Finally, when the conversation reaches completion and the message is ready to publish, the owner of each interface, who is also a member of the list takes the final version and releases it onto their respective interface, such as the IVR, a blog or social media.
We are now in the process of automating publishing through various interfaces through email, which should be relatively easy since almost everything supports post by email.
We are also using email in similar ways in projects in other parts of the developing world.
In summation email based integration strategies allow you to
Share what you want
With who you want
When you want
Irrespective of simultaneous bandwidth availability
Without having to write too much code
Or spending too much capacity building budget
We welcome all support in developing a content sharing standard around email as well as development support for Mojomail as an open source project at http://bitbucket.org/mojolab/mojomail