Op-interest: On Opinions + OpSec

On of my Webster Words-of-the-Day this week was “opine,” the act of having and stating an opinion. It’s something that I do often on my blog, but am encouraged to stem for a more objective perspective when it comes to professional data vis stuffs and news publications. Journalism solicits an ideally balanced representation of information, but as with any domain touched by human fallibility, it’s vulnerable to bias.



The Swedish have an interesting word for one-sided opinions: Åsiktstaliban, defining a group of people who tolerate only one opinion and can be colloquially synonomized with and global violence; and so this blog post is going to address opinion diversity and operational security, two poles of a global approach to citizen journalism and political activism. It seems an appropriate post in the days following September 11th and the tragic anniversary of the Westgate mall attacks, and definitely something that has peaked my [op]interest with as yet feeble articulation these past few months.

Like many developer-journos, I’ve been following the more tragic and graphic media reports out of Iraq, Gaza, and Syria lately. Jonathon, one of our developers at Ushahidi and Chris, his partner on the CrisisNET project, created a timeline of ISIS happenings a few weeks ago, followed by subsequent investigations of conflict in Iraq and Gaza, and this had me reading more about security and media verification for journalists in the Middle East, and otherwise hostile-to-media and humanity areas.

I touched on these topics briefly during my panel at HOPE-X with Harlo Holmes and Barton Gellman (livestream here), and again during our workshop on opsec last week and the Buenos Aires Hacks/Hackers Conference.


But independent of my own stuff, there’s a recent trend in crowdsourced citizen journalism that I want to encourage and support professionally and just personally. Part of supporting that initiative is providing open source tools to enable citizen reporters (like those in Ushahidi’s Toolbox), but part of it is also just sharing information openly about authoritative sources.


This is a good place to promote Bellingcat, and other work aimed at armoring activists, newsies, and the general public with information. While it probably won’t keep extremists from more barbarous and cowardly expressions of violence, being informed is non-trivial in the fight against global rights violention. A lack of information historically and consistently is the root of epic geopolitical blunders, tragic massacres, ignorance and ignoring of massive human rights transgressions, globally. To that end, and in a modest objection to the wave Åsiktstaliban media, I’ve assembled a small collection of links and sources to keep apprised of what is happening in places that are remote from my current locale. I’d love to solicit others so I’ve made a form at the bottom of this blog for collecting relevant media sources and tracking the safety of embedded journalists in the Middle East.

  • The New York Times had pretty decent coverage, McClatchy’s wires on the Middle East and the Guardian’s Liveblog have been pretty consistently informative
  • On twitter, I follow Blogs of War (@blogsofwar), and some specific journalists embedded in regions of interest (@BklynMiddleton, @IvanCNN, @Matthew__Barber, @Mudar_Zahran, @jrug, @abumuqawama, @joshuafoust,@combatjourno,@SajadJiyad,@RaquelEvita,@DrZuhdiJasser,@majidrafizadeh,@Reem_Abdellatif,@WalidShoebat)
  • I’ve started reading local bloggers and certainly Bellingcat
  • Vox had a pretty o.k. abbreviated breakdown of the current affairs vis-à-vis ISIS, HuffPo has a decent world roundup as well

But despite the intense media house coverage, I find myself often returning to individual blogs and the work of lone journalists; I think this trend is significant and I’m sure shared by many given the popular response to citizen journo-projects like Bellingcat. I find most embedded journalistss and local citizens to be the most informative for thorough and unapologetically blunt coverage.




As a personal/pseudo-professional aside, we’ve (@Ushahidi) also been working on an implementation of some data visualizations for election monitoring in Yemen, and this had me researching more of the political climate there (so samples below).


preview of Ushahidi V3 Viz

I’ve been wanting to build a visualization of global disappeared populations, of which there are many, in almost every country. Those that we hear about more often harken back to Colombia and Argentina circa the 1970s persistently through today, or more recently the 600+ Nigerian girls kidnapped by Boko Haram, the Yazidi women kidnapped by IS affiliates, or the Zone 9 Ethiopian journalists still detained in East Africa. When a country succumbs to brutal regime rule, it’s often the journalists, the vocal activists, and the outspoken citizenry spreading independent opinion and information about injustice that become the targets of violence and effacement tactics. Information becomes a target, and those who process and disseminate it are vulnerable to attack.


Screen Shot 2014-09-18 at 2.35.38 PM preview of Ushahidi V3 Viz – gender counts

Perhaps some of their anecdotes and needs are things we might accommodate in the newer version of Ushahidi, or in CrisisNET, our pretty rad aggregator of social and streaming data on the global crisis situation, unified in a single API. And while there are many visualizations and representations of the statistics around targeted terrorist groups, a direct comparison between the composition of the victim population and the terrorist perpetrators is something perhaps worth investigating. A recent open analysis on government documents about outstanding terrorist threats and the TIDE “watchlist” (see also TIME, The Atlantic) reveals some interesting statistics about the paucity of females associated with violence as terrorists, but the general density of females associated with violence as victims.

TIDE by the Numbers

watchlist-by-gender2 of the 9 detained Ethiopian journalists were women, 600+ of the Nigerian girls where; a substantial portion of the limited documentation on Syrian disappeared citizens catalog female adults and children, and coupled with the female rights violations in Yemen, the disappeared counts are also substantial; the same goes for Turkey and countless other nations who’ve only begun to catalog disappearances publically. The Support Yemen Project has done some work to publicize the circumstances surrounding human rights and free expression throughout Yemen as has the AHA Foundation, in addition to logging the impact of terrorism and counter-terrorism, and yet much of my more informed perspective on women’s issues and violence in Yemen stems from the posts of a Yemenese female blogger. Again, my focus returns to local journalists independent of media affiliation. And while females are not the sole-authors covering female rights, the dangers faced by female journalists in terror zones corroborated with some recent reports from the NYTimes and the Huffington Post, as well as some more general blogposts on women’s rights violations authored by the aformentioned lone-journos I follow on Twitter. The circumstances demand a more responsible way to monitor and vet on-the-ground activity and reports, and increasingly social media monitoring and crowdsourcing applications are providing these windows to supplement the occasional blog post and media supported piece.

Screen Shot 2014-08-11 at 11.58.22 AM

This made me consider the unknown, and absence of information as an important root to some of the more brutal disappearances, and particularly lead me to consider the position of citizen-journalists who seek to amplify information about a space and are subsequently pushed by kidnapping, eradication, or imprisonment and public execution. Information is sometimes the most dangerous currency to smuggle from a vacuum, it can mobilize nations to send aid or commence peace talks, it can prompt the vicious reactions of groups who would execute victims to deter action, it can push citizens to technological circumvention tools in an effort to counter the habitual throttling of their internet access. It’s one of the more noble vocational pursuits to propagate honesty in a sea of redirection and rumor, and it’s something that can be enabled and aided by technology. Given my recent research and just general current events, I’m incredibly humbled that I get the opportunity to work on technology for crowdsourcing and spreading information, and so I wanted to address how we’re tackling the vulnerability of information providers with our tech at Ushahidi and our trainings at Internews.


In terms of self and source protection, Harlo and I have compiled some applications that can help with operational security for journalists, and this applies to citizen journos as well.



In terms of source verification, CrisisNET has prepped a roadmap series of features to integrate the likes of TinEye, and twitter verification via TweetCred. To that end, the devs at CN wrote about this application of authenticity readings to the CN service vis-à-vis  the IDF/Gaza conflict  recently. We’re working to build more security into Ushahidi’s platform as well, and otherwise increase the availability of our technology through much needed translation efforts. Like most of the platforms we provide, we rely often on crowdsourcing and community participation to complete the arc of their utility, and we’re hoping our community will help make our products better.

Outside our own repertoire, there’s a beta product called Scraawl that also aims to provide streaming data about large scale graph and social media collections. There are further, plenty of ways to contribute to crowdsourced journalism projects: join Open Reporter, a platform for free and open news, or Open Street Map, a crowdsourced program for mapping the globe, or Project Fission, an open source project to manage reporters’ notes and stats. Opine and add-to where possible, open information and citizen journalism still source some of the most up-to-date coverage of crisis worldwide.


Meantime, I’ll close with a more positive piece, reblogged to oblivion, on Yemen, a link to some github to watch as we move more data viz into Ushahidi’s core, and request eagerly any blogs/sources to watch below:

There Must Be A Pony Somewhere: Digging in Data to Find a Story

CartoonicornQuote investigator wrote a cute quip about the origins of this blog’s title quote (“…there must be a pony somewhere…”), and lately, it has me thinking about a job I share with many techy-journalists: digging through data (evidence) for a story (pony). I’ve commented on that a bit exhaustively in this blog, but the metaphor carries through to building a data journalism team, composed of a ragtag herd of unicorns, racehorses, and predominantly, ponies. Online Journalism Blog did a short piece about the taxonomy of journo-developers too, bulleting a few typical types (racehorses, unicorns, mules),  to which I’d like to add ponies before diving a little deeper into what this means in terms of characterizing a professional population by its equine analog.

At this week’s MIT Civic Media Conference, Joi Ito kicked off an introductory talk with a nod to his coder fellow, a “unicorn” journalism-coder-analyst that had just joined the team, so the metaphor has stuck with some steady citation and I think it’s worth discussing here. In the next few sections, I’ll cover a few adventures in geo-journalism, talks and projects I’ve done around mapping in the past months. Moreover, this will be a blog about our equine habits and heros in data journalism, and some musings on what media hackery earns in terms of recognition and reward.

Dev-Journo Taxonomies

zebracornThere’s an understandable spectrum of personality types and professional competencies in Data Journalism. There are the fantastic anomalies: unicorns; the hardy worker hybrids: mules; the strange and rare portmanteaux whose skills define along a folksonomic schema: looking at you zorse, zebroids, donkras. I gave a talk on Data Journalism a few months ago (check vimeo below), and the thesis of my presentation echoed the essentially hybrid aspects of the job.

Those born under the sign of the Horse are a flexible group of people. They tend to be stubborn when it comes their ideas, but they are also incredibly patient when it comes to hearing out what other people have to say. They favor straight-forward conversation, but avoid trouble where possible; a paradoxical combo, but one that makes the horse persistently fascinating as a sub-population in the animal kingdom.

Data Journalism in DR

So in the data space, why fixate on ponies as representative of some substantial sample population in the greater software engineering venn? Because ponies are slightly different than horses; capable of the same intelligence and empathy but perpetually twee-er and often assumed to be less mature. Some of the brilliance I’ve witness from millenials in the data journalism space has made me think that another branch from the taxonomic tree should recognize those whose aptitude is impressive in code but whose journalism background, and experience in general perhaps seems premature.

Pony Projects

muybridge-2When social media steps down from the free speech party, and while governments and institutions of modern social exchange continue to use networks as a way of monitoring and managing society, it’s often the critics and the activists who have to pick up the slack to produce objective publications and in this space the post-modern (and often, outsider/premature) workhorses of the data journalism space have something to contribute.

As a class, proto-journalists and data mungershave developed some tools to analyze trends and provide objective and dissected-unicornuncensored criticism of the information they represent. Zeynep Tufekci’s talk at this year’s MIT Civic Conference on citizen investigative journalism in Turkey gave a nod to the use of social media (and twitter feeds in particular) as infrastructure for collecting public opinion and fact-checking specious claims. Many tools for crowdsourcing, Ushahidi included, can be deployed to provide for citizen journos-ponies, smaller breeds of self-taught but domain-proficient reporters, with tools for reporting. And while much of this citizen-driven practice is perhaps under-promoted in the contemporary news space, some of the most renegade journalism efforts are sustained by citizens running depolarization operations on social media platforms in their home countries, as Zeynep’s talk suggested.

Pony Hierarchies


Part of the persistent argument in discussions that blend net neutrality, privacy and surveillance censorship revolves around how important crowdsourced and social content has become for developing honest and unbiased alternative reporting models globally. Though not to be confused with incident data directly, social media reports like CrisisNET’s Syrian Youtube Map and Conflict Map’s tweet and social media tracking plan provide these kind of windows into the world of social streaming to study crises. In analysing, contributing, and disecting social media content, pony-journalism has become a more dominant approach to assessing conflict and geo-journalism at a global scale.

Muybridge Motion Studies

In fact, arguments around how to classify the oft-hyphenated and obscure titles applied to data-journalists are more about the hybridity of their job descriptions and the range of skills they deploy than about the elegance of the metaphor. As an equine-hybrid class, we’re often trying to find new ways of developing and pushing content, a nod to the aggressivness and tirelessness of the horse behavioral type. But part of that race, maybe the most important part, is about designing content and news to appeal to people, to visualize data in new and yet intuitive ways. Our objective is to find ways to relate to populations, and in a sea of bar charts and statistical models, sometimes maps are the more affective way of relating complex digital data to a simple physical topography. That’s where the map making (mentioned above) comes in.

fancyTwo of the most relatable and persistently referenced data types in post-modern visualization are geo-data and time-series. Why? Because we relate to them, we can consider our perpective relative to time and space; they have become our touchstones for syncing digital and physical worlds. Overwhelmingly, the projects at this year’s Civic Media Conference demo sessions fell into some kind of mapping context, and I think that trend is telling for the direction of visualization schema and citizen journalism: What We Watch, a map of youtube trends; Terra Incognita, a Chrome extension for mapping exploration; Media Cloud, a collection of tools for monitoring and mapping media globally; or Cliff, a project to automate media geo-parsing, being a few among many featured projects. Tools like We Feel and CrisisNET are aimed at facilitating this kind of study, enabling study of social media and reporting strategies. In each case, it will be interesting to watch how they compete in the investigative reporting space; the race seems primed to recognize their utility.

Pony Prizes

BookAnimalsTo address another interesting aspect of the data-journo ecosystem, I’ll now pivot to another curious theme in the MIT Civic Conference and others like it: the concept of work- “family.” In keeping with the metaphor of this post, and I would argue that family in the case of a company or sponsor, is more analogous to genus hierarchies than to social kinship models. People who share a company share a type and a goal, they’re a team but one built on affinity, not consanguinity.

This is a family:

IRL family reference

This is a team:



A company/funder/sponsor/laboratory/media-outlet/workplace is a herd of ponies. As individual members, we are unique in our methods and backgrounds and generally attracted to the same trajectory, but probably more powerful in that dispassionate diversity which a team or herd-mentality affords, less complicated by emotional entanglements internally and therefore more competent at empathy externally (that is, with our users/subjects/sources for stories). In a recent HBR article, “Your Company is Not Your Family,” the author uses the analogy of sports teams and the mentions of the spurs made me think that the pony metaphor might be as ridiculously apt.

The Spurs stand out for the stability and longevity of their player relationships, yet even their current 13-man roster only includes one player from their first championship in 1999: power forward Tim Duncan.

The PrinciplesTo consider your company analogous to your family, is to cripple it by a lack of adventure. Families, while wonderful, are a default, they usher you to growth, but if all goes well, you flourish on your own. You want to build a company of people who are flourishing, and will continue to do so under guidance and not parentage.

Joi Ito concluded the MIT Civic Conf with a series of “guiding principles” at the media lab, and those statements reinforce all-the-more why a lab/company isn’t a family. A team can be built on shared principles, but they’re not the same as those on which a family is founded.

Follow your unicornnon-believerYour family pushes you, educates you, and prefers (often) your safety over risk taking, whereas your work, and your class (genus/type/subgroups) often push you to independent and outlier achievements unsanctioned by precedent and rarely “safe” in practice. A total aside in this blogpost, to be sure, but I think often data journalism professionals (and by extension, other political/social-professionals who put position before the public they serve) seems to allow confused allegiance to cleave them from simple human and social empathies.

This is a point I treated in a recent interview with Danish news about the relationship between developers and journalists. Nothing revolutionary, but at the time I compared the ideal scenario to one of mutual respect in difference, and not to a familial metaphor. My collaborators aren’t my siblings, they’re my colleagues, and the relationship is pretty different in my mind.

We sometimes risk an allegiance to an editor or organization over an allegiance to the public, and it’s important to remember that the protection and privacy of your subjects and sources is just as precious as that of your employer-parents, regardless of who is paying our salaries. Too often, I’ve seen people at conferences too proprietarily motivated to share ideas, too proud to admit that many share the same ones and have started similar projects. There was a lot of overlap at this year’s Knight News Challenge award announcement, and I think it’s fair to ask overlapping orgs to collaborate and share their plans and programs of research as the year progresses, though I doubt they’ll be held to this. Sometimes, considering your company like your family can confuse your objective to do good in the world and supplant it with one to do good for your own.cartoonicorn1

This brings up another aspect of social good work, and journalism worth mentioning here. Often, the competition in the data journalism space is built on a capitolistic motivation to secure funding and support and resist the superior publication of another outfit that prematurely scoops your content. In this fear, we privilege our company over our vocation, which is to spread solid news, to share it with the world. There’s no shortage of conflict and controversy worth commenting on, so the competition seems sad and contrived especially in the social good and open source space. But recently, I’ve been reading economic coverage of the pay-gap issue and have come to appreciate that this competition has deep roots, founded in our cultural resistance to recognizing social-good as grant-worthy.

unicorn shower

some related items found on the Pinterest “unicorn” keyword search


The most prize-worthy ponies deserve reward, and I think it’s interesting to consider how we approach compensation when the goal of your work is social good. The resounding answer seems to be: we don’t.

Econ-Theorist David Graeber’s recent interview on the trends in our financial sector indicates that we rarely value work performed with altruistic motives, and that we waste most of our workforce on “bullshit jobs.” While our intentions might be genuine, study of our current workforce specialization schema indicates that we dole out few directly productive (as in “product-building”) positions, and most work is “administrative” or “managerial”: “…[l]ots of people [in Graeber’s interview pool] said their basic function was to create tasks for other people.” One quote that struck me as particularly insightful:

Geoff Shullenberger recently that pointed out that in many companies, there’s now an assumption that if there’s work that anyone might want to do for any reason other than the money, any work that is seen as having intrinsic merit in itself, they assume they shouldn’t have to pay for it… ~David Graeber

You can read more about his provocative, and well-argued perspective, here, and while he applies his study to translations jobs, I think the scope can widen to anyone doing fulfilling, socially-conscience, and context-driven journalism, globally; we’re all in the information translation/transformation/communication business at root.

You know, you’re describing what’s happened to journalism. Because people want to do it, it now pays very little. Same with college teaching. ~ Thomas Frank

Upshot: not compensating people doing good, critical, and socially beneficial things in the world is crippling our perspective on geopolitics and progress.

Problems with Ponies Abroad

Other than economic obstacles to pursuing social good, there’s other hiccups to the hierarchies of investigative journalism that relate to how we privilege unicorns over the content they cover, and here we return to our discussion of mapping. When I was at a hackathon last month in Aarhus, Denmark, my team won the Guardian API award at the event not for building something incredibly revolutionary, but something quick that simplified news content into a digest for mobile journos.


Our app was called GeoNewsies, and its objective was to allow travelers to search by country and pull down a digest of the news in that nation prior to, or during travel. A two-paneled webpage and android app, it pulled in the top 10 articles from the Guardian relative to a particular place (panel left), next to the top trending tweet topics in that place (panel right); a bit like or other rss aggregate sites.


The interface was unstellar, simple, and arguably flattened the geo-political happenings in a place to a top 10 trends list, but our objective illustrated something tragic and important about how we process news media today, and maybe it’s not what you would expect. Our point wasn’t that people only can afford to read short blurbs and dramatic reductions of the richness available in pre-travel research, but moreso: often, travelers fail to self-educate about the context they are about to enter, and this unfortunately extends to even traveling journalists working investigative beats abroad.


Sometimes, the best witness to activity in a particular place is someone on the ground an local; this is why so much social media analysis and source relations with citizen journalists remain important to our global understanding of news. Displacing a data journo-“unicorn” to code in a foreign environment is rarely as productive as sourcing information and accounts from the local population, and then enlisting the unicorns or racehorses to usher an idea to production; or better, training the local ponies and mules to race.

Scotland Tourism’s Sweater-Pony Campaign

Burak Arikan’s MonoVacation tourism visualizations speak to this touristic approach to documentation of place that has become our practice in journalism. Arikan built a projected mashup of the tourism video/commercials of many nation, exploring typical symbols and their geo-contextual meanings relative to the nation of video production. Horses were a trend, repeatedly used in travel commercials to express freedom and tourist wimsy, perhaps. Abstracted a bit further from the original project focus, and deftones - because obviyou might consider the horse comparison to data journalism as a sometimes apt description of investigative practice: short sprint production and reporting with often unfortunately abbreviated context: a tourists’ view of geo-politics. Often a foreign media outlet’s assessment of the on-the-ground occurrence in one place lacks the depth of historical and hyperlocal understanding that social media reporting/analysis can provide if controlled, curated, and harnessed to meaningful ends. Oualwaysr attention span for international news is something that perhaps can’t be corrected but our approach to economizing a broader range of opinion and local perspective is something that might be best achieved with social analysis and local data journalism training.

As someone who came rather late to code; I’m pretty comfortable advocating the premise that code can be trained, and not limited to the hierarchies of mythical creatures. I’d argue that researching for a story involves a healthy amount of logic that is more intuition and contextual/location knowledge than technical skill. Compelling news applications about a particular time and space are ones that root in a thorough knowledge of the geo-politics of a place, and often those come through most clearly from content generated by local mules, rather than unicorns.

Post-HorseRace: Project Persistance

equus-evolutionIt’s safe to say, however, that team assembly and the logic of our production pipeline aren’t the only concerns in developing sustainable news applications. With news apps, we deal in a particularly friable media; one whose impact often limits to the extent that it’s API/library/dependency components have yet to deprecate. When we think about endurance and the persistence of applications, we sometimes think about the ephemerality of our work.

What happens when the horserace is over; how will we remember our efforts?

This worry is not new of course, and its one that’s been persistently suffered by media producers and providers globally. Born digital projects are so vulnerable to almost immediate atrophy, and while you may make history with a web-based piece; the probability of it outlasting  even newsprint articles from 30 years ago is pretty pathetically weak.

We’re tackling that next month (July 23rd) at the 2014 Digital Preservation Conference in DC, if your’e interested, so check it out. Our objective in presenting is both to survey the state of media production today and discuss preservation options, but also acknowledge some technological trends we should avoid. Contemporary product development is replete with light-bulb conspiracies of ‘planned obsolescence’ and at the opposite spectral pole, stories of technology built for eternity. Somewhere in the middle, there’s a place for news apps in our geo-political history; a few pony programmers might just figure-out how. 🙂


ilovethisTo sum up this (rather-too-longform) piece about pony personalities in the geo-newsroom, I’d say that a lot of our professional expectations as journalists and developers presume a few narrow ideas: firstly, that a simple taxonomy can define competence in global news coverage, secondly that companies can operate like parents, and thirdly that the integrity and sustainability of your work are secondary considerations to the general scheme and scope of a path defined by paternity.

I’ll close with a link to my MIT Civic Media Ignite slides (presentation, references); it’s a talk about teleportation and mapping, but no less fantastical than the expectations of data journos globally (that we tell the future, that we perform our pony tricks on demand, that we manage to t[rans/ele]port). An area of growing interest in the data journo world is how we manage to create compelling narratives about remote happenings, and often these are through our modern tools of teleportation (things like Ushahidi’s BRCK or OpenNews’ Keyblur for deploying networks without Internet, or applications like Crowdmap, CrisisNET, and Media Cloud Focus, helping us to understand global coverage and crowdsourcing context from operatives on the ground. These applications are among the suite of devices at our current disposal for feats of science fiction fantasy, bringing our ambitions of teleporting and unicorn reporting all the more close to our realities of remote monitoring and pony-journo practice.


538: Errors, Plotting Crises, and the Protocol of Re:Processing Data

There’s probably an HTTP error code for every situation; for this post, 538 seems to well-suit. It’s a Windows error that returns a dialog about ABIOS (Basic I/OSubsystems, indicating invalid entries and corrupted drivers. Despite their obscurity to most of us, these are actually common and analogous issues in developing data projects for journalism…corrupted, dated, or invalid info being problematic in both cases. This is a post about one of those cases.


If you’ve been following journalistic tracking of the Nigeria kidnappings, then you might have come across 538, a collective of hackers and journalists who has been reporting on the topic and recently posted this set of maps using GDELT (Global Database of Events Language and Tone) data. This garnered a series of pretty solid rebuttals about integrity of their assertions; see @charlie_simpson’s Storify feed and Daniel Solomon on Source. The problem with the piece in question (to summarize the previous links), is that it provides time-series and mapped analysis of kidnapping in Nigeria but skews representation of the actual data plotted.













As someone who works with journo orgs, crowdsourced crisis-mapping projects, data, and Africa I thought I’d comment on some of the fallibilities briefly. The particular fumbles I see in the 538 representation of kidnapping incidents in Nigeria can be bundled under three issues that are persistently problematic in all data journalism projects.


A lot of issues with data mapping/graphing projects boil down to human representational error: what is your map actually showing and what are you saying it’s showing? In this case, the equivalence of GDELT media data and actual incident data is a superfail, but not only in the (mis)representation of the source used. The failure to buttress that representation with clear disclaimers and other data is also unfortunate, worth commenting on here. Quotes below taken from the 538 article in question.

Official kidnapping statistics for Nigeria aren’t available, and our numbers do provide a good relative picture; we can see where kidnappings in Nigeria are most prevalent.

This points to data paucity, which is fair, definitely a speedbump, but not entirely excusable. We’ve been spoiled perhaps by the assumption that everything should have a .csv download or an API endpoint, or that you can get all of the things from one aggregation feed, but some more context here would help.

The link in this quote, for example, should be bracketed in context, linking to a 404 (“aren’t available”) like this is unhelpful when you don’t know the query that led to


What about showing why/how your query was unsatisfactory? If you search in prognoz (the Nigerian Statistical Open Data Portal used the 538’s author to search) you do find data under “Public Order and Safety” as a data category, indicators (search terms) like “kidnapping” result in graphs from 2006 + .

Likewise, if a trend in one data set is notable, particularly a geographic density of “events” on the map, it’s worth looking at other data to supplement your assumptions.

One possible explanation is the region’s oil wealth, otherwise known as the curse of the black gold. The United Nations news service has also highlighted how oil extraction in the south of Nigeria has been accompanied by violence and criminality.”

If a relationship to oil by region is of interest, Prognoz has data for that (Macro-Economic Data > Petroleum), or maybe there’s another relationship to geography worth exploring: topography, environmental influences. Perhaps a comparative analysis with other mapping projects devoted to those data, like Oil Spill Monitor – Nigeria or flood tracking and standing water in the regions where 538 notes a density of kidnappings would be of comparative interest. Are there other geographic factors that might affect crises worth exploring?

There’s a value in layering data sets and comparisons across mainstream and social media, and the real value of journalism’s take on these data is the comparative perspective it can provide, recognizing the weaknesses between data sets and using them to crosscheck each other rather than only “normalizing” to control for error in one set. 

This is a somewhat crude calculation. We’re counting all geolocated kidnappings in the GDELT database since 1982 and dividing that by each state’s current population.

So, does that mean that the current population in a region was the denominator for that division across all decades (because at the time of this post, the population link provided in the post doesn’t load)? Where is the data? how can people access it, can I get a tooltip with counts and calcs in the timeseries (pretty sure cartodb supports this; I mean, really, man.)?


This is the predominant criticism in both rebuttals, the refrain of all journo projects being a pretty neat alliterative philosophy: check, compare, contextualize.

Validate Your Data

As this has been well-covered by the other critics and is a pretty well-documented challenge in journalism (see: “verification by replication,” scientific method-style), I won’t belabor it here. Qualified outfits have written impressive how-tos (like this awesome one from ProPublica) though the process for bullet-proofing each piece is usually custom. There are also papers and projects like the Data Verification Handbook, and applications like Twittcred and Storyful aimed at affirming social media.

Early in this bullet-proofing process, it’s also helpful to take a look at comparative projects and use them to illustrate why your analysis is distinct, and how it contributes to a gap. Nigeria Security Tracker also has mapped violence and fatalities in a time series; Nigeria Watch provides a database of violence trends as well, and there are other authoritative and georeferenceable event data with downloadable datasets worth querying against to better verify GDELT.


Lastly, and predictably, there are always hiccups when plotting social and secondary media accounts as events.

what GDELT *will* tell you

Analytics on postings and general media circulation can be valuable for viewing the conversation around a topic online, but they can also be speciously spun to represent the density of actual crises or activity in an area. Counting the tweets related to #nigeria isn’t entirely useful for modeling a threat without filters or ways to validate those postings. Even GDELT, in its ambitious programto provide the global research community with its first open global multi-decade quantitative database of human society” is still researching how to best verify social data.

Let’s look at a more general example mapping data. GDELT represents media activity around topics, like how google trends represents search activity on topics, but both can be confused with representing incidents. In the later case, examples of secondary source and interpretive fumble abound. 

Take Flu Trends:


or this Google Trends graph of a few JS libs one (note the rise of Angular JS in recent times):

Angular vs. all other JS

What these graphs illustrate is not an actual density of flu incidents or a spike in public interest in Angular JS but rather the number of searches related to incidents, and perhaps public confusion about Angular JS. People who have the flu might also go straight to the doctor and not google it; people who understand and appreciate Angular are perhaps unlikely to google for Stack Overflow. Media discussion or focus on a topic does not always/often equate with actual activity, though the two are sometimes conflated.

Just as there’s a tendency to consider a social media campaign as solely-sufficient involvement in a crisis situation, there’s a tendency to tap a feed aggregation or media API as an authoritative representation of actual events. The distinction between social and mainstream media fuzzes when mainstream relies on social or secondary media as data, a problem in the 538 case, as they provide analysis of an aggregation feed of secondary media accounts of events.

Often, social media is incredibly powerful for plotting the general conversation about a topic (I’m looking at you, Westgate twitter tracking). Some of the most positive reactions to this crisis have been piloted by social media (#BringBackOurGirls), whose impact can be limited practically, but potentially epic as an indictment of the the government and mainstream media are doing comparatively. There’s little that’s less shameful in our digital world then having your government and formal press upstaged by hipster hashtag advocacy. That’s not to say, certainly, that these campaigns aren’t subject to their own epic blunders of failed verification (see: #yikes).

But beyond press campaigns and historical analyses of population/kidnapping trends, projects that pull in crowdsourced data are pretty impressively valuable for soliciting first-person information and sparking citizen-driven initiatives; Reuters’ blog just covered a bunch of them as relevant to the plight of Nigeria’s current victims. Ushahidi, for example, uses crowdsourced first-person reports that have been subcategorized and mapped by the admins of each instances’ deploy. It’s not a perfect representation of conflict, and it certainly has its limitations, but it is a distributed 1st-person reporting mechanism that can track violence relative to a geographic location depending on how the instance is customized. Secondary processors of this information can add a layer of interpretive error that weakens the integrity of the sources, if by only failing to admit their fallibilities. There are several Ushahidi projects that track violence in Nigeria, with their own foci and categorization schema (distinguishing between “trusted”/”verified” reports and public feeds). Like Niger Delta Watch, or Extrajudicial Killings – Nigeria, or Stop the Bribes, all of which provide first person accounts of violence as mapped to regions in Nigeria.

No one is be perfect all of the time, or capable of pleasing all the people, certainly. GDELT is an imperfect source of most things beyond tracking media reaction, so it fails in this effort to echo its output back as event data (see Source). However, media reaction is still interesting for other analyses, hence the media reaction to these maps; the integrity of a news organization and its output of (even aggregated) content is still worth indexing.

EOD, the ethics of data journalism and best practices haven’t been adequately codified for these kinds of stories. At last year’s Highway Africa conference, Peter Horrock (BBC) talked about the best indices of quality media covering Africa being somewhere at the intersection of how an organization covers domestic events and how it covers its mistakes (see his full talk here). In this latter case, media reaction is important, if for a different reason. We’ll see how 538 reacts, and maybe learn something about how to manage future code-fumbles. I’m looking forward to more verification protocols: representational integrity, data bulletproofing, and secondary sourc-ery 😉  </ERROR>

* Thanks to J. Morgan. E. Constantaras, and  J. Rotich for contributing data, time, and thoughts to this post

