One of the most exciting aspects of the new research analyzing algorithmic culture is the manner by which portions of that research are increasingly routing around the slow, painful process of traditional academic knowledge production. This is not to say that the credentialing process afforded by peer-reviewed journals or university presses is irrelevant, but it is to say that this aspect of scholarly work is only one form among the many others now emerging both on- and offline. Overviews of stand-alone talks are now available through a variety of blogs, while a double session at this year’s 4S on “The Politics of Algorithms” was illuminating for those of us lucky enough to attend. Journals like Limn blur the boundaries between old and new forms of scholarship. All of this is just a long way of saying that, since I first alluded to the notion of a “sociology of algorithms” in 2011, I’ve been stimulated by my contact with a much wider community doing great work on these issues, much of which I’ve encountered through this very blog.
Amidst all this ferment, though, it has become increasingly clear that my own research on algorithmic knowledge production is somewhat idiosyncratic, to say the least. When people ask me, “what is your next book about?” I usually answer by telling them that it is “a cultural history of journalistic uses of data / documents / algorithms,” with the choice of noun depending on my mood that day and whether I’m talking to a historian of print culture (“documents,” I say) or someone in software studies (“algorithms,” I claim.) To echo Chris Kelty’s wonderful introduction to the second issue of Limn, what I’m really interested in is “the nature of representing and intervening in collectives“; in particular, studying how that representation carried out by a particular occupational group (journalists) drawing on particular epistemological procedures and, perhaps most importantly, utilizing a variety of socio-technical artifacts as the raw data through which this representation occurs. And it is to the question of this raw data- the stuff that lies beneath computational, epistemological, or algorithmic processes– that my mind increasingly turns.
We need, in short, to pay attention to the materiality of algorithmic processes. By that, I do not simply mean the materiality of the algorithmic processing (the circuits, server farms, internet cables, super-computers, and so on) but to the materiality of the procedural inputs. To the stuff that the algorithm mashes up, rearranges, and spits out. To the objects of evidence at play within particular computational processes, and the way that particular occupational epistemologies come to terms with that evidence. This is where I think my own contribution to this conversation fits.
* * *
I want to illustrate what I mean by the materiality of algorithms through two very brief examples– the first taken from popular culture, and the second taken from my own ongoing research on the ontologies of journalism. In the aftermath of the 2012 election, in a development that would take even the most jaded political observer by surprise, stats-guru and supposed quantitative wunderkind Nate Silver has now achieved something close to cultural deification (for an example of his increasingly public persona, check out the Twitter hashtag #drunknatesilver). His accomplishment? To predict the electoral college outcome of every single state during the 2012 presidential election. I can personally testify to Silver’s impact; when I showed my students slides of the 538 prediction, followed by the actual map of the electoral results, they let out an audible gasp of surprise.
The most interesting push back against the blossoming Silver cult comes from Slate. “Nate Silver didn’t nail it; the pollsters did,” writes Daniel Engber. “The vaunted Silver “picks”—the ones that scored a perfect record on Election Day—were derived from averaged state-wide data.” In other words, the focus should not be on the 538 algorithm for predicting electoral college “tipping point states,” but rather on the objects of evidence that algorithm uses to produce its’ results — the polls. And these polls are themselves built on a bedrock of particular evidentiary objects, namely, the quick interrogation of particular members of the voting populace about their electoral preferences, usually by telephone. The fact that polling has a long and (mostly) noble history in making predictions should not obscure the fact that it is simply one method amongst many for aggregating preferences, expressed in a particularly material format. As a point of comparison, look at the rise in the popularity of Intrade (which uses market-based cues to make political predicitions) or the growth of social media analytics, some of which claim to utilize tweets to analyze electoral behavior. In each of these cases, not only are epistemological processes and algorithmic procedures different, but the material substrata of evidence is different as well.
* * *
As a second example of what I mean by the materiality of algorithms, let me turn to an topic closer to my current research. One of my sites of ongoing analysis is a journalistic tool called the Overview Project, developed by computer scientist-turned-journalist Jonathan Stray. Overview is a tool that allows journalists to “clean, visualize and interactively explore large document and data sets,” in essence through clustering, threaded displays, and entry relationship diagrams. Overview thus has a particular focus on documents, which themselves have a strange and complex standing within the larger epistemology of journalistic reporting.
The material underpinnings of the Overview algorithm are obvious when watching it work. As Stray notes, while “the right algorithms are crucial … things like data clean-up and import are often bigger real-world obstacles to getting something out of a huge document dump.” We can see the particularities of document-based algorithmic processes when we examine the process by which Stray prepared 4500 pages of documents on Iraqi security contractors for analysis. One particular thorny problem turned out to be document cover pages:
The recovered text [from these documents] is a mess, because these documents are just about the worse possible case for OCR [optical character recognition]: many of these documents are forms with a complex layout, and the pages have been photocopied multiple times, redacted, scribbled on, stamped and smudged. But large blocks of text come through pretty well, and this command extracts what text there is into one file per page.
The next step is combining pages into their original multi-page documents. We don’t yet have a general solution, but we were able to get good results with a small script that detects cover pages, and splits off a new document whenever it finds one.
In other words, while the algorithmic processing of large documents sets lies at the root of my interest in Overview, these processes are themselves shaped, understood, and transformed through the specific material forms of government record keeping.
* * *
The question of the materiality of algorithms, finally, casts some interesting light on recent debates right here at Culture Digitally, particularly those on affordances and technological agency. I also caught whispers of this debate in the first Politics of Algorithms panel at this year’s 4S. Basically, the question is: when we say our focus is on the materiality of algorithms, or on the “objects of journalism,” are we simply reverting to a modernist ontology in which the existence of the “thingness of things” (Heidegger) is taken as unproblematic, or even “deterministic?”
Without the space to answer these important questions here, I can simply gesture at one possible answer. To the degree that “determinism” of any kind is seen as supposedly bad, it is usually for the reason that it flattens causality in unforgivable ways, usually sacrificing context and nuance on the altar of a single causal force. When I argue that we should pay more attention to the materiality of algorithms, on the other hand, I want to encourage the reverse desire. I want to encourage us, not to flatten the process by which knowledge is socio-technically constructed, but rather to add new objects to the universe of algorithmic operation. To understand algorithms we need analyze their affordances, meanings, ethics, legal infrastructures, and computational processes, but we also need to understand the very material fuel that keeps them powered in the first place.