Gizmodo has released two important articles (1, 2) about the people who were hired to manage Facebook’s “Trending” list. The first reveals not only how Trending topics are selected and packaged on Facebook, but also the peculiar working conditions this team experienced, the lack of guidance or oversight they were provided, and the directives they received to avoid news that addressed Facebook itself. The second makes a more pointed allegation: that along the way, conservative topics were routinely ignored, meaning the trending algorithm had identified user activity around a particular topic, but the team of curators chose not to publish it as a trend.
This is either a boffo revelation, or an unsurprising look at how the sausage always gets made, depending on your perspective. The promise of “trends” is a powerful one. Even as the public gets more and more familiar with the way social media platforms work with data, and even with more pointed scrutiny of trends in particular, it is still easy to think that “trends” means an algorithm is systematically and impartially uncovering genuine patterns of user activity. So, to discover that a handful of j-school graduates were tasked with surveying all the topics the algorithm identified, choosing just a handful of them, and dressing them up with names and summaries, feels like a unwelcome intrusion of human judgment into what we wish were analytic certainty. Who are these people? What incredible power they have to dictate what is and is not displayed, what is and is not presented as important! Wasn’t this supposed to be just a measure of what users were doing, what the people important! Downplaying conservative news is the most damning charge possible, since it has long been a commonplace accusation leveled at journalists. But the revelation is that there’s people in the algorithm at all.
But the plain fact of information algorithms like the ones used to identify “trends” is that they do not work alone, they cannot work alone — in so many ways that we must simply discard the fantasy that they do, or ever will. In fact, algorithms do surprisingly little, they just do it really quickly and with a whole lot of data. Here’s some of what they can’t do:
Trending algorithms identify patterns in data, but they can’t make sense of it. The raw data is Facebook posts, likes, and hashtags. Looking at this data, there will certainly be surges of activity that can be identified and quantified: words that show up more than other words, posts that get more likes than other posts. But there is so much more to figure out…
(1) What is a topic? To decide how popular a topic is, Facebook must decide which posts are about that topic. When do two posts or two hashtags represent the same story, such that they should be counted together? An algorithm can only do so much to say whether a post about Beyonce and a post about Bey and a post about Lemonade and a post about QueenB and the hashtag BeyHive are all the same topic. And that’s an easy one, a superstar with a distinctive name, days after a major public event. Imagine trying to determine algorithmically if people are talking about European tax reform, enough to warrant calling it a trend.
(2) Topics are also composed of smaller topics, endlessly down to infinity. Is the Republican nomination process a trending topic, or the Indiana primary, or Trump’s win in Indiana, or Paul Ryan’s response to Trump’s win in Indiana? According to one algorithmic threshold these would be grouped together, by another would be separate. The problem is not that an algorithm can’t tell. It’s that it can tell both interpretations, all interpretations equally well. So, an algorithm could be programmed to decide,to impose a particular threshold for the granularity of topics. But would that choice make sense to readers, would it map onto their own sense of what’s important, and would it work for the next topic, and the next?
(3) How should a topic be named and described, in a way that Facebook users would appreciate or even understand? Computational attempts to summarize are notoriously clunky, and often produce the kind of phrasing and grammar that scream “a computer wrote this.”
What trending algorithms can identify isn’t always what a platform wants to identify. Facebook, unlike Twitter, chose to display trends that identify topics, rather than single hashtags. This was already a move weighted towards identifying “news” rather than topics. It already strikes an uneasy balance between the kind of information they have — billions and posts and likes surging through their system — and the kind they’d like to display — a list of the most relevant topics. And it already sets up an irreconcilable tension: what should they do when user activity is not a good measure of public importance? It is not surprising the, that they’d try to focus on articles being circulated and commented on, and from the most reputable sources, as a way to lean on their curation and authority to pre-identify topics. Which opens up, as Gizmodo identifies, the tendency to discount some sources as non-reputable, which can have unintentionally partisan implications.
“Trending” is also being asked to do a lot of things for Facebook: capture the most relevant issues being discussed on Facebook, and conveniently map onto the most relevant topics in the worlds of news and entertainment, and keep users on the site longer, and keep up with Twitter, and keep advertisers happy. In many ways, a trending algorithm can be an enormous liability, if allowed to be: it could generate a list of dreadful or depressing topics; it could become a playground for trolls who want to fill it with nonsense and profanity; it could reveal how little people use Facebook to talk about matters of public importance; it could reveal how depressingly little people care about matters of public importance; and it could help amplify a story critical of Facebook itself. It would take a whole lot of bravado to set that loose on a system like Facebook, and let it show what it shows unmanaged. Clearly, Facebook has a lot more at stake in producing a trending list that, while it should look like an unvarnished report of what users are discussing, must also massage it into something that represents Facebook well at the same time.
So: people are in the algorithm because how could they not be? People produce the Facebook activity being measured, people design the algorithms and set their evaluative criteria, people decide what counts as a trend, people name and summarize them, and people look to game the algorithm with their next posts.
The thing is, these human judgments are all part of traditional news gathering as well. Choosing what to report in the news, how to describe it and feature it, and how to honor both the interests of the audience and the sense of importance, has always been a messy, subjective process, full of gaps in which error, bias, self-interest, and myopia can enter. The real concern here is not that there are similar gaps in Facebook’s process as well, or that Facebook hasn’t yet invented an algorithm that can close those gaps. The real worry is that Facebook is being so unbelievably cavalier about it.
Traditional news organizations face analogous problems and must make analogous choices, and can make analogous missteps. And they do. But two countervailing forces work against this, keep them more honest than not, more on target than not: a palpable and institutionalized commitment to news itself, and competition. I have no desire to glorify the current news landscape, which in many ways produces news that is disheartening less than what journalism should be. But there is at least a public, shared, institutionally rehearsed, and historical sense of purpose and mission, or at least there’s one available. Journalism schools teach their students about not just how to determine and deliver the news, but why. They offer up professional guidelines and heroic narratives that position the journalist as a provider of political truths and public insight. They provide journalists with frames that help them identify the way news can suffer when it overlaps with public relations, spin, infotainment, and advertising. There are buffers in place to protect journalists from the pressures that can come from the upper management, advertisers, or newsmakers themselves, because of a belief that independence is an important foundation for newsgathering. Journalists recognize that their choices have consequences, and they discuss those choices. And there are stakeholders for regularly checking these efforts for possible bias and self-interest: public editors and ombudspeople, newswatch organizations and public critics, all trying to keep the process honest. Most of all, there are competitors who would gleefully point out a news organization’s mistakes and failures, which gives editors and managers real incentive to work against the temptations to produce news that is self-serving, politically slanted, or commercially craven.
Facebook seemed to have thought of absolutely none of these. Based on the revelations in the two Gizmodo articles, it’s clear that they hired a shoestring team, lashed them to the algorithm, offered little guidance for what it meant to make curatorial choices, provided no ongoing oversight as the project progressed, imposed self-interested guidelines to protect the company, and kept the entire process inscrutable to the public, cloaked in the promise of an algorithm doing its algorithm thing.
The other worry here is that Facebook is engaged in a labor practice increasingly common among Silicon Valley: hiring information workers through third parties, under precarious conditions and without access to the institutional support or culture their full-time employees enjoy, and imposing time and output demands on them that can only fail a task that warrants more time, care, expertise, and support. This is the troubling truth about information workers in Silicon Valley and around the world, who find themselves “automated” by the gig economy — not just clickworkers on Mechanical Turk and drivers on Uber, but even “inside” the biggest and most established companies on the plant. It also is a dangerous tendency for the kind and scale of information projects that tech companies are willing to take on, without having the infrastructure and personnel to adequately support them. It is not uncommon now for a company to debut a new feature or service, only weeks in development and supported only by its design team, with the assumption that it can quickly hire and train a team of independent, hourly workers. Not only does this put a huge onus on those workers, but it means that, if the service finds users and begins to scales up quickly, little preparation was in place, and the overworked team must quickly make some ad hoc decisions about what are often tricky cases with real, public ramifications.
Trending algorithms are undeniably becoming part of the cultural landscape, and revelations like Gizmodo’s are helpful steps in helping us shed the easy notions of what they are and how they work, notions the platforms have fostered. Social media platforms must come to fully realize that they are newsmakers and gatekeepers, whether they intend to be or not, whether they want to be or not. And while algorithms can chew on a lot of data, it is still a substantial, significant, and human process to turn that data into claims about importance that get fed back to millions of users. This is not a realization that they will ever reach on their own — which suggests to me that they need the two countervailing forces that journalism has: a structural commitment to the public, imposed if not inherent, and competition to force them to take such obligations seriously.
Addendum 1 (May 8): Techcrunch is reporting that Facebook has responded to Gizmodo’s allegations, suggesting that it has “rigorous guidelines in place for the review team to ensure consistency and neutrality.” This makes sense. But consistency and neutrality are fine as concepts, but they’re vague and insufficient in practice. There could have been Trending curators at Facebook who deliberately tanked conservative topics and knew that doing so violated policy. But (and this has long been known in the sociology of news) the greater challenge in producing the news, whether generating it or just curating it, is how to deal with the judgments that happen while being consistent and neutral. Making the news always requires judgments, and judgements always incorporate premises for assessing the relevance, legitimacy, and coherence of a topic. Recognizing bias in our own choices or across an institution is extremely difficult, but knowing whether you have produced a biased representation of reality is nearly impossible, as there’s nothing to compare it to — even setting aside that Facebook is actually trying to do something even harder, produce a representation of the collective representations of reality of their users, and ensure that somehow it also represents reality, as other reality-representers (be they CNN or Twitter users) have represented it. Were social media platforms willing to acknowledge that they constitute public life rather than hosting or reflecting it, they might look to those who produce news, educate journalists, and study news as a sociological phenomenon, for help thinking through these challenges.
Addendum 2 (May 9): The Senate Committee on Commerce, Science, and Transportation has just filed an inquiry with Facebook, raising concerns about their Trending Topics based on the allegations in the Gizmodo report. The letter of inquiry is available here, and has been reported by Gizmodo and elsewhere. In the letter they ask Mark Zuckerberg and Facebook to respond to a series of questions about how Trending Topics works, what kind of guidelines and oversight they provided, and whether specific topics were sidelined or injected. Gizmodo and other sites are highlighting the fact that this Committee is run by a conservative and has a majority of members who are conservative. But the questions posed are thoughtful ones. What they make so clear is that we simply do not have a vocabulary with which to hold these services accountable. For instance, they ask “Have Facebook news curators in fact manipulated the content of the Trending Topics section, either by targeting news stories related to conservative views for exclusion or by injecting non-trending content?” Look at the verbs. “Manipulated” is tricky, as it’s not exactly clear what the unmanipulated Trending Topics even are. “Targeting” sounds like they excluded stories, when what Gizmodo reports is that some stories were not selected as trending, or not recognized as stories. If trending algorithms can only highlight possible topics surging in popularity, but Facebook and its news curators constitute that data into a list of topics, then language that takes trending to be a natural phenomenon, that Facebook either accurately reveals or manipulates, can’t quite grip how this works and why it is so important. It is worth noting, though, that the inquiry pushes on how (whether) Facebook is keeping records of what is selected: “Does Facebook maintain a record ,of curators’ decisions to inject a story into the Trending Topics section or target a story for removal? If such a record. is not maintained, can such decisions be reconstructed or determined based on an analysis of the Trending Topics product? a. If so, how many stories have curators excluded that represented conservative viewpoints or topics of interest to conservatives? How many stories did curators inject that were not, in fact, trending? b. Please provide a list of all news stories removed from or injected into the Trending Topics section since January 2014.” This approach I think does emphasize to Facebook that these choices are significant, enough so that they should be treated as part of the public record and open to scrutiny by policymakers or the courts. This is a way of demanding Facebook take role in this regard more seriously.