I’ve just returned from the “Social, Cultural, and Ethical Dimensions of Big Data” event, held by the Data & Society Initiative (led by danah boyd), and spurred by the efforts of the White House Office of Technology and Policy to develop a comprehensive report on issues of privacy, discrimination, and rights around big data. And my head is buzzing. (Oh boy. Here he goes.) There must be something about ma and workshops aimed at policy issues. Even though this event was designed to be wide-ranging and academic, I always get this sense of urgency or pressure that we should be working towards concrete policy recommendations. It’s something I rarely do in my scholarly work (to its detriment, I’d say, wouldn’t you?) But I don’t tend to come up with reasonable, incremental, or politically viable policy recommendations anyway. I get frustrated that the range of possible interventions feels so narrow, so many players that must be untouched, so many underlying presumptions left unchallenged. I don’t want to suggest some progressive but narrow intervention, and in the process confirm and reify the way things are – though believe me, I admire the people who can do this. I long for there to be a robust vocabulary for saying what we want as a society and what we’re willing to change, reject, regulate, or transform to get it. (But at some point, if it’s too pie in the sky, it ceases being a policy recommendation, doesn’t it?) And this is especially true when it comes to daring to restrain commercial actors who are doing something that can be seen as publicly detrimental, but somehow have this presumed right to engage in this activity because they have the right to profit. I want to be able to say, in some instances, “sorry, no, this simply isn’t a thing you get to profit on.”
All that said, I’m going to propose a policy recommendation. (It’s going to be a politically unreasonable one, you watch.)
I find myself concerned about this hazy category of stakeholders that, at our event, were generally called “data brokers.” There are probably different kinds of data brokers that we might think about: companies that buy up and combine data about consumers; companies that scrape public data from wherever it is available and create troves of consumer profiles. I’m particularly troubled by the kind of companies that Kate Crawford discussed in her excellent editorial for Scientific American a few weeks ago — like Turnstyle, a company that has set up dummy wifi transponders in major cities to pick up all those little pings your smartphone gives off when its looking for networks. Turnstyle coordinates those pings into a profile of how you navigated the city (i.e. you and your phone walked down Broadway, spent twenty minutes in the bakery, then drove to the south side), then aggregates those navigation profiles into data about consumers and their movements through the city and sells them to marketers. (OK, that is particularly infuriating.) What defines this category for me is that data brokers do not gather data as part of a direct service they provide to those individuals. Instead they gather at a point once removed from the data subjects: such as purchasing the data gathered by others, scraping our public utterances or traces, or tracking the evidence of our activity we give off. I don’t know that I can be much more specific than that, or that I’ve captured all the flavors, in part because I’ve only begun to think about them (oh good, then this is certain to be a well-informed suggestion!) and because they are a shadowy part of the data industry, relatively far with consumers, with little need to advertise or maintain a particularly public profile.
I think these stakeholders are in a special category, in terms of policy, for a number of reasons. First, they are important to questions of privacy and discrimination in data, as they help to move data beyond the settings in which we authorized its collection and use. Second, they are outside of traditional regulations that are framed around specific industries and their data use (like HIPAA provisions that regulate hospitals and medical record keepers, but not data brokers who might nevertheless traffic in health data). Third, they’re a newly emergent part of the data ecosystem, so they have not been thought about in the development of older legislation. But most importantly, they are a business that offers no social value to the individual or society whose data is being gathered. (Uh oh.) In all of the more traditional instances in which data is collected about individuals, there is some social benefit or service presumed to be offered in exchange. The government conducts a census, but we authorized that, because it is essential to the provision of government services: proportional representation of elected officials, fair imposition of taxation, etc. Verizon collects data on us, but they do so as a fundamental element of the provision of telephone service. Facebook collect all of our traces, and while that data is immensely valuable in its own right and to advertisers it is also an important component in providing their social media platform. I am by no means saying that there are no possible harms in such data arrangements (I should hope not) but at the very least, the collection of data comes with the provision of service, and there is a relationship (citizen, customer) that provides a legally structured and sanctioned space for challenging the use and misuse of that data — class action lawsuit, regulatory oversight, protest, or just switching to another phone company. (Have you tried switching phone companies lately?) Some services that collect data have even voluntarily sought to do additional, socially progressive things with that data: Google looking for signs of flu outbreaks, Facebook partnering with researchers looking to encourage voting behavior, even OK Cupid giving us curious insights about the aggregate dating habits of their customers. (You just love infographics, don’t you.) But the third party data broker who buys data from an e-commerce site I frequent, or scrapes my publicly available hospital discharge record, or grabs up the pings my phone emits as I walk through town, they are building commercial value on my data, but offer me no value to me, my community, or society in exchange.
So what I propose is a “pay it back tax” on data brokers. (Huh?! Does such a thing exist, anywhere?) If a company collects, aggregates, or scrapes data on people, and does so not as part of a service back to those people (but is that distinction even a tenable one? who would decide and patrol which companies are subject to this requirement?), then they must grant access to their data and access 10% of their revenue to non-profit, socially progressive uses of that data. This could mean they could partner with a non-profit, provide them funds and access to data, to conduct research. Or, they could make the data and dollars available as a research fund that non-profits and researchers could apply for. Or, as a nuclear option, they could avoid the financial requirement by providing an open API to their data. (I thought your concern about these brokers is that they aggravate the privacy problems of big data, but you’re making them spread that collected data further?) I think there could be valuable partnerships: Turnstyle’s data might be particularly useful for community organizations concerned about neighborhood flow or access for the disabled; health data could be used by researchers or activists concerned with discrimination in health insurance. There would need to be parameters for how that data was used and protected by the non-profits who received it, and perhaps an open access requirement for any published research or reports.
This may seem extreme. (I should say so. Does this mean any commercial entity in any industry that doesn’t provide a service to customers should get a similar tax?) Or, from another vantage point, it could be seen as quite reasonable: companies that collect data on their own have to spend an overwhelming amount of their revenue providing whatever service they do that justifies this data collection: governments that collect data on us are in our service, and make no profit. This is merely 10% and sharing their valuable resource. (No, it still seems extreme.) And, if I were aiming more squarely at the concerns about privacy, I’d be tempted to say that data aggregation and scraping could simply be outlawed. (Somebody stop him!) In my mind, it at the very least levels back the idea that collecting data on individuals and using that as a primary resource upon which to make profit must, on balance, provide some service in return, be it customer service, social service, or public benefit.
This is cross-posted from the Social Media Collective blog at Microsoft Research, New England.