BlogData Breach
Reddit: Centralization erodes privacy
Data Breach

Reddit: Centralization erodes privacy

N
Nick Bolduc
on Jun 01, 2021

the internet, where everybody has a chance of being heard, a more democratic systemAaron Swartz, co-founder of Reddit

Reddit is a social media platform, news aggregator, several thousand forums, cat picture distributor, occasionally hosts original content, and self-styles as "the front page of the internet." Reddit has 50 billion monthly views (this is like everybody in the world checking Reddit 1-2 times a week) and 52 million daily active users.[0]

For the early stages of its life, Reddit was known as a defender of privacy, anonymity, and an opponent of censorship. As it has grown, Reddit's centralized nature has led to selling ads, collecting user data, and censoring on behalf of—and surrendering user data to—state powers.

Judging by its privacy policy, Reddit is not taking as much license with your data as, e.g., WhatsApp, but its centralized nature has made some of these problems inevitable. Its a large company that needs to be sustainable, it collects personal information about users to sell ads that it doesn't strictly need to be Reddit, and because it is a centralized point of permissioned access to this data, governments can demand access.

On concessions of personal data, an important question is "What would it mean for me, if all of the data I give this company is made public?"

Surrendering personal data is not just a question of what a company will do with it, it is a question of what the world would do with your data: what your neighbors would do with your data, what your professors would do with your data, what your students would do with your data... but also, what a person trying to blackmail you would do with your data. What a government trying to capture or execute whistleblowers would do with your data. What an advertising agency trying to control your behavior would do with your data.

If this makes you think twice about what you're doing, reading, posting, that's a well known consequence of surveillance, and Reddit, through centralization, advertising, and government mandates, does surveil its users.

Because Reddit is a large company that (presumably) needs to sustain itself on more than just Reddit Gold payments—or is otherwise just trying to maximize profit, as centralized for-profit companies typically do—Reddit sells advertising space. It sweetens the deal by offering personalized and optimized ads, and analytics information to advertisers.

"Reddit only shares nonpublic information about you in the following ways. [....]
We may share information with vendors, consultants, and other service providers who need access to such information to carry out work for us. Their use of personal data will be subject to appropriate confidentiality and security measures. A few examples: [...] (iii) third-party ads measurement providers who help us and advertisers measure the performance of ads shown on our Services."

"We use information about you to: [...]
• Monitor and analyze trends, usage, and activities in connection with our Services;
• Measure the effectiveness of ads shown on our Services; and
• Personalize the Services, and provide and optimize advertisements, content, and features that match user profiles or interests. [source]"

This advertising leads to a regularization of content via a process of progressive censorship, as financially powerful advertisers condition partnering with Reddit on removal of content that conflicts with their brand images.

This is one way that centralization leads to a separation of user and company incentives. As Facebook and Google have swelled to their near omnipresent states, their operations involve more and more data collection, analysis, user profiling, inference, advertising sales, etc. The social media platform and search engine evidently still exist (at time of writing 😈), but instead of improving these core services to provide a better user experience, improvements are done to these other ends. User experience is only a prerequisite, if that (see how Google feels about user location data, for example).

There are some arguments that advertising isn't inherently misaligned from user incentives, but ultimately it comes down to a question of whether advertisers can express a desire to censor content that eclipses users' desire to see that content (and whether the host platform, Reddit, thinks it will manage with this alienation of users). Advertisers often do desire to censor content, because they want to protect their brand image.

For example, consider:

  1. 1
    Users want to see cat pics
  2. 2
    Dog Company Ltd. wants to advertise on Reddit
  3. 3
    Dog Company Ltd. doesn't want its brand image near cat pics
  4. 4
    Dog Company Ltd. pays Reddit for ad space conditional on censorship of cat pics

Whether or not users are well served by Dog Company Ltd. ads, they can no longer see cat pics on Reddit. Repeat this process as a centralized company grows, to serve only content from the intersection of "acceptable content" from their advertisers.

And because advertisers, not users, pay Reddit, the realm of acceptable content will continue shrinking.[1] Should our moral system for acceptable content be centralized in the hands of advertisers?

Because Reddit collects personal information to "analyze trends, usage, and activities" "measure the effectiveness of ads" and "provide and optimize advertisements [...] that match user profiles or interests", it's likely[2] that Reddit has personal information that it doesn't strictly need to act as "the front page of the internet" (i.e. provide its non-advertising functionality). This is in contrast to personal information that is inextricably linked to the general function of Reddit, for example: "If you create a Reddit account, we may require you to provide a username and password."

Reddit is a centralized entity. Reddit controls this data, the "necessary" and "unnecessary" personal information collected—from 52 million daily active users.[0]

Concentrations of user information like this pose some serious vulnerabilities to user privacy.

For example, governments have only to issue some legal mandate to Reddit to gain access to your information (the US government at least, Reddit has previously indicated a lack of cooperation with other governments on surrendering user information [3]), which could be motivated by as little as coincidence, e.g. you are suspected of a crime that has nothing to do with you. Is it acceptable that particularly unlucky people have their privacy violated such?

As of January 29, 2015, reddit has never received a National Security Letter, an order under the Foreign Intelligence Surveillance Act, or any other classified request for user information. If we ever receive such a request, we would seek to let the public know it existed. [source]

Reddit included the above "warrant canary" in their first transparency report, in 2014. Their next transparency report did not include such a canary (perhaps related to the two AMAs that Edward Snowden did in 2015?). None of their following transparency reports have, in fact. While there is some contention about the legal veracity of such notices, Reddit CEO Steve Huffman replied to mention of the canary's removal: "Even with the canaries, we're treading a fine line." Make your own conclusions.

(Even if Reddit didn't control this data, as /u/yishan points out, depending on a centralized storage system like AWS means that the data is still susceptible to government seizure or inspection, because governments can just go to Amazon.)

Of course, we don't have to speculate about whether Reddit has been forced to surrender users' personal information to governments—thanks to other information from their transparency reports, we know this for a fact. From 2020, the year of the latest transparency report so far, Reddit complied with/surrendered user information to the US government for: 256 subpoenas, 27 court orders, 86 search warrants, and 1 pen register/trap and trace order;[4] as well as 60 international user information requests.

A chart showing the general trend of increasing information requests from the US government that Reddit has complied with, starting with less than 25 subpoenas in 2014, and ending with over 250 subpoenas in 2020..
Number of info requests/US government mandates Reddit complied with. See [5] for sources.

Another issue posed by such centralized concentrations of user information is a classic that many readers will be unhappily familiar with: human error—if you consider using SMS based authentication as human error. Stop using SMS authentication!

In 2018, Reddit suffered a serious data breach: all user data from the site's inception in 2005, to 2007 was leaked, as well as some users' emails from 2018 (which would dox all of these users). [source 1][source 2]

The lesson is clear: personal data surrendered is always a risk.

If services aren't incentivized to collect user data for purposes other than the user experience, then services can churn through less personal information. If services use decentralized storage, we won't have to rely on the continued goodwill of centralized infrastructure giants like AWS. Decentralized services can work with the same incentives as their users.

Or, centralized services can continue collecting personal data. Advertisers can become more effective at personalizing ads. Privacy can erode so much to become unrecognizable, if it continues to exist at all. And someday, a company like Palantir might decide you're acting like a criminal.

0: [source][archive]

1: besides Reddit Gold users, but this comparison doesn't matter that much anyway because Reddit Gold buyers aren't able to express their content preferences to Reddit as effectively as advertisers

2: while it is technically possible that all user information Reddit collects is necessary for Reddit to provide its non-advertising functionality, this case is invalidated by considering "we may tell an advertiser how many people saw their ad."

3: see the section heading "All of this leads to...", point 6. here

4: every year that pen register/trap and trace numbers are reported in Reddit's transparency reports, there's exactly one. Who could that be? 🤔

5: Reddit transparency reports: 2020, 2019, 2018, 2017, 2016, 2015, 2014.

NI
Nick Bolduc
Share article on: