Earlier this month, California Gov. Gavin Newsom signed into law amendments to the California Consumer Privacy Act (CCPA), the most sweeping state data privacy regulations in the country. The law, which takes effect on Jan. 1, regulates how data is collected, managed, shared and sold by companies and entities doing business with or compiling information about California residents. Some observers contend that because no business would want to exclude selling to Californians, the CCPA is de facto a national law on data privacy, absent an overarching federal regulation protecting consumer information.
“The new privacy law is a big win for data privacy,” says Joseph Turow, a privacy scholar and professor of communication at the Annenberg School for Communication at the University of Pennsylvania. “Though it could be even stronger, the California law is stronger than anything that exists at the federal level.” Among other stipulations, the CCPA requires businesses to inform consumers regarding the types of personal data they’ll collect at the time they collect it and also how the information will be used. Consumers have the right to ask firms to disclose with whom they share the data and also opt out of their data being sold.
The CCPA comes on the heels of the EU’s General Data Protection Regulation (GDPR), which took effect in May 2018. According to the United Nations Conference on Trade and Development, 107 countries have data privacy rules in place including 66 developing nations. In the U.S., there was a “significant” increase in data privacy bills being introduced this year, with at least 25 states and Puerto Rico starting such legislation, according to the National Conference of State Legislatures. Notably, this bill count doesn’t include related legislation on topics such as cybersecurity.
Meanwhile, the increase in data privacy regulations has companies worried about how they are to comply and how much it would cost. According to a 2019 survey of senior executives by Gartner, the acceleration of privacy regulation and related regulatory burdens is the top emerging risk faced by companies globally. Sixty-four percent of executives identified it as a key risk, especially those from the banking, financial services, technology, telecommunications, and food, beverage and consumer goods sectors. Moreover, they view this regulatory risk as having “very rapid velocity” — meaning it could bring potentially large fines and brand damage if firms violate the rules.
In part to head off more stringent laws, companies and computer scientists are collaborating to provide computational and business solutions to strengthen data protections while not hampering innovation and operational efficiency. Google, Facebook, Amazon, Apple and others have come forward with changes that give users more control over how they are being tracked and how their data is being used. “Privacy is personal, which makes it even more vital for companies to give people clear, individual choices around how their data is used,” wrote Google CEO Sundar Pichai in a May 2019 opinion piece for The New York Times.
Some recent changes include the following: Google users can now choose to opt in to save their audio data collected by Google Assistant, which uses it to better recognize their voices over time. They can also delete their interactions at any time and agree to a human reviewer of the audio. This month, Facebook’s Instagram rolled out a new feature that lets users manage which third-party apps have access to their data. In September, Facebook said it has suspended tens of thousands of sketchy apps from 400 developers. Amazon also is cracking down on third-party apps for breaking its privacy rules while Apple said it will no longer retain audio recordings that Siri collects by default, among other things.
But even as businesses seek to self-regulate, data privacy laws remain necessary because companies have to be prodded to adopt them, says Michael Kearns, computer and information science professor at Penn Engineering and a founding director of the Warren Center for Network and Data Sciences, a research center of Penn faculty who study innovation in interconnected social, economic and technological systems. That’s because these changes “come at a cost,” notes Kearns, who is co-author of The Ethical Algorithm. Not only will companies have to change the way they operate, but their data analyses will be less accurate as well, which can affect their bottom line. Targeted ads could miss hitting the most lucrative customers, for example, leading to lost sales.
What Companies Collect
Most people don’t know how much of their activities are being tracked. “Most companies are collecting data these days on all the interactions, on all the places that they touch customers in the normal course of doing business,” says Elea Feit, senior fellow at Wharton Customer Analytics and a Drexel marketing professor. For example, a retailer would be keeping track of all the emails it sends you and whether you click on any of the links inside the email; it tracks your visits to its site and any purchases in a store if the retailer, say, has a loyalty card program. “Every time you interact with the company, you should expect that the company is recording that information and connecting it to you,” she notes.
Whether it’s a mom and pop shop — the corner tailor keeps track of clients’ shirt sizes and preferences — or a big corporation like Walmart, companies track their customers to give them a better customer experience and provide relevant goods and services. “They want to look at a customer’s purchasing pattern so they can tailor experiences to that customer,” Feit says. “Companies are trying to get to know their customers by collecting data on every interaction. … And most companies see this as so vital to their business that this is something they do internally.”
Companies have legitimate business purposes for tracking consumers — and it brings benefits. For example, a business that knows you’re a pet owner based on your searches for cat food could send you coupons. Companies can also use your data to improve product designs and performance, Feit says. Smartphone companies monitor how devices are working on an ongoing basis to see how they can improve upon the battery life, for instance. Carmakers also will often collect data on driving performance for such things as improving a vehicle’s fuel economy, she adds. Of course, algorithms do all the tracking, not human beings.
Using data helps a company’s bottom line as well, Feit explains. With more information about a person, a business can send ads to people who are more likely to buy or use the service. “You can actually reduce the cost of your advertising spend,” she says. Or at the very least, annoy fewer people with marketing emails because you’re targeting folks to whom these ads are relevant, Feit adds. Also, by tracking what people buy, companies can do better inventory management, which makes them more efficient.
It might surprise some to know that many major corporations also don’t actually sell their consumer data because it is valuable, Feit notes. Also, it is not standard practice to look at raw, individual data but rather they run queries on datasets to get insights. For example, “when I’ve done joint research with Google, I was never allowed to touch the data myself because I wasn’t using a Google-owned computer system,” she explains. Also, most big companies have a data governance policy that defines who can access this information and how it will be used, Feit says. “If I worked at Target and I wanted to look up what my best friend buys at Target, I wouldn’t be allowed to do that.”
But for companies that do sell their data, they work with third-party data brokers, such as subsidiaries of the major credit rating agencies, Feit notes. Buyers of this data gather information about a customer’s behavior across multiple interactions with various entities — the credit card issuer, car dealership, online shopping site and others. “You get a very rich sense of the customer’s behavior,” Feit says. “It’s really a problem in my mind because the consumer doesn’t necessarily know that their data is being sold to this third-party broker” and to whom the broker sells it.
While many don’t sell their data, they often do share access to it. For example, PayPal disclosed that it shares consumer data (such as name, address, phone number, date of birth, IP address, bank account information, recent purchases) with hundreds of entities around the world.
How You Are Tracked
The most common way a user is tracked is through the placement of ‘cookies,’ or files that a website or web service places in your device. So when you return to the website, you don’t have to re-enter your password to log on, for example, because you’re recognized, according to Sebastian Angel, professor of computer and information science at the University of Pennsylvania. “It’s for convenience,” he says. “But because they’re putting these cookies in your devices, it now allows, [say,] Facebook to know where you’re going on the internet.”
If an online blog that you read has a Facebook ‘Like’ button on it, and you click on it, “what your browser is doing under the covers is sending this cookie, this file, to Facebook,” Angel notes. “Now, Facebook is able to learn that you visited this blog, which has nothing to do with Facebook aside from having a ‘Like’ button. Through this mechanism, large social networks and other companies can track where you go on the internet and can get an idea of what your interest are, what times of the day you’re active [and all sorts of other data,] which they can transform into better understanding you and therefore give you better ads.”
While text, voice or email messages are encrypted — meaning only the sender and receiver can see or hear the content of the message — the metadata around it can be revealing, Angel says. Metadata refers to the information around the content, such as the identity of the sender and recipient, the time of day it was sent and how often the communication occurred. Metadata might seem harmless, but it can be privacy invasive. For example, if the metadata shows that you called an oncologist, one could infer that you or someone you know has or might have cancer.
Even when websites or web services offer a way to opt out of being tracked, it has limited protection because other signals give a person’s identity away. “There’s no real way to opt out,” Angel points out. For example, opening a browser in ‘incognito mode’ deletes the cookies so you cannot be tracked in this manner. However, algorithms can look at other signals. For example, they can track the resolution of your computer screen, the size of the browser, how you move your mouse around, and others. “All of this is very unique and it becomes a unique fingerprint of who you are,” he says. “This is known as device fingerprinting.”
“There’s no real way to opt out.”–Sebastian Angel
Whether a company looks at aggregate or individual data depends on what they want to do, Angel continues. If they want to find market trends, then grouped data would work. But if they want to send customized services, then individual information is key. When companies share data, they often don’t provide the raw information but would do things like let folks run queries on it. For example, one query could be finding out what were the top 10 purchases on a shopping site over the last 12 days, he says.
Four Tactics and Seduction
Furthermore, companies don’t make it easy for consumers to find out exactly how their data is being used. Turow’s research revealed four commonly used corporate strategies whose goal is to distract the consumer from questioning data privacy practices.
The first tactic is placation. Companies seek to allay the fears of the consumer by, say, putting out statements like, ‘Your trust is very important to us that’s why we aim to be clear and transparent about why we collect information,’” Turow says. The company could reinforce the statement with a video on its website where a smiling employee repeats that data privacy is important. These two assurances could be enough to placate most people, who are busy and don’t want to spend an afternoon digging through the legalese of privacy policies. But if they did read the policy, they could find such invasive practices as the intention to share third-party cookies and collect personally identifiable information like name and address, he notes.
For example, some privacy policies disclose that they gather information from sources such as consumer search firms and public databases. So users know that the company gets information about them from other places. But where exactly? “Most people would have no idea what that means,” Turow says. “It seems straightforward, but what do public databases [refer to?] … This really tells you nothing under the guise of telling you what they’re doing.”
These four tactics — placation, diversion, misnaming and using jargon — contribute to a feeling of resignation among consumers. Since they can’t fight this data collection and tracking, they might as well give up, according to another paper Turow co-authored, “The Corporate Cultivation of Digital Resignation.” It was first published in March 2019 in the New Media & Society journal. The research may explain why initial grassroots efforts to quit Facebook — sparked by the Cambridge Analytica data privacy scandal — died out.
The reason is that users feel resigned. “What it does is make people throw their hands up,” Turow says. It happens every day: An app, online service or a website will not let consumers use their service or access their content until people accept the terms of service, which most do. Throw in a freebie such as a 10% discount to the consumer in exchange for data — like phone number or email — and it becomes even harder to not share one’s information. Turow calls this “seduction.” “The seduction [aspect] of it overwhelms the surveillance part.”
Finally, the last salvo is what Turow calls the “hidden curriculum,” which he defines as “an education that people get without being told that they’re being taught.” People are being trained to give up data to get something or fit in with society, he says. “This is just the rehearsal for everything else. You get used to giving up your data in stores, you get used to giving up your data online,” Turow notes. “It becomes second nature.” And it’s going to get worse. “All these kinds of things we’ve been talking about have been the product of the last 15 years,” he says. “This is just the beginning of tracking and the beginning of personalization.”
A computational solution that’s been gaining ground is differential privacy, according to Aaron Roth, computer and information science professor at Penn Engineering and co-author with Kearns of The Ethical Algorithm. There are two kinds of differential privacy. The centralized model is applied when there is trust between the user and data collector (say, between a consumer and a shopping site) while the local or ‘coin flip’ version is used when there’s less trust, such as in cases where the data could be used for unstated purposes, says Roth, who is on a Facebook privacy advisory committee and was a consultant to Apple on differential privacy.
In general, differential privacy adds ‘noise’ in the form of positive and negative numbers to mask the data being collected. That means individual data becomes jumbled so it is not useful to the company. But on an aggregate level, the random numbers added zero out so the trend is revealed. Apple uses the local model of differential privacy when collecting usage data from iPhones, as does Google Chrome, Roth points out. That means the data is mixed up before it is sent to Apple. “The iPhone doesn’t send your data to Apple, but only noisy answers,” he says. But it is less accurate than the centralized model, which collects data from the individual first, puts it on company servers, and then adds noise.
Differential privacy is gaining ground among the tech giants. “The industry is actively figuring it out,” Roth says. While there aren’t any agreed upon industry standards yet, “that work is ongoing.” The beauty of differential privacy is that it “provides a means to extract generalizable, statistical facts about the world to perform statistical analyses to train machine learning algorithms in a way that offers a mathematical guarantee that it doesn’t reveal too much about any particular individual.”
Meanwhile, Angel and his colleagues are designing systems that “essentially prevent a collection of this data [in the first place],” he says. “The system is designed from the ground up not to leak information to external observers.” One project they call PUNG will make instant messages more private. “We have mechanisms that allow the message to be routed to the right person without the provider knowing who the right person was in the first place.”
Here’s how it works. Typically, the content of messages is encrypted but what’s revealed is the metadata: who is the caller, who is the recipient, when was the call made, for example. Under PUNG, the metadata will be masked as well. How? Imagine if you wanted to hide the identity of the person to whom you’re sending a text message. One way to do it would be to send everyone in the world the text. Only the right recipient would have a decoder to actually read the text. PUNG simulates a way to do this without actually sending everyone a text, Angel notes. PUNG can scale to tens or even hundreds of millions of users, but it needs a lot of computing power, which he is working to reduce.
For law enforcement concerned about criminals being able to hide their communication, there is a solution that Angel believes strikes the right balance between privacy and justice. To get at the information, law enforcement has to go through a procedure that basically tips off the people communicating that they are under surveillance. “Personally, I think this is the right balance because it allows them to figure out who communicated with whom, but at the same time they can’t do it in the shadows.”
Last semester, Angel taught a computer science course on how to build tools for anonymity and privacy. One idea was to build a version of Netflix that would shield your movie choices from the company and yet gives you access to the full roster of content. “We can build it,” he says. “It’s technologically feasible” to find a way to stream it to millions or billions of people. But such a system would be costly. For example, if Netflix needed 10,000 computers in their data center to serve movies to everyone, under Angel’s version, it might need 10 times more, or 100,000. His team is working to reduce the number of computers to two times more rather than 10. “Privacy has a cost,” he notes.
Regulatory and Societal Response
But it takes more than technology to protect data privacy. “We don’t believe technology solves everything,” Kearns says. Differential privacy might be promising, but computational solutions only come into play once companies have decided what type of information they will gather from users. “Companies still need internal policies about what kinds of data they are even going to collect in the first place, for example,” he notes. Is it even kosher to collect that type of data? And how long is it appropriate to retain the data? “There are many, many things about the whole data pipeline … that are extra-scientific,” Kearns adds.
For example, businesses can choose to throw away a user’s information after serving a targeted ad, says Turow. “If you know I’m in New York, it would be great to get an ad from a restaurant there. But don’t keep track of exactly where I’m headed. Throw [the data] away after you use it.” Feit adds that companies also could delete old data, which tend not to be relevant anymore anyway. “Expunge older records about a customer so that my history of what I bought at Target when I was 22 is not saved until I’m 72,” she says. “That’s very old information about a customer and it’s not that informative about how that customer is going to behave now.” Another tactic is to throw out the data once an analysis is completed, Feit says.
Would companies do it? There are headwinds to voluntary deployment — it could potentially lower profits, might be hard to implement, offer less accurate analyses and could be costly. Regulators could force the issue, but to properly monitor these tech giants, they have to be more equally matched, Kearns notes. While regulators are more tech-savvy today than decades ago, they are still not an equal counterweight to the tech armies employed by Google, Facebook, Amazon and the like.
So Kearns sees “a future in which regulators themselves start employing algorithmic tools.” That’s because “when the companies you’re trying to regulate are doing things with massive amounts of data and at a massive scale, and you’re trying to spot misbehavior, you have to be ready to spot misbehavior at that speed and scale also,” he says. Certainly, the government should hire more doctorates in computer science, math and statistics. “I think regulators need Ph.Ds. in machine learning these days,” he adds, noting that most government Ph.Ds. are in economics.
Kearns points to a precedent in finance in which regulators already use tech tools to enforce the law. “Wall Street, despite people’s impressions of it, is one of the most heavily regulated industries already,” he says. “Many of the regulators of the finance industry will use technological tools to spot violations.” For example, regulators deploy algorithms to spot suspicious areas of the market for violations such as ‘pump and dump’ schemes in which a stock is inflated to lure other investors, only for the perpetrator to sell at a high before the shares fall. This is possible because financial regulators have a window into trades.
How could this work in tech? One example is to let regulators have “much more direct, unfettered access” to Facebook’s ad platform, Kearns notes. Currently, regulators can sign up as a Facebook advertiser to see whether, say, racial or gender biases are at play. But they can’t see more deeply into the platform. “Imagine a future where regulators are allowed to have a much more detailed view of the real, underlying targeting algorithm that takes advertisers’ specifications and decides exactly where things are shown and furthermore, measures empirically whether there’s racial discrimination resulting from that.”
As for consumer responsibility, Kearns believes that people can do little by themselves to protect their data privacy. “Even following all your best practices isn’t going to be enough if you want to use Google, you want to use email, you want to use social media, you want to use navigation apps,” he says. “If you really want to have true privacy and security, you have to go offline.” Breaking up tech giants is not the solution. “Why would that magically cause the resulting pieces to have better privacy?”
For Angel, the bottom line is that society must value privacy much more than it does now — and spark a systemic change in the way companies collect, share, sell and use data. “It’s really bizarre that we are unwilling to pay 50 cents for an app in the app store but we are totally okay with paying $5 or $6 for a cup of coffee,” he points out. “Because of this psychology, it’s really hard to ask people to pay for electronic things they expect to be free.” It’s not even the amount of money at issue here, he adds, it’s the idea of paying for things people are used to getting gratis.
Since people are unwilling to pay, “companies have no choice but to monetize these services through things like advertising,” Angel says. For meaningful data privacy to take hold, society has to be ready to accept tradeoffs. “What are we willing to pay for using these services? Currently, the answer seems to be nothing. Until that changes, I don’t see that we can find a good balance [between user privacy and companies’ need for data].… This is a problem that’s [rooted] deep in the way we act as a society.”