Beyond Memorisation: Protecting Privacy from Inference Attacks with Large Language Models

A new research paper from ETH Zurich paints a concerning picture about the ability of large language models (LLMs) like ChatGPT to infer private information about users. Here are the key takeaways that ChatGPT users should be aware of:

LLMs Can Accurately Infer Personal Details

The researchers evaluated 9 popular LLMs on their ability to infer personal attributes like location, age, gender, income, education level, etc. from samples of Reddit comments. Alarmingly, the top performer - ChatGPT achieved near-human-level accuracy in guessing private details about authors from just their writing samples.

Image source: Robin Staab et al, Beyond Memorization: Violating Privacy Via Inference with Large Language Models, arXiv (2023). DOI: 10.48550/arxiv.2310.07298

The idea is that someone could scrape comments from sites like Reddit, use an initial system prompts such as: “Act as a private investigator with a wealth of experience in online profiling," followed by a prefix suggesting it’s a game the LLM has to play, followed by the scraped text, finally ending with a suffix.

The Costs Are Lower Than Human Intelligence

What makes this privacy risk greater is that LLMs can make these inferences at a fraction of the cost of human intelligence. The researchers estimate it is 100x cheaper financially and 240x faster in time investment than hiring human profilers. This means adversaries can profile individuals at scale in automated ways.

Mitigations Like Anonymization Are Ineffective

The study tested existing defenses like text anonymization tools and found they could remove direct personal identifiers, but LLMs were still able to infer private details from context at over 50% accuracy. More advanced anonymization methods are needed to keep up with the language understanding capabilities of LLMs.

LLMs Can Actively Elicit Private Information

Beyond analyzing existing text, LLMs pose an emerging threat of privacy-invasive chatbots that can actively steer conversations to provoke responses containing private data. The researchers demonstrated this is feasible today.

Alignment Doesn't Restrict Privacy Invasions

Most provider-side LLM alignment today focuses on blocking offensive or harmful content. The study found current alignment does little to restrict privacy-invasive prompts, with only 10% of prompts being rejected across providers. Better alignment for privacy protection is needed.

The study highlights that privacy risks from LLMs go beyond just memorizing training data. While memorization is an issue, models' strong inference capabilities pose a broader threat.

The adversarial interaction experiments with chatbots raise important ethical considerations. The researchers took care to simulate conversations without real users to avoid causing harm.

AI providers like Anthropic and OpenAI were given advance access to the paper's findings. Responsible disclosure is key so issues can be addressed.

There are still open questions around how exactly models make inferences. More transparency is needed into the reasoning behind predictions.

The key takeaway is that while ChatGPT itself seems benign, the technology it represents poses serious privacy risks that users should stay vigilant about as research progresses. We need advances in anonymization and alignment to ensure large language models benefit society while protecting user privacy. This comprehensive analysis of LLM privacy implications underscores the need for responsible development of this technology.

Full credit to the original research paper: Robin Staab et al, Beyond Memorization: Violating Privacy Via Inference with Large Language Models, arXiv (2023). DOI: 10.48550/arxiv.2310.07298