As large language models (LLMs) become increasingly embedded within sociotechnical systems, it is essential to scrutinize the privacy biases they may exhibit. We define privacy bias as the appropriateness of information flows in the responses generated by LLMs. When the observed privacy bias deviates from expected norms, this discrepancy— referred to as the privacy bias delta—may signal potential privacy violations.
Using privacy bias as an auditing metric offers several benefits:
(a) it enables model trainers to assess the ethical and societal implications of LLMs,
(b) it aids service providers in selecting context-appropriate LLMs,
(c) it helps policymakers evaluate the privacy appropriateness of deployed models.
To address these concerns, we pose a novel research question: How can we reliably measure privacy biases in LLMs, and what factors influence them? In response, we introduce a contextual-integrity-based methodology for evaluating responses from various LLMs. Our approach accounts for sensitivity across prompt variations, a major obstacle in privacy bias assessment. Finally, we examine how model capacity and optimization choices shape these biases.
Distribution of Responses: Responses across LLMs and prompt variations before filtering with thresholds.
Prompt Sensitivity by re-ordering Likert scale. LLMs show significant variance due to prompt variation, with three random Likert scale orders per prompt.
We observe significant variance in responses due to paraphrasing and changing the Likert scale order, which hinders the reliable evaluation of privacy biases.
Privacy biases for sender “fitness tracker” and “personal assistant” in gpt-4o-mini (top right triangle) and llama-3.1-8B (bottom left triangle) for IoT. We have senders (left), subjects and information type (bottom), recipients (top), and transmission principles (right).
gpt-4o-mini and llama-3.1-8B exhibit the signals for several notable privacy biases. Across all senders, information types, and recipients, for fixed transmission principles like stored indefinitely and used for advertising, gpt-4o-mini is less conservative with privacy biases ranging from strongly acceptable to somewhat acceptable. On the contrary, llama-3.1-8B is more conservative with the responses stated as somewhat unacceptable. For both LLMs, the privacy biases for a transmission principle such as if the owner gives consent are identified as somewhat/strongly acceptable.
Base LLMs with different capacities. Each square indicates a privacy bias for a specific information flow. Privacy biases can also be identified across a column, row, or matrix, by fixing different parameters. We include tulu-2-7B (top triangle) and tulu-2-13B (bottom triangle).
Base vs. Aligned LLMs: tulu-2-7B (top), tulu-2-13B (right), tulu-2-dpo-7B (down), and tulu-2-dpo-13B (left)
Base vs. Quantized LLMs: tulu-2-7B (top), tulu-2-13B (right), tulu-2-7B-AWQ (down), and tulu-2-13B-AWQ (left).
Each square indicates a privacy bias for a specific information flow. Privacy biases can be identified
across a column, row, or matrix by fixing different parameters. Senders (left), subjects and their
information (
Privacy biases vary across different capacities and optimizations, even with a similar training dataset. Model trainers need to consider these effects when choosing their LLM configuration.
Access the full project source code here: here .
Shvartzshnaider, Y., & Duddu, V. (2026). Privacy Bias in Language Models: A Contextual Integrity-based Auditing Metric. Proceedings on Privacy Enhancing Technologies.