Privacy Bias in Language Models: A Contextual Integrity-based Auditing Metric

Yan Shvartzshnaider (York University) Vasisht Duddu (Waterloo University)

Abstract

As large language models (LLMs) become increasingly embedded within sociotechnical systems, it is essential to scrutinize the privacy biases they may exhibit. We define privacy bias as the appropriateness of information flows in the responses generated by LLMs. When the observed privacy bias deviates from expected norms, this discrepancy— referred to as the privacy bias delta—may signal potential privacy violations.

Using privacy bias as an auditing metric offers several benefits: (a) it enables model trainers to assess the ethical and societal implications of LLMs,
(b) it aids service providers in selecting context-appropriate LLMs,
(c) it helps policymakers evaluate the privacy appropriateness of deployed models.

To address these concerns, we pose a novel research question: How can we reliably measure privacy biases in LLMs, and what factors influence them? In response, we introduce a contextual-integrity-based methodology for evaluating responses from various LLMs. Our approach accounts for sensitivity across prompt variations, a major obstacle in privacy bias assessment. Finally, we examine how model capacity and optimization choices shape these biases.

Results and main takeaways

Demonstrating Prompt Sensitivity

Figure 3

Distribution of Responses: Responses across LLMs and prompt variations before filtering with thresholds.

Figure 2

Prompt Sensitivity by re-ordering Likert scale. LLMs show significant variance due to prompt variation, with three random Likert scale orders per prompt.

Takeaway

We observe significant variance in responses due to paraphrasing and changing the Likert scale order, which hinders the reliable evaluation of privacy biases.

Identifying Privacy Biases

Figure 7

Privacy biases for sender “fitness tracker” and “personal assistant” in gpt-4o-mini (top right triangle) and llama-3.1-8B (bottom left triangle) for IoT. We have senders (left), subjects and information type (bottom), recipients (top), and transmission principles (right).

Takeaway

gpt-4o-mini and llama-3.1-8B exhibit the signals for several notable privacy biases. Across all senders, information types, and recipients, for fixed transmission principles like stored indefinitely and used for advertising, gpt-4o-mini is less conservative with privacy biases ranging from strongly acceptable to somewhat acceptable. On the contrary, llama-3.1-8B is more conservative with the responses stated as somewhat unacceptable. For both LLMs, the privacy biases for a transmission principle such as if the owner gives consent are identified as somewhat/strongly acceptable.

Demonstrating Impact of LLM Configuration

Figure 8

Base LLMs with different capacities. Each square indicates a privacy bias for a specific information flow. Privacy biases can also be identified across a column, row, or matrix, by fixing different parameters. We include tulu-2-7B (top triangle) and tulu-2-13B (bottom triangle).

Figure 9

Base vs. Aligned LLMs: tulu-2-7B (top), tulu-2-13B (right), tulu-2-dpo-7B (down), and tulu-2-dpo-13B (left)

Figure 10

Base vs. Quantized LLMs: tulu-2-7B (top), tulu-2-13B (right), tulu-2-7B-AWQ (down), and tulu-2-13B-AWQ (left).

Each square indicates a privacy bias for a specific information flow. Privacy biases can be identified across a column, row, or matrix by fixing different parameters. Senders (left), subjects and their information (bottom), recipients (top), and transmission principles (right). Empty blocks indicate that at least one of the four LLMs did not give consistent responses. See Appendix: Figures 13 and 14 for the complete set.

Takeaway

Privacy biases vary across different capacities and optimizations, even with a similar training dataset. Model trainers need to consider these effects when choosing their LLM configuration.

Results and code

Access the full project source code here: here .

Citation

Shvartzshnaider, Y., & Duddu, V. (2026). Privacy Bias in Language Models: A Contextual Integrity-based Auditing Metric. Proceedings on Privacy Enhancing Technologies.