TY - GEN
T1 - Watermarking Language Models for Many Adaptive Users
AU - Cohen, Aloni
AU - Hoover, Alexander
AU - Schoenbach, Gabe
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - We study watermarking schemes for language models with provable guarantees. As we show, prior works offer no robustness guarantees against adaptive prompting: when a user queries a language model more than once, as even benign users do. And with just a single exception [1], prior works are restricted to zero-bit watermarking: machine-generated text can be detected as such, but no additional information can be extracted from the watermark. Unfortunately, merely detecting AI-generated text may not prevent future abuses. We introduce multi-user watermarks, which allow tracing model-generated text to individual users or to groups of colluding users, even in the face of adaptive prompting. We construct multi-user watermarking schemes from undetectable, adaptively robust, zero-bit watermarking schemes (and prove that the undetectable zero-bit scheme of [2] is adaptively robust). Importantly, our scheme provides both zero-bit and multi-user assurances at the same time. It detects shorter snippets just as well as the original scheme, and traces longer excerpts to individuals. The main technical component is a construction of message-embedding watermarks from zero-bit watermarks. Ours is the first generic reduction between watermarking schemes for language models. A challenge for such reductions is the lack of a unified abstraction for robustness - that marked text is detectable even after edits. We introduce a new unifying abstraction called AEB-robustness. AEB-robustness provides that the watermark is detectable whenever the edited text 'approximates enough blocks' of model-generated output.
AB - We study watermarking schemes for language models with provable guarantees. As we show, prior works offer no robustness guarantees against adaptive prompting: when a user queries a language model more than once, as even benign users do. And with just a single exception [1], prior works are restricted to zero-bit watermarking: machine-generated text can be detected as such, but no additional information can be extracted from the watermark. Unfortunately, merely detecting AI-generated text may not prevent future abuses. We introduce multi-user watermarks, which allow tracing model-generated text to individual users or to groups of colluding users, even in the face of adaptive prompting. We construct multi-user watermarking schemes from undetectable, adaptively robust, zero-bit watermarking schemes (and prove that the undetectable zero-bit scheme of [2] is adaptively robust). Importantly, our scheme provides both zero-bit and multi-user assurances at the same time. It detects shorter snippets just as well as the original scheme, and traces longer excerpts to individuals. The main technical component is a construction of message-embedding watermarks from zero-bit watermarks. Ours is the first generic reduction between watermarking schemes for language models. A challenge for such reductions is the lack of a unified abstraction for robustness - that marked text is detectable even after edits. We introduce a new unifying abstraction called AEB-robustness. AEB-robustness provides that the watermark is detectable whenever the edited text 'approximates enough blocks' of model-generated output.
KW - fingerprinting codes
KW - generative ai
KW - language models
KW - watermarking
UR - https://www.scopus.com/pages/publications/105009318870
UR - https://www.scopus.com/pages/publications/105009318870#tab=citedBy
U2 - 10.1109/SP61157.2025.00084
DO - 10.1109/SP61157.2025.00084
M3 - Conference contribution
AN - SCOPUS:105009318870
T3 - Proceedings - IEEE Symposium on Security and Privacy
SP - 2583
EP - 2601
BT - Proceedings - 46th IEEE Symposium on Security and Privacy, SP 2025
A2 - Blanton, Marina
A2 - Enck, William
A2 - Nita-Rotaru, Cristina
T2 - 46th IEEE Symposium on Security and Privacy, SP 2025
Y2 - 12 May 2025 through 15 May 2025
ER -