Skip to main navigation Skip to search Skip to main content

Watermarking Language Models for Many Adaptive Users

  • The University of Chicago

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

We study watermarking schemes for language models with provable guarantees. As we show, prior works offer no robustness guarantees against adaptive prompting: when a user queries a language model more than once, as even benign users do. And with just a single exception [1], prior works are restricted to zero-bit watermarking: machine-generated text can be detected as such, but no additional information can be extracted from the watermark. Unfortunately, merely detecting AI-generated text may not prevent future abuses. We introduce multi-user watermarks, which allow tracing model-generated text to individual users or to groups of colluding users, even in the face of adaptive prompting. We construct multi-user watermarking schemes from undetectable, adaptively robust, zero-bit watermarking schemes (and prove that the undetectable zero-bit scheme of [2] is adaptively robust). Importantly, our scheme provides both zero-bit and multi-user assurances at the same time. It detects shorter snippets just as well as the original scheme, and traces longer excerpts to individuals. The main technical component is a construction of message-embedding watermarks from zero-bit watermarks. Ours is the first generic reduction between watermarking schemes for language models. A challenge for such reductions is the lack of a unified abstraction for robustness - that marked text is detectable even after edits. We introduce a new unifying abstraction called AEB-robustness. AEB-robustness provides that the watermark is detectable whenever the edited text 'approximates enough blocks' of model-generated output.

Original languageEnglish
Title of host publicationProceedings - 46th IEEE Symposium on Security and Privacy, SP 2025
EditorsMarina Blanton, William Enck, Cristina Nita-Rotaru
Pages2583-2601
Number of pages19
ISBN (Electronic)9798331522360
DOIs
StatePublished - 2025
Event46th IEEE Symposium on Security and Privacy, SP 2025 - San Francisco, United States
Duration: 12 May 202515 May 2025

Publication series

NameProceedings - IEEE Symposium on Security and Privacy
ISSN (Print)1081-6011

Conference

Conference46th IEEE Symposium on Security and Privacy, SP 2025
Country/TerritoryUnited States
CitySan Francisco
Period12/05/2515/05/25

Keywords

  • fingerprinting codes
  • generative ai
  • language models
  • watermarking

Fingerprint

Dive into the research topics of 'Watermarking Language Models for Many Adaptive Users'. Together they form a unique fingerprint.

Cite this