TY - GEN
T1 - Do LLMs Know to Respect Copyright Notice?
AU - Xu, Jialiang
AU - Li, Shenglan
AU - Xu, Zhaozhuo
AU - Zhang, Denghui
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Prior study shows that LLMs sometimes generate content that violates copyright.In this paper, we study another important yet underexplored problem, i.e., will LLMs respect copyright information in user input, and behave accordingly? The research problem is critical, as a negative answer would imply that LLMs will become the primary facilitator and accelerator of copyright infringement behavior.We conducted a series of experiments using a diverse set of language models, user prompts, and copyrighted materials, including books, news articles, API documentation, and movie scripts.Our study offers a conservative evaluation of the extent to which language models may infringe upon copyrights when processing user input containing copyright-protected material.This research emphasizes the need for further investigation and the importance of ensuring LLMs respect copyright regulations when handling user input to prevent unauthorized use or reproduction of protected content.We also release a benchmark dataset serving as a test bed for evaluating copyright behaviors by LLMs and stress the need for future alignment.
AB - Prior study shows that LLMs sometimes generate content that violates copyright.In this paper, we study another important yet underexplored problem, i.e., will LLMs respect copyright information in user input, and behave accordingly? The research problem is critical, as a negative answer would imply that LLMs will become the primary facilitator and accelerator of copyright infringement behavior.We conducted a series of experiments using a diverse set of language models, user prompts, and copyrighted materials, including books, news articles, API documentation, and movie scripts.Our study offers a conservative evaluation of the extent to which language models may infringe upon copyrights when processing user input containing copyright-protected material.This research emphasizes the need for further investigation and the importance of ensuring LLMs respect copyright regulations when handling user input to prevent unauthorized use or reproduction of protected content.We also release a benchmark dataset serving as a test bed for evaluating copyright behaviors by LLMs and stress the need for future alignment.
UR - https://www.scopus.com/pages/publications/85217802606
UR - https://www.scopus.com/pages/publications/85217802606#tab=citedBy
U2 - 10.18653/v1/2024.emnlp-main.1147
DO - 10.18653/v1/2024.emnlp-main.1147
M3 - Conference contribution
AN - SCOPUS:85217802606
T3 - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
SP - 20604
EP - 20619
BT - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
A2 - Al-Onaizan, Yaser
A2 - Bansal, Mohit
A2 - Chen, Yun-Nung
T2 - 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
Y2 - 12 November 2024 through 16 November 2024
ER -