Authorship similarity detection from email messages

Xiaoling Chen, Peng Hao, R. Chandramouli, K. P. Subbalakshmi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

33 Scopus citations

Abstract

It is easy to hide the true identity of the author of an email. The author's actual name, email address, etc. can be changed arbitrarily to deceive an email receiver. For example, a sender can change his/her identity in the email header to send different emails to various recipients. Therefore, in this paper, we investigate techniques for authorship similarity detection from the text content of a short length, topic-free email. 150 stylistic cues are identified for this problem. A frequent pattern and machine learning based method is proposed. Extensive experiment results are also presented for the Enron email data set.

Original languageEnglish
Title of host publicationMachine Learning and Data Mining in Pattern Recognition - 7th International Conference, MLDM 2011, Proceedings
Pages375-386
Number of pages12
DOIs
StatePublished - 2011
Event7th International Conference on Machine Learning and Data Mining, MLDM 2011 - New York, NY, United States
Duration: 30 Aug 20113 Sep 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6871 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference7th International Conference on Machine Learning and Data Mining, MLDM 2011
Country/TerritoryUnited States
CityNew York, NY
Period30/08/113/09/11

Keywords

  • Authorship similarity
  • Enron email
  • Frequent pattern
  • SVM

Fingerprint

Dive into the research topics of 'Authorship similarity detection from email messages'. Together they form a unique fingerprint.

Cite this