On the structure and semantics of identifier names containing closed syntactic category words

  • Christian D. Newman
  • , Anthony Peruma
  • , Eman Abdullah AlOmar
  • , Mahie Crabbe
  • , Syreen Banabilah
  • , Reem S. Alsuhaibani
  • , Michael J. Decker
  • , Farhad Akhbardeh
  • , Marcos Zampieri
  • , Mohamed Wiem Mkaouer
  • , Jonathan I. Maletic

Research output: Contribution to journalArticlepeer-review

Abstract

Identifier names are crucial components of code, serving as primary clues for developers to understand program behavior. This paper investigates the linguistic structure of identifier names by extending the concept of grammar patterns, which represent the part-of-speech (PoS) sequences underlying identifier phrases. The specific focus is on closed syntactic categories (e.g., prepositions, conjunctions, determiners), which are rarely studied in software engineering despite their central role in general natural language. To study these categories, the Closed Category Identifier Dataset (CCID), a new manually annotated dataset of 1,275 identifiers drawn from 30 open-source systems, is constructed and presented. The relationship between closed-category grammar patterns and program behavior is then analyzed using grounded-theory-inspired coding, statistical, and pattern analysis. The results reveal recurring structures that developers use to express concepts such as control flow, data transformation, temporal reasoning, and other behavioral roles through naming. This work contributes an empirical foundation for understanding how linguistic resources encode behavior in identifier names and supports new directions for research in naming, program comprehension, and education.

Original languageEnglish
Article number148
JournalEmpirical Software Engineering
Volume30
Issue number5
DOIs
StatePublished - Sep 2025

Keywords

  • Closed category terms
  • Identifier naming
  • Naming conventions
  • Part of speech tagging
  • Program comprehension
  • Software linguistics
  • Software maintenance and evolution

Fingerprint

Dive into the research topics of 'On the structure and semantics of identifier names containing closed syntactic category words'. Together they form a unique fingerprint.

Cite this