Scalar: A Part-of-Speech Tagger for Identifiers

  • Christian D. Newman
  • , Brandon Scholten
  • , Sophia Testa
  • , Joshua A.C. Behler
  • , Syreen Banabilah
  • , Michael L. Collard
  • , Michael J. Decker
  • , Mohamed Wiem Mkaouer
  • , Marcos Zampieri
  • , Eman Abdullah Alomar
  • , Reem Alsuhaibani
  • , Anthony Peruma
  • , Jonathan I. Maletic

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The paper presents the Source Code Analysis and Lexical Annotation Runtime (SCALAR), a tool specialized for mapping (annotating) source code identifier names to their corresponding part-of-speech tag sequence (grammar pattern). SCALAR's internal model is trained using scikit-learn's GradientBoostingClassifier in conjunction with a manually-curated oracle of identifier names and their grammar patterns. This specializes the tagger to recognize the unique structure of the natural language used by developers to create all types of identifiers (e.g., function names, variable names etc.). SCALAR's output is compared with a previous version of the tagger, as well as a modern off-the-shelf part-of-speech tagger to show how it improves upon other taggers' output for annotating identifiers. The code is available on Github https://github.com/SCANL/scanl-tagger.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE/ACM 33rd International Conference on Program Comprehension, ICPC 2025
Pages367-371
Number of pages5
ISBN (Electronic)9798331502232
DOIs
StatePublished - 2025
Event33rd IEEE/ACM International Conference on Program Comprehension, ICPC 2025 - Ottawa, Canada
Duration: 27 Apr 202528 Apr 2025

Publication series

NameIEEE International Conference on Program Comprehension
ISSN (Print)2643-7147
ISSN (Electronic)2643-7171

Conference

Conference33rd IEEE/ACM International Conference on Program Comprehension, ICPC 2025
Country/TerritoryCanada
CityOttawa
Period27/04/2528/04/25

Keywords

  • identifier naming
  • natural language processing
  • part-of-speech tagging
  • Program comprehension
  • software evolution
  • software maintenance

Fingerprint

Dive into the research topics of 'Scalar: A Part-of-Speech Tagger for Identifiers'. Together they form a unique fingerprint.

Cite this