TY - GEN
T1 - Scalar
T2 - 33rd IEEE/ACM International Conference on Program Comprehension, ICPC 2025
AU - Newman, Christian D.
AU - Scholten, Brandon
AU - Testa, Sophia
AU - Behler, Joshua A.C.
AU - Banabilah, Syreen
AU - Collard, Michael L.
AU - Decker, Michael J.
AU - Mkaouer, Mohamed Wiem
AU - Zampieri, Marcos
AU - Alomar, Eman Abdullah
AU - Alsuhaibani, Reem
AU - Peruma, Anthony
AU - Maletic, Jonathan I.
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - The paper presents the Source Code Analysis and Lexical Annotation Runtime (SCALAR), a tool specialized for mapping (annotating) source code identifier names to their corresponding part-of-speech tag sequence (grammar pattern). SCALAR's internal model is trained using scikit-learn's GradientBoostingClassifier in conjunction with a manually-curated oracle of identifier names and their grammar patterns. This specializes the tagger to recognize the unique structure of the natural language used by developers to create all types of identifiers (e.g., function names, variable names etc.). SCALAR's output is compared with a previous version of the tagger, as well as a modern off-the-shelf part-of-speech tagger to show how it improves upon other taggers' output for annotating identifiers. The code is available on Github https://github.com/SCANL/scanl-tagger.
AB - The paper presents the Source Code Analysis and Lexical Annotation Runtime (SCALAR), a tool specialized for mapping (annotating) source code identifier names to their corresponding part-of-speech tag sequence (grammar pattern). SCALAR's internal model is trained using scikit-learn's GradientBoostingClassifier in conjunction with a manually-curated oracle of identifier names and their grammar patterns. This specializes the tagger to recognize the unique structure of the natural language used by developers to create all types of identifiers (e.g., function names, variable names etc.). SCALAR's output is compared with a previous version of the tagger, as well as a modern off-the-shelf part-of-speech tagger to show how it improves upon other taggers' output for annotating identifiers. The code is available on Github https://github.com/SCANL/scanl-tagger.
KW - identifier naming
KW - natural language processing
KW - part-of-speech tagging
KW - Program comprehension
KW - software evolution
KW - software maintenance
UR - https://www.scopus.com/pages/publications/105009083412
UR - https://www.scopus.com/pages/publications/105009083412#tab=citedBy
U2 - 10.1109/ICPC66645.2025.00045
DO - 10.1109/ICPC66645.2025.00045
M3 - Conference contribution
AN - SCOPUS:105009083412
T3 - IEEE International Conference on Program Comprehension
SP - 367
EP - 371
BT - Proceedings - 2025 IEEE/ACM 33rd International Conference on Program Comprehension, ICPC 2025
Y2 - 27 April 2025 through 28 April 2025
ER -