Mathematics

Research

22 pages, 1343 KiB

Open AccessArticle

Code Comments: A Way of Identifying Similarities in the Source Code

by Rares Folea and Emil Slusanschi

Mathematics 2024, 12(7), 1073; https://0-doi-org.brum.beds.ac.uk/10.3390/math12071073 - 2 Apr 2024

Viewed by 714

This study investigates whether analyzing the code comments available in the source code can effectively reveal functional similarities within software. The authors explore how both machine-readable comments (such as linter instructions) and human-readable comments (in natural language) can contribute towards measuring the code [...] Read more.

This study investigates whether analyzing the code comments available in the source code can effectively reveal functional similarities within software. The authors explore how both machine-readable comments (such as linter instructions) and human-readable comments (in natural language) can contribute towards measuring the code similarity. For the former, the work is relying on computing the cosine similarity over the one-hot encoded representation of the machine-readable comments, while for the latter, the focus is on detecting similarities in English comments, using threshold-based computations against the similarity measurements obtained using models based on Levenshtein distances (for form-based matches), Word2Vec (for contextual word representations), as well as deep learning models, such as Sentence Transformers or Universal Sentence Encoder (for semantic similarity). For evaluation, this research has analyzed the similarities between different source code versions of the open-source code editor, VSCode, based on existing ESlint-specific directives, as well as applying natural language processing techniques on incremental releases of Kubernetes, an open-source system for automating containerized application management. The experiments outlines the potential for detecting code similarities solely based on comments, and observations indicate that models like Universal Sentence Encoder are providing a favorable balance between recall and precision. This research is integrated into Project Martial, an open-source project for automatic assistance in detecting plagiarism in software. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning Based Methods and Applications)

Journal Menu

Journal Browser

Artificial Intelligence and Machine Learning Based Methods and Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (26 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI