Next Article in Journal
Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
Previous Article in Journal
Financial Performance Analysis in European Football Clubs
Article

Software Requirements Classification Using Machine Learning Algorithms

Department of Computer Science, University of Brasília (UnB), P.O. Box 4466, Brasília 70910-900, Brazil
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Received: 12 August 2020 / Revised: 2 September 2020 / Accepted: 3 September 2020 / Published: 21 September 2020
The correct classification of requirements has become an essential task within software engineering. This study shows a comparison among the text feature extraction techniques, and machine learning algorithms to the problem of requirements engineer classification to answer the two major questions “Which works best (Bag of Words (BoW) vs. Term Frequency–Inverse Document Frequency (TF-IDF) vs. Chi Squared (CHI2)) for classifying Software Requirements into Functional Requirements (FR) and Non-Functional Requirements (NF), and the sub-classes of Non-Functional Requirements?” and “Which Machine Learning Algorithm provides the best performance for the requirements classification task?”. The data used to perform the research was the PROMISE_exp, a recently made dataset that expands the already known PROMISE repository, a repository that contains labeled software requirements. All the documents from the database were cleaned with a set of normalization steps and the two feature extractions, and feature selection techniques used were BoW, TF-IDF and CHI2 respectively. The algorithms used for classification were Logist Regression (LR), Support Vector Machine (SVM), Multinomial Naive Bayes (MNB) and k-Nearest Neighbors (kNN). The novelty of our work is the data used to perform the experiment, the details of the steps used to reproduce the classification, and the comparison between BoW, TF-IDF and CHI2 for this repository not having been covered by other studies. This work will serve as a reference for the software engineering community and will help other researchers to understand the requirement classification process. We noticed that the use of TF-IDF followed by the use of LR had a better classification result to differentiate requirements, with an F-measure of 0.91 in binary classification (tying with SVM in that case), 0.74 in NF classification and 0.78 in general classification. As future work we intend to compare more algorithms and new forms to improve the precision of our models. View Full-Text
Keywords: functional requirements; non-functional requirements; text normalization; feature extraction; machine learning; support vector machines functional requirements; non-functional requirements; text normalization; feature extraction; machine learning; support vector machines
Show Figures

Figure 1

MDPI and ACS Style

Dias Canedo, E.; Cordeiro Mendes, B. Software Requirements Classification Using Machine Learning Algorithms. Entropy 2020, 22, 1057. https://0-doi-org.brum.beds.ac.uk/10.3390/e22091057

AMA Style

Dias Canedo E, Cordeiro Mendes B. Software Requirements Classification Using Machine Learning Algorithms. Entropy. 2020; 22(9):1057. https://0-doi-org.brum.beds.ac.uk/10.3390/e22091057

Chicago/Turabian Style

Dias Canedo, Edna, and Bruno Cordeiro Mendes. 2020. "Software Requirements Classification Using Machine Learning Algorithms" Entropy 22, no. 9: 1057. https://0-doi-org.brum.beds.ac.uk/10.3390/e22091057

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop