Next Article in Journal
Generalized Pattern Search Algorithm for Crustal Modeling
Next Article in Special Issue
Integrated Multi-Model Face Shape and Eye Attributes Identification for Hair Style and Eyelashes Recommendation
Previous Article in Journal
Graph Reachability on Parallel Many-Core Architectures
Previous Article in Special Issue
Modelling Autonomous Agents’ Decisions in Learning to Cross a Cellular Automaton-Based Highway via Artificial Neural Networks
Article

Classification of Categorical Data Based on the Chi-Square Dissimilarity and t-SNE

1
Department of Electric Engineering, Universidad Tecnológica de Pereira, Pereira 660002, Colombia
2
Department of Engineering, Corporación Instituto de Administración y Finanzas (CIAF), Pereira 660002, Colombia
3
Department of Electronics and Computer Science, Pontificia Universidad Javeriana Cali, Cali 760031, Colombia
*
Author to whom correspondence should be addressed.
Received: 21 September 2020 / Revised: 5 October 2020 / Accepted: 7 October 2020 / Published: 4 December 2020
The recurrent use of databases with categorical variables in different applications demands new alternatives to identify relevant patterns. Classification is an interesting approach for the recognition of this type of data. However, there are a few amount of methods for this purpose in the literature. Also, those techniques are specifically focused only on kernels, having accuracy problems and high computational cost. For this reason, we propose an identification approach for categorical variables using conventional classifiers (LDC-QDC-KNN-SVM) and different mapping techniques to increase the separability of classes. Specifically, we map the initial features (categorical attributes) to another space, using the Chi-square (C-S) as a measure of dissimilarity. Then, we employ the (t-SNE) for reducing dimensionality of data to two or three features, allowing a significant reduction of computational times in learning methods. We evaluate the performance of proposed approach in terms of accuracy for several experimental configurations and public categorical datasets downloaded from the UCI repository, and we compare with relevant state of the art methods. Results show that C-S mapping and t-SNE considerably diminish the computational times in recognitions tasks, while the accuracy is preserved. Also, when we apply only the C-S mapping to the datasets, the separability of classes is enhanced, thus, the performance of learning algorithms is clearly increased. View Full-Text
Keywords: Chi-square; classification; t-SNE; categorical data; dissimilarity Chi-square; classification; t-SNE; categorical data; dissimilarity
Show Figures

Figure 1

  • Externally hosted supplementary file 1
    Doi: https://0-doi-org.brum.beds.ac.uk/10.1007/978-3-030-15127-0_46
MDPI and ACS Style

Cardona, L.A.S.; Vargas-Cardona, H.D.; Navarro González, P.; Cardenas Peña, D.A.; Orozco Gutiérrez, Á.Á. Classification of Categorical Data Based on the Chi-Square Dissimilarity and t-SNE. Computation 2020, 8, 104. https://0-doi-org.brum.beds.ac.uk/10.3390/computation8040104

AMA Style

Cardona LAS, Vargas-Cardona HD, Navarro González P, Cardenas Peña DA, Orozco Gutiérrez ÁÁ. Classification of Categorical Data Based on the Chi-Square Dissimilarity and t-SNE. Computation. 2020; 8(4):104. https://0-doi-org.brum.beds.ac.uk/10.3390/computation8040104

Chicago/Turabian Style

Cardona, Luis A.S., Hernán D. Vargas-Cardona, Piedad Navarro González, David A. Cardenas Peña, and Álvaro Á. Orozco Gutiérrez 2020. "Classification of Categorical Data Based on the Chi-Square Dissimilarity and t-SNE" Computation 8, no. 4: 104. https://0-doi-org.brum.beds.ac.uk/10.3390/computation8040104

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop