Next Article in Journal
Acknowledgment to Reviewers of MAKE in 2020
Next Article in Special Issue
Property Checking with Interpretable Error Characterization for Recurrent Neural Networks
Previous Article in Journal / Special Issue
Learning DOM Trees of Web Pages by Subpath Kernel and Detecting Fake e-Commerce Sites
Article

Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM

by 1,2,*,†, 1,2,*,†, 1,2 and 1
1
Fraunhofer IAIS, 53757 Sankt Augustin, Germany
2
Department of Computer Science, University of Bonn, 53113 Bonn, Germany
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Mach. Learn. Knowl. Extr. 2021, 3(1), 123-167; https://0-doi-org.brum.beds.ac.uk/10.3390/make3010007
Received: 30 November 2020 / Revised: 8 January 2021 / Accepted: 13 January 2021 / Published: 19 January 2021
(This article belongs to the Special Issue Selected Papers from CD-MAKE 2020 and ARES 2020)
Unsupervised topic extraction is a vital step in automatically extracting concise contentual information from large text corpora. Existing topic extraction methods lack the capability of linking relations between these topics which would further help text understanding. Therefore we propose utilizing the Decomposition into Directional Components (DEDICOM) algorithm which provides a uniquely interpretable matrix factorization for symmetric and asymmetric square matrices and tensors. We constrain DEDICOM to row-stochasticity and non-negativity in order to factorize pointwise mutual information matrices and tensors of text corpora. We identify latent topic clusters and their relations within the vocabulary and simultaneously learn interpretable word embeddings. Further, we introduce multiple methods based on alternating gradient descent to efficiently train constrained DEDICOM algorithms. We evaluate the qualitative topic modeling and word embedding performance of our proposed methods on several datasets, including a novel New York Times news dataset, and demonstrate how the DEDICOM algorithm provides deeper text analysis than competing matrix factorization approaches. View Full-Text
Keywords: matrix factorization; tensor factorization; word embeddings; topic modeling; NLP matrix factorization; tensor factorization; word embeddings; topic modeling; NLP
Show Figures

Figure 1

MDPI and ACS Style

Hillebrand, L.; Biesner, D.; Bauckhage, C.; Sifa, R. Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM. Mach. Learn. Knowl. Extr. 2021, 3, 123-167. https://0-doi-org.brum.beds.ac.uk/10.3390/make3010007

AMA Style

Hillebrand L, Biesner D, Bauckhage C, Sifa R. Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM. Machine Learning and Knowledge Extraction. 2021; 3(1):123-167. https://0-doi-org.brum.beds.ac.uk/10.3390/make3010007

Chicago/Turabian Style

Hillebrand, Lars, David Biesner, Christian Bauckhage, and Rafet Sifa. 2021. "Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM" Machine Learning and Knowledge Extraction 3, no. 1: 123-167. https://0-doi-org.brum.beds.ac.uk/10.3390/make3010007

Find Other Styles

Article Access Map by Country/Region

1
Back to TopTop