Special Issue "Information Theory and Machine Learning"

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: 28 February 2022.

Special Issue Editors

Prof. Dr. Lizhong Zheng
E-Mail Website
Guest Editor
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
Interests: wireless communications; space-time codes; network information theory; wireless networks
Prof. Dr. Chao Tian
E-Mail Website
Guest Editor
Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
Interests: a computational approach to information theoretic converses; coding for distributed data storage; joint source-channel coding; an approximate approach to network information theory lossy multiuser source coding problems

Special Issue Information

Dear Colleagues,

There are a number of significant steps in the development of machine learning that benefit from information theoretic analysis, as well as the insights into information processing that it brings. While we expect information theory to play an even more significant role in the next wave of growth in machine learning and artificial intelligence, we also recognize the new challenges in this task. There are indeed a set of lofty goals, where we hope to have a holistic view of data processing, to work with high-dimensional data and inaccurate statistical models, to incorporate domain knowledge, to provide performance guarantees, robustness, security, and fairness, to reduce the use of computational resources, to generate reusable and interpretable learning results, etc. Correspondingly, in theoretical studies, we shall need new formulations, new mathematical tools, new analysis techniques, and maybe even new metrics to evaluate the guidance and insights offered by theoretical studies.

The goal of this Special Issue is to collect new results in using information theoretic thinking to solve machine learning problems. We are also interested in papers presenting new methods and new concepts, even if some of these ideas might not have been fully developed, or might not have the most compelling set of supporting experimental results.

Some of the topics of interest are listed below:

  • Understanding gradient descent and general iterative algorithms;
  • Sample complexity and generalization errors;
  • Utilizing knowledge of data structure in learning;
  • Distributed learning, communication-aware learning algorithms;
  • Transfer learning;
  • Multimodal learning and information fusion;
  • Information theoretic approaches in active and reinforcement learning;
  • Representation learning and its information theoretic interpretation;
  • Method and theory for model compression.

Prof. Dr. Lizhong Zheng
Prof. Dr. Chao Tian
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Article
An Information Theoretic Interpretation to Deep Neural Networks
Entropy 2022, 24(1), 135; https://0-doi-org.brum.beds.ac.uk/10.3390/e24010135 - 17 Jan 2022
Viewed by 219
Abstract
With the unprecedented performance achieved by deep learning, it is commonly believed that deep neural networks (DNNs) attempt to extract informative features for learning tasks. To formalize this intuition, we apply the local information geometric analysis and establish an information-theoretic framework for feature [...] Read more.
With the unprecedented performance achieved by deep learning, it is commonly believed that deep neural networks (DNNs) attempt to extract informative features for learning tasks. To formalize this intuition, we apply the local information geometric analysis and establish an information-theoretic framework for feature selection, which demonstrates the information-theoretic optimality of DNN features. Moreover, we conduct a quantitative analysis to characterize the impact of network structure on the feature extraction process of DNNs. Our investigation naturally leads to a performance metric for evaluating the effectiveness of extracted features, called the H-score, which illustrates the connection between the practical training process of DNNs and the information-theoretic framework. Finally, we validate our theoretical results by experimental designs on synthesized data and the ImageNet dataset. Full article
(This article belongs to the Special Issue Information Theory and Machine Learning)
Show Figures

Figure 1

Article
Probabilistic Deterministic Finite Automata and Recurrent Networks, Revisited
Entropy 2022, 24(1), 90; https://0-doi-org.brum.beds.ac.uk/10.3390/e24010090 - 06 Jan 2022
Viewed by 125
Abstract
Reservoir computers (RCs) and recurrent neural networks (RNNs) can mimic any finite-state automaton in theory, and some workers demonstrated that this can hold in practice. We test the capability of generalized linear models, RCs, and Long Short-Term Memory (LSTM) RNN architectures to predict [...] Read more.
Reservoir computers (RCs) and recurrent neural networks (RNNs) can mimic any finite-state automaton in theory, and some workers demonstrated that this can hold in practice. We test the capability of generalized linear models, RCs, and Long Short-Term Memory (LSTM) RNN architectures to predict the stochastic processes generated by a large suite of probabilistic deterministic finite-state automata (PDFA) in the small-data limit according to two metrics: predictive accuracy and distance to a predictive rate-distortion curve. The latter provides a sense of whether or not the RNN is a lossy predictive feature extractor in the information-theoretic sense. PDFAs provide an excellent performance benchmark in that they can be systematically enumerated, the randomness and correlation structure of their generated processes are exactly known, and their optimal memory-limited predictors are easily computed. With less data than is needed to make a good prediction, LSTMs surprisingly lose at predictive accuracy, but win at lossy predictive feature extraction. These results highlight the utility of causal states in understanding the capabilities of RNNs to predict. Full article
(This article belongs to the Special Issue Information Theory and Machine Learning)
Show Figures

Figure 1

Article
Information-Corrected Estimation: A Generalization Error Reducing Parameter Estimation Method
Entropy 2021, 23(11), 1419; https://0-doi-org.brum.beds.ac.uk/10.3390/e23111419 - 28 Oct 2021
Cited by 1 | Viewed by 720
Abstract
Modern computational models in supervised machine learning are often highly parameterized universal approximators. As such, the value of the parameters is unimportant, and only the out of sample performance is considered. On the other hand much of the literature on model estimation assumes [...] Read more.
Modern computational models in supervised machine learning are often highly parameterized universal approximators. As such, the value of the parameters is unimportant, and only the out of sample performance is considered. On the other hand much of the literature on model estimation assumes that the parameters themselves have intrinsic value, and thus is concerned with bias and variance of parameter estimates, which may not have any simple relationship to out of sample model performance. Therefore, within supervised machine learning, heavy use is made of ridge regression (i.e., L2 regularization), which requires the the estimation of hyperparameters and can be rendered ineffective by certain model parameterizations. We introduce an objective function which we refer to as Information-Corrected Estimation (ICE) that reduces KL divergence based generalization error for supervised machine learning. ICE attempts to directly maximize a corrected likelihood function as an estimator of the KL divergence. Such an approach is proven, theoretically, to be effective for a wide class of models, with only mild regularity restrictions. Under finite sample sizes, this corrected estimation procedure is shown experimentally to lead to significant reduction in generalization error compared to maximum likelihood estimation and L2 regularization. Full article
(This article belongs to the Special Issue Information Theory and Machine Learning)
Show Figures

Figure 1

Article
Population Risk Improvement with Model Compression: An Information-Theoretic Approach
Entropy 2021, 23(10), 1255; https://0-doi-org.brum.beds.ac.uk/10.3390/e23101255 - 27 Sep 2021
Cited by 1 | Viewed by 456
Abstract
It has been reported in many recent works on deep model compression that the population risk of a compressed model can be even better than that of the original model. In this paper, an information-theoretic explanation for this population risk improvement phenomenon is [...] Read more.
It has been reported in many recent works on deep model compression that the population risk of a compressed model can be even better than that of the original model. In this paper, an information-theoretic explanation for this population risk improvement phenomenon is provided by jointly studying the decrease in the generalization error and the increase in the empirical risk that results from model compression. It is first shown that model compression reduces an information-theoretic bound on the generalization error, which suggests that model compression can be interpreted as a regularization technique to avoid overfitting. The increase in empirical risk caused by model compression is then characterized using rate distortion theory. These results imply that the overall population risk could be improved by model compression if the decrease in generalization error exceeds the increase in empirical risk. A linear regression example is presented to demonstrate that such a decrease in population risk due to model compression is indeed possible. Our theoretical results further suggest a way to improve a widely used model compression algorithm, i.e., Hessian-weighted K-means clustering, by regularizing the distance between the clustering centers. Experiments with neural networks are provided to validate our theoretical assertions. Full article
(This article belongs to the Special Issue Information Theory and Machine Learning)
Show Figures

Figure 1

Article
On Supervised Classification of Feature Vectors with Independent and Non-Identically Distributed Elements
Entropy 2021, 23(8), 1045; https://0-doi-org.brum.beds.ac.uk/10.3390/e23081045 - 13 Aug 2021
Viewed by 426
Abstract
In this paper, we investigate the problem of classifying feature vectors with mutually independent but non-identically distributed elements that take values from a finite alphabet set. First, we show the importance of this problem. Next, we propose a classifier and derive an analytical [...] Read more.
In this paper, we investigate the problem of classifying feature vectors with mutually independent but non-identically distributed elements that take values from a finite alphabet set. First, we show the importance of this problem. Next, we propose a classifier and derive an analytical upper bound on its error probability. We show that the error probability moves to zero as the length of the feature vectors grows, even when there is only one training feature vector per label available. Thereby, we show that for this important problem at least one asymptotically optimal classifier exists. Finally, we provide numerical examples where we show that the performance of the proposed classifier outperforms conventional classification algorithms when the number of training data is small and the length of the feature vectors is sufficiently high. Full article
(This article belongs to the Special Issue Information Theory and Machine Learning)
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

1. Prof. Alfred O. Hero, University of Michigan

Back to TopTop