entropy-logo

Journal Browser

Journal Browser

The Role of Signal Processing and Information Theory in Modern Machine Learning

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Signal and Data Analysis".

Deadline for manuscript submissions: closed (30 November 2020) | Viewed by 27495

Special Issue Editors


E-Mail Website
Guest Editor
1. Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA
2. Department of Computer Science, Ryerson University, Toronto, ON M5B 2K3, Canada
Interests: machine learning; signal processing; communication systems; data science

E-Mail Website
Guest Editor
Institute of Science and Technology Austria, 3400 Klosterneuburg, Austria
Interests: information theory; machine learning; data science; wireless communication systems; coding theory

E-Mail Website
Guest Editor
Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA
Interests: machine learning; statistical signal processing; artificial intelligence; medical imaging

Special Issue Information

Dear Colleagues,

Breakthroughs in modern machine learning are rapidly changing science, industry, and society, yet fundamental understanding in this area has lagged behind. For example, one of the central tenets of the field, the bias–variance trade-off, appears to be at odds with the observed behavior of methods used in practice and the black-box nature of deep neural network architectures defies explanation. As these technologies are integrated more and more deeply into devices and services used by millions of people worldwide, there is an urgent need to provide theoretical guarantees for machine-learning techniques and to explain why and how these techniques work, based on empirical observation.

Recently, powerful tools from signal processing, information theory, and statistical mechanics have provided insight into the inner workings of modern machine learning. This Special Issue aims to be a forum for the presentation of new and improved techniques at the intersection of Signal Processing, Information Theory, Statistical Mechanics, and Machine Learning. In particular, the theory of deep learning, novel uses of signal processing and information theory in machine learning, explainable deep learning, as well as active and adversarial learning fall within the scope of this Special Issue. 

Prof. Nariman Farsad
Prof. Marco Mondelli
Dr. Morteza Mardani
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Theory of Deep Learning
  • Information Theory in Machine Learning
  • Signal Processing in Machine Learning
  • Active Learning
  • Explainable Deep Learning
  • Adversarial Learning
  • Distributed Machine Learning
  • Statistics
  • Optimization

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

40 pages, 1007 KiB  
Article
Phase Transitions in Transfer Learning for High-Dimensional Perceptrons
by Oussama Dhifallah and Yue M. Lu
Entropy 2021, 23(4), 400; https://0-doi-org.brum.beds.ac.uk/10.3390/e23040400 - 27 Mar 2021
Cited by 5 | Viewed by 3422
Abstract
Transfer learning seeks to improve the generalization performance of a target task by exploiting the knowledge learned from a related source task. Central questions include deciding what information one should transfer and when transfer can be beneficial. The latter question is related to [...] Read more.
Transfer learning seeks to improve the generalization performance of a target task by exploiting the knowledge learned from a related source task. Central questions include deciding what information one should transfer and when transfer can be beneficial. The latter question is related to the so-called negative transfer phenomenon, where the transferred source information actually reduces the generalization performance of the target task. This happens when the two tasks are sufficiently dissimilar. In this paper, we present a theoretical analysis of transfer learning by studying a pair of related perceptron learning tasks. Despite the simplicity of our model, it reproduces several key phenomena observed in practice. Specifically, our asymptotic analysis reveals a phase transition from negative transfer to positive transfer as the similarity of the two tasks moves past a well-defined threshold. Full article
Show Figures

Figure 1

46 pages, 1342 KiB  
Article
Sharp Guarantees and Optimal Performance for Inference in Binary and Gaussian-Mixture Models
by Hossein Taheri, Ramtin Pedarsani and Christos Thrampoulidis
Entropy 2021, 23(2), 178; https://0-doi-org.brum.beds.ac.uk/10.3390/e23020178 - 30 Jan 2021
Cited by 2 | Viewed by 2053
Abstract
We study convex empirical risk minimization for high-dimensional inference in binary linear classification under both discriminative binary linear models, as well as generative Gaussian-mixture models. Our first result sharply predicts the statistical performance of such estimators in the proportional asymptotic regime under isotropic [...] Read more.
We study convex empirical risk minimization for high-dimensional inference in binary linear classification under both discriminative binary linear models, as well as generative Gaussian-mixture models. Our first result sharply predicts the statistical performance of such estimators in the proportional asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit to prove bounds on the best achievable performance. Notably, we show that the proposed bounds are tight for popular binary models (such as signed and logistic) and for the Gaussian-mixture model by constructing appropriate loss functions that achieve it. Our numerical simulations suggest that the theory is accurate even for relatively small problem dimensions and that it enjoys a certain universality property. Full article
Show Figures

Figure 1

24 pages, 456 KiB  
Article
Common Information Components Analysis
by Erixhen Sula and Michael C. Gastpar
Entropy 2021, 23(2), 151; https://0-doi-org.brum.beds.ac.uk/10.3390/e23020151 - 26 Jan 2021
Cited by 3 | Viewed by 1938
Abstract
Wyner’s common information is a measure that quantifies and assesses the commonality between two random variables. Based on this, we introduce a novel two-step procedure to construct features from data, referred to as Common Information Components Analysis (CICA). The first step can be [...] Read more.
Wyner’s common information is a measure that quantifies and assesses the commonality between two random variables. Based on this, we introduce a novel two-step procedure to construct features from data, referred to as Common Information Components Analysis (CICA). The first step can be interpreted as an extraction of Wyner’s common information. The second step is a form of back-projection of the common information onto the original variables, leading to the extracted features. A free parameter γ controls the complexity of the extracted features. We establish that, in the case of Gaussian statistics, CICA precisely reduces to Canonical Correlation Analysis (CCA), where the parameter γ determines the number of CCA components that are extracted. In this sense, we establish a novel rigorous connection between information measures and CCA, and CICA is a strict generalization of the latter. It is shown that CICA has several desirable features, including a natural extension to beyond just two data sets. Full article
Show Figures

Figure 1

28 pages, 775 KiB  
Article
Information-Theoretic Generalization Bounds for Meta-Learning and Applications
by Sharu Theresa Jose and Osvaldo Simeone
Entropy 2021, 23(1), 126; https://0-doi-org.brum.beds.ac.uk/10.3390/e23010126 - 19 Jan 2021
Cited by 20 | Viewed by 3427
Abstract
Meta-learning, or “learning to learn”, refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, [...] Read more.
Meta-learning, or “learning to learn”, refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average loss measured on the meta-training data and on a new, randomly selected task. This paper presents novel information-theoretic upper bounds on the meta-generalization gap. Two broad classes of meta-learning algorithms are considered that use either separate within-task training and test sets, like model agnostic meta-learning (MAML), or joint within-task training and test sets, like reptile. Extending the existing work for conventional learning, an upper bound on the meta-generalization gap is derived for the former class that depends on the mutual information (MI) between the output of the meta-learning algorithm and its input meta-training data. For the latter, the derived bound includes an additional MI between the output of the per-task learning procedure and corresponding data set to capture within-task uncertainty. Tighter bounds are then developed for the two classes via novel individual task MI (ITMI) bounds. Applications of the derived bounds are finally discussed, including a broad class of noisy iterative algorithms for meta-learning. Full article
Show Figures

Figure 1

34 pages, 612 KiB  
Article
No Statistical-Computational Gap in Spiked Matrix Models with Generative Network Priors
by Jorio Cocola, Paul Hand and Vladislav Voroninski
Entropy 2021, 23(1), 115; https://0-doi-org.brum.beds.ac.uk/10.3390/e23010115 - 16 Jan 2021
Cited by 4 | Viewed by 2574
Abstract
We provide a non-asymptotic analysis of the spiked Wishart and Wigner matrix models with a generative neural network prior. Spiked random matrices have the form of a rank-one signal plus noise and have been used as models for high dimensional Principal Component Analysis [...] Read more.
We provide a non-asymptotic analysis of the spiked Wishart and Wigner matrix models with a generative neural network prior. Spiked random matrices have the form of a rank-one signal plus noise and have been used as models for high dimensional Principal Component Analysis (PCA), community detection and synchronization over groups. Depending on the prior imposed on the spike, these models can display a statistical-computational gap between the information theoretically optimal reconstruction error that can be achieved with unbounded computational resources and the sub-optimal performances of currently known polynomial time algorithms. These gaps are believed to be fundamental, as in the emblematic case of Sparse PCA. In stark contrast to such cases, we show that there is no statistical-computational gap under a generative network prior, in which the spike lies on the range of a generative neural network. Specifically, we analyze a gradient descent method for minimizing a nonlinear least squares objective over the range of an expansive-Gaussian neural network and show that it can recover in polynomial time an estimate of the underlying spike with a rate-optimal sample complexity and dependence on the noise level. Full article
Show Figures

Figure 1

18 pages, 2691 KiB  
Article
Deep Task-Based Quantization
by Nir Shlezinger and Yonina C. Eldar
Entropy 2021, 23(1), 104; https://0-doi-org.brum.beds.ac.uk/10.3390/e23010104 - 13 Jan 2021
Cited by 28 | Viewed by 4757
Abstract
Quantizers play a critical role in digital signal processing systems. Recent works have shown that the performance of acquiring multiple analog signals using scalar analog-to-digital converters (ADCs) can be significantly improved by processing the signals prior to quantization. However, the design of such [...] Read more.
Quantizers play a critical role in digital signal processing systems. Recent works have shown that the performance of acquiring multiple analog signals using scalar analog-to-digital converters (ADCs) can be significantly improved by processing the signals prior to quantization. However, the design of such hybrid quantizers is quite complex, and their implementation requires complete knowledge of the statistical model of the analog signal. In this work we design data-driven task-oriented quantization systems with scalar ADCs, which determine their analog-to-digital mapping using deep learning tools. These mappings are designed to facilitate the task of recovering underlying information from the quantized signals. By using deep learning, we circumvent the need to explicitly recover the system model and to find the proper quantization rule for it. Our main target application is multiple-input multiple-output (MIMO) communication receivers, which simultaneously acquire a set of analog signals, and are commonly subject to constraints on the number of bits. Our results indicate that, in a MIMO channel estimation setup, the proposed deep task-bask quantizer is capable of approaching the optimal performance limits dictated by indirect rate-distortion theory, achievable using vector quantizers and requiring complete knowledge of the underlying statistical model. Furthermore, for a symbol detection scenario, it is demonstrated that the proposed approach can realize reliable bit-efficient hybrid MIMO receivers capable of setting their quantization rule in light of the task. Full article
Show Figures

Figure 1

13 pages, 792 KiB  
Article
Deep Ensemble of Weighted Viterbi Decoders for Tail-Biting Convolutional Codes
by Tomer Raviv, Asaf Schwartz and Yair Be’ery
Entropy 2021, 23(1), 93; https://0-doi-org.brum.beds.ac.uk/10.3390/e23010093 - 10 Jan 2021
Cited by 6 | Viewed by 2581
Abstract
Tail-biting convolutional codes extend the classical zero-termination convolutional codes: Both encoding schemes force the equality of start and end states, but under the tail-biting each state is a valid termination. This paper proposes a machine learning approach to improve the state-of-the-art decoding of [...] Read more.
Tail-biting convolutional codes extend the classical zero-termination convolutional codes: Both encoding schemes force the equality of start and end states, but under the tail-biting each state is a valid termination. This paper proposes a machine learning approach to improve the state-of-the-art decoding of tail-biting codes, focusing on the widely employed short length regime as in the LTE standard. This standard also includes a CRC code. First, we parameterize the circular Viterbi algorithm, a baseline decoder that exploits the circular nature of the underlying trellis. An ensemble combines multiple such weighted decoders, and each decoder specializes in decoding words from a specific region of the channel words’ distribution. A region corresponds to a subset of termination states; the ensemble covers the entire states space. A non-learnable gating satisfies two goals: it filters easily decoded words and mitigates the overhead of executing multiple weighted decoders. The CRC criterion is employed to choose only a subset of experts for decoding purpose. Our method achieves FER improvement of up to 0.75 dB over the CVA in the waterfall region for multiple code lengths, adding negligible computational complexity compared to the circular Viterbi algorithm in high signal-to-noise ratios (SNRs). Full article
Show Figures

Figure 1

14 pages, 1523 KiB  
Article
Examining the Causal Structures of Deep Neural Networks Using Information Theory
by Scythia Marrow, Eric J. Michaud and Erik Hoel
Entropy 2020, 22(12), 1429; https://0-doi-org.brum.beds.ac.uk/10.3390/e22121429 - 18 Dec 2020
Cited by 3 | Viewed by 4620
Abstract
Deep Neural Networks (DNNs) are often examined at the level of their response to input, such as analyzing the mutual information between nodes and data sets. Yet DNNs can also be examined at the level of causation, exploring “what does what” within the [...] Read more.
Deep Neural Networks (DNNs) are often examined at the level of their response to input, such as analyzing the mutual information between nodes and data sets. Yet DNNs can also be examined at the level of causation, exploring “what does what” within the layers of the network itself. Historically, analyzing the causal structure of DNNs has received less attention than understanding their responses to input. Yet definitionally, generalizability must be a function of a DNN’s causal structure as it reflects how the DNN responds to unseen or even not-yet-defined future inputs. Here, we introduce a suite of metrics based on information theory to quantify and track changes in the causal structure of DNNs during training. Specifically, we introduce the effective information (EI) of a feedforward DNN, which is the mutual information between layer input and output following a maximum-entropy perturbation. The EI can be used to assess the degree of causal influence nodes and edges have over their downstream targets in each layer. We show that the EI can be further decomposed in order to examine the sensitivity of a layer (measured by how well edges transmit perturbations) and the degeneracy of a layer (measured by how edge overlap interferes with transmission), along with estimates of the amount of integrated information of a layer. Together, these properties define where each layer lies in the “causal plane”, which can be used to visualize how layer connectivity becomes more sensitive or degenerate over time, and how integration changes during training, revealing how the layer-by-layer causal structure differentiates. These results may help in understanding the generalization capabilities of DNNs and provide foundational tools for making DNNs both more generalizable and more explainable. Full article
Show Figures

Figure 1

Back to TopTop