entropy-logo

Journal Browser

Journal Browser

Theory and Application of the Information Bottleneck Method

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (30 April 2023) | Viewed by 17851

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors

Fraunhofer Institute for Communication, Information Processing and Ergonomics FKIE, 53343 Wachtberg, Germany
Interests: information bottleneck method; signal processing; channel coding; modulation; machine learning
Hamburg University of Technology, 21073 Hamburg, Germany
Interests: information bottleneck method; signal processing; channel coding; modulation; machine learning

Special Issue Information

Dear Colleagues,

Even though more than two decades have passed since Tishby et al. introduced an information theoretical setup termed the information bottleneck method, it still is an extremely interesting and hot research topic. Conceptually, the method aims for the compression of a random variable that keeps relevant mutual information. Relevance in this context is defined through a random variable of interest. A particularly fascinating aspect of this concept is its generality, which made the information bottleneck method a framework with many versatile applications over the last few years. Researchers from different disciplines quickly recognized the usefulness and the power of the quite theoretical concept and pushed it into practice. Applications of the information bottleneck method now cover many different fields, for example, machine learning, deep learning, neuroscience, multimedia and image processing, data processing, source coding, channel coding and information processing. In addition, the theoretical backgrounds of the method, generalizations and algorithmic approaches became fruitful research topics. Today, we have a rich and powerful framework with algorithms and theoretical backgrounds available. Now is our chance to discover and investigate novel applications that can profit from this fascinating method and to reveal additional insights on its theory.

This Special Issue will consolidate the latest ideas and findings on applications and theory of the information bottleneck method. Intentionally, we do not narrow the scope to a particular field, but encourage submissions from all engineering disciplines. We especially appreciate contributions on machine learning and signal processing in communications.

Dr. Jan Lewandowsky
Prof. Dr. Gerhard Bauch
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • information bottleneck method
  • data processing
  • signal processing
  • information processing
  • machine learning
  • quantization
  • clustering

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

3 pages, 160 KiB  
Editorial
Theory and Application of the Information Bottleneck Method
by Jan Lewandowsky and Gerhard Bauch
Entropy 2024, 26(3), 187; https://0-doi-org.brum.beds.ac.uk/10.3390/e26030187 - 22 Feb 2024
Viewed by 534
Abstract
In 1999, Naftali Tishby et al [...] Full article
(This article belongs to the Special Issue Theory and Application of the Information Bottleneck Method)

Research

Jump to: Editorial

62 pages, 2676 KiB  
Article
The Information Bottleneck’s Ordinary Differential Equation: First-Order Root Tracking for the Information Bottleneck
by Shlomi Agmon
Entropy 2023, 25(10), 1370; https://0-doi-org.brum.beds.ac.uk/10.3390/e25101370 - 22 Sep 2023
Viewed by 613
Abstract
The Information Bottleneck (IB) is a method of lossy compression of relevant information. Its rate-distortion (RD) curve describes the fundamental tradeoff between input compression and the preservation of relevant information embedded in the input. However, it conceals the underlying dynamics of optimal input [...] Read more.
The Information Bottleneck (IB) is a method of lossy compression of relevant information. Its rate-distortion (RD) curve describes the fundamental tradeoff between input compression and the preservation of relevant information embedded in the input. However, it conceals the underlying dynamics of optimal input encodings. We argue that these typically follow a piecewise smooth trajectory when input information is being compressed, as recently shown in RD. These smooth dynamics are interrupted when an optimal encoding changes qualitatively, at a bifurcation. By leveraging the IB’s intimate relations with RD, we provide substantial insights into its solution structure, highlighting caveats in its finite-dimensional treatments. Sub-optimal solutions are seen to collide or exchange optimality at its bifurcations. Despite the acceptance of the IB and its applications, there are surprisingly few techniques to solve it numerically, even for finite problems whose distribution is known. We derive anew the IB’s first-order Ordinary Differential Equation, which describes the dynamics underlying its optimal tradeoff curve. To exploit these dynamics, we not only detect IB bifurcations but also identify their type in order to handle them accordingly. Rather than approaching the IB’s optimal tradeoff curve from sub-optimal directions, the latter allows us to follow a solution’s trajectory along the optimal curve under mild assumptions. We thereby translate an understanding of IB bifurcations into a surprisingly accurate numerical algorithm. Full article
(This article belongs to the Special Issue Theory and Application of the Information Bottleneck Method)
Show Figures

Figure 1

51 pages, 7091 KiB  
Article
Exact and Soft Successive Refinement of the Information Bottleneck
by Hippolyte Charvin, Nicola Catenacci Volpi and Daniel Polani
Entropy 2023, 25(9), 1355; https://0-doi-org.brum.beds.ac.uk/10.3390/e25091355 - 19 Sep 2023
Viewed by 956
Abstract
The information bottleneck (IB) framework formalises the essential requirement for efficient information processing systems to achieve an optimal balance between the complexity of their representation and the amount of information extracted about relevant features. However, since the representation complexity affordable by real-world systems [...] Read more.
The information bottleneck (IB) framework formalises the essential requirement for efficient information processing systems to achieve an optimal balance between the complexity of their representation and the amount of information extracted about relevant features. However, since the representation complexity affordable by real-world systems may vary in time, the processing cost of updating the representations should also be taken into account. A crucial question is thus the extent to which adaptive systems can leverage the information content of already existing IB-optimal representations for producing new ones, which target the same relevant features but at a different granularity. We investigate the information-theoretic optimal limits of this process by studying and extending, within the IB framework, the notion of successive refinement, which describes the ideal situation where no information needs to be discarded for adapting an IB-optimal representation’s granularity. Thanks in particular to a new geometric characterisation, we analytically derive the successive refinability of some specific IB problems (for binary variables, for jointly Gaussian variables, and for the relevancy variable being a deterministic function of the source variable), and provide a linear-programming-based tool to numerically investigate, in the discrete case, the successive refinement of the IB. We then soften this notion into a quantification of the loss of information optimality induced by several-stage processing through an existing measure of unique information. Simple numerical experiments suggest that this quantity is typically low, though not entirely negligible. These results could have important implications for (i) the structure and efficiency of incremental learning in biological and artificial agents, (ii) the comparison of IB-optimal observation channels in statistical decision problems, and (iii) the IB theory of deep neural networks. Full article
(This article belongs to the Special Issue Theory and Application of the Information Bottleneck Method)
Show Figures

Figure 1

28 pages, 1966 KiB  
Article
On Neural Networks Fitting, Compression, and Generalization Behavior via Information-Bottleneck-like Approaches
by Zhaoyan Lyu, Gholamali Aminian and Miguel R. D. Rodrigues
Entropy 2023, 25(7), 1063; https://0-doi-org.brum.beds.ac.uk/10.3390/e25071063 - 14 Jul 2023
Viewed by 1383
Abstract
It is well-known that a neural network learning process—along with its connections to fitting, compression, and generalization—is not yet well understood. In this paper, we propose a novel approach to capturing such neural network dynamics using information-bottleneck-type techniques, involving the replacement of mutual [...] Read more.
It is well-known that a neural network learning process—along with its connections to fitting, compression, and generalization—is not yet well understood. In this paper, we propose a novel approach to capturing such neural network dynamics using information-bottleneck-type techniques, involving the replacement of mutual information measures (which are notoriously difficult to estimate in high-dimensional spaces) by other more tractable ones, including (1) the minimum mean-squared error associated with the reconstruction of the network input data from some intermediate network representation and (2) the cross-entropy associated with a certain class label given some network representation. We then conducted an empirical study in order to ascertain how different network models, network learning algorithms, and datasets may affect the learning dynamics. Our experiments show that our proposed approach appears to be more reliable in comparison with classical information bottleneck ones in capturing network dynamics during both the training and testing phases. Our experiments also reveal that the fitting and compression phases exist regardless of the choice of activation function. Additionally, our findings suggest that model architectures, training algorithms, and datasets that lead to better generalization tend to exhibit more pronounced fitting and compression phases. Full article
(This article belongs to the Special Issue Theory and Application of the Information Bottleneck Method)
Show Figures

Figure 1

23 pages, 2779 KiB  
Article
In-Network Learning: Distributed Training and Inference in Networks
by Matei Moldoveanu and Abdellatif Zaidi
Entropy 2023, 25(6), 920; https://0-doi-org.brum.beds.ac.uk/10.3390/e25060920 - 10 Jun 2023
Viewed by 904
Abstract
In this paper, we study distributed inference and learning over networks which can be modeled by a directed graph. A subset of the nodes observes different features, which are all relevant/required for the inference task that needs to be performed at some distant [...] Read more.
In this paper, we study distributed inference and learning over networks which can be modeled by a directed graph. A subset of the nodes observes different features, which are all relevant/required for the inference task that needs to be performed at some distant end (fusion) node. We develop a learning algorithm and an architecture that can combine the information from the observed distributed features, using the processing units available across the networks. In particular, we employ information-theoretic tools to analyze how inference propagates and fuses across a network. Based on the insights gained from this analysis, we derive a loss function that effectively balances the model’s performance with the amount of information transmitted across the network. We study the design criterion of our proposed architecture and its bandwidth requirements. Furthermore, we discuss implementation aspects using neural networks in typical wireless radio access and provide experiments that illustrate benefits over state-of-the-art techniques. Full article
(This article belongs to the Special Issue Theory and Application of the Information Bottleneck Method)
Show Figures

Figure 1

24 pages, 662 KiB  
Article
Counterfactual Supervision-Based Information Bottleneck for Out-of-Distribution Generalization
by Bin Deng and Kui Jia
Entropy 2023, 25(2), 193; https://0-doi-org.brum.beds.ac.uk/10.3390/e25020193 - 18 Jan 2023
Viewed by 1506
Abstract
Learning invariant (causal) features for out-of-distribution (OOD) generalization have attracted extensive attention recently, and among the proposals, invariant risk minimization (IRM) is a notable solution. In spite of its theoretical promise for linear regression, the challenges of using IRM in linear classification problems [...] Read more.
Learning invariant (causal) features for out-of-distribution (OOD) generalization have attracted extensive attention recently, and among the proposals, invariant risk minimization (IRM) is a notable solution. In spite of its theoretical promise for linear regression, the challenges of using IRM in linear classification problems remain. By introducing the information bottleneck (IB) principle into the learning of IRM, the IB-IRM approach has demonstrated its power to solve these challenges. In this paper, we further improve IB-IRM from two aspects. First, we show that the key assumption of support overlap of invariant features used in IB-IRM guarantees OOD generalization, and it is still possible to achieve the optimal solution without this assumption. Second, we illustrate two failure modes where IB-IRM (and IRM) could fail in learning the invariant features, and to address such failures, we propose a Counterfactual Supervision-based Information Bottleneck (CSIB) learning algorithm that recovers the invariant features. By requiring counterfactual inference, CSIB works even when accessing data from a single environment. Empirical experiments on several datasets verify our theoretical results. Full article
(This article belongs to the Special Issue Theory and Application of the Information Bottleneck Method)
Show Figures

Figure 1

19 pages, 3143 KiB  
Article
Minimum-Integer Computation Finite Alphabet Message Passing Decoder: From Theory to Decoder Implementations towards 1 Tb/s
by Tobias Monsees, Oliver Griebel, Matthias Herrmann, Dirk Wübben, Armin Dekorsy and Norbert Wehn
Entropy 2022, 24(10), 1452; https://0-doi-org.brum.beds.ac.uk/10.3390/e24101452 - 12 Oct 2022
Cited by 1 | Viewed by 1434
Abstract
In Message Passing (MP) decoding of Low-Density Parity Check (LDPC) codes, extrinsic information is exchanged between Check Nodes (CNs) and Variable Nodes (VNs). In a practical implementation, this information exchange is limited by quantization using only a small number of bits. In recent [...] Read more.
In Message Passing (MP) decoding of Low-Density Parity Check (LDPC) codes, extrinsic information is exchanged between Check Nodes (CNs) and Variable Nodes (VNs). In a practical implementation, this information exchange is limited by quantization using only a small number of bits. In recent investigations, a novel class of Finite Alphabet Message Passing (FA-MP) decoders are designed to maximize the Mutual Information (MI) using only a small number of bits per message (e.g., 3 or 4 bits) with a communication performance close to high-precision Belief Propagation (BP) decoding. In contrast to the conventional BP decoder, operations are given as discrete-input discrete-output mappings which can be described by multidimensional LUTs (mLUTs). A common approach to avoid exponential increases in the size of mLUTs with the node degree is given by the sequential LUT (sLUT) design approach, i.e., by using a sequence of two-dimensional Lookup-Tables (LUTs) for the design, leading to a slight performance degradation. Recently, approaches such as Reconstruction-Computation-Quantization (RCQ) and Mutual Information-Maximizing Quantized Belief Propagation (MIM-QBP) have been proposed to avoid the complexity drawback of using mLUTs by using pre-designed functions that require calculations over a computational domain. It has been shown that these calculations are able to represent the mLUT mapping exactly by executing computations with infinite precision over real numbers. Based on the framework of MIM-QBP and RCQ, the Minimum-Integer Computation (MIC) decoder design generates low-bit integer computations that are derived from the Log-Likelihood Ratio (LLR) separation property of the information maximizing quantizer to replace the mLUT mappings either exactly or approximately. We derive a novel criterion for the bit resolution that is required to represent the mLUT mappings exactly. Furthermore, we show that our MIC decoder has exactly the communication performance of the corresponding mLUT decoder, but with much lower implementation complexity. We also perform an objective comparison between the state-of-the-art Min-Sum (MS) and the FA-MP decoder implementations for throughput towards 1 Tb/s in a state-of-the-art 28 nm Fully-Depleted Silicon-on-Insulator (FD-SOI) technology. Furthermore, we demonstrate that our new MIC decoder implementation outperforms previous FA-MP decoders and MS decoders in terms of reduced routing complexity, area efficiency and energy efficiency. Full article
(This article belongs to the Special Issue Theory and Application of the Information Bottleneck Method)
Show Figures

Figure 1

37 pages, 1158 KiB  
Article
The Double-Sided Information Bottleneck Function
by Michael Dikshtein, Or Ordentlich and Shlomo Shamai (Shitz)
Entropy 2022, 24(9), 1321; https://0-doi-org.brum.beds.ac.uk/10.3390/e24091321 - 19 Sep 2022
Viewed by 2004
Abstract
A double-sided variant of the information bottleneck method is considered. Let (X,Y) be a bivariate source characterized by a joint pmf PXY. The problem is to find two independent channels PU|X and [...] Read more.
A double-sided variant of the information bottleneck method is considered. Let (X,Y) be a bivariate source characterized by a joint pmf PXY. The problem is to find two independent channels PU|X and PV|Y (setting the Markovian structure UXYV), that maximize I(U;V) subject to constraints on the relevant mutual information expressions: I(U;X) and I(V;Y). For jointly Gaussian X and Y, we show that Gaussian channels are optimal in the low-SNR regime but not for general SNR. Similarly, it is shown that for a doubly symmetric binary source, binary symmetric channels are optimal when the correlation is low and are suboptimal for high correlations. We conjecture that Z and S channels are optimal when the correlation is 1 (i.e., X=Y) and provide supporting numerical evidence. Furthermore, we present a Blahut–Arimoto type alternating maximization algorithm and demonstrate its performance for a representative setting. This problem is closely related to the domain of biclustering. Full article
(This article belongs to the Special Issue Theory and Application of the Information Bottleneck Method)
Show Figures

Figure 1

20 pages, 449 KiB  
Article
Symmetry-Breaking Bifurcations of the Information Bottleneck and Related Problems
by Albert E. Parker and Alexander G. Dimitrov
Entropy 2022, 24(9), 1231; https://0-doi-org.brum.beds.ac.uk/10.3390/e24091231 - 02 Sep 2022
Cited by 1 | Viewed by 1147
Abstract
In this paper, we investigate the bifurcations of solutions to a class of degenerate constrained optimization problems. This study was motivated by the Information Bottleneck and Information Distortion problems, which have been used to successfully cluster data in many different applications. In the [...] Read more.
In this paper, we investigate the bifurcations of solutions to a class of degenerate constrained optimization problems. This study was motivated by the Information Bottleneck and Information Distortion problems, which have been used to successfully cluster data in many different applications. In the problems we discuss in this paper, the distortion function is not a linear function of the quantizer. This leads to a challenging annealing optimization problem, which we recast as a fixed-point dynamics problem of a gradient flow of a related dynamical system. The gradient system possesses an SN symmetry due to its invariance in relabeling representative classes. Its flow hence passes through a series of bifurcations with specific symmetry breaks. Here, we show that the dynamical system related to the Information Bottleneck problem has an additional spurious symmetry that requires more-challenging analysis of the symmetry-breaking bifurcation. For the Information Bottleneck, we determine that when bifurcations occur, they are only of pitchfork type, and we give conditions that determine the stability of the bifurcating branches. We relate the existence of subcritical bifurcations to the existence of first-order phase transitions in the corresponding distortion function as a function of the annealing parameter, and provide criteria with which to detect such transitions. Full article
(This article belongs to the Special Issue Theory and Application of the Information Bottleneck Method)
Show Figures

Figure 1

15 pages, 347 KiB  
Article
Revisiting Sequential Information Bottleneck: New Implementation and Evaluation
by Assaf Toledo, Elad Venezian and Noam Slonim
Entropy 2022, 24(8), 1132; https://0-doi-org.brum.beds.ac.uk/10.3390/e24081132 - 16 Aug 2022
Cited by 1 | Viewed by 1096
Abstract
We introduce a modern, optimized, and publicly available implementation of the sequential Information Bottleneck clustering algorithm, which strikes a highly competitive balance between clustering quality and speed. We describe a set of optimizations that make the algorithm computation more efficient, particularly for the [...] Read more.
We introduce a modern, optimized, and publicly available implementation of the sequential Information Bottleneck clustering algorithm, which strikes a highly competitive balance between clustering quality and speed. We describe a set of optimizations that make the algorithm computation more efficient, particularly for the common case of sparse data representation. The results are substantiated by an extensive evaluation that compares the algorithm to commonly used alternatives, focusing on the practically important use case of text clustering. The evaluation covers a range of publicly available benchmark datasets and a set of clustering setups employing modern word and sentence embeddings obtained by state-of-the-art neural models. The results show that in spite of using the more basic Term-Frequency representation, the proposed implementation provides a highly attractive trade-off between quality and speed that outperforms the alternatives considered. This new release facilitates the use of the algorithm in real-world applications of text clustering. Full article
(This article belongs to the Special Issue Theory and Application of the Information Bottleneck Method)
Show Figures

Figure 1

30 pages, 598 KiB  
Article
Information Bottleneck Signal Processing and Learning to Maximize Relevant Information for Communication Receivers
by Jan Lewandowsky, Gerhard Bauch and Maximilian Stark
Entropy 2022, 24(7), 972; https://0-doi-org.brum.beds.ac.uk/10.3390/e24070972 - 14 Jul 2022
Cited by 1 | Viewed by 1647
Abstract
Digital communication receivers extract information about the transmitted data from the received signal in subsequent processing steps, such as synchronization, demodulation and channel decoding. Technically, the receiver-side signal processing for conducting these tasks is complex and hence causes bottleneck situations in terms of [...] Read more.
Digital communication receivers extract information about the transmitted data from the received signal in subsequent processing steps, such as synchronization, demodulation and channel decoding. Technically, the receiver-side signal processing for conducting these tasks is complex and hence causes bottleneck situations in terms of power, delay and chip area. Typically, many bits per sample are required to represent and process the received signal in the digital receiver hardware accurately. In addition, demanding arithmetical operations are required in the signal processing algorithms. A popular recent trend is designing entire receiver chains or some of their crucial building blocks from an information theoretical perspective. Signal processing blocks with very simple mathematical operations can be designed to directly maximize the relevant information that flows through them. At the same time, a strong quantization reduces the number of bits processed in the receiver to further lower the complexity. The described system design approach follows the principle of the information bottleneck method. Different authors proposed various ideas to design and implement mutual information-maximizing signal processing units. The first important aim of this article is to explain the fundamental similarities between the information bottleneck method and the functionalities of communication receivers. Based on that, we present and investigate new results on an entire receiver chain that is designed following the information bottleneck design principle. Afterwards, we give an overview of different techniques following the information bottleneck design paradigm from the literature, mainly dealing with channel decoding applications. We analyze the similarities of the different approaches for information bottleneck signal processing. This comparison leads to a general view on information bottleneck signal processing which goes back to the learning of parameters of trainable functions that maximize the relevant mutual information under compression. Full article
(This article belongs to the Special Issue Theory and Application of the Information Bottleneck Method)
Show Figures

Figure 1

30 pages, 530 KiB  
Article
Distributed Quantization for Partially Cooperating Sensors Using the Information Bottleneck Method
by Steffen Steiner, Abdulrahman Dayo Aminu and Volker Kuehn
Entropy 2022, 24(4), 438; https://0-doi-org.brum.beds.ac.uk/10.3390/e24040438 - 22 Mar 2022
Cited by 2 | Viewed by 1837
Abstract
This paper addresses the optimization of distributed compression in a sensor network with partial cooperation among sensors. The widely known Chief Executive Officer (CEO) problem, where each sensor has to compress its measurements locally in order to forward them over capacity limited links [...] Read more.
This paper addresses the optimization of distributed compression in a sensor network with partial cooperation among sensors. The widely known Chief Executive Officer (CEO) problem, where each sensor has to compress its measurements locally in order to forward them over capacity limited links to a common receiver is extended by allowing sensors to mutually communicate. This extension comes along with modified statistical dependencies among involved random variables compared to the original CEO problem, such that well-known outer and inner bounds do not hold anymore. Three different inter-sensor communication protocols are investigated. The successive broadcast approach allows each sensor to exploit instantaneous side-information of all previously transmitting sensors. As this leads to dimensionality problems for larger networks, a sequential point-to-point communication scheme is considered forwarding instantaneous side-information to only one successor. Thirdly, a two-phase transmission protocol separates the information exchange between sensors and the communication with the common receiver. Inspired by algorithmic solutions for the original CEO problem, the sensors are optimized in a greedy manner. It turns out that partial communication among sensors improves the performance significantly. In particular, the two-phase transmission can reach the performance of a fully cooperative CEO scenario, where each sensor has access to all measurements and the knowledge about all channel conditions. Moreover, exchanging instantaneous side-information increases the robustness against bad Wyner–Ziv coding strategies, which can lead to significant performance losses in the original CEO problem. Full article
(This article belongs to the Special Issue Theory and Application of the Information Bottleneck Method)
Show Figures

Figure 1

Back to TopTop