Next Article in Journal
Causality in Schwinger’s Picture of Quantum Mechanics
Next Article in Special Issue
Weighted Relative Group Entropies and Associated Fisher Metrics
Previous Article in Journal
On Quantization Errors in Approximate and Sample Entropy
Previous Article in Special Issue
Kernel Estimation of Cumulative Residual Tsallis Entropy and Its Dynamic Version under ρ-Mixing Dependent Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On a 2-Relative Entropy

School of Mathematical Sciences, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
Submission received: 4 December 2021 / Revised: 27 December 2021 / Accepted: 27 December 2021 / Published: 31 December 2021
(This article belongs to the Special Issue Measures of Information II)

Abstract

:
We construct a 2-categorical extension of the relative entropy functor of Baez and Fritz, and show that our construction is functorial with respect to vertical morphisms. Moreover, we show such a ‘2-relative entropy’ satisfies natural 2-categorial analogues of convex linearity, vanishing under optimal hypotheses, and lower semicontinuity. While relative entropy is a relative measure of information between probability distributions, we view our construction as a relative measure of information between channels.

1. Introduction

Let X and Y be finite sets which are the input and output alphabets of a discrete memoryless channel X f Y with probability transition matrix f y x , representing the probability of the output y given the input x. Every input x then determines a probability distribution on Y which we denote by f x , so that f x ( y ) = f y x for all x X and y Y . The channel X f Y together with the choice of a prior distribution p on X will be denoted ( f | p ) , and such data then determine a distribution ϑ ( f | p ) on X × Y given by ϑ ( f | p ) ( x , y ) = p x f y x . Given a second channel X g Y with prior distribution q on X, the chain rule for relative entropy says that the relative entropy D ϑ ( f | p ) , ϑ ( g | q ) is given by
D ϑ ( f | p ) , ϑ ( g | q ) = D ( p , q ) + x X p x D ( f x , g x ) .
As the RHS of (1) involves precisely the datum of the channels f and g together with the prior distributions p and q, we view the quantity D ϑ ( f | p ) , ϑ ( g | q ) as a relative measure of information between the channels ( f | p ) and ( g | q ) . In particular, since from a Bayesian perspective D ( p , q ) may be thought of as the amount of information gained upon discovering that the assumed prior distribution p is actually q, it seems only natural to think of D ϑ ( f | p ) , ϑ ( g | q ) as the amount of information gained upon learning that the assumed channel ( f | p ) is actually the channel ( g | q ) .
To make such a Bayesian interpretation more precise, we build upon the work of Baez and Fritz [1], who formulated a type of Bayesian inference as a process  X Y (including a set of conditional hypotheses on the outcome of the process), which given a prior distribution p on X yields distributions r on Y and q on X in such a way that the relative entropy D ( p , q ) has an operational meaning as a quantity associated with a Bayesian updating with respect to the process X Y . (Here, X may be thought of more generally as the set of possible states of some system to be measured, while Y may be thought of as the possible outcomes of the measurement.) Baez and Fritz then proved that up to a constant multiple, the map on such processes given by
( X Y ) RE ( X Y ) : = D ( p , q )
is the unique map satisfying the following axioms.
  • Functoriality: Given a composition of processes X Y Z ,
    RE ( X Y Z ) = RE ( X Y ) + RE ( Y Z ) .
  • Convex Linearity: Given a collection of processes U x V x indexed by the elements x X of a finite probability space ( X , p ) ,
    RE x X p x ( U x V x ) = x X p x RE U x V x .
  • Vanishing Under Optimal Hypotheses: If the conditional hypotheses associated with a process X Y are optimal, then
    RE ( X Y ) = 0 .
  • Continuity: The map ( X Y ) RE ( X Y ) is lower semi-continuous.
While Baez and Fritz facilitated their exposition using the language of category theory [2], knowing that a category consists of a class of objects together with a class of composable arrows (i.e., morphisms) between objects is all that is needed for an appreciation of their construction. From such a perspective, the aforementioned processes X Y are morphisms in a category FinStat , and the relative entropy assignment given by (2) is then a map from morphisms in FinStat to [ 0 , ] .
In what follows, we elevate the construction of Baez and Fritz to the level of 2-categories (or more precisely, double categories), whose 2-morphisms may be viewed as certain processes between processes, or rather, processes which connect one channel to another. In particular, in Section 2, we formally introduce the category FinStat introduced by Baez and Fritz, and then, in Section 3, we review their functorial characterization of relative entropy using FinStat . In Section 4, we construct a category FinStat 2 which is a 2-level extension of FinStat , and in Section 5, we define a convex structure on FinStat 2 . In Section 6, we define a relative measure of information between channels which we refer to as conditional relative entropy, and show that it is convex linear and functorial with respect to vertical morphisms in FinStat 2 . The conditional relative entropy is then used in Section 7 to define a relative entropy assignment RE 2 on 2-morphisms via the chain rule as given by (1) (for more on the chain rule for relative entropy one may consult Chapter 2 of [3]). Moreover, we show that such a ‘2-relative entropy’ satisfies the natural 2-level analogues of axioms 1–4 as satisfied by the relative entropy map RE.
As abstract as a relative entropy of processes between processes may seem, Shannon’s Noisy Channel Coding Theorem—which is a cornerstone of information theory—is essentially a statement about transforming a noisy channel into a noiseless one via a sequence of codings and encodings. From such a viewpoint, information theory is fundamentally about processes (i.e., a sequence of codings and encodings), between processes (i.e., channels), and it is precisely this viewpoint with which we will proceed. Furthermore, there is a growing recent interest in axiomatic and categorical approaches to information theory [4,5,6,7,8,9,10,11,12,13], and the present work is a direct outgrowth of such activity.

2. The Category FinStat

In this section, we introduce the first-level structure of interest, which is the category FinStat introduced by Baez and Fritz [1]. Though we use the language of categories, knowing that a category consists of a class of composable arrows between a class of objects is sufficient for the comprehension of all categorical notions in this work.
Definition 1.
Let X and Y be finite sets. Adiscrete memoryless channel(or simplychannelfor short) X f Y associates every x X with a probability distribution f x on Y. In such a case, the sets X and Y are referred to as theset of inputsandset of outputsof the channel f, respectively, and f x ( y ) is the probability of receiving the output y given the input x, which will be denoted by f y x .
Definition 2.
If X f Y and Y g Z are channels, then the composition X g f Z is given by
g z x = x X g z y f y x
for all x X and y Y .
Remark 1.
If X f Y is a channel such that for every x X there exists a y Y with f y x = 1 , then such a y is necessarily unique given x, and as such, f may be identified with a function f : X Y . In such a case, we say that the channel f ispure(ordeterministic), and from here on, we will not distinguish the difference between a pure channel and the associated function from its set of inputs to its set of outputs.
Definition 3.
Ifdenotes a set with a single element, then a channel p X is simply a probability distribution on X, and, in such a case, we will use p x to denote the probability of x as given by p for all x X . The pair ( X , p ) is then referred to as a finite probability space.
Notation 1.
The datum of a channel X f Y together with a prior distribution p X on its set of inputs will be denoted ( f | p ) .
Definition 4.
Let FinStat denote the category whose objects are finite probability spaces, and whose morphisms ( X , p ) ( Y , q ) consist of the following data:
  • A function f : X Y such that f p = q ;
  • A channel Y s X such that f s = id Y . In other words, Y s X is astochastic sectionof f : X Y .
A morphism in FinStat is then summarized by a diagram of the form
Entropy 24 00074 i001
and a composition of morphisms in FinStat is obtained via function composition and composition of stochastic sections. In such a case, it is straightforward to show that a composition of stochastic sections is a stochastic section, etc. The morphism corresponding to diagram (3) will often be denoted ( f , p , s ) .
Remark 2.
Note that in diagram (3), a straight arrow is used for X f Y as f is a function, as opposed to a noisy channel.
Remark 3.
The operational interpretation of diagram (3) is as follows. The set X is thought of as the set of possible states of the system, and f : X Y is then thought of as a measurement process, so that Y is then thought of as the set of possible states of some measuring apparatus. The stochastic section Y s X is then thought of as a set of hypotheses about the state of the system given a state of the measuring apparatus. In particular, s x y is thought of as the probability the system was in state x given the state y of the measuring apparatus.
Definition 5.
If the stochastic section Y s X in diagram (3) is such that s q = p , then s will be referred to as anoptimal hypothesisfor ( f | p ) .
Definition 6.
Let ( X , p ) be a finite probability space, and let U x μ x V x be a collection of channels with prior distributions q x U x indexed by X. Theconvex combinationof ( μ x | q x ) with respect to ( X , p ) is the channel
x X U x x X μ x x X V x x X p x q x ,
where x X μ x is the channel given by
x X μ x v u = μ v u x if ( v , u ) V x × U x for some x X 0 otherwise ,
with prior distribution x X p x q x U x given by
x X p x q x u = p x u q u x u ,
where x u is such that u U x u . Such a convex combination will be denoted x X p x ( μ x | q x ) .

3. The Baez and Fritz Characterization of Relative Entropy

We now recall the Baez and Fritz characterization of relative entropy in FinStat .
Definition 7.
Let ( X , p ) be a finite probability space, and let
U x μ x V x , q x U x , V x s x U x
be a collection of morphisms in FinStat indexed by X. Theconvex combinationof ( μ x , q x , s x ) with respect to ( X , p ) is the morphism x X p x ( μ x , q x , s x ) in FinStat corresponding to the diagram
Entropy 24 00074 i002
where r x = μ x q x for all x X .
Definition 8.
Let ( f , p , s ) be a morphism in FinStat , and let r = s f p . Therelative entropyof ( f , p , s ) is the non-negative extended real number R E ( f , p , s ) [ 0 , ] given by
R E ( f , p , s ) = D ( p , r ) ,
where D ( p , r ) = x p x log ( p x / r x ) is the relative entropy between the distributions p and r on X.
Definition 9.
Let F : FinStat [ 0 , ] be a map from the morphisms in FinStat to the extended non-negative reals [ 0 , ] .
  • F is said to befunctorialif and only if for every composition ( g f , p , s t ) of morphisms in FinStat we have
    F ( g f , p , s t ) = F ( f , p , s ) + F ( g , f p , t ) .
  • F is said to beconvex linearif and only if for every convex combination x X p x ( μ x , q x , s x ) of morphisms in FinStat we have
    F x X p x ( μ x , q x , s x ) = x X p x F ( μ x , q x , s x ) .
  • F is said to bevanishing under optimal hypothesesif and only if for every morphism ( f , p , s ) in FinStat with s an optimal hypothesis we have
    F ( f , p , s ) = 0 .
  • F is said to belower semicontinuousif and only if for every sequence of morphisms ( f , p n , s n ) in FinStat converging to a morphism ( f , p , s ) we have
    F ( f , p , s ) lim inf n F ( f , p n , s n ) .
Theorem 1
(The Baez and Fritz Characterization of Relative Entropy). Let S be the collection of maps from the morphisms in FinStat to [ 0 , ] which are functorial, convex linear, vanishing under optimal hypotheses and lower semicontinuous. Then, the following statements hold.
1.
The relative entropy RE is an element of S ;
2.
If F S , then F = c RE for some non-negative constant c R .

4. The Category FinStat 2

In this section, we introduce the second-level structure of interest, namely, the double category FinStat 2 , which is a 2-level extension of FinStat .
Definition 10.
Let FinStat 2 denote the 2-category whose objects and 1-morphisms coincide with those of F i n S t a t , and whose 2-morphisms are constructed as follows. Given 1-morphisms
Entropy 24 00074 i003
a 2-morphism : ( μ , p , s ) ( ν , q , t ) consists of channels X f Y and X f Y such that
  • f p = q
  • f p = q
  • ν f = f μ
The 2-morphism : ( μ , p , s ) ( ν , q , t ) may then be summarized by the following diagram.
Entropy 24 00074 i004
Remark 4.
A crucial point is that in the above diagram, all arrows necessarily commute except for any compositions involving the the outer ‘wings’, s and t. For example, the compositions s μ p and t f μ need not be equal to p and f, respectively.
Remark 5.
Diagram (5) should be thought of as a flattened out pyramid, whose base is the inner square and whose vertex is obtained by the identification of the upper and lower stars in the diagram.
Remark 6.
For an operational interpretation of a 2-morphism in FinStat 2 as given by diagram (5), one may consider X and Y as sample spaces associated with all possible outcomes of experiments E X and E Y . As the sets X and Y are endowed with prior distributions p and q, the maps μ : X X and ν : Y Y are then random variables with values in X and Y , and the stochastic sections s and t then represent conditional hypotheses about the outcomes of the measurements corresponding to μ and ν. The channels f : X Y and f : X Y then represent stochastic processes such that taking the measurement μ followed by f results in the same process as first letting X evolve according to f and then taking the measurement ν.
Example 1.
For a real-life scenario which realizes a 2-morphism in FinStat , suppose two experimenters Alice and Bob are collaborating on a project to verify predictions of a theory. As such, Alice and her data analyst partner Alicia travel to a mountain in Brazil during a solar eclipse to perform experiments while Bob and and his data-analyst partner Bernie travel to a mountain in Montenegro at the same time for the same purpose. Alice and Bob will then perform experiments in their separate locations and hand their results over to Alicia and Bernie, who will then analyze the data to produce numerical results. At the end of each day, Alice will report her results to Bob over a noisy channel while Alicia will report her results to Bernie over a noisy channel, so that Bob and Bernie may compare their results with Alice and Alicia’s. We then summarize such a scenario with the following 2-morphism in FinStat 2 :
Entropy 24 00074 i005
In diagram (6), p and q are assumed prior distributions on Alice and Bob’s measurements, while s and t are empirical conditional distributions on Alice and Bob’s measurements given the data outcomes of Alicia and Bernie’s analysis. Moreover, if the communication channel f is less reliable than f , then the composition t f μ provides a Bayesian updating for the channel f.
For vertical composition of 2-morphisms, suppose : ( μ , p , s ) ( ν , q , t ) is the 2-morphism summarized by the following diagram.
Entropy 24 00074 i006
The vertical composition is then summarized by the following diagram.
Entropy 24 00074 i007
For horizontal composition, let : ( ν , q , t ) ( ξ , r , u ) be a 2-morphism summarized by the following diagram.
Entropy 24 00074 i008
The horizontal composition is then summarized by the following diagram.
Entropy 24 00074 i009

5. Convexity in FinStat 2

We now generalize the convex structure on morphisms in FinStat to 2-morphisms in FinStat 2 . For this, let ( X , p ) be a finite probability space, and let x be a collection of 2-morphisms in FinStat 2 indexed by X, where x is summarized by the following diagram.
Entropy 24 00074 i010
Definition 11.
Theconvex sum x X p x x is the 2-morphism in FinStat 2 summarized by the following diagram.
Entropy 24 00074 i011

6. Conditional Relative Entropy in FinStat 2

We now introduce a measure of information associated with 2-morphisms in FinStat 2 which we refer to as ‘conditional relative entropy’. The results proved in this section are essentially all lemmas for the results proved in the next section, where we introduce a 2-level extension of the relative entropy map RE, and show that it satisfies the 2-level analogues of the characterizing axioms of relative entropy.
Definition 12.
With every 2-morphism
Entropy 24 00074 i012
we associate the non-negative extended real number CE ( ) [ 0 , ] given by
CE ( ) = x X p x D ( f x , ( t f μ ) x ) ,
where D ( , ) is the standard relative entropy. We refer to CE ( ) as the conditional relative entropy of ♠.
Remark 7.
We refer to CE ( ) as conditional relative entropy as its defining formula (7) is structurally similar to the defining formula for conditional entropy. In particular, if X f Y is a channel with prior distribution p X , then the conditional entropy H ( f | p ) is given by
H ( f | p ) = x X p x H ( f x ) ,
where H ( f x ) is the Shannon entropy of the distribution f x on Y.
Proposition 1.
Conditional relative entropy in FinStat 2 is convex linear, i.e., if ( X , p ) is a finite probability space and x is a collection of 2-morphisms in FinStat 2 indexed by X, then
C E x X p x x = x X p x C E ( x ) .
Proof. 
Suppose x X p x x is summarized by the following diagram
Entropy 24 00074 i013
and let U = x X U x , μ = x X μ x , f = x X f x , f = x X f x and t = x X t x . We then have
CE x X p x x = u U q u D ( f u , ( t f μ ) u ) = x X u x U x p x q u x x D ( f u x , ( t f μ ) u x ) = x X p x u x U x q u x x D ( ( f x ) u x , ( t x f x μ x ) u x ) = x X p x CE ( x ) ,
as desired. □
Theorem 2.
Conditional relative entropy in FinStat 2 is functorial with respect to vertical composition, i.e., if is a vertical composition in FinStat 2 , then C E ( ) = C E ( ) + C E ( ) .
Lemma 1.
Let X f Y g Z be a composition of channels.
1.
If f is a pure channel, then ( g f ) z x = g z f ( x ) ;
2.
If g is a stochastic section of a pure channel Z h Y , then ( g f ) z x = g z h ( z ) f h ( z ) x .
Proof. 
The statements 1 and 2 follow immediately from the definitions of pure channel and stochastic section. □
Lemma 2.
Let ♠ be a 2-morphism in FinStat 2 as summarized by the diagram
Entropy 24 00074 i014
Then,
1.
C E ( ) = x X y Y p x f y x log f y x t y ν ( y ) f ν ( y ) μ ( x ) for all x X ;
2.
p x f y x = x μ 1 ( x ) y ν 1 ( y ) p x f y x for all ( x , y ) X × Y .
Proof. 
To prove item (1), let x X and y Y . Then,
( t f μ ) y x = ( t ( f μ ) ) y x = y Y t y y ( f μ ) y x = y Y t y y f y μ ( x ) = t y ν ( y ) f ν ( y ) μ ( x ) ,
where the third equality follows from Lemma 1 since μ is a pure channel, and the fourth equality follows also from Lemma 1 since t is a stochastic section of a pure channel. We then have
CE ( ) = x X p x D ( f x , ( t f μ ) x ) = x X y Y p x f y x log f y x ( t f μ ) y x = x X y Y p x f y x log f y x t y ν ( y ) f ν ( y ) μ ( x ) ,
as desired.
To prove item (2), the condition ν f = f μ is equivalent to the equation ( ν f ) y x = ( f μ ) y x for all y Y and x X . In addition, since
( f μ ) y x = f y μ ( x ) ,
and
( ν f ) y x = y Y ν y y f y x = y ν 1 ( y ) ν y y f y x = y ν 1 ( y ) f y x ,
it follows that f y μ ( x ) = y ν 1 ( y ) f y x , thus, for all x X and y Y , it follows that f y x = y ν 1 ( y ) f y x for all x μ 1 ( x ) . As such, we have
p x f y x = x μ 1 ( x ) p x y ν 1 ( y ) f y x = x μ 1 ( x ) y ν 1 ( y ) p x f y x ,
as desired. □
Proof of Theorem 2.
Suppose ♠ and ♣ are such that the vertical composition is summarized by the following diagram.
Entropy 24 00074 i015
By item (1) in Lemma 2, we then have
CE ( ) = x X y Y p x f y x log f y x ( t t ) y ( ν ν ) ( y ) f ( ν ν ) ( y ) ( μ μ ) ( x ) ,
and since t is a section of the pure channel ν , by item (2) of Lemma 1, it follows that for every y Y we have
( t t ) y ( ν ν ) ( y ) = t y ν ( y ) t ν ( y ) ( ν ν ) ( y ) .
As such, we have
CE ( ) = x X y Y p x f y x log f y x t y ν ( y ) t ν ( y ) ( ν ν ) ( y ) f ( ν ν ) ( y ) = x X y Y p x f y x log f y x f ν ( y ) μ ( x ) t y ν ( y ) f ν ( y ) μ ( x ) t ν ( y ) ( ν ν ) ( y ) f ( ν ν ) ( y ) ( μ μ ) ( x ) = x X y Y p x f y x log f y x t y ν ( y ) f ν ( y ) μ ( x ) + x X y Y p x f y x log f ν ( y ) μ ( x ) t ν ( y ) ( ν ν ) ( y ) f ( ν ν ) ( y ) ( μ μ ) ( x ) = x X y Y p x f y x log f y x t y ν ( y ) f ν ( y ) μ ( x ) + x X y Y p x f y x log f y x t y ν ( y ) f ν ( y ) μ ( x ) ( by item ( 2 ) of Lemma   2 ) = x X p x D ( f x , ( t f μ ) x ) + x X p x D ( f x , ( t f μ ) x ) = CE ( ) + CE ( ) ,
as desired (the second-to-last equality follows by item (1) of Lemma 2). □

7. Relative Entropy in FinStat 2

In this section, we introduce a 2-level extension of the relative entropy map RE introduced by Baez and Fritz, and show that it satisfies the natural 2-level analogues of functoriality, convex linearity, vanishing under optimal hypotheses, and lower semicontinuity.
Definition 13.
With every 2-morphism
Entropy 24 00074 i016
we associate the non-negative extended real number RE 2 ( ) [ 0 , ] given by
RE 2 ( ) = RE ( μ , p , s ) + CE ( ) ,
which we refer to as the 2-relative entropy of ♠. We note that the quantity RE ( μ , p , s ) appearing on the RHS of (8) is the relative entropy associated with the morphism ( μ , p , s ) in FinStat , so that
RE ( μ , p , s ) = D ( p , s μ p ) ,
where D ( , ) is the standard relative entropy.
Proposition 2.
2-Relative entropy is convex linear, i.e., if ( X , p ) is a finite probability space and x is a collection of 2-morphisms in FinStat 2 indexed by X, then
R E 2 x X p x x = x X p x R E 2 ( x ) .
Proof. 
Suppose x X p x x is summarized by the following diagram
Entropy 24 00074 i017
By Theorem 1, we know that the relative entropy RE is convex linear over 1-morphisms in FinStat 2 , and by Proposition 1, we know conditional relative entropy is convex linear over 2-morphisms in FinStat 2 , thus
RE x X p x ( μ x , q x , s x ) = x X p x RE ( μ x , q x , s x ) & CE x X p x x = x X p x CE ( x ) .
We then have
RE 2 x X p x x = RE x X p x ( μ x , q x , s x ) + CE x X p x x = ( 9 ) x X p x RE ( μ x , q x , s x ) + x X p x CE ( x ) = x X p x RE ( μ x , q x , s x ) + CE x = x X p x RE 2 ( x ) ,
as desired. □
Theorem 3.
Relative entropy is functorial with respect to vertical composition, i.e., if is a vertical composition in FinStat 2 , then RE 2 ( ) = RE 2 ( ) + RE 2 ( ) .
Proof. 
Suppose ♠ and ♣ are such that the vertical composition is summarized by the following diagram.
Entropy 24 00074 i018
Then,
RE 2 ( ) = RE ( μ μ , p , s s ) + CE ( ) = RE ( μ , p , s ) + RE ( μ , p , s ) + CE ( ) + CE ( ) = RE 2 ( ) + RE 2 ( ) ,
where the second equality follows from Theorems 1 and 2. □
Proposition 3.
Let : ( μ , p , s ) ( ν , q , t ) be a 2-morphism in FinStat , and suppose s and t are optimal hypotheses for ( μ | p ) and ( ν | q ) as (defined in Definition 5). Then, RE 2 ( ) = 0 .
Proof. 
Since s and t are optimal hypotheses, it follows that RE ( μ , p , s ) = CE ( ) = 0 , from which the proposition follows. □
Proposition 4.
The 2-relative entropy RE 2 is lower semicontinuous.
Proof. 
Since the 2-relative entropy RE 2 is a linear combination of 1-level relative entropies, and 1-level relative entropies are lower semicontinuous by Theorem 1, it follows that RE 2 is lower semicontinuous. □

8. Conclusions, Limitations and Future Research

In this work, we have constructed a 2-categorical extension RE 2 of the relative entropy functor RE of Baez and Fritz [1], yielding a new measure of information which we view as a relative measure of information between noisy channels. Moreover, we show that our construction satisfies natural 2-level analogues of functoriality, convex linearity, vanishing under optimal hypotheses and lower semicontinuity. As the relative entropy functor of Baez and Fritz is uniquely characterized by such properties, it is only natural to question if our 2-level extension RE 2 of RE is also uniquely characterized by the 2-level analogues of such properties. It would also be interesting to investigate alternative versions of 2-morphisms in FinStat where the 2-morphisms are less restrictive, such as where the base pyramid of a 2-morphism is not assumed to be commutative. While taking the 2-relative entropy associated with such morphisms would not be functorial, such less restrictive morphisms would provide more flexibility for potential applications. Finally, as there are many other categories of interest with respect to information theory [8,14,15], it would be interesting to investigate 2-level extensions of such categories as well.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Baez, J.C.; Fritz, T. A Bayesian characterization of relative entropy. Theory Appl. Categ. 2014, 29, 422–457. [Google Scholar]
  2. Mac Lane, S. Categories for the Working Mathematician, 2nd ed.; Springer: New York, NY, USA, 1998. [Google Scholar]
  3. Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
  4. Baez, J.C.; Fritz, T.; Leinster, T. A characterization of entropy in terms of information loss. Entropy 2011, 13, 1945–1957. [Google Scholar] [CrossRef]
  5. Coecke, B.; Fritz, T.; Spekkens, R. A mathematical theory of resources. Inform. Comput. 2016, 250, 59–86. [Google Scholar] [CrossRef] [Green Version]
  6. Faddeev, D.K. On the concept of entropy of a finite probabilistic scheme. Uspekhi Mat. Nauk 1956, 11, 227–231. [Google Scholar]
  7. Fong, B. Causal Theories: A Categorical Perspective on Bayesian Networks. Master’s Thesis, University of Oxford, Oxford, UK, 2012. [Google Scholar]
  8. Fritz, T. A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Adv. Math. 2020, 370, 107239. [Google Scholar] [CrossRef]
  9. Fullwood, J. An axiomatic characterization of mutual information. arXiv 2021, arXiv:2108.12647. [Google Scholar]
  10. Fullwood, J.; Parzygnat, A.P. The Information Loss of a Stochastic Map. Entropy 2021, 23, 1021. [Google Scholar] [CrossRef] [PubMed]
  11. Leinster, T. A short characterization of relative entropy. J. Math. Phys. 2019, 60, 023302. [Google Scholar] [CrossRef] [Green Version]
  12. Leinster, T. Entropy and Diversity: The Axiomatic Approach; Cambridge University Press: Cambridge, UK, 2021. [Google Scholar]
  13. Parzygnat, A.P. A functorial characterization of von Neumann entropy. IHÉS prépublications. arXiv 2020, arXiv:2009.07125. [Google Scholar]
  14. Cho, K.; Jacobs, B. Disintegration and Bayesian inversion via string diagrams. Math. Struct. Comput. Sci. 2019, 29, 938–971. [Google Scholar] [CrossRef] [Green Version]
  15. Golubtsov, P.V. Information transformers: Category-theoretical structure, informativeness, decision-making problems. Hadron. J. Suppl. 2004, 19, 375–424. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Fullwood, J. On a 2-Relative Entropy. Entropy 2022, 24, 74. https://0-doi-org.brum.beds.ac.uk/10.3390/e24010074

AMA Style

Fullwood J. On a 2-Relative Entropy. Entropy. 2022; 24(1):74. https://0-doi-org.brum.beds.ac.uk/10.3390/e24010074

Chicago/Turabian Style

Fullwood, James. 2022. "On a 2-Relative Entropy" Entropy 24, no. 1: 74. https://0-doi-org.brum.beds.ac.uk/10.3390/e24010074

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop