Next Article in Journal
Consistency of Learning Bayesian Network Structures with Continuous Variables: An Information Theoretic Approach
Previous Article in Journal
Fruit Classification by Wavelet-Entropy and Feedforward Neural Network Trained by Fitness-Scaled Chaotic ABC and Biogeography-Based Optimization
Previous Article in Special Issue
Asymptotic Description of Neural Networks with Correlated Synaptic Weights
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deformed Algebras and Generalizations of Independence on Deformed Exponential Families

1
Department of Computer Science and Engineering, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya, 466-8555, Japan
2
Department of Electrical and Electronic Engineering, Ibaraki University, 4-12-1 Nakanarusawa, Hitachi, Ibaraki, 316-8511, Japan
*
Author to whom correspondence should be addressed.
Entropy 2015, 17(8), 5729-5751; https://0-doi-org.brum.beds.ac.uk/10.3390/e17085729
Submission received: 1 February 2015 / Revised: 1 February 2015 / Accepted: 4 August 2015 / Published: 10 August 2015
(This article belongs to the Special Issue Entropic Aspects in Statistical Physics of Complex Systems)

Abstract

:
A deformed exponential family is a generalization of exponential families. Since the useful classes of power law tailed distributions are described by the deformed exponential families, they are important objects in the theory of complex systems. Though the deformed exponential families are defined by deformed exponential functions, these functions do not satisfy the law of exponents in general. The deformed algebras have been introduced based on the deformed exponential functions. In this paper, after summarizing such deformed algebraic structures, it is clarified how deformed algebras work on deformed exponential families. In fact, deformed algebras cause generalization of expectations. The three kinds of expectations for random variables are introduced in this paper, and it is discussed why these generalized expectations are natural from the viewpoint of information geometry. In addition, deformed algebras cause generalization of independences. Whereas it is difficult to check the well-definedness of deformed independence in general, the κ-independence is always well-defined on κ-exponential families. This is one of advantages of κ-exponential families in complex systems. Consequently, we can well generalize the maximum likelihood method for the κ-exponential family from the viewpoint of information geometry.

1. Introduction

An exponential family is a set of probability distributions and an important statistical model in mathematical sciences. For example, the set of all Gaussian distributions is an exponential family. A deformed exponential family is one of generalizations of exponential families, and it has been studied in anomalous statistical physics (cf. [1]) and in machine learning theory (cf. [2,3]). A useful class of power law tailed distributions, such as the set of all Student’s t-distributions, is a deformed exponential family.
In the study of deformed exponential families, a deformed exponential function and a deformed logarithm function play important roles. However, these functions do not satisfy the law of exponents in general. Hence, deformed algebraic structures and deformed differentials have been introduced in anomalous statistical physics (cf. [4,5,6,7]). In addition, a random variable that follows a power law tailed distribution may not have its mean and variance. To overcome this problem, a deformed probability distribution called an escort distribution (cf. [1,8]) has been introduced. Then, an expectation with respect to the escort distribution has been discussed.
In this paper, after summarizing such deformed algebraic structures, we clarify how a deformed algebra works on deformed exponential families. In particular, we elucidate that a deformed sum works on the sample space and a deformed product works on the target functional space. This difference makes clear how to use deformed algebras.
Since the deformed sum works on the sample space (i.e., the domain of random variables), the sample space can be regarded as some algebraic space, not the standard Euclidean space. This deformation causes generalizations of expectations of random variables. In this paper, we consider three kinds of expectations, which include the expectation with respect to the escort distribution mentioned above. Then, we elucidate why these expectations are natural from the viewpoint of information geometry. Here, information geometry is one of the differential geometric methods for statistical estimation(cf. [9]). As a consequence, generalized expectations give local coordinate systems of deformed exponential families, and such coordinate systems have close relations to a dually-flat structure and to a projective structure of deformed exponential families. (see also [10,11,12, 13], etc.)
The deformed product works on the target space of probability distributions. This deformation causes generalizations of independences. Though it is difficult to check the well-definedness of deformed independence, the κ-independence for the κ-exponential family is always well defined. This is an advantage of κ-exponential families among the class of deformed exponential families. Hence, we consider κ-generalization of the maximum likelihood method. In information geometry, it is known that the maximum likelihood estimator for a curved exponential family is characterized by the Kullback–Leibler divergence projection from the observed data. Based on this fact, we give a κ-generalization of the divergence projection-type theorem for the κ-maximum likelihood estimator.
In this paper, new contributions are stated as theorems (i.e., Theorems 1, 3, 4 and 7), whereas known results are stated as propositions.

2. Deformed Exponential Families

In this section, we give definitions of deformed exponential functions and deformed exponential families. For more details, see [1,10,11,14], for example. We assume that all functions are real functions and that variables are defined in a real number field, since we will consider probability distributions in a real number field.
Let χ be a strictly increasing function from R > 0 to R > 0 . We define a χ-logarithm function (or a deformed exponential function) by:
ln χ s : = 1 s 1 χ ( t ) d t .
The inverse of the χ-logarithm function is called a χ-exponential function (or a deformed exponential function), and it is given by:
exp χ t : = 1 + 0 t λ ( s ) d x ,
where the function λ ( s ) is given by λ ( ln χ s ) = χ ( s ) .
We remark that the χ-logarithm function ln χ and the χ-exponential function exp χ are usually called ϕ-logarithm and ϕ-exponential, respectively (cf. [1,15]). However, the symbol ϕ is used as the dual Hessian potential function in information geometry, so we use χ as the deformation function in this paper.
Example 1. Suppose that a deformation function χ ( s ) is given by:
χ ( s ) = 2 s s κ + s - κ , ( - 1 < κ < 1 , κ 0 ) .
Then, the deformed exponential and the deformed logarithm are given by:
ln κ s : = s κ - s - κ 2 κ , ( s > 0 ) , exp κ t : = ( κ t + 1 + κ 2 t 2 ) 1 κ ,
respectively. The function ln κ s is called a κ-logarithm function and exp κ t a κ-exponential function(cf. [6]). By taking a limit κ 0 , these functions coincide with the standard logarithm and the standard exponential, respectively.
While s > 0 is needed for defining the κ-logarithm function ln κ s , the κ-exponential function exp κ t is defined entirely on R, since κ t + 1 + κ 2 t 2 is always positive.
Example 2. Suppose that χ ( s ) is given by a power function χ ( s ) = s q , ( q > 0 , q 1 ) , Then, the deformed exponential and the deformed logarithm are given by:
ln q s : = s 1 - q - 1 1 - q , ( s > 0 ) , exp q t : = ( 1 + ( 1 - q ) t ) 1 1 - q , ( 1 + ( 1 - q ) t > 0 ) .
The function ln q s is called a q-logarithm, and exp q t a q-exponential (cf. [1,8]). Taking a limit q 1 , these functions coincide with the standard logarithm and the standard exponential, respectively.
The condition s > 0 is needed for defining ln q s . In the q-exponential case, the condition:
1 + ( 1 - q ) t > 0
is also necessary, since the base of the exponential function must be positive. Condition (1) is called the anti-exponential condition for the q-exponential function.
Let Ω be a total sample space. We say that a statistical model S χ on Ω is a χ-exponential family or a deformed exponential family if S χ is a set of probability density functions, such that:
S χ : = p ( x ; θ ) p ( x ; θ ) = exp χ i = 1 n θ i F i ( x ) - ψ ( θ ) , θ Θ R n ,
where F 1 ( x ) , , F n ( x ) are functions on Ω, θ = { θ 1 , , θ n } is a parameter and ψ ( θ ) is the normalization with respect to the parameter θ. We assume that S χ is a statistical model in the sense of information geometry. That is, a probability density p ( x ; θ ) S χ has support entirely on Ω. See Chapter 2 in [9] for more details. The normalization function ψ is convex, but it may not be strictly convex in general. We assume that ψ is strictly convex in this paper, and then, we can induce a Riemannian metric from this normalization function ψ (see Section 7). In addition, functions F 1 ( x ) , , F n ( x ) , ψ ( θ ) and a parameter θ must satisfy the anti-exponential condition. For example, in the q-exponential case,
i = 1 n θ i F i ( x ) - ψ ( θ ) < - 1 1 - q .
We remark that it is a bit of a difficult problem how the anti-exponential condition imposes the domain of { θ i } and the range of { F i ( x ) } . We will give a further discussion at the end of this section.
We say that a deformed exponential family is a κ-exponential family if its deformed exponential function is a κ-exponential function exp κ and a q-exponential family if its deformed exponential function is a q-exponential function exp q . These deformed exponential families are denoted by S κ and S q , respectively.
Suppose that M χ is a submanifold of S χ , that is,
M χ : = p ( x ; θ ( u ) ) p ( x ; θ ( u ) ) = exp χ i = 1 n θ i ( u ) F i ( x ) - ψ ( θ ( u ) ) , u U R m R n .
The submanifold M χ is called a curved χ-exponential family of S χ . From similar arguments, we can define a curved q-exponential family M q in S q and a curved κ-exponential family M κ in S κ .
Example 3 (Discrete distributions (cf. [10])). Suppose that Ω = { x 0 , x 1 , , x n } is a finite sample space. Denote by S n the set of all probability distributions on Ω:
S n = p ( x ; η ) η i > 0 , i = 0 n η i = 1 , p ( x ; η ) = i = 0 n η i δ i ( x ) .
The natural parameters and the normalization are given by:
θ i = ln χ p ( x i ) - ln χ p ( x 0 ) = ln χ η i - ln χ 1 - i = 1 n η i , ψ ( θ ) = - ln χ η 0 = - ln χ 1 - i = 1 n η i .
Then, we obtain:
ln χ p ( x ; θ ) = ln χ i = 0 n η i δ i ( x ) = i = 1 n ln χ η i - ln χ η 0 δ i ( x ) + ln χ η 0 = i = 1 n θ i δ i ( x ) - ψ ( θ ) .
This implies that S n is a χ-exponential family for any χ.
Example 4 (Student t-distributions). Fix a parameter q ( 1 < q < 3 ) . A probability density functions p ( x ; μ , σ ) on Ω = R is said to be a Student t-distribution or a q-normal distribution if:
p ( x ; μ , σ ) : = 1 Z q ( σ ) 1 - 1 - q 3 - q ( x - μ ) 2 σ 2 1 1 - q ,
where ( μ , σ ) are parameters, such that - < μ < and 0 < σ < , and Z q ( σ ) is the normalization of probability density defined by:
Z q ( σ ) : = 3 - q q - 1 Beta 3 - q 2 ( q - 1 ) , 1 2 σ .
By taking a limit q 1 , a Student t-distribution converges to a normal distribution.
The set of all Student t-distributions S q is a q-exponential family. In fact, natural parameters are given by:
θ 1 : = 2 3 - q { Z q ( σ ) } q - 1 μ σ 2 , θ 2 : = - 1 3 - q { Z q ( σ ) } q - 1 1 σ 2 ,
respectively. Then, we obtain:
ln q p ( x ) = 1 1 - q ( { p ( x ) } 1 - q - 1 ) = 1 1 - q 1 { Z q ( σ ) } 1 - q 1 - 1 - q 3 - q ( x - μ ) 2 σ 2 - 1 = 2 μ { Z q ( σ ) } q - 1 ( 3 - q ) σ 2 x - { Z q ( σ ) } q - 1 ( 3 - q ) σ 2 x 2 - { Z q ( σ ) } q - 1 3 - q μ 2 σ 2 + { Z q ( σ ) } q - 1 - 1 1 - q = θ 1 x + θ 2 x 2 - ψ ( θ ) ,
where ψ is the normalization defined by:
ψ ( θ ) : = - ( θ 1 ) 2 4 θ 2 - { Z q ( σ ) } q - 1 - 1 1 - q .
Hence, the set of all Student t-distributions S q is a q-exponential family.
Let us give further considerations about deformed exponential families. In the case 0 < q < 1 , a q-normal distribution has the following form:
p ( x ; μ , σ ) : = 1 Z q ( σ ) 1 - 1 - q 3 - q ( x - μ ) 2 σ 2 1 1 - q = 1 Z q ( σ ) exp q - ( x - μ ) 2 ( 3 - q ) σ 2 ,
where the normalization Z q ( σ ) is given by:
Z q ( σ ) : = 3 - q 1 - q Beta 2 - q 1 - q , 1 2 σ .
The anti-exponential condition for this q-normal distribution is:
1 - 1 - q 3 - q ( x - μ ) 2 σ 2 > 0 ,
hence the domain of random variable x is given by:
μ - 3 - q 1 - q σ < x < μ + 3 - q 1 - q σ .
In this case, the set of q-normal distributions S q = { p ( x ; μ , σ ) } is not a statistical model in the sense of information geometry [9], since the support of p ( x ; μ , σ ) depends on its parameter ( μ , σ ) .
On the other hand, for a q-normal distribution, fix parameters q , μ , σ . By introducing a new parameter α ( 0 < α < q / ( 1 - q ) ) , we set:
q α = q - α ( 1 - q ) 1 - α ( 1 - q ) , σ α 2 = 3 - q 3 - q - 2 α ( 1 - q ) σ 2 .
The transformation ( q , σ ) ( q α , σ α ) defined by (3) is called a τ-transformation [16]. From straightforward calculations, we have:
3 - q α 1 - q α σ α 2 = 3 - q 1 - q σ 2 .
This equation implies that, from Equation (2), the domain of random variable x is invariant under τ-transformations. Hence, a one-dimensional statistical model is defined by:
S q α = p ( x ; α ) p ( x ; α ) : = 1 Z q α ( σ α ) exp q α - ( x - μ ) 2 ( 3 - q α ) σ α 2 , 0 < α < q 1 - q .
However, S q α is not a deformed exponential family in our setting, since the exponent q α of the deformed exponential function depends on the parameter α.

3. Non-Additive Differentials

In this section, we consider deformed algebras and deformed differential equations to characterize deformed exponential functions.

3.1. κ-Deformed Algebras and κ-Exponential Functions

We begin with the κ-exponential case. For more details about κ-deformed algebras, see [6].
Let exp κ be a κ-exponential function and ln κ a κ-logarithm function. Since exp κ and ln κ do not satisfy the law of exponents, we introduce the κ-sum ˜ κ and the κ-product κ as follows.
x 1 ˜ κ x 2 : = ln κ exp κ x 1 · exp κ x 2 = x 1 1 + κ 2 x 2 2 + x 2 1 + κ 2 x 1 2 , y 1 κ y 2 : = exp κ ln κ y 1 + ln κ y 2 , ( y 1 > 0 a n d y 2 > 0 ) .
The conditions y 1 > 0 and y 2 > 0 are necessary for defining the κ-logarithm function. On the other hand, such conditions are not necessary for defining the κ-exponential function.
From the definitions of κ-deformed algebras, we have the following deformed law of exponents.
exp κ ( x 1 ˜ κ x 2 ) = exp κ x 1 · exp κ x 2 , ln κ ( y 1 · y 2 ) = ln κ y 1 ˜ κ ln κ y 2 , exp κ ( x 1 + x 2 ) = exp κ x 1 κ exp κ x 2 , ln κ ( y 1 κ y 2 ) = ln κ y 1 + ln κ y 2 .
Since the inverse element of x with respect to the κ-sum is - x , we define the κ-difference ˜ κ by:
x 1 ˜ κ x 2 : = x 1 ˜ κ ( - x 2 ) = x 1 1 + κ 2 x 2 2 - x 2 1 + κ 2 x 1 2 .
By taking a limit with respect to the κ-difference, we define a (non-additive) κ-differential as follows.
d κ d κ x f ( x ) : = lim x ' x f ( x ' ) - f ( x ) x ' ˜ κ x .
We remark that a non-additive κ-differential d κ / d κ x characterizes the κ-exponential function. Consider the following deformed differential equations:
d κ d κ x f ( x ) = f ( x ) ,
d d x f ( x ) = 1 1 + κ 2 x 2 f ( x ) .
Then, the eigenfunction f ( x ) of both equations is the κ-exponential function. That is,
f ( x ) = exp κ x = ( κ x + 1 + κ 2 x 2 ) 1 κ .
In fact, from the definition of the κ-difference (5), we have:
d κ d κ x = 1 + κ 2 x 2 d d x ,
hence two deformed differential equations, (7) and (8), are essentially equivalent. We call a non-additive differential equation (7) a non-additive representation and a deformed differential equation (8) an escort representation.
Remark 1. A κ-sum works on the domain of a κ-exponential function (i.e., the sample space Ω), and a κ-product works on the target space. This implies that the sample space can be regarded as some deformed algebraic space, not the standard Euclidean space. In fact, the sample space and the target space are regarded as commutative fields (equivalently, Abelian fields in the usage of [7]). The κ-sum is an additive group structure of a commutative field structure on the sample space, and the κ-product is a multiplicative group structure on the target space (see also Remark 2 and [7]). We consider that this fact is very important in the theory of non-extensive statistical physics.
Recall the definition of Napier’s constant. The standard exponential function has the following infinite product expression:
exp x = lim n 1 + x n n .
In the κ-exponential case, we have the following.
Theorem 1. Fix a real number x R . Suppose that n > | x | and n N . Then, we have:
exp κ x = lim n 1 + x n κ n ,
where:
1 + x n κ n : = 1 + x n κ κ 1 + x n n t i m e s .
Proof. From the assumption, the inequality 1 + x / n > 0 always holds. Hence, we have:
ln κ 1 + x n κ n = n ln κ 1 + x n = n 1 + x n κ - 1 + x n κ 2 κ .
(the assumption n > | x | is a condition that the κ-logarithm in Equation (9) defines). From the definition of the κ-product (4), using asymptotic expansions, we have:
1 + x n κ = 1 + κ x n + O x n 2 , 1 + x n - κ = 1 - κ x n + O x n 2 .
Substituting asymptotic expansions into (9), we have:
ln κ 1 + x n κ n = x + n 2 κ · O x n 2 .
Hence, we have:
1 + x n κ n = exp κ x + n 2 κ · O x n 2 .
By taking a limit n , we obtain the result.

3.2. q-Deformed Algebras and q-Exponential Functions

Let us consider the q-exponential case (cf. [4]). Let exp q be a q-exponential function, and let ln q be a q-logarithm function. The q-deformed algebras, i.e., the q-sum ˜ q and the q-product q , are defined as follows.
x 1 ˜ q x 2 : = ln q exp q x 1 · exp q x 2 = x 1 + x 2 + ( 1 - q ) x 1 x 2 , y 1 q y 2 : = exp q ln q y 1 + ln q y 2 = y 1 1 - q + y 2 1 - q - 1 1 1 - q ,
where conditions 1 + ( 1 - q ) x 1 > 0 , 1 + ( 1 - q ) x 2 > 0 , y 1 1 - q + y 2 1 - q - 1 > 0 are needed for defining q-exponential functions and y 1 > 0 , y 2 > 0 are for q-logarithm. Under the q-deformed algebras, the q-deformed law of exponents holds:
exp q ( x 1 ˜ q x 2 ) = exp q x 1 · exp q x 2 , ln q ( y 1 · y 2 ) = ln q y 1 ˜ q ln q y 2 , exp q ( x 1 + x 2 ) = exp q x 1 q exp q x 2 , ln q ( y 1 q y 2 ) = ln q y 1 + ln q y 2 .
The inverse element of x with respect to the q-sum is given by:
[ - x ] q : = ln q 1 exp q x = - x 1 + ( 1 - q ) x .
Hence, the q-difference should be defined by:
x 1 ˜ q x 2 : = x 1 ˜ q [ - x 2 ] q = x 1 - 1 + ( 1 - q ) x 1 1 + ( 1 - q ) x 2 x 2 .
By taking a limit with respect to the q-difference, we define a (non-additive) q-differential as follows.
d q d q x f ( x ) : = lim x ' x f ( x ' ) - f ( x ) x ' ˜ q x .
Let us consider the following deformed differential equations:
d q d q x f ( x ) = f ( x ) ,
d d x f ( x ) = { f ( x ) } q .
Then, the eigenfunction f ( x ) of both equations is the q-exponential function. That is,
f ( x ) = exp q x = ( 1 + ( 1 - q ) x ) 1 1 - q .
In the same way as the κ-exponential, we say that the non-additive differential equation (12) is the non-additive representation and the deformed differential equation (13) is the escort representation.
We remark again that a q-sum (a deformed sum) works on the domain of a q-exponential function and that a q-product (a deformed product) works on the target space. Hence, the sample space Ω may not be the standard Euclidean space.
An infinite product expression of the q-exponential function is given as follows.
Proposition 2 (cf. [17]). For all integers n N , suppose that:
n 1 + x n 1 - q - ( n - 1 ) > 0 .
Then, we have:
exp q x = lim n 1 + x n q n ,
where:
1 + x n 1 n : = 1 + x n q q 1 + x n n t i m e s .
Proof. From the definition of q-product (10) and the anti-exponential condition (14), we have:
ln q 1 + x n q n = n ln q 1 + x n = n 1 + x n 1 - q - 1 1 - q .
Using an asymptotic expansion:
1 + x n 1 - q = 1 + ( 1 - q ) x n + O x n 2 ,
we have:
ln q 1 + x n q n = x + n 1 - q · O x n 2 .
Hence, we have:
1 + x n q n = exp q x + n 1 - q · O x n 2 .
By taking a limit n , we obtain the result.
Remark 2. If a deformed sum and a deformed product are well-defined, then we can give similar arguments for any χ-exponential functions. However, it is difficult to describe the anti-exponential conditions in general. If we can admit a complex number field for the domain and the target of statistical model (cf. [18]), then the deformed algebras are well defined [7]. In fact, we can define the following commutative field structures if all of the objects are well defined:
x 1 ˜ χ x 2 : = ln χ exp χ x 1 · exp χ x 2 , x 1 ˜ χ x 2 : = ln χ exp [ ln ( exp χ x 1 ) · ln ( exp χ x 2 ) ] , y 1 χ y 2 : = exp χ ln [ exp ( ln χ y 1 ) + exp ( ln χ y 2 ) ] , y 1 χ y 2 : = exp χ ln χ y 1 + ln χ y 2 .
Usages of the multiplicative group structure ˜ χ on the sample space and the additive group structure χ on the target space are not clear. We may need algebraic probability theory to clarify these group structures (for usages of algebraic structures for statistics, see [19,20], for example).

4. Expectation Functionals

As we have seen in the previous section, the sample space Ω may not be the standard Euclidean space. Let us consider suitable expectations for deformed exponential families.
For a χ-exponential probability p ( x ; θ ) S χ , we define the escort distribution P χ ( x ; θ ) and the normalized escort distribution P χ e s c ( x ; θ ) of p ( x ; θ ) by:
P χ ( x ; θ ) : = χ { p ( x ; θ ) } , P χ e s c ( x ; θ ) : = 1 Z χ ( θ ) χ { p ( x ; θ ) } , Z χ ( θ ) : = Ω χ { p ( x ; θ ) } d x ,
respectively. The χ-canonical expectation E χ , p [ * ] and the normalized χ-escort expectation E χ , p e s c [ * ] are defined by:
E χ , p [ f ( x ) ] : = Ω f ( x ) P χ ( x ; θ ) d x = Ω f ( x ) χ { p ( x ; θ ) } d x , E χ , p e s c [ f ( x ) ] : = Ω f ( x ) P χ e s c ( x ; θ ) d x = 1 Z χ ( θ ) Ω f ( x ) χ { p ( x ; θ ) } d x .
Even though the integration of χ-canonical expectation is carried out with respect to a positive density, as we will see in later sections, this expectation is natural from the viewpoint of differential geometry. On the other hand, we call the standard expectation with respect to p ( x ; θ ) a simple expectation and denote it by:
E p [ f ( x ) ] : = Ω f ( x ) p ( x ; θ ) d x .
A χ-canonical expectation and a normalized χ-escort expectation with respect to a κ-exponential probability p ( x ; θ ) are called the κ-canonical expectation and the normalized κ-escort expectation and are denoted by E κ , p [ * ] and E κ , p e s c [ * ] , respectively. In the q-exponential case, they are called the q-canonical expectation and the normalized q-escort expectation and denoted by E q , p [ * ] and E q , p e s c [ * ] , respectively.
For a Student t-distribution p ( x ; μ , θ ) S q , the normalized q-escort mean μ q and the normalized q-escort variance σ q 2 are given by:
μ q : = E q , p e s c [ x ] = μ , σ q 2 : = E q , p e s c ( x - μ ) 2 = σ 2 ,
respectively. Hence, the normalized q-escort expectation E q , p e s c [ * ] is a natural generalization of the simple expectation E p [ * ] .
Next, we consider non-additive integrals to elucidate the relations between the deformed algebras and the escort expectations. In particular, we discuss the κ-exponential case.
Let f ( x ) be a function on the sample space Ω. Then, we define a (non-additive) κ-integral (cf. [5]) by the following formula:
Ω f ( x ) d κ x : = Ω f ( x ) 1 + κ 2 x 2 d x = Ω f ( x ) w κ ( x ) d x ,
where w ( x ) is a weight function defined by:
w κ ( x ) = 1 1 + κ 2 x 2 .
Obviously, this is the inverse operation of the non-additive κ-differential (6).
When Ω is a discrete set, Ω = { x 0 . x 1 , , x n } , then we define a (non-additive) κ-summation by:
i = 0 n κ f ( x i ) : = i = 0 n f ( x i ) 1 + κ 2 x i 2 = i = 0 N f ( x i ) w ( x i ) .
From the definition of the κ-exponential function, we have the following.
Theorem 3. Suppose that χ ( s ) = 2 s / ( s κ + s - κ ) is the deformation function with respect to the κ-logarithm function. Then, χ ( exp κ x ) coincides with the weight function w ( x ) with respect to the non-additive κ-integral. That is, the following formula holds:
χ ( exp κ x ) = 1 1 + κ 2 x 2 = w κ ( x ) .
We think that the canonical expectation E κ , p [ * ] gives a suitable weight for the sample space Ω from the above theorem. We may consider a non-additive χ-integral as a general discussion (in the q-exponential case, the corresponding q-integral is introduced in [4]). However, we have to check carefully the well-definedness of the χ-integral since the anti-exponential condition must be satisfied.

5. Geometry of χ-Exponential Families with Simple Expectations

In this section, we consider the geometry of χ-exponential families by generalizing the e-representation and the m-representation of probability densities. For more details, see [11].
Let S χ be a χ-exponential family. We define a χ-score function s χ ( x ; θ ) : S χ R n , s χ ( x ; θ ) = t ( s χ ) 1 ( x ; θ ) , , ( s χ ) n ( x ; θ ) by:
( s χ ) i ( x ; θ ) : = θ i ln χ p ( x ; θ ) , ( i = 1 , , n ) .
Under suitable conditions, we can define Riemannian metrics on S χ by:
g i j E ( θ ) : = Ω i ln χ p ( x ; θ ) j ln χ p ( x ; θ ) χ { p ( x ; θ ) } d x
= E χ , p [ ( s χ ) i ( x ; θ ) ( s χ ) j ( x ; θ ) ] , g i j M ( θ ) : = Ω i p ( x ; θ ) j ln χ p ( x ; θ ) d x ,
g i j N ( θ ) : = Ω 1 χ { p ( x ; θ ) } i p ( x ; θ ) j p ( x ; θ ) d x .
In the same manner as an invariant statistical manifold, a differential i p ( x ; θ ) and a χ-score function i ln p ( x ; θ ) are regarded as tangent vectors for a χ-exponential family S χ . Hence, the χ-score function is a generalization of the e-representation of p ( x ; θ ) .
Theorem 4. Riemannian metrics g E , g M and g N on S χ coincide. That is,
g E ( θ ) = g M ( θ ) = g N ( θ ) .
Proof. For a χ-exponential distribution p ( x ; θ ) , its differential is given as follows:
θ i p ( x ; θ ) = χ ( p ( x ; θ ) ) F i ( x ) - θ i ψ ( θ ) , θ i ln χ p ( x ; θ ) = F i ( x ) - θ i ψ ( θ ) .
By substituting the above formulas into (17)–(19), we obtain the results.
We remark that integrations are carried out with respect to un-normalized χ-escort distributions. If we define Riemannian metrics by normalized χ-escort expectations, they do not coincide in general. Their Riemannian metrics are conformally equivalent (cf. [11]).
By differentiating Equation (18), we can define dual affine connections M ( e ) and M ( m ) on S χ by:
Γ i j , k M ( e ) ( θ ) : = Ω k p ( x ; θ ) i j ln χ p ( x ; θ ) d x , Γ i j , k M ( m ) ( θ ) : = Ω i j p ( x ; θ ) k ln χ p ( x ; θ ) d x .
From the definitions of the χ-exponential family and the χ-logarithm function, we obtain Γ i j , k M ( e ) ( θ ) 0 . Hence, a parameter θ = { θ i } is a M ( e ) -affine coordinate system, and the connection M ( e ) is flat. These imply that the triplet ( S χ , M ( e ) , g M ) is a Hessian manifold. The cubic form C i j k M of ( S χ , M ( e ) , g M ) is:
C i j k M = Γ i j , k M ( m ) - Γ i j , k M ( e ) = Γ i j , k M ( m ) .
To give Hessian potential functions of ( S χ , M ( e ) , g M ) , we define functions I χ and Φ by:
I χ ( p θ ) : = - Ω V χ ( p ( x ; θ ) ) + ( p ( x ; θ ) - 1 ) V χ ( 0 ) d x , Ψ ( θ ) : = Ω p ( x ; θ ) ln χ p ( x ; θ ) d x + I χ ( p θ ) + ψ ( θ ) ,
where the function V χ ( t ) is given by:
V χ ( t ) : = 1 t ln χ ( s ) d s .
We call I χ a generalized entropy functional and Ψ a generalized Massieu potential.
Proposition 5 (cf. [21,22]). For a χ-exponential family S χ , (1) the generalized Massieu potential Ψ ( θ ) is the potentials of g M and C M with respect to { θ i } :
g i j M ( θ ) = i j Ψ ( θ ) , C i j k M ( θ ) = i j k Ψ ( θ ) .
(2) Let η i be the simple expectation of F i ( x ) , i.e., η i : = E p [ F i ( x ) ] . Then, { η i } is the dual affine coordinate system of { θ i } with respect to g M , and each η i is given by:
η i = i Ψ ( θ ) .
(3) Let Φ ( η ) be the negative generalized entropy functional, i.e., Φ ( η ) : = - I χ ( p θ ) . Then, Φ ( η ) is the potential of g M with respect to { η i } .
Let us consider a divergence function on χ-exponential family. The canonical divergence D on ( S χ , M ( e ) , g M ) is defined by:
D ( p , r ) = Ψ ( θ ( p ) ) + Φ ( η ( r ) ) - i = 1 n θ i ( p ) η i ( r ) .
On the other hand, the χ-divergence (or U-divergence) on S χ is defined by:
D χ ( p , r ) = Ω U χ ( ln χ r ( x ) ) - U χ ( ln χ p ( x ) ) - p ( x ) ( ln χ r ( x ) - ln χ p ( x ) ) d x ,
where the function U χ ( t ) is given by:
U χ ( s ) : = 0 s exp χ ( t ) d t .
Then, the χ-divergence D χ coincides with the canonical divergence D on ( S χ , M ( m ) , g M ) . We remark that the χ-divergence is naturally constructed from a bias corrected χ-score function. See [11,23]. for more details.
In the q-exponential case, the χ-divergence is given by:
D 1 - q ( p , r ) = 1 ( 1 - q ) ( 2 - q ) Ω p ( x ) 2 - q d x - 1 1 - q Ω p ( x ) r ( x ) 1 - q d x + 1 2 - q Ω r ( x ) 2 - q d x .
The divergence D 1 - q ( p , r ) is called a β-divergence ( β = 1 - q ) or a density power divergence in statistics [24]. This divergence is useful in robust statistics.
We remark that the generalization of e- and m-representations through an arbitrary monotone embedding function was first studied in [25]. For further generalizations through monotone embedding functions, see [26,27]. These generalizations of e- and m-representations are also related to the U-geometry in information geometry (cf. [21,22]). When the embedding function χ ( t ) is identity ( q = 1 in the q-exponential case and κ = 0 in the κ-exponential case), the results in this section reduce to the standard results in exponential families [11].

6. Geometry of Deformed Exponential Families with χ-Escort Expectation

Since a χ-exponential distribution has a normalization term ψ ( θ ) , we induce geometric structures directly from the potential function ψ. For more details, see [10,11]. When the embedding function χ ( t ) is identity, the results in this section also reduce to the standard results in exponential families [11].
We define a χ-Fisher metric g χ and a χ-cubic form C χ by:
g i j χ ( θ ) : = i j ψ ( θ ) , C i j k χ ( θ ) : = i j k ψ ( θ ) ,
respectively. Denote by Γ i j , k χ ( 0 ) the Christoffel symbol of the Levi–Civita connection with respect to the χ-Fisher metric g χ . From standard arguments in Hessian geometry [28], we can define mutually dual flat connections by:
Γ i j , k χ ( e ) ( θ ) : = Γ i j , k χ ( 0 ) ( θ ) - 1 2 C i j k χ ( θ ) 0 , Γ i j , k χ ( m ) ( θ ) : = Γ i j , k χ ( 0 ) ( θ ) + 1 2 C i j k χ ( θ ) = C i j k χ ( θ ) ,
respectively. We call χ ( e ) a χ-exponential connection and χ ( m ) a χ-mixture connection. In this case, { θ i } is a χ ( e ) -affine coordinate system, and triplets ( S χ , χ ( e ) , g χ ) and ( S χ , χ ( m ) , g χ ) are mutually dual Hessian manifolds.
Proposition 6 (cf. [10,11]). For a χ-exponential family S χ ,
(1)
ψ ( θ ) is the potential of g χ and C χ with respect to { θ i } .
(2)
Let η i be the normalized χ-escort expectation of F i ( x ) , i.e., η i : = E χ , p e s c [ F i ( x ) ] . Then, { η i } is the dual affine coordinate system of { θ i } with respect to g χ , and each η i is given by:
η i = i ψ ( θ ) .
(3)
Let ϕ ( η ) by the negative χ-deformed entropy, i.e., ϕ ( η ) : = E χ , p e s c [ ln χ p ( x ; θ ) ] .
Then, ϕ ( η ) is the potential of g χ with respect to { η i } .
Let us consider divergence functions. The canonical divergence of ( S χ , χ ( e ) , g χ ) is given by:
D ( p , r ) = ψ ( θ ( p ) ) + ϕ ( η ( r ) ) - i = 1 n θ i ( p ) η i ( r ) .
On the other hand, a χ-relative entropy (or a generalized relative entropy) D χ ( p , r ) on S χ is defined by:
D χ ( p , r ) : = E χ , p e s c [ ln χ p ( x ) - ln χ r ( x ) ] .
If the deformation function χ is an identity function χ ( s ) = s , then the χ-relative entropy coincides with the Kullback–Leibler divergence. In addition, the χ-relative entropy D χ coincides with the canonical divergence on ( S χ , χ ( m ) , g χ ) . In fact, in the same way as a standard exponential family, we have:
D χ ( p ( θ ) , p ( θ ' ) ) = E χ , p e s c i = 1 n θ i F i ( x ) - ψ ( θ ) - i = 1 n ( θ ' ) i F i ( x ) - ψ ( θ ' ) = i = 1 n θ i η i ( x ) - ψ ( θ ) - i = 1 n ( θ ' ) i η i ( x ) - ψ ( θ ' ) = ψ ( θ ' ) + ϕ ( θ ) - i = 1 n ( θ ' ) i η i = D ( p ( θ ' ) , p ( θ ) ) .
In the κ-exponential case, we call a χ-relative entropy (20) a κ-relative entropy and denote it by D κ .
On the other hand, in the q-exponential case, a χ-relative entropy for q-exponential family is called a normalized Tsallis relative entropy, which is given by:
D q T ( p , r ) : = E q , p e s c [ ln q p ( x ) - ln q r ( x ) ] = Ω P q e s c ( x ) ln q p ( x ) - ln q r ( x ) d x = 1 ( 1 - q ) Z q ( p ) 1 - Ω p ( x ) q r ( x ) 1 - q d x ,
where Z q ( p ) is the normalization of the escort distribution P q e s c ( x ) of p ( x ) . Denote by ( S q , q ( e ) , g q ) and ( S q , q ( m ) , g q ) the induced Hessian manifolds from the normalization ψ ( θ ) . Then, the normalized Tsallis relative entropy coincides with the canonical divergence for a Hessian manifold ( S q , q ( m ) , g q ) .
For a q-exponential family, we can also define an α-divergence ( α = 1 - 2 q ) by:
D ( 1 - 2 q ) ( p , r ) : = 1 q E q , p [ ln q p ( x ) - ln q r ( x ) ] = 1 q Ω P q ( x ) ln q p ( x ) - ln q r ( x ) d x = 1 q ( 1 - q ) 1 - Ω p ( x ) q r ( x ) 1 - q d x .
It is known that the α-divergence ( α = 1 - 2 q ) induces an invariant statistical manifold ( S q , ( 1 - 2 q ) , g ) .
Remark 3. For a q-exponential family S q , a normalized Tsallis entropy induces a Hessian manifold (i.e., a flat statistical manifold) ( S q , q ( m ) , g q ) , whereas an α-divergence induces an invariant statistical manifold ( S q , ( 1 - 2 q ) , g ) . Since a constant multiplication is not essential in differential geometry, the difference is caused by the normalization of the escort distribution:
D q T ( p , r ) = q Z q ( p ) D ( 1 - 2 q ) ( p , r ) .
In this case, the two statistical manifolds ( S q , q ( m ) , g q ) and ( S q , ( 1 - 2 q ) , g ) are ( - 1 ) -conformally equivalent (cf. [29,30]). This implies that the normalization of a probability density is not a trivial problem. The normalization does affect the induced geometric structures and, consequently, the estimating methods for statistical inference.

7. Discussion about Expectations

We give further discussions about expectation functionals. Since a deformed exponential family S χ is regarded as a manifold, we can choose an arbitrary local coordinate system for S χ . From this point of view, simple expectations { E p [ F i ( x ) ] } and normalized χ-escort expectations { E χ , p e s c [ F i ( x ) ] } are nothing but local coordinates of the statistical model. However, in differential geometry, we often use appropriate coordinates depending on the background geometry, e.g., Darboux coordinates in symplectic geometry and isothermal coordinates in geometry of minimal surfaces. From Propositions 5 and 6, the simple expectations { E p [ F i ( x ) ] } and the normalized χ-escort expectations { E χ , p e s c [ F i ( x ) ] } give appropriate coordinates for ( S χ , g M , M ( e ) , M ( m ) ) and ( S χ , g χ , χ ( e ) , χ ( m ) ) , respectively, since they are the dual affine coordinates of the natural parameters { θ i } .
From the assumptions of deformed exponential families, there always exists a dually flat structure ( S χ , g χ , χ ( e ) , χ ( m ) ) , but there does not exist ( S χ , g M , M ( e ) , M ( m ) ) in general (see [31] for more details). In addition, from Theorem 3, the deformed algebra on sample space Ω is reflected in the canonical expectation E χ , p [ * ] . Hence, we think that the canonical expectation E χ , p [ * ] and the normalized χ-escort expectation E χ , p e s c [ * ] are more natural than the simple expectation E p [ * ] .

8. Maximum κ-Likelihood Estimators

In Section 3, we discussed deformed algebras for deformed exponential functions. As a consequence, it is natural to regard that a sample space is not the standard Euclidean space. In this section, we construct a maximum likelihood method that is in accordance with the deformed algebras.
Suppose that X is a random variable that follows a probability p 1 ( x ) , and Y follows p 2 ( y ) . We say that two random variables X and Y are independent if the joint probability p ( x , y ) coincides with the product of marginal distributions p 1 ( x ) and p 2 ( y ) :
p ( x , y ) = p 1 ( x ) p 2 ( y ) .
Suppose that p 1 ( x ) and p 2 ( y ) have support entirely on Ω, that is p 1 ( x ) > 0 and p 2 ( y ) > 0 hold for all x Ω . The independence is given by a duality of an exponential function and a logarithm function:
p ( x , y ) = exp ln p 1 ( x ) + ln p 2 ( x ) .
We generalize the notion of independence using the χ-exponential and χ-logarithm.
Suppose that X i is a random variable on Ω i , which follows p i ( x ) ( i = 1 , 2 , , N ) . Random variables X 1 , X 2 , , X N may not be independent on the standard algebra. Let p ( x 1 , x 2 , , x N ) be the joint probability density of X 1 , X 2 , , X N .
We say that X 1 , X 2 , , X N are χ-independent with m-normalization if:
p ( x 1 , x 2 , , x N ) = p 1 ( x 1 ) χ p 2 ( x 2 ) χ χ p N ( x N ) Z p 1 , p 2 , , p N ,
where Z p 1 , p 2 , , p N is the normalization of p 1 ( x 1 ) χ p 2 ( x 2 ) χ χ p N ( x N ) defined by:
Z p 1 , p 2 , , p N : = S u p p { p ( x 1 , x 2 , , x N ) } Ω 1 Ω N p 1 ( x 1 ) χ p 2 ( x 2 ) χ χ p N ( x N ) d x 1 d x N .
We remark that the domain of integration may not be entirely Ω 1 × × Ω N because of the anti-exponential conditions. In addition, N is not an arbitrary integer. The maximum number of N depends on the deformation function χ.
Example 5 (Bivariate Student t-distributions (cf. [32])). Suppose that X and Y are random variables that follow Student t-distributions p q ( x ; μ x , σ x ) and p q ( y ; μ y , σ y ) , respectively. Even if X and Y are independent, the joint distribution p ( x , y ) = p q ( x ) p q ( y ) is not a bivariate Student t>-distribution. On the other hand, if X and Y are q-independent with m-normalization, then the joint distribution:
p q ( x , y ) = p q ( x ) q p q ( y ) Z p q ( x ) , p q ( y )
is a bivariate Student t-distribution. Note that neither p q ( x ) nor p q ( y ) is the marginal distribution, because:
Ω Y p q ( x , y ) d y p q ( x ) .
However, in this paper, we say that p q ( x ) and p q ( y ) are the =q=-marginal distributions of the joint distribution p q ( x , y ) .
Recall that we cannot consider infinitely many q-products to define a joint distribution. In the case of Student t-distributions, the number of q-marginal distributions must satisfy N < 2 ( q - 1 ) . Otherwise, the normalization Z diverges.
Let us consider the κ-exponential case. We say that random variables X 1 , X 2 , , X N are κ-independent with m-normalization if:
p ( x 1 , x 2 , , x N ) = p 1 ( x 1 ) κ p 2 ( x 2 ) κ κ p N ( x N ) Z p 1 , p 2 , , p N ,
where Z p 1 , p 2 , , p N is the normalization of p 1 ( x 1 ) κ p 2 ( x 2 ) κ κ p N ( x N ) defined by:
Z p 1 , p 2 , , p N : = Ω 1 Ω N p 1 ( x 1 ) κ p 2 ( x 2 ) κ κ p N ( x N ) d x 1 d x N .
In the κ-exponential case, the domain of integration is entirely Ω 1 × × Ω N , since the κ-exponential function is defined entirely on R.
Similarly, we say that X 1 , X 2 , , X N are κ-independent with e-normalization (or exponential normalization) if:
p ( x 1 , x 2 , , x N ) = p 1 ( x 1 ) κ p 2 ( x 2 ) κ κ p N ( x N ) κ ( - c ) ,
where c is the normalization of p 1 ( x 1 ) κ p 2 ( x 2 ) κ κ p N ( x N ) defined by:
Ω 1 Ω N p 1 ( x 1 ) κ p 2 ( x 2 ) κ κ p N ( x N ) κ ( - c ) d x 1 d x N = 1 .
We remark that the e-normalization is different form the m-normalization in general. See [33] for further discussion.
A normalization of joint distribution is not required in several problems. In these cases, we define a joint positive distribution (not a probability distribution) by the κ-marginal probability distributions,
f ( x 1 , x 2 , , x N ) : = p 1 ( x 1 ) κ p 2 ( x 2 ) κ κ p N ( x N ) ,
and we say that X 1 , X 2 , , X N are simply κ-independent.
Remark 4. As we mentioned in Remark 2, it is difficult to describe explicitly the anti-exponential conditions for the χ-exponential case. Though several authors have introduced χ-independence (which is called U-independence in [2,3] and F-independence in [34]), they did not mention the anti-exponential conditions. Hence, the χ-independence was not well defined in their papers.
On the other hand, the anti-exponential condition of the κ-deformed algebra (4) is always satisfied, since p ( x ; θ ) S κ can be defined entirely on R. Therefore, the κ-independence is well defined for a κ-exponential family. This is an advantage of the κ-exponential families.
Before we discuss a generalization of maximum likelihood methods, we recall the difference between Gauss’ law of error and the maximum likelihood method.
In the case of Gauss’ law of error, we consider the following likelihood function:
L ( θ ) : = p ( x 1 - θ ) p ( x 2 - θ ) p ( x N - θ ) .
Suppose that N-observations { x 1 , , x N } are obtained. If the likelihood function L ( θ ) attains the maximum at the sample mean θ = x ¯ N = ( x 1 , , x N ) / N , then the probability density function p must be a Gaussian distribution. Hence, we specify a probability distribution from a given likelihood function and observed data. Generalizations of Gauss’s law of error in non-extensive statistical physics have been obtained in [35,36], etc.
On the other hand, in the case of the maximum likelihood method, we suppose a statistical model S = { p ( x ; θ ) } and define a likelihood function L ( θ ) by:
L ( θ ) : = p ( x 1 ; θ ) p ( x 2 ; θ ) p ( x N ; θ ) .
Suppose that N-observations { x 1 , , x N } are obtained. If the likelihood function attains the maximum at θ ^ , then the probability distribution p ( x ; θ ^ ) is expected to be closest to the true distribution in the given statistical model. Hence, we specify a parameter on a given statistical model from a likelihood function and observed data.
Later in this section, we consider a κ-generalization of the maximum likelihood method and give a characterization of the maximum κ-likelihood estimator from the viewpoint of information geometry.
Let S κ = { p ( x ; θ ) | θ Θ } be a κ-exponential family, and let { x 1 , , x N } be N-observations from p ( x ; θ ) S κ . We define a κ-likelihood function L κ ( θ ) and a κ-logarithm κ-likelihood function l κ ( θ ) by:
L κ ( θ ) : = p ( x 1 ; θ ) κ p ( x 2 ; θ ) κ κ p ( x N ; θ ) ,
l κ ( θ ) : = ln κ L κ ( θ ) = i = 1 N ln κ p ( x i ; θ ) ,
respectively. By taking a limit κ 0 , L κ is the standard likelihood function on θ.
The maximum κ-likelihood estimator θ ^ is the maximizer of κ-likelihood function. We assume the existence of θ ^ in this paper. Since the parameter space Θ is assumed to be an open subset, θ ^ should be an interior point in Θ. From the monotonicity of the κ-logarithm ln κ , θ ^ is also the maximizer of κ-logarithm κ-likelihood function:
θ ^ : = argmax θ Θ L κ ( θ ) = argmax θ Θ ln κ L κ ( θ ) .
Theorem 7. Let S κ = { p ( x ; θ ) | θ Θ } be a κ-exponential family. Suppose that M κ = { p ( x ; θ ( u ) ) | u U } is a curved κ-exponential family of S κ and { x 1 , , x N } are N-observations from p ( x ; θ ( u ) ) M κ . Then,
(1)
the maximum κ-likelihood estimator for S κ in η-coordinates is given by:
η ^ i = 1 N j = 1 N F i ( x j ) .
(2)
The κ-likelihood attains the maximum if and only if the κ-relative entropy attains the minimum.
Proof. (1) The κ-logarithm κ-likelihood function is given by:
l κ ( θ ) = j = 1 N ln κ p ( x j ; θ ) = j = 1 N i = 1 n θ i F i ( x j ) - ψ ( θ ) = i = 1 n θ i j = 1 N F i ( x j ) - N ψ ( θ ) .
Hence, we obtain the κ-logarithm κ-likelihood equation:
i l κ ( θ ) = j = 1 N F i ( x j ) - N i ψ ( θ ) = 0 .
From Proposition 6, the maximum κ-likelihood estimator for S κ is given by:
η ^ i = 1 N j = 1 N F i ( x j ) .
(2) Denote p ( η ^ ) = p ( x ; η ^ ) S κ by the probability distribution whose parameter is determined by the maximum likelihood η ^ . Since a κ-relative entropy coincides with a canonical divergence, we obtain:
D κ ( p ( η ^ ) , p ( θ ( u ) ) ) = D ( p ( θ ( u ) ) , p ( η ^ ) ) = ψ ( θ ( u ) ) + ϕ ( η ^ ) - i = 1 n θ i ( u ) η ^ i = ϕ ( η ^ ) - 1 N ln κ L κ ( θ ( u ) ) .
This implies that the κ-likelihood attains the maximum if and only if the κ-relative entropy attains the minimum.
Since the κ-relative entropy attains the minimum at the κ-maximum likelihood estimator, we say that Theorem 7 is a divergence projection theorem for the κ-exponential family. We remark again that similar arguments hold for any χ-exponential families if the χ-independence is well defined.

9. Conclusion

In this paper, we discussed deformed algebras and generalizations of expectations for χ-exponential families. In particular, we clarified how to use deformed algebraic structures for deformed exponential families. We introduced the canonical expectation for χ-exponential families, whereas the normalized χ-escort expectation has been known in anomalous statistical physics. We then considered information geometric properties of deformed exponential families. Though the canonical expectation is not an expectation with respect to a probability density, it naturally characterizes a generalized Fisher metric and the α-divergence.
In addition, we studied the generalization of independence and introduced a generalized maximum likelihood method for the κ-exponential family. In particular, a divergence projection-type theorem was obtained in the case of the κ-maximum likelihood method. A deformed independence is not defined explicity in general, since it is difficult to describe anti-exponential conditions for χ-exponential functions. On the other hand, the κ-independence for the κ-exponential family is always well defined. This is an advantage of the κ-exponential family in the class of χ-exponential families.

Acknowledgments

The authors would like to express their sincere gratitude to the anonymous reviewers for the constructive comments that improved this paper. Hiroshi Matsuzoe is partially supported by The Ministry of Education, Culture, Sports, Science and Technology (MEXT) Grants-in-Aid for Scientific Research (KAKENHI) Grant Numbers 23740047, 26108003 and 15K04842. Tatsuaki Wada is partially supported by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number 25400188.

Author Contributions

This work has been conceived of and prepared by both authors. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Naudts, J. Generalised Thermostatistics; Springer: London, UK, 2011. [Google Scholar]
  2. Fujimoto, Y.; Murata, N. A generalization of independence in naive Bayes model. In Intelligent Data Engineering and Automated Learning—IDEAL 2010, Proceedings of 11th International Conference on Intelligent Data Engineering and Automated Learning, Paisley, UK, 1–3 September 2010; Lecture Notes in Computer Science. Volume 6283, pp. 153–161.
  3. Fujimoto, Y.; Murata, N. A generalisation of independence in statistical models for categorical distribution. Int. J. Data Min. Model. Manag. 2012, 4, 172–187. [Google Scholar] [CrossRef]
  4. Borgesa, E.P. A possible deformed algebra and calculus inspired in nonextensive thermostatistics. Physica A 2004, 340, 95–101. [Google Scholar] [CrossRef]
  5. Kaniadakis, G. Non-linear kinetics underlying generalized statistics. Physica A 2001, 296, 405–425. [Google Scholar] [CrossRef]
  6. Kaniadakis, G. Theoretical foundations and mathematical formalism of the power-law tailed statistical distributions. Entropy 2013, 15, 3983–4010. [Google Scholar] [CrossRef]
  7. Scarfone, A.M. Entropic forms and related algebras. Entropy 2013, 15, 624–649. [Google Scholar] [CrossRef]
  8. Tsallis, C. Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World; Springer: New York, NY, USA, 2009. [Google Scholar]
  9. Amari, S.; Nagaoka, H. Method of Information Geometry; Translations of Mathematical Monographs; American Mathematical Society: Providence, RI, USA; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
  10. Amari, S.; Ohara, A.; Matsuzoe, H. Geometry of deformed exponential families: Invariant, dually-flat and conformal geometry. Physica A 2012, 391, 4308–4319. [Google Scholar] [CrossRef]
  11. Matsuzoe, H.; Henmi, M. Hessian structures and divergence functions on deformed exponential families. In Geometric Theory of Information, Signals and Communication Technology; Nielsen, F., Ed.; Springer: Basel, Switzerland, 2014; pp. 57–80. [Google Scholar]
  12. Ohara, A. Geometry of distributions associated with Tsallis statistics and properties of relative entropy minimization. Phys. Lett. A 2007, 370, 184–193. [Google Scholar] [CrossRef]
  13. Ohara, A. Geometric study for the Legendre duality of generalized entropies and its application to the porous medium equation. Eur. Phys. J. B. 2009, 70, 15–28. [Google Scholar] [CrossRef]
  14. Matsuzoe, H.; Henmi, M. Hessian structures on deformed exponential families. In Geometric Science of Information, Proceedings of First International Conference on Geometric Science of Information (GSI 2013), Paris, France, 28–30 August 2013; Lecture Notes in Computer Science. Volume 8085, pp. 275–282.
  15. Naudts, J. Estimators, escort probabilities, and ϕ-exponential families in statistical physics. J. Inequal. Pure Appl. Math. 2004, 5, 102. [Google Scholar]
  16. Tanaka, M. Meaning of an escort distribution and τ-transformation. J. Phys. Conf. Ser. 2010, 201, 012007. [Google Scholar] [CrossRef]
  17. Suyari, H. Fundamental Mathematics for Complex Systems; Makinoshoten: Tokyo, Japan, 2010. (in Japanese) [Google Scholar]
  18. Wilk, G.; Wlodarczyk, Z. Tsallis distribution with complex nonextensivity parameter q. Physica A 2014, 413, 53–58. [Google Scholar] [CrossRef]
  19. Lauritzen, S.L. Extremal Families and Systems of Sufficient Statistics; Lecture Notes in Statistics; Volume 49, Springer: New York, NY, USA, 1988. [Google Scholar]
  20. Pistone, G. κ-exponential models from the geometrical viewpoint. Eur. Phys. J. B 2009, 70, 29–37. [Google Scholar] [CrossRef]
  21. Murata, N.; Takenouchi, T.; Kanamori, T.; Eguchi, S. Information geometry of U-boost and Bregman divergence. Neural Comput. 2004, 16, 1437–1481. [Google Scholar] [CrossRef] [PubMed]
  22. Ohara, A.; Wada, T. Information geometry of q-Gaussian densities and behaviors of solutions to related diffusion equations. J. Phys. A 2010, 43, 035002. [Google Scholar] [CrossRef]
  23. Matsuzoe, H. Statistical manifolds and geometry of estimating functions. In Prospects of Differential Geometry and Its Related Fields, Proceedings of the 3rd International Colloquium on Differential Geomentry and Its Related Fields; Adachi, T., Hashimoto, H., Hristov, M.J., Eds.; World Scientific: Hackensack, NJ, USA, 2013; pp. 187–202. [Google Scholar]
  24. Basu, A.; Harris, I.R.; Hjort, N.L.; Jones, M.C. Robust and efficient estimation by minimising a density power divergence. Biometrika 1998, 85, 549–559. [Google Scholar] [CrossRef]
  25. Zhang, J. Divergence function, duality, and convex analysis. Neural Comput. 2004, 16, 159–195. [Google Scholar] [CrossRef] [PubMed]
  26. Harsha, K.V.; Subrahamanian Moosath, K.S. Dually flat geometries of the deformed exponential family. Physica A 2015, 433, 136–147. [Google Scholar]
  27. Zhang, J. A note on monotone embedding in information geometry. Entropy 2015, 17, 4485–4499. [Google Scholar] [CrossRef]
  28. Shima, H. The Geometry of Hessian Structures. World Scientific: Hackensack, NJ, 2007. [Google Scholar]
  29. Kurose, T. On the divergences of 1-conformally flat statistical manifolds. Tôhoku Math. J. 1994, 46, 427–433. [Google Scholar] [CrossRef]
  30. Matsuzoe, H.; Ohara, A. Geometry for q-exponential families. In Recent progress in differential geometry and its related fields; Proceedings of the 2nd International Colloquium on Differential Geomentry and Its Related Fields; Adachi, T., Hashimoto, H., Hristov, M.J., Eds.; World Scientific: Hackensack, NJ, USA, 2011; pp. 55–71. [Google Scholar]
  31. Matsuzoe, H. Hessian structures on deformed exponential families and their conformal structures. Differ. Geom. Appl. 2014, 35 Supplement. 323–333. [Google Scholar] [CrossRef]
  32. Sakamoto, M.; Matsuzoe, H. A generalization of independence and multivariate Student’s t-distributions. To appear in Lecture Notes in Comput. Sci.
  33. Takatsu, A. Behaviors of φ-exponential distributions in Wasserstein geometry and an evolution equation. SIAM J. Math. Anal. 2013, 45, 2546–2556. [Google Scholar]
  34. Harsha, K.V.; Subrahamanian Moosath, K.S. Geometry of F-likelihood estimators and F-Max-Ent theorem. AIP Conf. Proc. 2015, 1641, 263–270. [Google Scholar]
  35. Suyari, H.; Tsukada, M. Law of error in Tsallis statistics. IEEE Trans. Inf. Theory 2005, 51, 753–757. [Google Scholar] [CrossRef]
  36. Wada, T.; Suyari, H. κ-generalization of Gauss’s law of error. Phys. Lett. A 2006, 348, 89–93. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Matsuzoe, H.; Wada, T. Deformed Algebras and Generalizations of Independence on Deformed Exponential Families. Entropy 2015, 17, 5729-5751. https://0-doi-org.brum.beds.ac.uk/10.3390/e17085729

AMA Style

Matsuzoe H, Wada T. Deformed Algebras and Generalizations of Independence on Deformed Exponential Families. Entropy. 2015; 17(8):5729-5751. https://0-doi-org.brum.beds.ac.uk/10.3390/e17085729

Chicago/Turabian Style

Matsuzoe, Hiroshi, and Tatsuaki Wada. 2015. "Deformed Algebras and Generalizations of Independence on Deformed Exponential Families" Entropy 17, no. 8: 5729-5751. https://0-doi-org.brum.beds.ac.uk/10.3390/e17085729

Article Metrics

Back to TopTop