Next Article in Journal
On the Similarity and Dependence of Time Series
Next Article in Special Issue
Precise Trajectory Tracking Control of Ship Towing Systems via a Dynamical Tracking Target
Previous Article in Journal
Regression Methods Based on Nearest Neighbors with Adaptive Distance Metrics Applied to a Polymerization Process
Previous Article in Special Issue
Sparse Grid Adaptive Interpolation in Problems of Modeling Dynamic Systems with Interval Parameters
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Qualitative Properties of Randomized Maximum Entropy Estimates of Probability Density Functions

by
Yuri S. Popkov
1,2,3
1
Federal Research Center “Computer Science and Control” of Russian Academy of Sciences, 119333 Moscow, Russia
2
Institute of Control Sciences of Russian Academy of Sciences, 117997 Moscow, Russia
3
Department of Software Engineering, ORT Braude College, 2161002 Karmiel, Israel
Submission received: 28 January 2021 / Revised: 24 February 2021 / Accepted: 2 March 2021 / Published: 5 March 2021
(This article belongs to the Special Issue Control, Optimization, and Mathematical Modeling of Complex Systems)

Abstract

:
The problem of randomized maximum entropy estimation for the probability density function of random model parameters with real data and measurement noises was formulated. This estimation procedure maximizes an information entropy functional on a set of integral equalities depending on the real data set. The technique of the Gâteaux derivatives is developed to solve this problem in analytical form. The probability density function estimates depend on Lagrange multipliers, which are obtained by balancing the model’s output with real data. A global theorem for the implicit dependence of these Lagrange multipliers on the data sample’s length is established using the rotation of homotopic vector fields. A theorem for the asymptotic efficiency of randomized maximum entropy estimate in terms of stationary Lagrange multipliers is formulated and proved. The proposed method is illustrated on the problem of forecasting of the evolution of the thermokarst lake area in Western Siberia.

1. Introduction

Estimating the characteristics of models is a very popular and, at the same time, important problem of science. This problem arises in applications with unknown parameters, which have to be estimated somehow using real data sets. In particular, such problems have turned out to be fundamental in machine learning procedures [1,2,3,4,5]. The core of these procedures is a parametrized model trained by statistically estimating the unknown parameters based on real data. Most of the econometric problems associated with reconstructing functional relations and forecasting also reduce to estimating the model parameters; for example, see [6,7].
The problems described above are solved using traditional mathematical statistics methods, such as the maximum likelihood method and its derivatives, the method of moments, Bayesian methods, and their numerous modifications [8,9].
Among the mathematical tools for parametric estimation mentioned, a special place is occupied by entropy maximization methods for finite-dimensional probability distributions [10,11].
Consider a random variable x taking discrete values x 1 , , x n with probabilities p 1 , , p n , respectively, and r functions f 1 ( x ) , , f r ( x ) of this variable with discrete values. The discrete probability distribution function p ( x ) = { p 1 ( x 1 ) , , p n ( x n ) } is defined as the solution of the problem
H ( p ) = i = 1 n p i ln p i , i = 1 n p i f k ( x i ) q k , k = 1 , , r ,
where q 1 , , q r are given constants.
If f k ( x i ) x i k , then the system of equalities specifies constraints on the kth moments of the discrete random variable x . In the case of equality constraints, some modifications of this problem adapted to different applications were studied in [10,11,12,13]. Since this problem is conditionally extremal, it can be solved using the Lagrange method, which leads to a system of equations for Lagrange multipliers. The latter often turn out to be substantially nonlinear functions, and hence, rather sophisticated techniques are needed for their numerical calculation [14,15].
In the case of inequality constraints, this problem belongs to the class of mathematical programming problems [16].
The entropy maximization principle is adopted to estimate the parameters of a priori distributions when constructing Bayesian estimates [17,18] or maximum likelihood estimates.
The parameters of probability distributions (continuous or discrete) can be estimated using various mathematical statistics methods, including the method of entropy maximization. Their efficiency in hydrological problems was compared in [19]. Apparently, the method of entropy maximization yields the best results in such problems due to the structure of hydrological data.
The problem of estimating some model characteristics on real data was further developed in connection with the appearance of new machine learning methods, called randomized machine learning (RML) [20]. They are based on models with random parameters, and it is necessary to estimate the probability density functions of these parameters. The estimation algorithm (RML algorithm) is formulated in terms of functional entropy-linear programming [21].
The original statement of this problem was to estimate probability density functions (PDFs) in RML procedures. However, in recent times, a more general context has been assumed—the method of maximizing entropy functionals for constructing estimates of continuous probability density functions using real data (randomized maximum entropy (RME) estimation).
In this paper, the general RME estimation problem is formulated; its solutions, numerical algorithms, and the asymptotic properties of the solutions are studied. The theoretical results are illustrated by an important application—estimating the evolution of the thermokarst lake area in Western Siberia.

2. Statement of the RME Estimation Problem

Consider a scalar continuous function φ ( x , θ ) with parameters θ = { θ 1 , , θ n } . Assume that this function is a characteristic of an object’s model with an input x and an output y ^ . Let x ( r ) = { x [ 1 ] , , x [ r ] } and y ( r ) = { y [ 1 ] , , y [ r ] } be given measurements at time t = 1 , , r . Note that the latter measurements are obtained with random vector errors ξ = { ξ [ 1 ] , , ξ [ r ] } , which are generally different for different time points.
Thus, after r measurements, the model and observations are described by the equations
y ^ = Γ ( x ( r ) , θ ) , v ^ = y ^ + ξ ,
where the vector function Γ ( x ( r ) , θ ) has the components φ ( x [ t ] , θ ) , where t = 1 , , r are the time points; v ^ denotes the observed output of the model containing measurement noises of the object’s output.
Let us introduce a series of assumptions necessary for further considerations.
  • The random parameters are θ Θ R n , Θ = [ θ , θ + ] , where [ , ] is a vectorial segment in the space R n [22].
  • The PDF P ( θ ) of the parameters is continuously differentiable on its support Θ .
  • The random noise is ξ Ξ R r , where
    Ξ = t = 1 r Ξ t , Ξ t = [ ξ t , ξ t + ] .
  • The PDF Q ( ξ ) of the measurement noises is continuously differentiable on the support Ξ and also has the multiplicative structure
    Q ( ξ ) = t = 1 r Q t ( ξ [ t ] ) .
The estimation problem is stated as follows: Find the estimates of the PDFs P * ( θ ) and Q * ( ξ ) that maximize the generalized information entropy functional
H [ P ( θ ) , Q ( ξ ) ] = Q P ( θ ) ln P ( θ ) d θ t = 1 r Ξ t Q t ( ξ [ t ] ) ln Q t ( ξ [ t ] ) max
subject to
—the normalization conditions of the PDFs given by
Θ P ( θ ) d θ = 1 ; Ξ t Q t ( ξ [ t ] ) d ξ [ t ] = 1 , t = 1 , , r ;
and
—the empirical balance conditions
Φ [ P ( θ ) , Q ( ξ ) ] = y ( r ) , Φ [ P ( θ ) , Q ( ξ ) ] = { Φ 1 [ P ( θ ) , Q ( ξ ) ] , , Φ r [ P ( θ ) , Q ( ξ ) ] } Φ t [ P ( θ ) , Q ( ξ ) ] = Θ φ ( x [ t ] , θ ) P ( θ ) d θ + Ξ t Q t ( ξ [ t ] ) ξ [ t ] d ξ [ t ] , t = 1 , , r ,
where y ( r ) = { y [ 1 ] , , y [ r ] } are the measured data on the object’s output. We will denote the problems (4)–(6) as the RME estimate.
Problems (4)–(6) are of the Lyapunov type [23,24], as they have an integral functional and also integral constraints.

3. Optimality Conditions

The optimality conditions in optimization problems of the Lyapunov type are formulated in terms of Lagrange multipliers. In addition, the Gâteaux derivatives of the problem’s functionals are used [25].
The Lagrange functional is defined by
L [ P ( θ ) , Q ( ξ ) , μ , η , λ ] = H [ P ( θ ) , Q ( ξ ) ] + μ 1 Θ P ( θ ) d θ + + t = 1 r η t 1 Ξ t Q t ( ξ [ t ] ) d ξ [ t ] + + t = 1 r λ t y [ t ] Θ P ( θ ) φ ( x [ t ] , θ ) d θ Ξ t Q t ( ξ [ t ] ) ξ [ t ] d ξ [ t ] .
Let us recall the technique for obtaining optimality conditions in terms of the Gâteaux derivatives [26].
The PDFs P ( θ ) and Q t ( ξ [ t ] ) , ( t = 1 , , r ) , are continuously differentiable, i.e., belong to the class C 1 . Choosing arbitrary functions h ( θ ) and w t ( ξ [ t ] ) , ( t = 1 , , r ) , from this class, we represent the PDFs as
P ( θ ) = P * ( θ ) + α h ( θ ) ; Q t ( ξ [ t ] ) = Q t * ( ξ [ t ] ) + β t w i ( ξ [ t ] ) , t = 1 , , r ,
where the PDFs P * ( θ ) and Q t * ( ξ [ t ] ) are the solutions of problems (4)–(6), and α and β 1 , , β r are parameters.
Next, we substitute the above representations of the PDFs into (7). If all functions from C 1 are assumed to be fixed, the Lagrange functional depends on the parameters α and β 1 , , β r . Then, the first-order optimality conditions for the functional (7) in terms of the Gâteaux derivative take the form
d L d α | ( α , β ) = 0 = 0 , L β t | ( α , β ) = 0 = 0 , t = 1 , , r .
These conditions lead to the following system of integral equations:
Θ h ( θ ) Ω ( θ ) d θ = 0 , Ξ t w t ( ξ [ t ] ) Υ t ( ξ [ t ] ) d ξ [ t ] = 0 , t = 1 , , r ,
which are satisfied for any functions h ( θ ) and w 1 ( ξ [ 1 ] ) , , w r ( ξ [ r ] ) from C 1 if and only if
Ω ( θ ) = 0 , Υ t ( ξ [ t ] ) = 0 , t = 1 , , r .
The optimality conditions for problems (4)–(6) are given by
Ω ( θ ) = ln P * ( θ ) + 1 μ t = 1 s λ t φ ( x [ t ] , θ ) = 0 ,
Υ t ( ξ [ t ] ) = ln Q t * ( ξ [ t ] ) + 1 η t λ t ξ [ t ] = 0 , t = 1 , , r .
Hence, the entropy-optimal PDFs of the model parameters and measurement noises have the form
P * ( θ | y ( r ) , x ( r ) ) = exp j = 1 r λ j ( y ( r ) , x ( r ) ) φ ( x [ j ] , θ ) P ( λ ( y ( r ) , x ( r ) ) , Q t * ( ξ [ t ] | y ( r ) , x ( r ) ) = exp λ t ( y ( r ) , x ( r ) ) ξ [ t ] Q t ( λ t ( y ( r ) , x ( r ) ) , t = 1 , , r ,
where
P ( λ ( y ( r ) , x ( r ) ) = Θ exp j = 1 r λ j ( y ( r ) , x ( r ) ) φ ( x [ j ] , θ ) d θ , Q t ( λ t ( y ( r ) , x ( r ) ) = Ξ t exp λ t ( y ( r ) , x ( r ) ) ξ [ t ] d ξ [ t ] , t = 1 , , r .
Due to equalities (10) and (11), the entropy-optimal PDFs are parametrized by the Lagrange multipliers λ 1 , , λ r , which represent the solutions of the empirical balance equations
G ( λ ( y ( r ) , x ( r ) ) ) P ( λ ( y ( r ) , x ( r ) ) ) + E t ( λ t ( y ( r ) , x ( r ) ) ) Q t λ t ( y ( r ) , x ( r ) ) = y [ t ] , t = 1 , , r ,
where
G ( λ ( y ( r ) , x ( r ) ) ) = Θ φ ( x [ t ] , θ ) exp j = 1 r λ j ( y ( r ) , x ( r ) ) φ ( x [ j ] , θ ) d θ , E t ( λ t ( y ( r ) , x ( r ) ) ) = Ξ t ξ [ t ] exp λ t ( y ( r ) , x ( r ) ) ξ [ t ] d ξ [ t ] , t = 1 , , r .
The solution λ * ( y ( r ) , x ( r ) ) of these equations depends on the sample ( y ( r ) , x ( r ) ) used for constructing the RME estimates of the PDFs.

4. Existence of an Implicit Function

The second term in the balance Equations (12) and (13) is the mean value of the noise in each measurement t. The noises and their characteristics are often assumed to be equal over the measurements:
ξ ξ [ t ] ξ + , t = 1 , , r .
Therefore, the mean value of the noise is given by
ξ ¯ = E t ( λ t ( y ( r ) , x ( r ) ) ) Q t λ t ( y ( r ) , x ( r ) ) , ξ ξ ¯ ξ + .
The balance equations can be written as
W t ( λ | y ˜ [ t ] , x ( r ) ) = Θ φ ( x [ t ] , θ ) y ˜ [ t ] exp j = 1 r λ j ( y ˜ ( r ) , x ( r ) ) φ ( x [ j ] , θ ) d θ = 0 , t = 1 , , r ,
where
y ˜ [ t ] = y [ t ] ξ ¯ , y ˜ ( r ) = { y ˜ [ 1 ] , , y ˜ [ r ] } .
In the vector form, Equation (16) is described by
W ( λ | y ˜ ( r ) , x ( r ) ) = 0 .
Equation (21) defines an implicit function λ ( y ˜ ( r ) , x ( r ) ) . The existence and properties of this implicit function depend on the properties of the Jacobian matrix
J λ ( λ | y ˜ ( r ) , x ( r ) ) = W t λ i | ( t , i ) = 1 , , r ,
which has the elements
W t λ i = Q φ ( x [ t ] , θ ) y ˜ [ t ] φ ( x [ i ] , θ ) j = 1 r exp j = 1 r λ j φ ( x [ j ] , θ ) d θ .
Theorem 1.
Let the next conditions be valid (assume that):
  • The function φ ( x ( r ) , θ ) is continuous in all variables.
  • For any ( x ( r ) , y ˜ ( r ) ) R r × R r ,
    det J λ ( λ | y ˜ ( r ) , x ( r ) ) 0 ,
    lim λ W ( λ | y ˜ ( r ) , x ( r ) ) = ± .
Then, there exists a unique implicit function λ ( y ˜ ( r ) , x ( r ) , ) defined on R r × R r .
Proof of Theorem 1. 
Due to the first assumption, the continuous function W ( λ | y ˜ ( r ) , x ( r ) ) induces the vector field Φ ( y ˜ ( r ) , x ( r ) ) ( λ ) = W ( λ | y ˜ ( r ) , x ( r ) ) in the space R r × R r .
We choose an arbitrary vector u in R r and define the vector field
Π u ( λ ) = Φ ( y ˜ ( r ) , x ( r ) ) ( λ ) u .
By condition (22), the field Π u ( λ ) with a fixed vector u has no zeros on the spheres λ = ϱ of a sufficiently large radius ϱ .
Hence, a rotation is well defined on the spheres λ = ϱ of a sufficiently large radius ϱ . For details, see [27].
Consider the two vector fields
Π u ( 1 ) ( λ ) = Φ ( y ˜ ( r ) , x ( r ) ) ( λ ) u ( 1 ) , Π u ( 2 ) ( λ ) = Φ ( y ˜ ( r ) , x ( r ) ) ( λ ) u ( 2 ) .
These vector fields are homotopic on the spheres of a sufficiently large radius, i.e., the field
Ω ( λ ) = α Π u ( 1 ) ( λ ) + ( 1 α ) Π u ( 2 ) ( λ ) = Φ ( y ˜ ( r ) , x ( r ) ) ( λ ) [ α u ( 1 ) + ( 1 α ) u ( 2 ) ]
has no zeros on the spheres of a sufficiently large radius for any α [ 0 , 1 ] . Homotopic fields have identical rotations [27]:
γ ( Π u ( 1 ) ( λ ) ) = γ ( Π u ( 2 ) ( λ ) ) .
The vector fields Π u ( 1 ) ( λ ) and Π u ( 2 ) ( λ ) are nondegenerate on the spheres of a sufficiently large radius; in the ball λ ϱ 1 < ϱ , however, each of them may have a number of singular points. We denote by κ ( u ( 1 ) ) and κ ( u ( 2 ) ) the numbers of singular points of the vector fields Π u ( 1 ) ( λ ) and Π u ( 2 ) ( λ ) , respectively. As the vector fields are homotopic,
κ ( u ( 1 ) ) = κ ( u ( 2 ) ) = κ .
In view of (21), these singular points are isolated.
Now, let us utilize the index of a singular point introduced in [27]:
ind ( λ 0 ) = ( 1 ) β ( λ 0 ) ,
where β ( λ 0 ) is the number of eigenvalues of the matrix Π u ( λ 0 ) = J λ ( λ 0 | , y ˜ ( r ) , x ( r ) ) with the negative real part. By definition, the value of this index depends not on the magnitude of β ( λ 0 ) , but on its parity. Due to condition (21), all singular points have the same parity. Really, J λ ( λ 0 | y ˜ ( r ) , x ( r ) ) 0 , and hence, for any y ˜ ( r ) , x ( r ) R r × R r , the eigenvalues of the matrix J λ ( λ 0 | y ˜ ( r ) , x ( r ) ) may move from the left half-plane to the right one in pairs only: Real eigenvalues are transformed into pairs of complex–conjugate ones, passing through the imaginary axis.
In view of this fact, the rotation of the homotopic fields (20) is given by
γ ( Π u ) = κ ( 1 ) β ,
where β is the number of eigenvalues of the matrix Π u ( λ ) for some singular point.
It remains to demonstrate that the vector field Π u ( λ ) has a unique singular point in the ball λ ϱ 1 < ϱ . Consider the equation
Π u ( λ ) = Φ ( y ˜ ( r ) , x ( r ) ) ( λ ) u = 0 .
Assume that for each fixed pair ( y ˜ ( r ) , x ( r ) ) , this equation has κ singular points, i.e., the functions λ ( 1 ) ( y ˜ ( r ) , x ( r ) ) , , λ ( κ ) ( y ˜ ( r ) , x ( r ) ) . Therefore, it defines a multivalued function λ ( y ˜ ( r ) , x ( r ) ) , whose κ branches are isolated (the latter property follows from the isolation of the singular points). Due to condition (21), each of the branches λ ( i ) ( y ˜ ( r ) , x ( r ) ) defines an open set in the space R r , and
i = 1 κ λ ( i ) ( y ˜ ( r ) , x ( r ) ) = R r .
This is possible if and only if κ = 1 . Hence, for each pair ( y ˜ ( r ) , x ( r ) ) from R r × R r , there exists a unique function λ * ( y ˜ ( r ) , x ( r ) ) for which the function W ( λ | y ˜ ( r ) , x ( r ) ) will vanish. □
Theorem 2.
Under the assumptions of Theorem 1, the function λ ( y ˜ ( r ) , x ( r ) ) is real analytical in all variables.
Proof of Theorem 2. 
From (15), it follows that the function W ( λ | y ˜ ( r ) , x ( r ) ) is analytical in all variables. Therefore, the left-hand side of Equation (15) can be expanded into the generalized Taylor series [26], and the solution can be constructed in the form of the generalized Taylor series as well. The power elements of this series are determined using a recursive procedure. □

5. Asymptotic Efficiency of RME Estimates

The RME estimate yields the entropy-optimal PDFs (10) for the arrays of input and output data, each of size r . For the sake of convenience, consider the PDFs parametrized by the exponential Lagrange multipliers z = exp ( λ ) . Then, equalities (10) take the form
P * θ , z ( y ( r ) , x ( r ) ) = j = 1 r [ z j ( y ( r ) , x ( r ) ) ] φ ( x [ j ] , θ ) Θ j = 1 r [ z j ( y ( r ) , x ( r ) ) ] φ ( x [ j ] , θ ) d θ , Q t * ( ξ [ t ] , z t ( y ( r ) , x ( r ) ) ) = [ z t ( y ( r ) , x ( r ) ) ] ξ [ t ] Ξ t [ z t ( y ( r ) , x ( r ) ) ] ξ [ t ] d ξ [ t ] , t = 1 , , r .
Consequently, the structure of the PDF significantly depends on the values of the exponential Lagrange multipliers z , which, in turn, depend on the data arrays y ( r ) and x ( r ) .
Definition 1.
The estimates P * ( θ , z * ) and Q t * ( ξ [ t ] , z t * ) are said to be asymptotically efficient if
lim r P * θ , z ( y ( r ) , x ( r ) ) = P * ( θ , z * ) , lim r Q t * ( ξ [ t ] , z t ( y ( r ) , x ( r ) ) ) = Q t * ( ξ [ t ] , z t * ) , t = 1 , , r ;
where
z * = lim r z ( y ( r ) , x ( r ) ) .
Consider the empirical balance Equation (21), written in terms of the exponential Lagrange multipliers:
Φ t ( z , y ˜ ( r ) , x ( r ) ) = Θ j = 1 r [ z j ( y ˜ ( r ) , x ( r ) ) ] φ ( x [ j ] , θ ) φ ( x [ t ] , θ ) y ˜ [ t ] d θ = 0 , t = 1 , , r .
As has been demonstrated above, Equation (26) defines an implicit analytical function z = z ( y ˜ ( r ) , x ( r ) ) for ( y ˜ ( r ) , x ( r ) ) R r × R r .
Differentiating the left- and right-hand sides of these equations with respect to y ˜ ( r ) and x ( r ) yields
z y ˜ ( r ) = Φ z 1 Φ y ˜ ( r ) , z x ( r ) = Φ z 1 Φ x ( r ) .
Then, passing to the norms and using the inequality for the norm of the product of matrices [28], we obtain the equalities
0 z y ˜ ( r ) Φ z 1 Φ y ˜ ( r ) , 0 z x ( r ) Φ z 1 Φ x ( r ) .
Both of the inequalities incorporate the norm of the inverse matrix Φ z 1 .
Lemma 1.
Let a square matrix A be nonsingular, i.e., det A 0 . Then, there exists a constant α > 1 such that
1 A A 1 α A .
Proof of Lemma 1. 
Since the matrix A is nondegenerate, the elements a i k ( 1 ) of the inverse matrix A 1 can be expressed in terms of the algebraic complement (adjunct) of the element a k i in the determinant of the matrix A [28]:
a i k ( 1 ) = A k i det A , ( k , i ) = 1 , , r ,
and they are bounded:
a i k ( 1 ) M < , A 1 < .
Hence, there exists a constant α > 1 for which inequality (29) is satisfied. □
Lemma 1 can be applied to the norm Φ z 1 of the inverse matrix. As a result,
Φ z 1 Φ z 1 α Φ z 1 ,
where
Φ z = r max t , j Φ t z j .
Lemma 2.
Let
Φ y ˜ ( r ) ϱ < , Φ x ( r ) ω < .
Then,
lim r z y ˜ ( r ) = lim r z x ( r ) = 0 .
Proof of Lemma 2. 
According to (28), (31), and (32) we have:
z y ˜ ( r ) 1 r ϱ b , z x ˜ ( r ) 1 r ω b ,
where b = max t , j Φ t z j .
Whence, it follows that for the sample length r , the norms of relevant Jacobians tend to zero, and function z = z ( y ˜ ( r ) , x ( r ) ) tends to the vector z * (25). □

6. Thermokarst Lake Area Evolution in Western Siberia: RME Estimation and Testing

Permafrost zones, which occupy a significant part of the Earth’s surface, are the locales of thermokarst lakes, which accumulate greenhouse gases (methane and carbon dioxide). These gases make a considerable contribution to global climate change.
The source data in studies of the evolution of thermokarst lake areas are acquired through remote sensing of the Earth’s surface and ground measurements of meteorological parameters [29,30].
The state of thermokarst lakes is characterized by their total area S [ t ] in a given region, measured in hectares (ha), and the factors influencing thermokarst formations—the average annual temperatures T [ t ] , measured in Celsius (C ), and the annual precipitation R [ t ] , measured in millimeters (mm), where t denotes the calendar year.
We used the remote sensing data and ground measurements of the meteorological parameters for a region of Western Siberia between 65 N– 70 N and 65 E– 95 E that were presented in [31]. We divided the available time series into two groups, which formed the training collection L ( t = 0 , , 24 ) and the testing collection T ( t = 25 , , 35 ) .

6.1. RME Estimation of Model Parameters and Measurement Noises

The temporal evolution of the lake area S [ t ] is described by the following dynamic regression equation with two influencing factors, the average annual temperature T [ t ] and the annual precipitation R [ t ] :
S ^ [ t ] = a 0 + k = 1 p a k S ^ [ t k ] + a ( p + 1 ) T [ t ] + a ( p + 2 ) ( R [ t ] , v ^ [ t ] = S ^ [ t ] + ξ [ t ] .
The model parameters and measurement noises are assumed to be random and of the interval type:
a k A k = [ a k , a k + ] , k = 0 , d o t s , p + 2 , a = { a 0 , , a p , a p + 1 , a p + 2 } A = k = 0 p + 2 A k .
The probabilistic properties of the parameters are characterized by a PDF P ( a ) .
The variable v ^ [ t ] is the observed output of the model, and the values of the random measurement noise ξ [ t ] at different time instants t may belong to different ranges:
ξ [ t ] Ξ t = [ ξ [ t ] , ξ + [ t ] ] ,
with a PDF Q t ( ξ [ t ] ) , ( t = 0 , , N ) , where N denotes the length of the observation interval. The order p = 4 and the parameter ranges for the dynamic randomized regression model (34) (see Table 1 below) were calculated based on real data using the empirical correlation functions and the least-square estimates of the residual variances.
For the training collection L , the model can be written in the vector–matrix form
S ^ = S ^ a + a 5 T + a 6 R , v ^ = S ^ + ξ ,
with the matrix
S ^ = 1 S ^ [ 3 ] S ^ [ 0 ] 1 S ^ [ 4 ] S ^ [ 1 ] 1 S ^ [ 23 ] S ^ [ 20 ]
and the vectors S ^ = [ S ^ [ 4 ] , , S ^ [ 24 ] ] , T = [ T [ 4 ] , , T [ 24 ] ] , R = [ R [ 4 ] , , R [ 24 ] ] , and v ^ = [ v [ 4 ] , , v [ 24 ] ] ; ξ = [ ξ [ 4 ] , , ξ [ 24 ] ] .
The RME estimation procedure yielded the following entropy-optimal PDFs of the model parameters (36) and measurement noises:
P * ( a , λ ) = k = 0 6 exp ( q k a k ) P k ( λ ) , P k ( λ ) = A exp ( q k a k ) d a k , q 0 = t = 4 24 λ n , q k = t = p 24 λ n S [ t k ] , k = 1 , , 4 , q 5 = t = 4 24 λ t T [ t ] , q 6 = t = p 24 λ t R [ t ] , Q * ( ξ , λ ¯ ) = exp ( λ ¯ ξ ) Q , Q = Ξ exp ( λ ¯ ξ ) d ξ , λ ¯ = q 0 20 .
Note that S [ t k ] , T [ t ] , and R [ t ] are the data from the collection L . The two-dimensional sections of the function P * ( a ) and the function Q * ( ξ ) are shown in Figure 1.

6.2. Testing

Testing was performed using the data from the collection T , which included the lake area S [ t ] , the average annual temperature T [ t ] , and the annual precipitation R [ t ] , t = 25 , , 35 . An ensemble of trajectories of the model’s observed output v [ t ] was generated using Monte Carlo simulations and sampling of the entropy-optimal PDFs P * ( a ) , Q * ξ on the testing interval. In addition, the trajectory of the empirical means v ¯ [ t ] and the dimensions of the empirical standard deviation area were calculated.
The quality of RME estimation was characterized by the absolute and relative errors:
A b s E r r = t = 26 35 S [ t ] v ¯ [ t ] 2 = 0.3446 ,
R e l E r r = t = 26 35 S [ t ] v ¯ [ t ] 2 t = 26 35 S 2 [ t ] + t = 26 35 v ¯ 2 [ t ] = 0.0089 .
The generated ensemble of the trajectories is shown in Figure 2.

7. Discussion

Given an available data collection, the RME procedure allows estimation of the PDFs of a model’s random parameters under measurement noises corresponding to the maximum uncertainty (maximum entropy). In addition, this procedure needs no assumptions about the structure of the estimated PDFs or the statistical properties of the data and measurement noises.
An entropy-optimal model can be simulated by sampling the PDFs to generate an empirical ensemble of a model’s output trajectories and to calculate its empirical characteristics (the mean and median trajectories, the standard deviation area, interquartile sets, and others).
The RME procedure was illustrated with an example of the estimation of the parameters of a linear regression model for the evolution of the thermokarst lake area in Western Siberia. In this example, the procedure demonstrated a good estimation accuracy.
However, these positive features of the procedure were achieved with computational costs. Despite their analytical structure, the RME estimates of the PDFs depend on Lagrange multipliers, which are determined by solving the balance equations with the so-called integral components (the mathematical expectations of random parameters and measurement noises). Calculating the values of multidimensional integrals may require appropriate computing resources.

8. Conclusions

The problem of randomized maximum entropy estimation of a probability density function based on real available data has been formulated and solved. The developed estimation algorithm (the RME algorithm) finds the conditional maximum of an information entropy functional on a set of admissible probability density functions characterized by the empirical balance equations for Lagrange multipliers. These equations define an implicit dependence of the Lagrange multipliers on the data collection. The existence of such an implicit function for any values in a data collection has been established. The function’s behavior for a data collection of a greater size has been studied, and the asymptotic efficiency of the RME estimates has been proved.
The positive features of RME estimates have been illustrated with an example of estimation and testing a linear dynamic regression model of the evolution of the thermokarst lake area in Western Siberia with real data.

Funding

This research was funded by the Ministry of Science and Higher Education of the Russian Federation, project no. 075-15-2020-799.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Vapnik, V.N. Statistical Learning Theory; Wiley: New York, NY, USA, 1998. [Google Scholar]
  2. Witten, I.H.; Eibe, F. Data Mining: Practical Machine Learning Tools and Techniques; Morgan Kaufmann: Heidelberg, Germany, 2005. [Google Scholar]
  3. Bishop, C.M. Pattern Recognition and Machine Learning. Series: Information Theory and Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
  4. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2001. [Google Scholar]
  5. Vorontsov, K.V. Mathematical Methods of Learning by Precedents: A Course of Lectures; Moscow Institute of Physics and Technology: Moscow, Russia, 2013. [Google Scholar]
  6. Goldberger, A.S. A Course in Econometrics; Harvard University Press: Cambridge, UK, 1991. [Google Scholar]
  7. Aivazyan, S.A.; Enyukov, I.S.; Meshalkin, L.D. Prikladnaya Statistika: Issledovanie Zavisimostei (Applied Statistics: Study of Dependencies); Finansy i Statistika: Moscow, Russia, 1985. [Google Scholar]
  8. Lagutin, M.B. Naglyadnaya Matematicheskaya Statistika (Visual Mathematical Statistics); BINOM, Laboratoriya Znanii: Moscow, Russia, 2013. [Google Scholar]
  9. Roussas, G. A Course of the Mathematical Statistics; Academic Press: San Diego, CA, USA, 2015. [Google Scholar]
  10. Malouf, R. A comparison of algorithms for maximum entropy parameters estimation. In Proceedings of the 6th Conference on Natural Language Learning 2002 (CoNLL-2002), Taipei, Taiwan, 31 August–1 September 2002; Volume 20, pp. 1–7. [Google Scholar]
  11. Borwein, J.; Choksi, R.; Marechal, P. Probability distribution of assets inferred from option prices via principle of maximum entropy. SIAM J. Optim. 2003, 14, 464–478. [Google Scholar] [CrossRef]
  12. Golan, A.; Judge, G.; Miller, D. Maximum Entropy Econometrics: Robust Estimation with Limited Data; John Wiley & Sons: New York, NY, USA, 1997. [Google Scholar]
  13. Golan, A. Information and Entropy econometrics—A review and synthesis. Found. Trends Econom. 2008, 2, 1–145. [Google Scholar] [CrossRef]
  14. Csiszar, I.; Matus, F. On minimization of entropy functionals under moment constraints. In Proceedings of the IEEE International Symposium on Information Theory, Toronto, ON, Canada, 6–11 July 2008. [Google Scholar]
  15. Loubes, J.-M. Approximate maximum entropy on the mean for instrumental variable regression. Stat. Probab. Lett. 2012, 82, 972–978. [Google Scholar] [CrossRef]
  16. Borwein, J.M.; Lewis, A.S. Partially-finite programming in L1 and existence of maximum entropy estimates. SIAM J. Optim. 1993, 3, 248–267. [Google Scholar] [CrossRef] [Green Version]
  17. Burg, J.P. The relationship between maximum entropy spectra and maximum likelihood spectra. Geophysics 1972, 37, 375–376. [Google Scholar] [CrossRef]
  18. Christakos, G. A Bayesian/maximum entropy view to the spatial estimation problem. Math. Geol. 1990, 22, 763–777. [Google Scholar] [CrossRef]
  19. Singh, V.P.; Guo, H. Parameter estimation for 3-parameter generalized Pareto distribution by the principle of maximum entropy. Hydrol. Sci. J. 1994, 40, 165–181. [Google Scholar] [CrossRef] [Green Version]
  20. Popkov, Y.S.; Dubnov, Y.A.; Popkov, A.Y. Randomized machine learning: Statement, solution, applications. In Proceedings of the 2016 IEEE 8th International Conference on Intelligent Systems (IS), Sofia, Bulgaria, 4–6 September 2016. [Google Scholar] [CrossRef]
  21. Popkov, A.Y.; Popkov, Y.S. New methods of entropy-robust estimation for randomized models under limited data. Entropy 2014, 16, 675–698. [Google Scholar] [CrossRef] [Green Version]
  22. Krasnosel’skii, M.A.; Vainikko, G.M.; Zabreyko, R.P.; Ruticki, Y.B.; Stet’senko, V.V. Approximate Solutions of Operator Equations; Wolters-Noordhoff Publishing: Groningen, The Netherlands, 1972. [Google Scholar] [CrossRef]
  23. Ioffe, A.D.; Tikhomirov, V.M. Theory of Extremal Problems; Elsevier: New York, NY, USA, 1974. [Google Scholar]
  24. Alekseev, V.M.; Tikhomirov, V.M.; Fomin, S.V. Optimal Control; Springer: Boston, MA, USA, 1987. [Google Scholar]
  25. Kaashoek, M.A.; van der Mee, C. Recent Advances in Operator Theory and Its Applications; Birkhäuser Basel: Basel, Switzerland, 2006. [Google Scholar]
  26. Kolmogorov, A.N.; Fomin, S.V. Elements of the Theory of Functions and Functional Analysis; Dover Publication: New York, NY, USA, 1999. [Google Scholar]
  27. Krasnoselskii, M.A.; Zabreiko, P.P. Geometrical Methods of Nonlinear Analysis; Springer: Berlin, Germany; New York, NY, USA, 1984. [Google Scholar]
  28. Gantmacher, F.R.; Brenner, J.L. Applications of the Theory of Matrices; Dover: New York, NY, USA, 2005. [Google Scholar]
  29. Riordan, B.; Verbula, D.; McGruire, A.D. Shrinking ponds in subarctic Alaska based on 1950–2002 remotely sensed images. J. Geophys. Res. 2006, 111, G04002. [Google Scholar] [CrossRef]
  30. Kirpotin, S.; Polishchuk, Y.; Bruksina, N. Abrupt changes of thermokarst lakes in Western Siberia: Impacts of climatic warming on permafrost melting. Int. J. Environ. Stud. 2009, 66, 423–431. [Google Scholar] [CrossRef]
  31. Western Siberia Thermokarsk Lakes Dataset. Available online: https://cloud.uriit.ru/index.php/s/0DOrxL9RmGqXsV0 (accessed on 20 February 2021).
Figure 1. Two-dimensional section of the function P* and the function Q*.
Figure 1. Two-dimensional section of the function P* and the function Q*.
Mathematics 09 00548 g001
Figure 2. Ensemble of the trajectories (gray domain), the standard deviation area (dark gray domain), the empirical mean trajectory, and the lake area data.
Figure 2. Ensemble of the trajectories (gray domain), the standard deviation area (dark gray domain), the empirical mean trajectory, and the lake area data.
Mathematics 09 00548 g002
Table 1. Parameter ranges for the model.
Table 1. Parameter ranges for the model.
a a 0 a 1 a 2 a 3 a 4 a 5 a 6
a −0.50−0.14−0.49−0.53−0.440.460.19
a + 0.070.520.200.190.191.140.88
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Popkov, Y.S. Qualitative Properties of Randomized Maximum Entropy Estimates of Probability Density Functions. Mathematics 2021, 9, 548. https://0-doi-org.brum.beds.ac.uk/10.3390/math9050548

AMA Style

Popkov YS. Qualitative Properties of Randomized Maximum Entropy Estimates of Probability Density Functions. Mathematics. 2021; 9(5):548. https://0-doi-org.brum.beds.ac.uk/10.3390/math9050548

Chicago/Turabian Style

Popkov, Yuri S. 2021. "Qualitative Properties of Randomized Maximum Entropy Estimates of Probability Density Functions" Mathematics 9, no. 5: 548. https://0-doi-org.brum.beds.ac.uk/10.3390/math9050548

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop