Next Article in Journal
Symbolic Analysis Applied to the Specification of Spatial Trends and Spatial Dependence
Previous Article in Journal
Early Detection of Alzheimer’s Disease: Detecting Asymmetries with a Return Random Walk Link Predictor
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Weyl Prior and Bayesian Statistics

1
Department of Mathematics, The University of British Columbia Okanagan, Kelowna, BC V1V 1V7, Canada
2
School of Mathematics and Statistics, Carlton University, Ottawa, ON K1S 5B6, Canada
*
Author to whom correspondence should be addressed.
Submission received: 16 February 2020 / Revised: 12 April 2020 / Accepted: 17 April 2020 / Published: 20 April 2020
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
When using Bayesian inference, one needs to choose a prior distribution for parameters. The well-known Jeffreys prior is based on the Riemann metric tensor on a statistical manifold. Takeuchi and Amari defined the α -parallel prior, which generalized the Jeffreys prior by exploiting a higher-order geometric object, known as a Chentsov–Amari tensor. In this paper, we propose a new prior based on the Weyl structure on a statistical manifold. It turns out that our prior is a special case of the α -parallel prior with the parameter α equaling n , where n is the dimension of the underlying statistical manifold and the minus sign is a result of conventions used in the definition of α -connections. This makes the choice for the parameter α more canonical. We calculated the Weyl prior for univariate Gaussian and multivariate Gaussian distribution. The Weyl prior of the univariate Gaussian turns out to be the uniform prior.

1. Introduction

In Bayesian inference, a parameter is regarded as a random variable Θ . A density of Θ is called, by abuse of terminology, a prior distribution p ( θ ) . After collecting some data, one obtains a conditional density p ( x | θ ) , referred to as the likelihood function. The Bayes’ theorem then computes the posterior distribution p ( θ | x ) using p ( θ ) and p ( x | θ ) . This is interpreted as an update of the information about the unknown parameter Θ in Bayesian inference. One such choice of the prior distribution p ( θ ) is Jeffreys prior J ω , which is the correct choice of uniform distribution. Here, the word “uniform” means uninformative, not favorable of any particular choice of the parameter.
Information geometry, in its narrowest sense, is an attempt to use differential geometry to study statistical inference. It has found applications in statistical inference, signal processing, and machine learning [1]. References [2,3] are two elementary introductions. In information geometry, geometric structures, for example metric tensors g and affine connections ∇, can be put on the set of prior distributions P ( θ ) . These geometric structures help to single out some particular prior distributions, for example, the Jeffreys prior J ω , which, by the fundamental theorem of Riemannian geometry, is the unique volume form parallel with respect to the Levi–Civita connection L C . Since the Jeffreys prior J ω is provided by geometry, it is automatically invariant under reparametrization, which reflects the opinion that information can be at best not lost during a transformation of parameters and this is encoded in the notion of sufficient statistics. Similarly, if one can find a unique prior distribution satisfying some specified geometric conditions, then that prior distribution is called canonically chosen. Matsuzoe, Takeuchi, and Amari used information geometry to define the α -parallel prior α ω such that, when α = 0 , it reduces to the Jeffreys prior [4].
Historically, Weyl proposed a generalization of general relativity to unify gravity and electromagnetism. Einstein soon pointed out that Weyl’s theory predicted substantial broadening of the characteristic length of atoms, which is contradictory to the well-observed thin atomic spectra. Even though Weyl geometry failed the unification of gravity and electromagnetism, which is still an open problem, Weyl geometry has found applications in possible generalization of general relativity [5] and the differential-geometric study of defects in continuum mechanics [6]. Weyl geometry is kept in mathematics. The relation of affine differential geometry, Weyl geometry, and Riemannian geometry are shown below. Let π : E B be a fibre bundle with base B and let each fibre π 1 ( x ) , x B , be a Lie group G, called the structure group of the fibre bundle. For different G, we obtain different geometries as follows:
  • G L ( n ) , the general linear group, affine differential geometry;
  • C ( n ) : = { k A   |   k R + ,   A O ( n ) } , the conformal group, Weyl geometry;
  • O ( n ) , the orthogonal group, Riemannian geometry.
With the reduction of structure groups O ( n ) C ( n ) G L ( n ) , and the fact that smaller the structure group, the more geometric properties are expected. In our case, the reduction of group gives rise to a canonical choice for the parameter α of the α -parallel prior. For more about bundle-theoretic differential geometry, see [7].
In this paper, we will use Weyl geometry to define a prior distribution for Bayesian inference, which we call the Weyl prior. We will elucidate the relation between the dimension of a statistical manifold and the parameter α in Takeuchi and Amari’s α -parallel prior.
The organization of the paper is as follows: In Section 2, we review information geometry and the α -parallel prior. We discuss Weyl geometry in Section 3. We define the Weyl prior, and elucidate the relation between the Weyl prior and the α -parallel prior in Section 4. We calculate the Weyl prior for the univariate Gaussian distribution as an example in Section 5 and the multivariate Gaussian distribution in Section 6. All functions in the paper are real-valued and smooth, all connections are torsion-free, and the Einstein summation rule is used.

2. Information Geometry and α -Priors

In this section, we review some basics of information geometry. For more details, see [1].
Let us consider a statistical model P , which is a set of parametric densities P = { p ( x | θ ) } . P can be geometrized as follows: first, we introduce the Fisher metric tensor, which is a 2nd order tensor,
Definition 1.
The Fisher metric tensor is defined by
g i j = E θ i l j l
where E θ is the the transition kernel X × Θ [ 0 , ) , l is the log likelihood function, and i is the partial derivative with respect to coordinate i.
Then, we introduce the Amari–Chentsov tensor, which is a 3rd order tensor.
Definition 2
(Amari–Chentsov Tensor). The Amari–Chentsov tensor C is defined by
C i j k = E θ i l j l k l .
Remark 1.
The Amari–Chentsov tensor C defined above satisfies
C = g .
In other words, C is the covariant derivative of the metric tensor g .
In Riemannian geometry, C vanishes everywhere, which is required if the length of a tangent vector is to be preserved under the parallel transport. In information geometry, this requirement is dropped and thus a duality theory arises. Let ∇ be an arbitrary torsion-free affine connection on a Riemannian manifold ( M , g ) . The dual connection * of ∇ plays an important role in information geometry.
Definition 3
(Dual Connection). The dual connection * on a Riemannian manifold ( M , g ) with affine connection is defined as the unique affine connection satisfying the following equation:
X g p ( Y , Z ) = g p ( X Y , Z ) + g p ( Y , X * Z ) ,
where p M and X , Y , Z T p M .
Remark 2.
The dual connection * preserves the metric tensor g together with :
g p ( X , Y ) = g q ( Π X , Π * Y ) ,
where X , Y T p M , and Π and Π * are parallel transports induced by and * , respectively, along some curve from p to q. In general, g p ( X , Y ) g q ( Π X , Π Y ) and g p ( X , Y ) g q ( Π * X , Π * Y ) , unless = * = L C . See [1].
Now, we introduce α -connections.
Definition 4.
The α-connections are defined in terms of Christoffel symbols by
α Γ j k i = L C Γ j k i α 2 g i l C l j k ,
where α R and LC stands for Levi–Civita.
Remark 3.
The dual connection of α is then given by
α * = α .
Remark 4.
The α-parallel prior α ω is the volume form parallel with respect to α . Unlike the Jeffreys prior, which always exists, the α-parallel prior do not necessarily exist. An α-parallel prior exists if and only if the Ricci curvature tensor is symmetric [4]. However, if α ω exists for one α R , then it exists for all α [8].
The following characterization will be used in Section 5 to obtain the relation between the α -parallel prior and the Weyl prior defined therein.
Proposition 1.
[4] Let ( M , g , α ) be a statistical manifold. If there exists an exact 1-form T = d Ω for some function Ω determined by and g, then the α-parallel prior is α ω = exp { α 2 Ω } det g .
Remark 5.
d Ω is known as the Chebyshev 1-form. A differential form ϕ is called closed if the exterior derivative vanishes i.e., d ϕ = 0 , and is called exact if there exists a differential form φ such that ϕ = d φ . By definition, every exact form is closed. By Poincare’s lemma, every closed form is locally exact. Because statistical manifolds are simply connected, closedness implies exactness.

3. Weyl Geometry

In this section, we review some concepts of Weyl geometry which are needed in the next section. For more details, see [9].
Two Riemannian metrics g and g on a manifold M are said to be conformally equivalent if g = e λ g for some smooth function λ on M.
A conformal structure C on M is an equivalent class of conformally equivalent Riemannian metrics, i.e., C : = { g | g = e λ g } .
A Weyl structure is a map F : C Λ 1 ( M ) from the conformal structure C to the set of 1-forms on M , satisfying
F ( e λ g ) = F ( g ) d λ .
The image of g under F is called the Weyl 1-form F ( g ) : = φ .
A Weyl structure enables us to translate a scalar product (   ,   ) p at p to (   ,   ) q at q along a curve c : [ 0 , 1 ] M :
(   ,   ) q = exp 0 1 c * φ g q ,
where c * φ is the pullback of the Weyl 1 form φ along curve c . A Weyl manifold is a manifold with a Weyl structure.
Remark 6.
The meaning of this equation is: If we start with a scalar product ( , ) p at a point p arising from the conformal class C , then there exists a metric tensor g C extending ( , ) p , i.e., g p = ( , ) p . The value of this particular choice of g at another point q is g q . However, different choice of g gives rise to different g q . The scalar product ( , ) q determined by Weyl translation is proven to be independent of g [9]. Hence, by Weyl translation, we can compare lengths of vectors at different points on a Weyl manifold, whereas, with only the conformal structure C , we can only compare ratios of lengths.
An affine connection ∇ is said to be a Weyl connection if the parallel transport of a scalar product under ∇ coincides with the Weyl translation.
The Weyl connection is characterized by the following propositions.
Proposition 2
([9]). An affine connection is a Weyl connection if and only if g + φ g = 0 for all g C .
Proposition 3
(Fundamental Theorem of Weyl Geometry [9]). There exists a unique torsion-free Weyl connection W on a Weyl manifold M . The Christoffel symbols of W are given by
W Γ j k i = L C Γ j k i + 1 2 δ j i φ k + δ k i φ j g i m g j k φ m ,
where δ j i is the Kronecker delta.

4. Weyl Prior

In this section, we define the Weyl prior and show its relation to the α -parallel prior.
First, we define the Weyl prior as follows.
Definition 5
(Weyl Prior). Let ( M , g ) be an n-dimensional Riemannian manifold with the conformal structure C = [ g ] and the Weyl structure F . Let W be the Weyl connection. The Weyl prior W ω is defined as the unique volume form parallel with respect to W .
Remark 7.
The uniqueness of the Weyl prior is the result of the uniqueness of the Weyl connection.
Now, we prove the main result of this paper.
Theorem 1.
Let ( M , g ) be a Riemannian manifold. Let W and n be the Weyl connection and the n -connection, i.e., the α-connection with α = n , where n is the dimension of M . Suppose that the n -prior n ω exists, then
W ω = n ω .
Proof. 
Consider an arbitrary volume form f det g , where f is a positive function on M . For f det g to be parallel with respect to W ω , it is necessary and sufficient that
W f det g = f W det g + W f det g = 0 .
Componentwise, Equation (8) becomes
f W j det g + W j f det g = f W j det g + j f det g ,
since covariant derivative coincides with partial derivative for functions.
Since det g is a scalar density of weight 1 , its covariant derivative is given by
W j det g = j det g W Γ j det g ,
where Γ j is obtained by the contraction of Equation (7) over i and k :
W Γ j = W Γ j i i = L C Γ j i i + 1 2 δ j i φ i + δ i i φ j g i m g j i φ m = j ln det g + 1 2 φ j + n φ j δ j m φ m = j ln det g + n 2 φ j .
Substituting Equation (10) into Equation (9), we obtain
j f = n 2 φ j f .
Since the covariant derivative coincides with exterior derivative for functions, collect indices in Equation (11)
φ = 2 n d ln f .
Assume for now that the Weyl 1-form φ is exact, that is, φ = d Ω for some function Ω on M . Then, from Equation (29), the Weyl prior is given by
W ω = exp { n 2 Ω } det g .
By comparison of Equation (13) with Proposition 1, the theorem is proved under the assumption of the exactness of the Weyl 1-form.
Since we proved that the Weyl prior W ω is the α prior with α = n , and we required the existence of n ω , our assumption of the exactness of the Weyl 1 form φ is indeed true by Remark 4. □
Remark 8.
The minus sign in α = n is a result of the definition of α-connection. By Remark 3, the dual connection of α is α , we would have α = n here, had we defined the α-connection to be its dual connection in Definition 4. It would seem more natural to consider the dual prior of the Weyl prior.

5. Weyl Prior for Gaussian Family

In this section, we calculate the Weyl prior of the Gaussian family as an example.
Example 1
(Gaussian Family). Consider the Gaussian family
P = p ( x | μ , σ 2 ) = 1 2 π σ exp 1 2 σ 2 ( x μ ) 2   |   ( μ , σ ) M .
Choosing ( μ , σ 2 ) as a coordinate system, we have
μ l = x μ σ 2 ,
σ 2 l = ( x μ ) 2 2 σ 4 1 2 σ 2 ,
where l is the log likelihood function.
The first element of the Fisher metric tensor g in the ( μ , σ 2 ) -coordinate is given by
g μ μ = E θ μ l μ l = ( x μ ) 2 σ 4 1 2 π σ exp 1 2 σ 2 ( x μ ) 2 d x = 1 σ 2 ,
where E θ is the conditional expectation of X given Θ . The other elements of the Fisher metric tensor are
g μ σ 2 = g σ 2 μ = 0 ,
and
g σ 2 σ 2 = 1 2 σ 4 .
Hence,
det g = 1 2 σ 3 .
To calculate the Weyl 1-form, we first calculate the Amari–Chentsov tensor C ,
C μ μ μ = E θ μ l μ l μ l = ( x μ ) 3 σ 6 1 2 π σ exp 1 2 σ 2 ( x μ ) 2 d x = 0 .
Similarly,
C σ 2 μ μ = C μ σ 2 μ = C μ μ σ 2 = 1 σ 4 ,
C σ 2 σ 2 μ = C σ 2 μ σ 2 = C μ σ 2 σ 2 = 0 ,
and
C σ 2 σ 2 σ 2 = 1 σ 6 .
Hence, the Weyl 1-form is given by
φ = 1 2 C i j k g j k d θ i = 3 2 σ 2 d σ 2 .
Now, it is easy to check that φ = d ( 3 2 ln σ 2 ) is an exact form. Hence, for Gaussian family P , a Weyl prior exists and is given by
W ω = exp 2 2 3 2 ln σ 2 1 2 σ 3 = 1 2 .
Remark 9.
Based on our calculation, we find that the Weyl prior for the univariate Gaussian distribution with unknown mean and unknown variance is just the uniform prior. This shows that the uniform prior is in fact a uninformative prior. This counter-intuitive result is related to the fact that every two-dimensional manifold is conformally-flat, which can be proved using the existence of isothermal coordinates in two dimensions [10].

6. Multivariate Gaussian

The above example can be extended to the multivariate case. Consider the multivariate Gaussian distribution
f ( x | μ , Σ ) = 1 ( 2 π ) n / 2 det Σ exp { 1 2 ( x μ ) Σ 1 ( x μ ) } ,
where μ is the mean vector and Σ is the covariance matrix.
Using matrix calculus, we have
μ l = Σ 1 ( x μ )
and
Σ l = 1 2 Σ 1 + 1 2 Σ 1 ( x μ ) ( x μ ) Σ 1 = 1 2 Σ 1 + 1 2 [ Σ 1 ( x μ ) ] [ Σ 1 ( x μ ) ] = 1 2 Σ 1 + 1 2 ( Σ 1 Σ 1 ) [ ( x μ ) ( x μ ) ]
We can now compute the Fisher information matrix.
g μ μ = E θ [ μ μ ] = ( Σ 1 Σ 1 ) E θ [ ( x μ ) ( x μ ) ] = ( Σ 1 Σ 1 ) Σ = Σ 1 Σ Σ = Σ 1
where the second last line is by the action of matrix tensor product and the last line is by the definition of covariance matrix.
Similarly,
g μ Σ = g Σ μ = 0 ,
and
g Σ Σ = 1 2 Σ 1 Σ 1 .
The Amari–Chentsov tensor can be computed in the same way:
C μ μ μ = E θ [ μ l μ l μ l ] = 0 .
C Σ μ μ = C μ Σ μ = C μ μ Σ = Σ 1 Σ 1 .
C μ Σ Σ = C Σ μ Σ = C Σ Σ μ = 0 .
C Σ Σ Σ = Σ 1 Σ 1 Σ 1 .
The detail computation of Equation (26) is as follows:
C Σ Σ Σ = E θ [ Σ l Σ l Σ l ] = E θ { 1 8 Σ 1 Σ 1 Σ 1 + 1 8 Σ 1 ( Σ 1 Σ 1 ) ( x μ ) ( x μ ) Σ 1 + 1 8 Σ 1 Σ 1 ( x μ ) ( x μ ) ( Σ 1 Σ 1 ) 1 8 Σ 1 Σ 1 Σ 1 Σ 1 ( x μ ) ( x μ ) ( x μ ) ( x μ ) Σ 1 + 1 8 Σ 1 Σ 1 Σ 1 Σ 1 ( x μ ) ( x μ ) 1 8 Σ 1 Σ 1 Σ 1 ( x μ ) ( x μ ) Σ 1 Σ 1 ( x μ ) ( x μ ) 1 8 Σ 1 Σ 1 ( x μ ) ( x μ ) Σ 1 Σ 1 Σ 1 ( x μ ) ( x μ ) + 1 8 Σ 1 Σ 1 Σ 1 Σ 1 Σ 1 Σ 1 [ ( x μ ) ( x μ ) ( x μ ) ( x μ ) ( x μ ) ( x μ ) ] } = Σ 1 Σ 1 Σ 1
The above expression can be evaluated by 4th and 6th moments of multivariate Gaussian.
The Weyl prior is then given by:
φ = 1 2 C i j k g j k d θ i = 1 2 C Σ μ μ g μ μ d Σ + 1 2 C Σ Σ Σ g Σ Σ d Σ = 1 2 Σ 1 Σ 1 Σ d Σ + 1 2 Σ 1 Σ 1 Σ 1 2 Σ Σ d Σ = 3 2 Σ 1 d Σ = d 3 2 ln det Σ .
The Weyl prior is thus given by:
W ω = exp n + ( n + 1 ) n / 2 2 3 2 ln det Σ det Σ 1 det 1 2 Σ 1 Σ 1 = det Σ ( 3 n 2 + 9 n ) / 8 det Σ ( 2 n 1 ) / 2 2 n / 2 = det Σ ( n 1 ) ( 3 n + 4 ) / 8 2 n / 2 ,
where, in the first line, n + ( n + 1 ) n / 2 is the dimension of the statistical manifold for the multivariate Gaussian distribution.
Remark 10.
Our calculation of the Weyl prior of the multivariate Gaussian distribution is generally not a uniform prior. However, Equation (29) shows that, when n = 1 , that is, the univariate case, the Weyl prior is indeed the uniform prior. This is in accordance with our direct calculation for the univariate case.

7. Discussion and Conclusions

We discussed Weyl geometry and Weyl prior in this paper. We also calculated Weyl prior for the Gaussian family as an example.
The underlying principle of Jeffreys prior, α -parallel prior, and Weyl prior is the concept of invariance in statistics. Jeffreys prior is invariant under a change of the coordinate of parameters. Weyl prior and α -parallel prior, as generalizations of Jeffreys prior, automatically satisfy this invariance. Moreover, Weyl prior, as a volume form defined on a Weyl manifold, is also invariant under a gauge transformation [11]. In addition, invariant under the gauge transformation is the generalized conjugate connection [11].
One possible use of the Weyl prior is using the uniform prior for distributions with two parameters. This is because any two-dimensional manifold is conformally-flat.

Author Contributions

R.J. contributed significantly to the paper. Throughout the process of this research, both J.T. and Y.Z. provided detailed advices, discussions, and suggestions. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Discovery Grants of the Natural Sciences and Engineering Research Council of Canada (NSERC) under No. 256233 and No. 163407. The APC was funded by NSERC No. 163407.

Acknowledgments

We thank the referees for their constructive comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Amari, S. Information Geometry and Its Applications; Springer: Berlin/Heidelberg, Germany, 2016; Volume 194. [Google Scholar]
  2. Calin, O.; Udriste, C. Geometric Modeling in Probability and Statistics; Springer: Basel, Switzerland, 2014. [Google Scholar]
  3. Nielsen, F. An elementary introduction to information geometry. arXiv 2018, arXiv:1808.08271. [Google Scholar]
  4. Matsuzoe, H.; Takeuchi, J.; Amari, S. Equiaffine structures on statistical manifolds and Bayesian statistics. Differ. Geom. Its Appl. 2006, 24, 567–578. [Google Scholar] [CrossRef]
  5. Ciambelli, L.; Leigh, R.G. Weyl Connections and their Role in Holography. arXiv 2019, arXiv:1905.04339. [Google Scholar]
  6. Yavari, A.; Goriely, A. Weyl geometry and the nonlinear mechanics of distributed point defects. Proc. R. Soc. A 2012, 468, 3902–3922. [Google Scholar] [CrossRef] [Green Version]
  7. Kobayashi, S.; Nomizu, K. Foundations of Differential Geometry; Wiley: New York, NY, USA, 1963; Volume 1. [Google Scholar]
  8. Takeuchi, J.; Amari, S. α-parallel prior and its properties. IEEE Trans. Inf. Theory 2005, 51, 1011–1023. [Google Scholar] [CrossRef]
  9. Folland, G.B. Weyl manifolds. J. Differ. Geom. 1970, 4, 145–153. [Google Scholar] [CrossRef]
  10. Kulkarni, R. Conformally flat manifolds. Proc. Natl. Acad. Sci. USA 1972, 69, 2675. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Calin, O.; Matsuzoe, H.; Zhang, J. Generalizations of conjugate connections. In Trends in Differential Geometry, Complex Analysis and Mathematical Physics; World Scientific: Singapore, 2009; pp. 26–34. [Google Scholar]

Share and Cite

MDPI and ACS Style

Jiang, R.; Tavakoli, J.; Zhao, Y. Weyl Prior and Bayesian Statistics. Entropy 2020, 22, 467. https://0-doi-org.brum.beds.ac.uk/10.3390/e22040467

AMA Style

Jiang R, Tavakoli J, Zhao Y. Weyl Prior and Bayesian Statistics. Entropy. 2020; 22(4):467. https://0-doi-org.brum.beds.ac.uk/10.3390/e22040467

Chicago/Turabian Style

Jiang, Ruichao, Javad Tavakoli, and Yiqiang Zhao. 2020. "Weyl Prior and Bayesian Statistics" Entropy 22, no. 4: 467. https://0-doi-org.brum.beds.ac.uk/10.3390/e22040467

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop