Next Article in Journal
Horton Ratios Link Self-Similarity with Maximum Entropy of Eco-Geomorphological Properties in Stream Networks
Previous Article in Journal
Glassy States of Aging Social Networks
Previous Article in Special Issue
Paradigms of Cognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deriving Proper Uniform Priors for Regression Coefficients, Parts I, II, and III

by
H.R. Noel van Erp
1,*,†,
Ronald. O. Linger
1,† and
Pieter H.A.J.M. van Gelder
1,2,†
1
Safety and Security Science Group, TU Delft, Delft 2628 BX, The Netherlands
2
Safety and Security Institute, TU Delft, Delft 2628 BX, The Netherlands
*
Author to whom correspondence should be addressed.
This is an extended version of the original MaxEnt 2016 conference paper: Deriving Proper Uniform Priors forRegression Coefficients, Part II, in which the main result of the first part of this research has been integratedand to which new theoretical insights and more extensive Monte Carlo study outputs have been added.
Submission received: 24 February 2017 / Revised: 7 April 2017 / Accepted: 27 April 2017 / Published: 30 May 2017
(This article belongs to the Special Issue Selected Papers from MaxEnt 2016)

Abstract

:
It is a relatively well-known fact that in problems of Bayesian model selection, improper priors should, in general, be avoided. In this paper we will derive and discuss a collection of four proper uniform priors which lie on an ascending scale of informativeness. It will turn out that these priors lead us to evidences that are closely associated with the implied evidence of the Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC). All the discussed evidences are then used in two small Monte Carlo studies, wherein for different sample sizes and noise levels the evidences are used to select between competing C-spline regression models. Also, there is given, for illustrative purposes, an outline on how to construct simple trivariate C-spline regression models. In regards to the length of this paper, only one half of this paper consists of theory and derivations, the other half consists of graphs and outputs of the two Monte Carlo studies.

1. Introduction

Using informational consistency requirements, Jaynes [1] derived the form of maximal non-informative priors for location parameters, that is, regression coefficients, to be uniform. However, this does not tell us what the limits of these uniform probability distributions should be, that is, what particular uniform distribution to use. If we are faced with a parameter estimation problem, then these limits of the uniform prior are irrelevant, as we may scale the product of the improper uniform prior and the likelihood to one, which gives us a properly normalized posterior for our regression coefficients. However, if we are faced with a problem of model selection, then the volume covered by the uniform prior is an integral part of the evidence which is used to rank the various competing regression models.
In this paper we will give the four proper uniform priors originally derived in [2]. These priors lie on an ascending scale of informativeness. It will turn out, as we discuss the Bayesian Information Criterion (BIC), the Akaike Information Criterion (AIC), and the results of a small Monte Carlo study, that these priors lead us to evidences that are closely associated with the implied evidences of the BIC and the AIC, as these evidences fill in the space between and around the BIC and AIC on a continuum of conservativeness, in terms of the number of parameters of the chosen regression analysis models.
This paper is structured as follows. First we give an introduction to the evidence construct, that “too-often-ignored half of Bayesian inference” [3], as we give an outline on how to use these evidences in Bayesian model selection. Then we describe the normal multiple regression models for both known and unknown σ s, after which we specify the conditions under which improper priors become problematic for model selection. This specification brings us naturally to a continuum of informativeness on which priors of regression coefficients may be located. After these preliminaries, we proceed to give the derivations of the four proper uniform priors, originally derived in [2], by way of the results in [4], which are neither grossly ignorant nor grossly knowledgeable. Having checked the coverage of these priors, we address the question what constitutes data and what constitutes prior information. We then discuss the evidences that are associated with our proper priors, as we connect these evidences to the BIC and AIC reference procedures and give the posterior probability distribution of the unknown regression coefficients and the consequent predictive probability distribution that is associated with these proper priors. In Appendix A we report on two small Monte Carlo studies with the C-spline regression models, in order to give the reader a sense for all the discussed evidences. Also, a collection of three simple trivariate C-spline regression models will be discussed in Appendix B, in order to provide the reader with a low-level, hands-on introduction into C-splines [5].

2. The Evidence and Bayesian Model Selection

Bayesian probability theory has four fundamental constructs, namely, the prior, the likelihood, the posterior, and the evidence. These constructs are related in the following way:
posterior = prior × likelihood evidence .
Most of us will be familiar with the prior, likelihood, and posterior. However, the evidence concept is less universally known, as most people come to Bayes by way of the more compact relationship [6]:
posterior prior × likelihood ,
which does not make any explicit mention of the evidence construct. In what follows, we will employ the correct, though notationally more cumbersome, relation (1), and forgo of the more compact, but incomplete, Bayesian shorthand (2). This is done so the reader may develop some feeling for the evidence construct, and how this construct relates to the other three Bayesian constructs (i.e., the prior, likelihood, and posterior.)
Let p θ I be the prior of some parameter θ , where I is the prior information model of the unknown θ . Let p D θ , M be the probability of the data D conditional on the value of parameter θ and the likelihood model M which is used; the probability of the data is also known as the likelihood of the parameter θ . Let p θ D , M , I be the posterior distribution of the parameter θ , conditional on the data D, the likelihood model M, and the prior information model I. Then
p θ D , M , I = p θ I p D θ , M p θ I p D θ , M d θ = p θ I p D θ , M p D M , I ,
where
p D M , I = p θ , D M , I d θ = p θ I p D θ , M d θ
is the evidence, that is, the marginalized likelihood of both the likelihood model M and the prior information model I.
Now, if we have a set of likelihood models M i (e.g., a collection of regression models) we wish to choose from, and just the one prior information model I (e.g., an ignorance model), then we may do so by computing the evidence values p D M i , I .
Let p M i and p M i D , I be, respectively, the prior and posterior probability of the likelihood model M i . Then the posterior probability distribution of these likelihood models is given as
p M i D , I = p M i p D M i , I i p M i p D M i , I .
For p M i = p M j for i j , the posterior probabilities (5) will reduce to the normalized evidence values:
p M i D , I = p D M i , I i p D M i , I .
So, if we assign equal prior probabilities to our likelihood models M i , then we may rank these models by way of their respective evidence values, where the model with the highest evidence value is the model which has the highest posterior probability of all the models that were taken into consideration [7,8].

3. The Normal Multiple Regression Model (Known Sigma)

Let the model M for the response vector y be
y = X β + e ,
where X is some N × m predictor matrix, β is the m × 1 vector with regression coefficients and e is the N × 1 error vector to which we assign a multivariate normal distribution, that is,
p e σ = 1 2 π σ 2 N / 2 exp e T e 2 σ 2 ,
or, equivalently, e M N 0 , σ 2 I , where I is the N × N identity matrix and σ is some known standard deviation. By way of a simple Jacobian transformation from e to y in (8), we then may obtain the likelihood function of the β s:
p y σ , X , β , M = 1 2 π σ 2 N / 2 exp 1 2 σ 2 y X β T y X β .
If we assign a uniform prior to the unknown regression coefficients β [6]
p β I = C , β D β ,
where C is a yet unspecified normalizing constant, I is the prior information regarding the unknown β s which we have at our disposal, and D β is the prior domain of the β s, then the probability distribution of both β and y is derived as
p β , y σ , X , M , I = p β I p y σ , X , β , M = C 2 π σ 2 N / 2 exp 1 2 σ 2 y X β T y X β .
By integrating the unknown β s out of (11) over the prior domain D β , we obtain the evidence of model M:
p y σ , X , M , I = D β p β , y σ , X , M , I d β .
The evidence (12) is used both to normalize (11) into a posterior distribution, (1), as well as to choose between competing regression models, (5) and (6). In order to evaluate the evidence (12), we rewrite (11) as [6]
p β , y σ , X , M , I = C 2 π σ 2 N / 2 exp 1 2 σ 2 y y ^ T y y ^ + β β ^ T X T X β β ^ ,
where
β ^ = X T X 1 X T y and y ^ = X β ^ .
We then factor (13) as
p β , y σ , X , M , I = C X T X 1 / 2 2 π σ 2 N m / 2 exp 1 2 σ 2 y y ^ T y y ^ × X T X 1 / 2 2 π σ 2 m / 2 exp 1 2 σ 2 β β ^ T X T X β β ^ .
The last term in (15) is in the multivariate normal form [6], so it should evaluate to 1 when integrated over the β s. Stated differently, for a prior domain D β which is centered correctly and ‘wide enough’, we have, by way of the factorization (15), that the evidence (12) tends to the equality
p y σ , X , M , I = C X T X 1 / 2 2 π σ 2 N m / 2 exp 1 2 σ 2 y y ^ T y y ^ .
By way of (13), (16) and the product rule (1), we obtain the posterior of the unknown β s, [6]:
p β σ , y , X , M , I = p β , y σ , X , M , I p y σ , X , M , I = X T X 1 / 2 2 π σ 2 m / 2 exp 1 2 σ 2 β β ^ T X T X β β ^ .
This posterior of the unknown β s has a mean of β ^ = X T X 1 X T y , (14), and a covariance matrix of X T X / σ 2 1 .
In the parameter estimation problem, that is, the derivation of the posterior distribution (17), any reference to the normalizing constant C of the uniform prior (10) has fallen away. In contrast, in the model selection problem, that is, the derivation of the evidence (16), C is still present.
In closing, note that different N × m i predictor matrices X i correspond with different likelihood models M i in (5) and (6). It is to be understood that in what follows we will construct proper uniform priors for a generic likelihood model M which has a generic N × m predictor matrix X, as we drop the sub-index j in both X and M in order to remove some of the notational clutter in our equations.

4. The Normal Multiple Regression Model (Unknown Sigma)

In case of unknown σ , we may assign the Jeffreys prior for scaling parameters [6]:
p σ I = A σ ,
where A is some normalizing constant, to the unknown σ in (11), in order to lose this unknown nuisance parameter by way of integration:
p β , y X , M , I = p σ , β , y X , M , I d σ = p σ , β , y X , M , I d σ ,
where (11) and (18),
p σ I p β , y σ , X , M , I = A C 2 π N / 2 σ N + 1 exp 1 2 σ 2 y X β T y X β .
We may conveniently factorize (20) as,
p σ , β , y X , M , I = A Γ N / 2 2 π N / 2 C y X β T y X β N / 2 × 2 Γ N / 2 y X β T y X β 2 N / 2 1 σ N + 1 exp 1 2 σ 2 y X β T y X β .
The last term in (21) evaluates to 1 when integrated over σ , as it has the form of an inverted gamma distribution [6], from which it follows that
p β , y X , M , I = A Γ N / 2 2 π N / 2 C y X β T y X β N / 2 ,
By integrating the unknown β s out of (22) over the prior domain D β , we obtain the evidence of model M:
p y X , M , I = D β p β , y X , M , I d β .
In order to evaluate the evidence (23), we rewrite (22) as [6]
p β , y X , M , I = A Γ N / 2 2 π N / 2 C y y ^ T y y ^ + β β ^ T X T X β β ^ N / 2 .
We then factor (24) as
p β , y X , M , I = 1 X T X 1 / 2 C y y ^ N m A Γ N m / 2 2 π N m / 2 × Γ N / 2 Γ N m / 2 X T X 1 / 2 π m / 2 y y ^ N m y y ^ 2 + β β ^ T X T X β β ^ N / 2 ,
where
y y ^ 2 = y y ^ T y y ^ ,
and where the last term in (25) is in the multivariate Student-t form [6]. So, for a prior domain D β which is centered correctly and “wide enough”, we have, by way of the factorization (25), that the evidence (23) tends to the equality
p y X , M , I = 1 X T X 1 / 2 C y y ^ N m A Γ N m / 2 2 π N m / 2 .
If we divide (24) by the evidence (27), we obtain, by way of the product rule (1), the posterior of the unknown β s, [6]:
p β y , X , M , I = v v / 2 Γ N / 2 1 s 2 X T X 1 / 2 Γ N m / 2 π m / 2 v + β β ^ T 1 s 2 X T X β β ^ N / 2 .
where
s 2 = 1 v y y ^ 2 and v = N m .
This posterior of the unknown β s has a mean of β ^ = X T X 1 X T y , (14), and a covariance matrix of X T X / s 2 1 , (29).
Again, in the parameter estimation problem, that is, the derivation of the posterior distribution (28), any reference to the normalizing constant C of the uniform prior (10) has, seemingly, fallen away. In contrast, in the model selection problem, that is, the derivation of the evidence (27), C is still present.

5. The Problem with Improper Priors

In problems of model comparison between competing (regression) models one generally must take care not to use improper priors, be they uniform or not. Since improper priors may introduce inverse infinities in the evidence factors which do not cancel out if one proceeds to compute the posterior probabilities of the respective models [9]. We will demonstrate this fact and its consequences with a simple example in which we assign improper uniform priors to the respective regression coefficients.
Suppose that we want to compare two regression models:
M 1 : y = X 1 β 1 + e 1 and M 2 : y = X 2 β 2 + e 2 ,
where X 1 is an N × m 1 predictor matrix and X 2 an N × m 2 , with m 2 > m 1 , and where both e 1 and e 1 are multivariate normally distributed M N 0 , σ 2 I , where I is the N × N identity matrix and σ is some known standard deviation, (8). Let the uniform prior of a regression coefficient be given as
p β j I = 1 2 B , for B β j B ,
for j = 1 , , m . If B , then (31) will tend to the improper Jeffreys prior for location parameters [6]:
p β j I d β j d β j , for β j ,
where “∝” is the proportionality sign that absorbs the normalizing constant 1 / 2 . Let the uniform prior of m regression coefficients be given as, (31),
p β I = j = 1 m p β j I = 1 2 B m , for β D β ,
where D β is an m-dimensional cube which is centered at the origin. Substituting (33) into (10), we find the evidences:
p y σ , X i , M i , I = A 2 B m i L i ,
for i = 1 , 2 , where (27)
L i = 1 X T X 1 / 2 1 y y ^ N m Γ N m / 2 2 π N m / 2 ,
and m i is the number of columns in the N × m i predictor matrix X i , and y ^ i is the regression model estimate (14)
β ^ i = X i T X i 1 X i T y and y ^ i = X i β ^ i .
If we assign equal prior probabilities to M 1 and M 2 , then we find posterior model probabilities, (6) and (34):
p M 1 σ , X 1 , y , I = L 1 L 1 + 1 2 B m 2 m 1 L 2 and p M 2 σ , X 2 , y , I = 1 2 B m 2 m 1 L 2 L 1 + 1 2 B m 2 m 1 L 2 ,
as m 2 > m 1 , (30). So, if in (31) we let B , then the posterior model probabilities (37) will tend to
p M 1 σ , X 1 , y , I L 1 L 1 = 1 and p M 2 σ , X 2 , y , I 0 L 1 + 0 = 0 .
It can be seen that the assigning of an improper Jeffreys’ prior to location parameters (32) will make that the regression model with the least number of regression coefficients, or, equivalently, number of predictors, is automatically chosen over any model which has more regression coefficients.
Improper priors can introduce inverse infinities in the evidence factors, as 2 B m 2 + m 1 in (37), which do not cancel out if one proceeds to compute the posterior probabilities of the respective models. However, if the parameter in question is shared by all the competing models, like, for example, the parameter σ in (1), then the inverse infinities will cancel out, like A cancels out in (37). This is why care must be taken to let the prior for the regression coefficients β , (10), be proper, while, at the same time, as both a mathematical and a modeling convenience, one may let the prior of σ , (18), be improper.

6. A Continuum of Informativeness

The Jeffreys prior for location parameters (32),
p β j I d β j d β j , for β j ,
represents a limit of gross ignorance as we are even ignorant about the possible limits of the parameters β j . This gross ignorance leads to evidences that are extremely conservative in that they will always choose the regression model with least number of regression coefficients, (38).
An opposite limit of gross knowledgeableness is the empirical “sure thing” prior [3]:
p β β ^ , sure thing = δ β β ^ ,
where δ is the multivariate Dirac delta function for which we have
δ x c f x d x = f c .
The evidence that corresponds with the “sure thing” prior may be derived as, (9), (14), (18), (26), (39), and (40):
p y X , β ^ , sure thing = 0 p σ , β , y X , β ^ , sure thing d β d σ = 0 p σ I p β β ^ , sure thing p y σ , X , β , M d β d σ 1 y y ^ N ,
where the “∝” symbol is used to absorb the factor A Γ N / 2 / 2 π N / 2 .
Since an increase in the number of predictors m tends to decrease the length of the error vector y y ^ , with a limit length of zero as the number of predictors m tends to the sample size N:
y y ^ 0 , as m N ,
we have that, (41) and (42),
p y X , β ^ , sure thing , as m N .
So, the gross knowledgeableness of the “sure thing” prior leads to evidences that are extremely liberal in that they will tend to choose regression models which have the largest number of regression coefficients.
In what follows we will derive a suite of priors on the continuum of informativeness that are more informed than the improper Jeffreys prior for location parameters (32) and less knowledgeable than the “sure thing” prior (39). It will be shown that the corresponding evidences, as a consequence, will be less conservative than the evidence (34) in its limit of B , and less liberal than the maximum likelihood evidence (41).

7. A Proper Ignorance Prior

We now proceed to construct a more informed, proper (i.e., non-zero) inverse normalizing “constant” C for the prior (10). By way of (7) and (14), we have for a N × m predictor matrix X of rank m that
β = X T X 1 X T y e = X T X 1 X T z ,
where
z = y e ,
and e M N 0 , σ 2 I , (8). Closer inspection of (44) shows us that the parameter space of β is constrained by the difference vector z .
For the special case where the predictor matrix X is an N × 1 vector x we have that
β = x T z x T x = x z x 2 cos ϕ ,
where ϕ is the angle between the predictor vector x and the difference vector z . Given that 1 cos ϕ 1 , we may by way of (46) put definite bounds on β :
max z x β max z x .
So, if we assign a uniform distribution to the regression coefficient β , then this uniform distribution is defined on a line-piece of length 2 max z / x . It follows that for the case of just the one regression coefficient, the prior (10) is
p β x , max z , I = x 2 max z
where (48) is understood to be defined on the interval (47) which is centered at the origin.
In order to generalize (48) to the general multivariate case, we first must generalize (47) to its multivariate case. This may be done as follows [4]. Let X be a N × m predictor matrix consisting of m independent vectors x j . The vectors x j , because of their independence, then will span a m-dimensional subspace S m . It follows, trivially, that we may decompose z into a part that lies inside of this subspace and a part that lies outside, say,
z = z ^ + n ,
where z ^ is the part of z that is projected on S m and n is the part of z that is orthogonal to S m . The orthogonality of n to S m implies that
x j T n = 0 ,
for j = 1 , , m , whereas the fact that z ^ is a projection on S m implies that
z ^ = j = 1 m x j β j ,
where, by construction, (49), (50), and the assumed independence of the x j ,
β j = x j T z x j T x j = x j T z ^ + n x j T x j = x j T z ^ x j T x j = z ^ x j cos ϕ j .
Now, because of the independence of the x j we have that
x i T x j = 0 ,
for i j . So, if we take the norm of (51) we find
z ^ 2 = j = 1 m x j β j 2 = z ^ 2 j = 1 m cos 2 ϕ j .
It follows from (54) that the angles ϕ j in (52) must obey the constraint
j = 1 m cos 2 ϕ j = 1 .
Combining (52) and (55), we see that the regression coefficients β j must lie on the surface of an m-variate ellipsoid centered at the origin and with axes which have respective lengths of
r j = z ^ x j .
Since
z ^ z max z ,
the axes (56) may be maximized through our prior knowledge of the maximal length of of the outcome variable z :
max r j = max z x j .
It follows that the regression coefficients β j are constrained to lie in the m-variate ellipsoid that is centered at the origin and has axes of length (58). If we substitute (58) into the identity for the volume of an m-variate ellipsoid
V = π m / 2 Γ ( m + 2 ) / 2 j = 1 m r j ,
we find that the parameter space of β has a maximal prior volume of
V = π m / 2 Γ ( m + 2 ) / 2 max z m j = 1 m x j .
Now, let X x 1 x m . Then for orthogonal predictors x j the product of the norms is equivalent to the square root of the determinant of X T X , that is,
j = 1 m x j = X T X 1 / 2 ,
which is also the volume of the parallelepiped defined by the vectors x j . If the predictor matrix X is non-orthogonal, then we may use a Gram–Schmidt process to transform X to the orthogonal matrix X ˜ , say, where, because of invariance of the volume of a parallelepiped under orthogonalization,
X ˜ T X ˜ 1 / 2 = X T X 1 / 2 .
So, by way of (60), (61), and (62), it follows that (47) generalizes to the statement that for general (i.e., non-orthogonal) N × m predictor matrices X the regression coefficient vectors β are constrained to lie in an m-dimensional ellipsoid which is centered on the origin and has a volume of
V = π m / 2 Γ m + 2 / 2 max z m X T X 1 / 2 .
And the inverse of this volume gives us the corresponding multivariate generalization of the uniform prior (48):
p β X , max z , I = Γ m + 2 / 2 π m / 2 X T X 1 / 2 max z m ,
where (64) is understood to be defined on some ellipsoid having volume (63) and a centroid located at the origin.
Because of the triangle inequality [10], we have that
y e y + e .
From (45) and (65), it follows trivially that
max z max y + max e .
As to the first term in the right-hand of (66), let max y be a prior assessment of the maximum absolute value of the dependent variable y. Then we may assign the following simple bound on the length of the vector y :
max y = N max y .
As to the second term in the right-hand of (66), the error vector e has known multivariate probability distribution (8). If we rewrite the elements in e as a function of its norm e and the angles α 1 , , α N 1 [6]
e 1 = e cos α 1 cos α 2 cos α 3 cos α N q cos α N q + 1 cos α N 3 cos α N 2 cos α N 1 e 2 = e cos α 1 cos α 2 cos α 3 cos α N q cos α N q + 1 cos α N 3 cos α N 2 sin α N 1 e 3 = e cos α 1 cos α 2 cos α 3 cos α N q cos α N q + 1 cos α N 3 sin α N 2 e q = e cos α 1 cos α 2 cos α 3 cos α N q sin α N q + 1 e N 2 = e cos α 1 cos α 2 sin α 3 e N 1 = e cos α 1 sin α 2 e N = e sin α 1 ,
where 0 < e < , π / 2 < α i < π / 2 , for i = 1 , 2 , , N 2 , and 0 < α N 1 < 2 π , and which has as its Jacobian
J = e N 1 cos N 2 α 1 cos N 3 α 2 cos α N 2 ,
then it may be checked that the polar transformation (68) gives, as it should,
e T e = e 1 2 + e 2 2 + + e N 2 = e 2 .
So, by way of (69) and (70), we may map (8) from a Cartesian to a polar coordinate system. This gives the transformed probability distribution
p e , α 1 , α 2 , , α N 1 σ = e N 1 2 π σ 2 N / 2 exp e 2 2 σ 2 cos N 2 α 1 cos N 3 α 2 cos α N 2 .
Using the identities
π / 2 π / 2 cos N i 1 α i d α i = Γ N i / 2 Γ N i 1 / 2 + 1 ,
for i = 1 , , N 2 , and
0 2 π d α N 1 = 2 π ,
we may integrate (71) over the N 1 nuisance variables α i and, so, obtain the univariate probability distribution of the norm e ,
p e σ , I = 2 e N 1 2 σ 2 N / 2 Γ N / 2 exp e 2 2 σ 2 ,
which has a mean
E e σ , I = 2 Γ N + 1 / 2 Γ N / 2 σ N 1 σ
and a standard deviation
std e σ , I = N 2 Γ N + 1 / 2 Γ N / 2 2 σ σ 2 .
By way of (75) and (76), we may set a probabilistic bound on max e in (66), that is, we may let max e be the k-sigma upper bound
max e = U B e = E e σ , I + k std e σ , I N 1 + k 2 σ .
In what follows, we will assume sample sizes N > 1 and, consequently, treat the right-hand approximation in (77) as an equality.
By way of (64), (66), (67), and (77), we then obtain the proper ignorance prior [2]
p β X , max y , k , σ , I = Γ m + 2 / 2 X T X 1 / 2 π m / 2 N max y + N 1 + k 2 σ m ,
where, as in (64), it is understood that (78) is defined on some ellipsoid which has the origin as its centroid. The proper ignorance prior simplifies to
p β X , max y , k , σ , I σ max y + σ m 1 N m / 2 Γ m + 2 2 X T X 1 / 2 π σ 2 m / 2 ,
for k < < 2 N , where k is some sigma-level for the upper bound (77).

8. A More Informed Manor’s Prior

If apart from the maximum absolute value max y we also have prior knowledge about the minimum and maximum values of y, then we may rewrite (7) as
min y + max y 2 1 + y min y + max y 2 1 = X β + e ,
where 1 is a vector of ones and min y + max y / 2 is the center of the interval min y , max y . Let
c = x T 1 2 min y + max y 1 x T x and w = y min y + max y 2 1 e .
Then (47) becomes
c max w x β c + max w x .
It follows that for the case of just one regression coefficient, the prior (10) is given as
p β x , max w , I = x 2 max w ,
where(83) is understood to be defined on the interval (82) which is centered at c, (81). Let
c = min y + max y 2 X T X 1 X T 1 .
Then, for the case where X is a N × m predictor matrix, (82) generalizes to the statement that β is constrained to lie in an m-dimensional ellipsoid which has a centroid c and a volume [4]
V = π m / 2 Γ m + 2 / 2 max w m X T X 1 / 2 .
The inverse of this volume gives us the corresponding multivariate generalization of the uniform prior (83):
p β X , max w , I = Γ m + 2 / 2 π m / 2 X T X 1 / 2 max w m .
Since min y + max y / 2 is the center of the interval min y , max y which has a range of max y min y , we have that
max y min y + max y 2 1 = N max y min y 2 .
So it follows, (45), (65), (66), (77), (81), and (87), that
max w = N max y min y 2 + N 1 + k 2 σ .
Substituting (88) into (86), we obtain the more informed Manor’s prior [2]
p β X , min y , max y , k , σ , I = Γ m + 2 / 2 X T X 1 / 2 π m / 2 N max y min y 2 + N 1 + k 2 σ m ,
where it is understood that (89) is defined on some ellipsoid which has as its centroid c , (84). Manor’s prior simplifies to
p β X , min y , max y , k , σ , I σ max y min y 2 + σ m 1 N m / 2 Γ m + 2 2 X T X 1 / 2 π σ 2 m / 2 ,
for k < < 2 N , where k is some sigma-level for the upper bound (77).

9. An Even More Informed Neeley’s Prior

Alternatively, if we have prior knowledge about the mean ν and the variance φ 2 of the dependent variable y, then, based on that information alone, by way of a maximum entropy argument [11], which also lets us assign (8) to the error vector e in (7), we may assign a normal distribution as an informative prior to this dependent variable; that is,
y M N ν 1 , φ 2 I .
Let
u = y ν 1 .
By way of (8), (91), (92), and the fact that the mean and variance of a sum of stochastics are the sum of, respectively, the means and variances of those stochastics [12], we then have
u M N 0 , φ 2 + σ 2 I .
Since e and u both have a zero mean vector and a diagonal covariance matrix, (8) and (93), it follows from (77) that
max u = U B u N 1 + k 2 φ 2 + σ 2 .
In what follows, we will assume sample sizes N > 1 and, consequently, treat the right-hand approximation in (94) as an equality. Substituting (94) into (86), we obtain the even more informed Neeley’s prior [2]
p β X , φ , k , σ , I = Γ m + 2 / 2 X T X 1 / 2 π m / 2 N 1 + k 2 φ 2 + σ 2 m ,
where it is understood, as in (89), that (95) is defined on some ellipsoid, which, however, now has a centroid located at
c = X T X 1 X T ν 1 .
Neeley’s prior simplifies to
p β X , φ , k , σ , I σ φ 2 + σ 2 m 1 N m / 2 Γ m + 2 2 X T X 1 / 2 π σ 2 m / 2 ,
for k < < 2 N , where k is some sigma-level for the upper bound (77).

10. The Parsimonious Constantineau’s Prior

By way of (7) and (14), we may, in principle, come to the inequality
β = X T X 1 X T y e = β ^ X T X 1 X T e ,
where e M N 0 , σ 2 I , (8). So for the special case of an N × 1 predictor vector x , we have that
β = β ^ + x T e x T x = β ^ + cos ϕ x e x 2 ,
where ϕ is the angle between the predictor vector x and the error vector e . Given that 1 cos ϕ 1 , we may by way of (77) and (99) put the following bounds on β :
β ^ N 1 + k 2 σ x β β ^ + N 1 + k 2 σ x .
For the case where X is a N × m predictor matrix, (100) generalizes to the statement that β is constrained to lie in an m-dimensional ellipsoid which is centered on β ^ and has a volume of
V = π m / 2 Γ m + 2 / 2 N 1 + k 2 σ m X T X 1 / 2 .
The inverse of this volume gives us the parsimonious Constantineau’s prior [2]
p β X , k , σ , I , S = Γ m + 2 / 2 X T X 1 / 2 π m / 2 N 1 + k 2 σ m ,
where S is the stipulation
S centroid prior located at β ^ .
This prior simplifies to
p β X , k , σ , I , S 1 N m / 2 Γ m + 2 2 X T X 1 / 2 π σ 2 m / 2 ,
for k < < 2 N , where k is some sigma-level for the upper bound (77).
Constantineau’s prior (102) is the most parsimonious of the proposed priors, as it has the smallest k-sigma parameter space volume V. But it will materialize later on that there is an even more parsimonious “stipulation prior” already out there, be it only by implication.

11. The Coverage of the Proposed Priors

In order to demonstrate that (16) tends to hold as an equality for the proposed proper uniform priors, we only need to show that (16) does so for Constantineau’s prior (102), as this prior is the most parsimonious of the proposed priors. That is, we will need to show that the second right-hand term of (15), for all intents and purposes, evaluates to 1 when integrated over D β , the domain implied by (101):
D β X T X 1 / 2 2 π σ 2 m / 2 exp 1 2 σ 2 β β ^ T X T X β β ^ d β 1 .
Let X K = X ˜ be a transformation of the predictor matrix X such that the columns in X ˜ are orthogonal, or, equivalently, X ˜ T X ˜ is diagonal. Then (105) may be evaluated by way of the transformation
β = K γ γ ^ + β ^ ,
which has a Jacobian of K . Because of the fact that [6]
K X T X 1 / 2 = K T X T X K 1 / 2 = X ˜ T X ˜ 1 / 2
and the orthogonality of X ˜ together with (61), we may rewrite the integrand in (105) for the transformation (106) as
K X T X 1 / 2 2 π σ 2 m / 2 exp 1 2 σ 2 γ γ ^ T X ˜ T X ˜ γ γ ^ = j = 1 m x ˜ j 2 π σ 2 exp x ˜ j 2 2 σ 2 γ j γ ^ j 2 .
Also, if we go from X to the orthogonal X ˜ in (108), then the prior (102) undergoes (by construction) a corresponding transformation, (61),
p β X ˜ , σ , I , S ˜ = Γ m + 2 / 2 X ˜ T X ˜ 1 / 2 π m / 2 N 1 + k 2 σ m = Γ m + 2 / 2 π m / 2 j = 1 m x ˜ j N 1 + k 2 σ ,
where k is the sigma-level of the upper bound of the length of the error vector, (77), and S ˜ is the transformed stipulation
S ˜ centroid prior located at γ ^ .
Because of the orthogonality of the x ˜ j , the fact that (109) is the inverse of the volume of the prior accessible parameter space, and the fact that this volume is in the form of an ellipsoid with axes of length (59)
r j = N 1 + k 2 σ x ˜ j ,
it follows that the rotated parameter space (106) is defined by the ellipsoid
γ 1 γ ^ 1 2 σ 2 / x ˜ 1 2 + γ 2 γ ^ 2 2 σ 2 / x ˜ 2 2 + + γ m γ ^ m 2 σ 2 / x ˜ m 2 = N 1 + k 2 2 .
The transformation
γ j = η j σ x ˜ j + γ ^ j ,
for j = 1 , 2 , , m , has a Jacobian of
J = j = 1 m σ x ˜ j .
By way of (106), (108), (113), and (114), we find for the integral in (105) that
D β X T X 1 / 2 2 π σ 2 m / 2 exp 1 2 σ 2 β β ^ T X T X β β ^ d β = D η 1 2 π m / 2 exp η T η 2 d η ,
where the parameter space D η is defined as a sphere which has a radius N 1 + k 2 and is centered at the origin, (112) and (113):
η 1 2 + η 2 2 + + η m 2 = N 1 + k 2 2 .
By way of the polar transformation (68) and steps (69) through (73), we find that the right-hand side of (115) evaluates as
0 N 1 + k 2 2 η m 1 2 m / 2 Γ m / 2 exp η 2 2 d η = 1 Γ m 2 , N 1 + k 2 2 2 Γ m 2 ,
where Γ a , b and Γ a are the incomplete and the ordinary (Euler) gamma functions, respectively:
Γ a , b = b t a 1 exp t d t and Γ a = Γ a , 0 .
Substituting (117) into (115), we find that requirement (105) translates to the equivalent requirement
1 Γ m 2 , N 1 + k 2 2 2 Γ m 2 1 .
And it may be checked (numerically) that this requirement holds for k = 6 , (77), even in the (extreme) limit case where the number of predictors m tends to the sample size N. Moreover, it may be checked, by setting k = 0 , that it is the k / 2 term in Constantineau’s prior (102) which ensures that requirement (119) holds for the limit case where m tends to N.

12. What is the Data?

Before we go on, we now will discuss two questions that need addressing. The first question is whether or not the predictor matrix X is part of the data. The second question is whether or not the stipulation (103) makes the proposed parsimonious Constantineau’s prior empirical or not.
In answer to the first question, in Bayesian regression analysis the predictor variables in X are assumed to be [6]: “fixed non-stochastic variables,” or, alternatively, “random variables distributed independently of the e , with a pdf not [italics by Zellner himself] involving the parameters β j and σ .” Stated differently, the likelihood p y σ , X , β , M is a probability of the response vector y , and not of the predictor matrix X. Following this line of reasoning, the predictor matrix X should not be considered to be part of the data. Rather, X is part of the prior problem structure, in that for a given predictor matrix X a corresponding response vector y is obtained in the data gathering phase. So, where in [4] (i.e., Part I of this research) it was proposed that in order to construct a parsimonious prior for regression coefficients one needed to assign a minimal value to the determinant of X T X based on the prior information at hand, a non-trivial task. It was argued in [2] (i.e., Part II) that the predictor matrix X is not a part of the data and, consequently, may be used for the construction of proper priors.
In answer to the second question, if “we adopt the posture of the scrupulous fair judge who insists that fairness in comparing models requires that each is delivering the best performance of which it is capable, by giving each the best possible prior probability for its parameters” [11], then we may defend the use of the cheap and cheerful prior (102), with its stipulation (103), as being the prior that represents some limit of parsimony, which is not influenced by our state of ignorance regarding the dependent variable y. However, if we “consider it necessary to be cruel realists and judge each model taking into account the prior information we actually have pertaining to it, that is, we penalize a model if we do not have the best possible prior information about the dependent variable y, although that is not really a fault of the model itself” [11], then we will be forced to revert to the more solemn priors (78), (89), and (95).

13. The Corresponding Evidences

By way of (10), we may substitute (78) into (16), and so obtain the evidence value of the likelihood model M and prior information I, conditional on σ :
p y k , σ , X , max y , M , I 2 m / 2 Γ m + 2 / 2 N max y σ + 1 + k 2 N m 1 2 π σ 2 N / 2 exp 1 2 σ 2 y y ^ T y y ^ .
If σ is unknown, then, as both a mathematical and a modeling convenience (see discussion of Section 5), we may assign the improper Jeffreys prior for scaling parameters (18):
p σ I = A σ ,
where A is some normalizing constant, to the unknown σ in the evidence (120), in order to integrate with respect to this unknown parameter:
p y k , X , M , I = p σ , y k , X , M , I d σ = p σ I p y k , σ , X , M , I d σ ,
where, (120) and (121),
p σ , y k , X , max y , M , I 2 m / 2 Γ m + 2 / 2 N max y σ + 1 + k 2 N m A 2 π N / 2 σ N + 1 exp 1 2 σ 2 y y ^ T y y ^ .
We may conveniently factorize (123) as,
p σ , y k , X , max y , M , I 2 m / 2 Γ m + 2 / 2 N max y σ + 1 + k 2 N m 1 y y ^ N A Γ N / 2 2 π N / 2 × 2 Γ N / 2 y y ^ 2 2 N / 2 1 σ N + 1 exp 1 2 σ 2 y y ^ T y y ^ .
The last term in (124) evaluates to 1 when integrated over σ , as it has the form of an inverted gamma distribution [6]. Also, the last term in (124) will tend to a Dirac delta distribution as N , [9]; that is,
2 Γ N / 2 y y ^ 2 2 N / 2 1 σ N + 1 exp 1 2 σ 2 y y ^ T y y ^ δ σ y y ^ N .
So, by way of (125), the property (40), and the factorization (124), we have that the evidence (122) evaluates as
p y k , X , max y , M , I 1 + k 2 N N max y y y ^ + 1 + k 2 N m 2 2 N + k m Γ m + 2 2 1 y y ^ N ,
where k is the upper-bound sigma level of the maximum length of the error vector max e , (77). If we assume that k < < 2 N , then the evidence (126) simplifies to
p y k , X , max y , M , I N max y y y ^ + 1 m 2 N m / 2 Γ m + 2 2 1 y y ^ N .
Likewise, if we substitute (89), (95), and (102) into (16), integrate over σ , and assume k < < 2 N , we obtain the respective approximate evidence values:
p y k , X , min y , max y , M , I N max y min y 2 y y ^ + 1 m 2 N m / 2 Γ m + 2 2 1 y y ^ N ,
and
p y k , X , φ , M , I N φ 2 y y ^ 2 + 1 m / 2 2 N m / 2 Γ m + 2 2 1 y y ^ N ,
and
p y k , X , M , I , S 2 N m Γ m + 2 2 1 y y ^ N ,
where the “∝” symbol is used to absorb the common factors A Γ N / 2 / 2 π N / 2 , which are shared by all the competing regression models and which cancel out as the posterior probabilities of these models are computed.
The above evidences can be deconstructed into a goodness of fit factor, which is also the implied evidence (41) of the “sure thing” prior (39):
Goodness of Fit = 1 y y ^ N ,
and an Occam factor which penalizes the shrinkage of the posterior accessible parameter space of β relative to the prior accessible space. Now, all Occam factors are a monotonic decreasing function in the number of predictors m. But only the Occam factors of the “cruelly realistic” evidences (127)–(129) have terms which are dependent upon our state of prior knowledge regarding the dependent variable y.
If in the construction of the priors (79), (90), or (97) we make prior value assignments that grossly overestimate the maximum absolute value, range, and standard deviation, respectively, of the dependent variable y, then the Occam factors in the corresponding evidences, (127)–(129), stand ready to punish us for making consequent prior parameter space assignments that are too voluminous. Whereas, if we make prior value assignments that grossly underestimate these aspects of the dependent variable y, then the Occam factors of the cruelly realistic evidences (127)–(129) will tend to the Occam factor of the “scrupulously fair” evidence (130), as the cruelly realistic evidences, as a consequence, tend to the scrupulously fair evidence.
For prior value assignments that approximate the underlying ‘true’ values of the maximum absolute value, range, and standard deviation, respectively, of the dependent variable y, the Occam factors of the evidences (127)–(129) tend to the inequality:
Occam Factor 2 m / 2 2 N m Γ m + 2 2 ,
seeing that for accurate prior value assignments we have that, (125),
N max y y y ^ N max y min y / 2 y y ^ > N φ y y ^ N σ y y ^ 1 ,
where φ is the prior standard deviation of y which is estimated by the root mean square error of a simple intercept-only regression model and σ is the prior model error which is estimated by the root mean square error of the full regression model.
Note that equality will hold in (132) only for the evidence (129) of an intercept-only regression model in combination with an accurate prior value assignment for φ , because only then do we have that φ is approximated by y y ^ / N .

14. Connecting the Derived Evidences with the BIC and the AIC

In order to get our bearings for the proposed priors and their consequent evidences, we will connect the Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC) to these evidences.
The BIC is given as [13]
BIC = m log N + 2 N log y y ^ ,
where, given any two estimated models, the model with the lower value of BIC is the one to be preferred. The BIC has an implied evidence of
p y X , BIC , S exp 1 2 BIC = 1 N m / 2 1 y y ^ N ,
where S is the stipulation (103)
S centroid prior located at β ^ .
and where we assume that the factor A Γ N / 2 / 2 π N / 2 has been absorbed in the proportionality sign. For k < < 2 N , the BIC evidence (135) differs from the Constantineau’s evidence (130) by an approximate factor
p y k , X , M , I , S p y X , BIC , S 2 m / 2 Γ m + 2 2 .
Let c BIC be the factor by which the lengths of the axes of the parameter space of the implied BIC prior differs from the lengths of the axes of the parameter space of Constantineau’s prior (102). Then we have that
p y X , BIC , S = 1 c BIC m p y k , X , M , I , S ,
as the lengths of the prior ellipsoid parameter spaces factor inversely into their corresponding evidences. Combining (136) and (137), and making use of the Stirling approximation
log Γ m + 2 2 = m 2 log m 2 m 2 + O m ,
we find that the axes of the implied BIC prior tend to be longer by a factor
c BIC 2 m / 2 Γ m + 2 2 1 / m m e 1 1 / 2 .
than the axes of Constantineau’s prior. It follows that the implied BIC prior is approximately given as, (104) and (139),
p β X , σ , BIC , S e 1 m m / 2 1 N m / 2 Γ m + 2 2 X T X 1 / 2 π σ 2 m / 2 .
And it may be checked that the requirement (105) holds for this implied prior, as we have that the equivalent requirement (119),
1 Γ m 2 , e 1 m N 2 Γ m 2 = 1 ,
holds for N m 3 , where it is understood that in a regression analysis the number of parameters m may never exceed the sample size N.
The AIC is given as [13]
AIC = 2 m + 2 N log y y ^ ,
where, given any two estimated models, the model with the lower value of AIC is the one to be preferred. The AIC has an implied evidence of
p y X , AIC , S exp 1 2 AIC = e m 1 y y ^ N ,
where S is the stipulation (103). For k < < 2 N , the AIC evidence (143) differs from Constantineau’s evidence (130) by an approximate factor
p y X , M , I , S p y X , AIC , S e m 2 N m / 2 Γ m + 2 2 .
Let c AIC be the factor by which the lengths of the axes of the parameter space of the implied BIC prior differs from the lengths of the axes of the parameter space of Constantineau’s prior (102). Then we have that, (137),
p y X , AIC , S = 1 c AIC m p y X , M , I , S .
Combining (144) and (145), and making use of the Stirling approximation (138), we find that the axes of the implied AIC prior tend to be shorter by a factor
c AIC e m 2 N m / 2 Γ m + 2 2 1 / m e 1 m N 1 / 2
than the axes of Constantineau’s prior. It follows that the implied AIC prior is approximately given as, (104) and (146),
p β X , σ , AIC , S 1 e 1 m m / 2 Γ m + 2 2 X T X 1 / 2 π σ 2 m / 2 .
Now, if we look at the coverage of the AIC prior (147), then we find that, (119),
1 Γ m 2 , e 1 m 2 Γ m 2 = 1 ,
even as m 1 . Moreover, it would seem that the second argument of the incomplete gamma function in (148) is the threshold level below which, for a given first argument of m / 2 , the requirement (147) no longer holds for general m, as we have for m that, on the one hand,
1 Γ m 2 , m 2 Γ m 2 0 . 5
and, on the other hand,
1 Γ m 2 , e 1 m 2 Γ m 2 1 .
Stated differently, it would seem that it is the implied AIC prior (147) that is optimally parsimonious, rather than Constantineau’s prior (102), as this AIC prior may very well be the uniform proper prior which has the smallest possible parameter space for which requirement (105) will always hold.
We summarize, of the three “stipulation priors”, (102), (140), and (147), the BIC prior is the most conservative in that it has an evidence that penalizes the severest for the number of parameters m, followed by Constantineau’s prior, which, though parsimonious, is not the optimally parsimonious prior, as was initially thought in part II of this research [2]. This honor may very well go to the AIC prior, should it turn out that the value of e 1 m / 2 in the second argument of (148) is indeed the exact threshold point above which (119) will always hold.

15. The Corresponding Regression Model

If we combine the prior (18) of the unknown σ and the respective priors of the regression coefficients β , (78), (89), (95), (102), (140), and (147), with the likelihood model (9), and integrate with respect to the unknown σ , we obtain the posterior of the unknown β s, (21) through (28):
p β y , X , M , I = Γ N + m / 2 Γ N / 2 π m / 2 1 N s 2 X T X 1 / 2 1 + β β ^ T 1 N s 2 X T X β β ^ N + m / 2 ,
where
s 2 = 1 N y y ^ 2 .
Stated differently, as the normalizing constant C of (10) in the priors (78), (89), (95), (102), (140), and (147), is not so much a constant as it is a function of σ :
C σ 1 σ m ,
we have that the degrees of freedom of the multivariate Student-t distribution (151) and, consequently, the sample error variance (152), are always N, irrespective of the number of predictors m, hence the “seemingly” interjection following (29).
The posterior (151) has a mean of β ^ = X T X 1 X T y , (14), a covariance matrix of X T X / s 2 1 , (152), and a corresponding predictive probability distribution for y ^ , given a vector m × 1 vector of predictor values x , [6]:
p y ^ x , y , X , M , I = Γ N + 1 / 2 Γ N / 2 π h N s 2 1 + h N s 2 y ^ x T β ^ 2 N + 1 / 2 ,
where
h = 1 x T X T X + x x T 1 x ,
which is in the univariate Student-t form and has expected value, (14), and standard deviation, (152),
E y ^ = x T β ^ and std y ^ = s N N 2 1 + x T X T X 1 x .

16. Discussion

This research into proper uniform priors was inspired by our research into spline models [5,14]. Spline models may have hundreds of regression coefficients. So, in using these models in an actual data-analysis, one is forced to think about the most suitable bounds of the proper non-informative priors of the unknown regression parameters. Not because this will give us better parameter estimates, but simply because taking a proper prior with overly large bounds will severely punish the larger regression models.
Grappling with the problem of defining a parsimonious proper prior for regression coefficients, it was quickly realized that the proposed priors should include the square root of X T X , so that this term could cancel out in the evidence derivations, since this term is not invariant for otherwise equivalent B- and C-spline regression analysis formulations (in which pairs of triangles in the B-spline analysis were forced to connect with continuity orders equal to the polynomial orders in order to merge these paired triangles into squares.) Moreover, it was found that dropping the square root of X T X in an ad-hoc fashion from the regression analysis evidences proposed in [6,7] gives satisfactory results, in terms of (spline) regression model selections that commit neither gross under- nor gross over-fitting. So, the first impetus of this research was the desire to find a principled argument by which we would be allowed to drop the square root of X T X from the evidence, a term which was problematic in that it is non-invariant under certain transformations of the predictor variables and which seemed to be not that essential for a successful model selection.
Apart from the need to include the square root of X T X in the proper priors, or, equivalently, the need to drop this term from the evidences, it was also realized that regression coefficients are bounded by certain aspects of the predictor matrix X and the dependent variable vector y . This second realization led to the finding that the prior accessible space of regression coefficients is ellipsoid in form, which then provided us in the first part of this research [4] with the sought for rational of the inclusion of the square root of X T X in the proper priors.
Now, in the first part of this research it was implicitly assumed that the predictor matrix X is part of the data, which forced us to make a prior estimate of the (scalar) value of the square root of X T X . This estimated value then would be weighted by the actual observed value of the square root of X T X . But as this prior estimation is a non-trivial task [4], we were forced to think on how to justify the use of the actual observed values of the square root of X T X , rather than the prior estimates of these values. This then led us to the second part of this research [2], in which it was observed that X may very well in practicality be obtained during the data-gathering phase, but that X formally is not part of the data, as it admits no likelihood function in ordinary regression analysis. Also, in the second part of this research there was presented a suite of proper uniform priors for the regression coefficients proper β , rather than, as was realized in hindsight, a single proper uniform prior for the estimated regression coefficients β ^ given in [4].
It was found in the second part of this research that if the actual observed value of the square root of X T X is used in the construction of the proper prior for regression coefficients, then the user only needs to assign prior values to either the maximum absolute value, or the minimum and maximum, or the standard deviation of the dependent variable y, in order to construct his cruelly realistic priors. Alternatively, if the user is willing to accept empirical overtones in his prior, by way of the stipulation that the proper uniform prior is to be centered at the to be estimated regression coefficients β ^ , the need for prior value assignments to the characteristics of the dependent variable y may be circumvented, as we construct Constantineau’s scrupulously fair stipulation prior.
In the third part of this research it has now been checked analytically that the accessible parameter space of the in [2] proposed priors cover the true values of β with a probability that tends to one. It has also been found that the implied AIC prior is a viable stipulation prior, as its accessible parameter space covers the true values of β with a probability one. Moreover, it may very well be that the AIC stipulation prior is optimally parsimonious as it may represent the inverse of the smallest prior volume which covers the true value of β with a probability one, when centered at β ^ . It follows that Constantineau’s stipulation prior takes the middle position in terms of conservativeness, as the implied BIC stipulation prior is more conservative in terms of the penalizing for the number of parameters m, whereas the implied AIC stipulation prior is more liberal.
Also, there are given, in Appendix A below, two Monte Carlo studies on the performance of the discussed priors, in terms of their implied evidences, in C-spline regression model selection problems. It is found in these studies that, depending on the accuracy of the prior assessments of the characteristics of the dependent variable y, the priors that were proposed in the second part of this research fill in the space between the BIC and AIC on a continuum of conservativeness, in terms of the number of parameters chosen.

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 723254. This paper reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein. Entropy 19 00250 i001

Author Contributions

H.R. Noel van Erp and Ronald. O. Linger derivred the proper uninform priors discussed in this paper and designed the spline regression Monte Carlo experiments; Pieter H.A.J.M. van Gelder provided feedback and supervision. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Two Monte Carlo Studies

We now will use the proposed evidences (127)–(130), together with the implied BIC and AIC evidences, (135) and (143), respectively, and the “sure thing” evidence (41), for two Monte Carlo studies which involve two-dimensional C-spline regression models. But before we do so, we first will give a short introduction to spline models.
In ordinary polynomial regression we have that the more non-linear the target function f x , y is, the higher the order of the polynomial basis d needs to be, in order to adequately capture that non-linearity [15]:
f x , y = i = 0 d j = 0 d β ^ i j x i y j + e ,
where e N 0 , σ 2 .
The polynomial model (A1) has m = d + 1 2 free parameters. There is a limit, however, on the order d that can be used in a polynomial regression analysis, as the solution will tend to degenerate from some polynomial order d crit . onward, as the inverse of B ˜ P T B ˜ P , where B ˜ P is the N × m polynomial predictor matrix, becomes ever more ill-conditioned with increasing polynomial order d. This limit on the polynomial order d translates directly to a limit on the number of parameters m at our disposal for capturing the non-linearity in the target function.
One way to circumvent the problem of the bounded number of free parameters m is to use a spline model. In spline models one partitions the original domain in sub-domains and on these sub-domains piecewise polynomials of order d like, for example, (A1) are fitted under the constraint that they should connect with rth order continuity on their sub-domain boundaries. The power of spline models lies in the fact that even the most non-linear of functions f x , y will tend to become linear on its sub-domains as the size of the sub-domains tends to zero. In B-spline models the sub-domains are taken to be triangles/tetrahedra [14,16], whereas in C-spline models the sub-domains are taken to be squares/cubes [5]; see Appendix B for a discussion of C-splines.
Since in a spline regression analysis piecewise polynomials are fitted to each of the sub-domains of the given partition, we have that splines models, like neural networks [17], allow for highly flexible models with large m. This is why, whenever there is the potential for measurement errors in the data, Bayesian model selection is needed to protect against the problem of over-fitting.
In closing, note that the results of the following Monte Carlo studies are presented in terms of evidences, rather than in terms of the priors from which they were derived. This is because the choice for a particular proper uniform prior in regression analysis problems translates directly to a choice for a particular evidence that is to be used in the model selection phase, (5) or (6).

Appendix A.1. Monte Carlo Study 1

In the first Monte Carlo study we sample from the target function
f x , y = sin π x 2 + 2 y 2 , for 0 x , y 1 ,
which is shown in Figure A1. The sampling in this first study is done with sample sizes N = 5000 and N = 10,000, and with Gaussian noise levels of σ n = 0 , 1 / 2 , 1 , and 2. The evidences must choose for each of these conditions amongst 42 models with 4 m 484 parameters.
Figure A1. Target function (A2).
Figure A1. Target function (A2).
Entropy 19 00250 g001
In Figure A2 some representative examples of large size data sets are shown for the different noise levels σ n .
Figure A2. Noisy data sampled from target function (A2). Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with noisy data, noisy data minus true target value, and cross sections of the target function and noisy data.
Figure A2. Noisy data sampled from target function (A2). Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with noisy data, noisy data minus true target value, and cross sections of the target function and noisy data.
Entropy 19 00250 g002
For N = 5000 it is found, Table A1, that the Ignorance, Manor, and BIC evidences are the most conservative of all the viable evidences in terms of the number of parameters m of the respective spline models. The Neeley and Constantineau evidences are slightly less conservative, as they choose for σ n = 2 a model that is one order less conservative in terms of the number of parameters m, relatively to the Ignorance, Manor, and BIC evidences. The AIC evidence takes the high ground in that it is consistently less conservative in terms of the number of parameters m, relatively to the Ignorance, Manor, Neeley, Constantineau, and BIC evidences. Finally, the “sure thing” evidence just chooses the largest model available, thus, consistently (grossly) over-fitting the data. Also, it may be noted that in the absence of noise (i.e., σ n = 0 ) all the evidences are in agreement in taking the model with the largest possible number of parameters; i.e., the model with a 7-by-7 partitioning, a polynomial order of d = 3 , and a continuity order of r = 0 .
Table A1. C-spline models (geometry g, polynomial order d, continuity order r) and number of parameters m that were chosen by the discussed evidences, for N = 5000 and under Gaussian noise levels σ n = 0 , 1 / 2 , 1 , and 2.
Table A1. C-spline models (geometry g, polynomial order d, continuity order r) and number of parameters m that were chosen by the discussed evidences, for N = 5000 and under Gaussian noise levels σ n = 0 , 1 / 2 , 1 , and 2.
σ n = 0 σ n = 1 / 2 σ n = 1 σ n = 2
EvidencesModel 1 m Model 2 m Model 3 m Model 4 m
“Sure thing” (41) 7 , 3 , 0 484 7 , 3 , 0 484 7 , 3 , 0 484 7 , 3 , 0 484
AIC (143) 7 , 3 , 0 484 5 , 2 , 1 49 2 , 3 , 1 36 2 , 3 , 1 36
Neeley (127), Constantineau (130) 7 , 3 , 0 484 2 , 3 , 1 36 2 , 3 , 2 25 3 , 2 , 1 25
Ignorance (127), Manor (127), BIC (135) 7 , 3 , 0 484 2 , 3 , 1 36 2 , 3 , 2 25 3 , 1 , 0 16
1 Data estimates: max y = 1.00 , min y = 1.00 , max y = 1.00 , and φ = 0.67 ; 2 Data estimates: max y = 2.71 , min y = 2.71 , max y = 2.38 , and φ = 0.85 ; 3 Data estimates: max y = 4.32 , min y = 4.32 , max y = 4.13 , and φ = 1.21 ; 4 Data estimates: max y = 7.28 , min y = 6.72 , max y = 7.28 , and φ = 2.11 .
In Figure A3, Figure A4, Figure A5 and Figure A6, the fitted C-spline models are given per evidence (group), starting with the “sure thing” evidence and in descending order of liberalness in terms of the number of parameters m. In Figure A6 there is a possible instance of under-fitting for a noise level of σ n = 2 (i.e., fourth column) by the model which is picked by the Ignorance, Manor, and BIC evidences.
Figure A3. Sample size N = 5000 and C-spline models of target function (A2) are picked by the “sure thing” evidence (41) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A3. Sample size N = 5000 and C-spline models of target function (A2) are picked by the “sure thing” evidence (41) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g003
Figure A4. Sample size N = 5000 and C-spline models of target function (A2) are picked by the AIC evidence (143) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A4. Sample size N = 5000 and C-spline models of target function (A2) are picked by the AIC evidence (143) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g004
Figure A5. Sample size N = 5000 and C-spline models of target function (A2) are picked by the Neeley and the Constantineau evidences, (129) and (130), for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A5. Sample size N = 5000 and C-spline models of target function (A2) are picked by the Neeley and the Constantineau evidences, (129) and (130), for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g005
Figure A6. Sample size N = 5000 and C-spline models of target function (A2) are picked by the Ignorance, Manor, and BIC evidences, (135), (127), and (128), for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A6. Sample size N = 5000 and C-spline models of target function (A2) are picked by the Ignorance, Manor, and BIC evidences, (135), (127), and (128), for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g006
In order to give the reader a more concrete sense of the discussed evidences, we give for the Gaussian noise level of σ = 1 the full output of the Bayesian model selection analysis in Table A2. It may be noted in these tables that the highest “sure thing” evidence must necessarily correspond with the lowest sample error standard deviation s, or, equivalently, the smallest sample error variance s 2 , since we have that this sample error variance, (152),
s 2 = 1 N y y ^ 2 = 1 N i = 1 N y i y ^ i 2
is an inverse root of the “sure thing” evidence (41). Likewise, let the sample variance be given as
s 0 2 = 1 N y 1 y ¯ 2 = 1 N i = 1 N y i y ¯ 2 ,
where y ¯ is the sample mean
y ¯ = 1 N 1 T y = 1 N i = 1 N y i ,
then we have that the highest “sure thing” evidence must necessarily correspond with the highest R-square value, since we have that,
R 2 = 1 s 2 s 0 2 .
Stated differently, model selection based on R-square values is equivalent to model selection based on “sure thing” evidences (41).
Table A2. Output model selection analysis for data sampled from target function (A2), sample size N = 5000 , and Gaussian error of σ e = 1 ; given are (internally) ranked logarithms of the discussed evidences, ranked sample error standard deviations (from low to high) and R-square values, number of parameters m, and spline model specifications (geometry, polynomial-order, and continuity-order).
Table A2. Output model selection analysis for data sampled from target function (A2), sample size N = 5000 , and Gaussian error of σ e = 1 ; given are (internally) ranked logarithms of the discussed evidences, ranked sample error standard deviations (from low to high) and R-square values, number of parameters m, and spline model specifications (geometry, polynomial-order, and continuity-order).
IgnoranceManorNeeleyConstantineauBICAIC“Sure Thing”Error StdR-SquaremModel Specs
1−21,3831−21,3831−21,3531−21,3411−21,37113−21,29030−21,265300.99300.3225232
2−21,3852−21,3852−21,3553−21,3442−21,37317−21,29231−21,267310.99310.3225321
3−21,4033−21,4033−21,3592−21,3433−21,3921−21,27525−21,239250.99250.3336231
4−21,4084−21,4074−21,3644−21,3484−21,3962−21,27927−21,243270.99270.3336421
5−21,4105−21,4095−21,3665−21,3505−21,3984−21,28128−21,245280.99280.3336332
6−21,4116−21,4117−21,3819−21,3696−21,39923−21,31832−21,293321.00320.3225220
7−21,4127−21,4116−21,3686−21,3517−21,4006−21,28329−21,247290.99290.3336510
8−21,4158−21,4158−21,38512−21,3738−21,40324−21,32233−21,297331.00330.3225410
9−21,4499−21,4489−21,3897−21,3669−21,4403−21,28021−21,231210.99210.3349230
10−21,45110−21,45010−21,3918−21,36910−21,4425−21,28222−21,233220.99220.3349432
11−21,45211−21,45211−21,39310−21,37011−21,4438−21,28423−21,235230.99230.3349521
12−21,45312−21,45212−21,39311−21,37012−21,4449−21,28424−21,235240.99240.3349320
13−21,45913−21,45813−21,39913−21,37713−21,45016−21,29026−21,241260.99260.3349610
14−21,49614−21,49514−21,41714−21,38814−21,4927−21,28317−21,219170.99170.3464621
15−21,49815−21,49615−21,41915−21,39015−21,49410−21,28518−21,221180.99180.3464331
16−21,50216−21,50116−21,42316−21,39416−21,49811−21,28919−21,225190.99190.3464532
17−21,50217−21,50117−21,42417−21,39417−21,49812−21,28920−21,225200.99200.3464710
18−21,50820−21,50823−21,48927−21,48220−21,49834−21,44636−21,430341.03360.2816130
19−21,50819−21,50822−21,48926−21,48219−21,49833−21,44635-21,430351.03350.2816131
20−21,50818−21,50821−21,48925−21,48218−21,49832−21,44634−21,430361.03340.2816132
21−21,50921−21,50824−21,48928−21,48221−21,49835−21,44637−21,430371.03370.2816221
22−21,52322−21,52328−21,50429−21,49722−21,51336−21,46138−21,445381.03380.2716310
23−21,55023−21,54918−21,45118−21,41423−21,55414−21,29013−21,209130.98130.3481721
24−21,55024−21,54919−21,45119−21,41424−21,55415−21,29014−21,209140.98140.3481420
25−21,55925−21,55820−21,46020−21,42325−21,56318−21,29916−21,218160.99160.3481632
26−21,61326−21,61125−21,49121−21,44526−21,62819−21,30211−21,202110.98110.34100330
27−21,62027−21,61826−21,49722−21,45127−21,63421−21,30812−21,208120.98120.34100431
28−21,62228−21,62127−21,50023−21,45428−21,63722−21,31115−21,211150.98150.34100732
29−21,66329−21,66233−21,65235−21,64829−21,65539−21,62639−21,617391.07390.229210
30−21,67430−21,67129−21,52524−21,46930−21,70220−21,3089−21,18790.9890.35121520
31−21,71432−21,71437−21,70439−21,70032−21,70741−21,67841−21,669401.08410.219120
32−21,71431−21,71436−21,70438−21,70031−21,70740−21,67840−21,669411.08400.219121
33−21,75633−21,75330−21,58030−21,51333−21,80226−21,33310−21,189100.98100.34144531
34−21,81634−21,81331−21,60931−21,53034−21,88225−21,3325−21,16350.9750.35169620
35−21,82935−21,82632−21,62232−21,54335−21,89527−21,3447−21,17570.9870.35169430
36−21,91936−21,91634−21,67934−21,58836−22,01029−21,3728−21,17680.9880.35196631
37−21,96337−21,95935−21,68633−21,58038−22,08028−21,3473−21,12230.9730.36225720
38−22,06638−22,06641−22,06142−22,06037−22,06242−22,04942−22,045421.16420.084110
39−22,08539−22,08138−21,77136−21,65139−22,23630−21,4024−21,14640.9740.36256530
40−22,10540−22,10139−21,79237−21,67240−22,25731−21,4236−21,16760.9860.35256731
41−22,37841−22,37140−21,93440−21,76441−22,65037−21,4732−21,11220.9620.36361630
42−22,71242−22,70442−22,11741−21,88742−23,14538−21,5671−21,08310.9610.37484730
For N = 10,000 the same pattern can be discerned as for N = 5000 , Table A3. The Ignorance, Manor, and BIC evidences are the most conservative of all the viable evidences in terms of the number of parameters m of the respective spline models. The Neeley and Constantineau evidences are slightly less conservative, as they choose for σ n = 1 / 2 a model that is one order less conservative in terms of the number of parameters m, relatively to the Ignorance, Manor, and BIC evidences. The AIC evidence takes the high ground in that it is consistently less conservative in terms of the number of parameters m, relatively to the Ignorance, Manor, Neeley, Constantineau, and BIC evidences. Finally, the “sure thing” evidence just chooses the largest model available, thus, consistently (grossly) over-fitting the data. And it may again be noted that in the absence of noise (i.e., σ n = 0 ) all the evidences are in agreement in taking the model with the largest possible number of parameters.
Table A3. C-spline models (geometry g, polynomial order d, continuity order r) and number of parameters m that were chosen by the discussed evidences, for N = 10 , 000 and under Gaussian noise levels σ n = 0 , 1 / 2 , 1 , and 2.
Table A3. C-spline models (geometry g, polynomial order d, continuity order r) and number of parameters m that were chosen by the discussed evidences, for N = 10 , 000 and under Gaussian noise levels σ n = 0 , 1 / 2 , 1 , and 2.
σ n = 0 σ n = 1 / 2 σ n = 1 σ n = 2
EvidencesModel 1 m Model 2 m Model 3 m Model 4 m
“Sure thing” (41) 7 , 3 , 0 484 7 , 3 , 0 484 7 , 3 , 0 484 7 , 3 , 0 484
AIC (143) 7 , 3 , 0 484 3 , 3 , 1 64 2 , 3 , 0 49 3 , 3 , 2 36
Neeley (127), Constantineau (130) 7 , 3 , 0 484 4 , 3 , 2 49 2 , 3 , 1 36 2 , 3 , 2 25
Ignorance (127), Manor (127), BIC (135) 7 , 3 , 0 484 3 , 3 , 2 36 2 , 3 , 1 36 2 , 3 , 2 25
1 Data estimates: max y = 1.00 , min y = 1.00 , max y = 1.00 , and φ = 0.67 ; 2 Data estimates: max y = 2.89 , min y = 2.89 , max y = 2.65 , and φ = 0.84 ; 3 Data estimates: max y = 4.33 , min y = 4.39 , max y = 4.43 , and φ = 1.20 ; 4 Data estimates: max y = 10.14 , min y = 10.14 , max y = 7.76 , and φ = 2.11 .
In Figure A7, Figure A8, Figure A9 and Figure A10, the fitted C-spline models are given per evidence (group), starting with the “sure thing” evidence and in descending order of liberalness in terms of the number of parameters m. In Figure A8 there is a possible instance of over-fitting for a noise level σ n = 1 (i.e., column 3) by the model which is picked by the AIC evidence. Also, again in order to give the reader a more concrete sense of the discussed evidences, we give for the Gaussian noise level of σ = 1 the full output of the Bayesian model selection analysis in Table A4.
Figure A7. Sample size N = 10,000 and C-spline models of target function (A2) are picked by the “sure thing” evidence (41) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A7. Sample size N = 10,000 and C-spline models of target function (A2) are picked by the “sure thing” evidence (41) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g007
Figure A8. Sample size N = 10,000 and C-spline models of target function (A2) are picked by the AIC evidence (143) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A8. Sample size N = 10,000 and C-spline models of target function (A2) are picked by the AIC evidence (143) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g008
Figure A9. Sample size N = 10,000 and C-spline models of target function (A2) are picked by the Neeley and the Constantineau evidences, (129) and (130), for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A9. Sample size N = 10,000 and C-spline models of target function (A2) are picked by the Neeley and the Constantineau evidences, (129) and (130), for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g009
Figure A10. Sample size N = 10,000 and C-spline models of target function (A2) are picked by the Ignorance, Manor, and Bayesian Information Criterion (BIC) evidences, (135), (127), and (128), for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A10. Sample size N = 10,000 and C-spline models of target function (A2) are picked by the Ignorance, Manor, and Bayesian Information Criterion (BIC) evidences, (135), (127), and (128), for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g010
Table A4. Output model selection analysis for data sampled from target function (A2), sample size N = 10,000, and Gaussian error of σ e = 1 ; given are (internally) ranked logarithms of the discussed evidences, ranked sample error standard deviations (from low to high) and R-square values, number of parameters m, and spline model specifications (geometry, polynomial-order, and continuity-order).
Table A4. Output model selection analysis for data sampled from target function (A2), sample size N = 10,000, and Gaussian error of σ e = 1 ; given are (internally) ranked logarithms of the discussed evidences, ranked sample error standard deviations (from low to high) and R-square values, number of parameters m, and spline model specifications (geometry, polynomial-order, and continuity-order).
IgnoranceManorNeeleyConstantineauBICAIC“Sure Thing”Error StdR-SquaremModel Specs
1−46,1771−46,1761−46,1321−46,1151−46,1644−46,03526−45,999260.99260.3236231
2−46,1792−46,1794−46,1485−46,1372−46,16721−46,07630−46,051301.00300.3125232
3−46,1813−46,1815−46,1508−46,1383−46,16822−46,07831−46,053311.00310.3125321
4−46,1824−46,1822−46,1372−46,1214−46,16910−46,04027−46,004271.00270.3236421
5−46,1865−46,1863−46,1413−46,1255−46,17412−46,04428−46,008281.00280.3236332
6−46,2036−46,2036−46,15810−46,1426−46,19116−46,06129−46,025291.00290.3136510
7−46,2207−46,2197−46,1594−46,1377−46,2101−46,03321−45,984210.99210.3249230
8−46,2208−46,2208−46,1596−46,1378−46,2102−46,03422−45,985220.99220.3249432
9−46,2219−46,2219−46,1607−46,1389−46,2113−46,03423−45,985230.99230.3249521
10−46,22310−46,22310−46,1629−46,14010−46,2145−46,03724−45,988240.99240.3249320
11−46,22611−46,22511−46,16411−46,14211−46,2168−46,03925−45,990250.99250.3249610
12−46,23712−46,23716−46,20616−46,19512−46,22527−46,13532−46,110321.01320.3025220
13−46,23913−46,23917−46,20817−46,19713−46,22628−46,13633−46,111331.01330.3025410
14−46,27414−46,27412−46,19412−46,16514−46,2696−46,03817−45,974170.99170.3264532
15−46,27415−46,27413−46,19413−46,16515−46,2697−46,03818−45,974180.99180.3264621
16−46,27516−46,27514−46,19514−46,16616−46,2709−46,03919−45,975190.99190.3264331
17−46,27917−46,27815−46,19915−46,17017−46,27411−46,04320−45,979200.99200.3264710
18−46,33518−46,33518−46,23418−46,19718−46,33813−46,04613−45,965130.99130.3281721
19−46,33719−46,33719−46,23719−46,20019−46,34014−46,04815−45,967150.99150.3281420
20−46,34020−46,33920−46,23920−46,20220−46,34215−46,05016−45,969160.99160.3281632
21−46,40921−46,40921−46,28521−46,23921−46,42217−46,06211−45,962110.99110.32100330
22−46,41022−46,40922−46,28522−46,24022−46,42318−46,06212−45,962120.99120.32100431
23−46,41323−46,41223−46,28823−46,24323−46,42619−46,06614−45,966140.99140.32100732
24−46,47926−46,47930−46,45932−46,45226−46,46836−46,41036−46,394341.03360.2616130
25−46,47925−46,47929−46,45931−46,45225−46,46835−46,41035−46,394351.03350.2616131
26−46,47924−46,47928−46,45930−46,45224−46,46834−46,41034−46,394361.03340.2616132
27−46,47927−46,47924−46,32824−46,27429−46,50620−46,07010−45,949100.99100.32121520
28−46,48228−46,48231−46,46233−46,45627−46,47237−46,41437−46,398371.04370.2616221
29−46,50029−46,49932−46,48034−46,47328−46,48938−46,43138−46,415381.04380.2616310
30−46,56630−46,56525−46,38625−46,32130−46,61023−46,0909−45,94690.9990.32144531
31−46,63631−46,63526−46,42526−46,34831−46,70024−46,0915−45,92250.9950.33169620
32−46,64732−46,64727−46,43727−46,36032−46,71225−46,1037−45,93470.9970.33169430
33−46,75633−46,75533−46,51228−46,42336−46,84529−46,1398−45,94380.9980.32196631
34−46,75834−46,75837−46,74738−46,74433−46,75139−46,71839−46,709391.07390.219210
35−46,81735−46,81734−46,53729−46,43437−46,93426−46,1233−45,89830.9830.33225720
36−46,84637−46,84640−46,83541−46,83135−46,83941−46,80641−46,797401.08410.209120
37−46,84636−46,84639−46,83540−46,83134−46,83940−46,80640−46,797411.08400.209121
38−46,93938−46,93835−46,62035−46,50338−47,08830−46,1654−45,90940.9940.33256530
39−46,95539−46,95436−46,63636−46,52039−47,10531−46,1826−45,92660.9960.33256731
40−47,26140−47,25938−46,81037−46,64441−47,53032−46,2292−45,86820.9820.33361630
41−47,52141−47,52142−47,51642−47,51540−47,51742−47,50242−47,498421.16420.084110
42−47,64842−47,64641−47,04439−46,82142−48,07933−46,3341−45,85010.9810.34484730
In closing, note that for both N = 5000 and N = 10 , 000 the cruelly realistic evidences (127)–(129), have been helped by estimating max y , min y , max y , and φ directly from the observed dependent variable y. So, if we penalize the computed evidences with a multiplication factor of 2 / 3 m , in order to compensate (see Section 5 of [3]) for the non-conservativeness of the data estimates of max y , min y , max y , and φ , it is found for N = 10,000 and σ n = 1 / 2 that the Neely evidence will become one order of magnitude more conservative, as it picks the same model as the Ignorance, Manor, and BIC evidences, while at the same time we have that for N = 10,000 and σ n = 1 the Ignorance and Manor evidences become one order of magnitude more conservative, thus, leaving the BIC evidence behind, as they choose the C-spline model 2 , 3 , 2 , which has m = 25 parameters.

Appendix A.2. Monte Carlo Study 2

In the second Monte Carlo study we sample from the target function
f x , y = sin π x 2 + 2 y 2 , for 0 x , y 1 . 5 ,
which is shown in Figure A11. The sampling in this second study is done with sample size N = 15,000, with Gaussian noise levels of σ n = 0 , 1 / 2 , 1 , and 2, and multiplication factors of 1 and 10, respectively, for the data estimates of max y , min y , max y , and φ . The evidences must now choose amongst 78 models with 4 m 1600 parameters.
Figure A11. Target function (A7).
Figure A11. Target function (A7).
Entropy 19 00250 g011
In Figure A12 some representative examples of large size data sets are shown for the different noise levels σ n .
Figure A12. Noisy data sampled from target function (A2). Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with noisy data, noisy data minus true target value, and cross sections of the target function and noisy data.
Figure A12. Noisy data sampled from target function (A2). Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with noisy data, noisy data minus true target value, and cross sections of the target function and noisy data.
Entropy 19 00250 g012
For a multiplication factor of 1, or, equivalently, a straightforward data-estimate of the characteristics of the dependent variable y, it is found, Table A5, that the Ignorance, Manor, Neeley, and BIC evidences become conservative in the absence of measurement error (i.e., σ n = 0 ), as they choose a model with m = 625 parameters, rather than the model with the maximum number of parameters m = 1600 which is preferred by the Constantineau, AIC, and “sure thing” evidences. Stated differently, the penalizing of an increase of Δ m = 975 parameters by the Occam factors the Ignorance, Manor, Neeley, and BIC evidences outweighs the gains in goodness of fit of said Δ m = 975 parameters.
Table A5. C-spline models (geometry g, polynomial order d, continuity order r) and number of parameters m that were chosen by the discussed evidences, for N = 15 , 000 under Gaussian noise levels σ n = 0 , 1 / 2 , 1 , and 2, and a multiplication factor of 1 for the estimates of the characteristics of the dependent variable y.
Table A5. C-spline models (geometry g, polynomial order d, continuity order r) and number of parameters m that were chosen by the discussed evidences, for N = 15 , 000 under Gaussian noise levels σ n = 0 , 1 / 2 , 1 , and 2, and a multiplication factor of 1 for the estimates of the characteristics of the dependent variable y.
σ n = 0 σ n = 1 / 2 σ n = 1 σ n = 2
EvidencesModel 1 m Model 2 m Model 3 m Model 4 m
“Sure thing” (41) 13 , 3 , 0 1600 13 , 3 , 0 1600 13 , 3 , 0 1600 13 , 3 , 0 1600
AIC (143) 13 , 3 , 0 1600 6 , 3 , 1 196 5 , 3 , 1 144 9 , 2 , 1 121
Constantineau (130) 13 , 3 , 0 1600 5 , 3 , 1 144 8 , 3 , 2 121 4 , 3 , 1 100
Neeley (127) 8 , 3 , 0 625 5 , 3 , 1 144 8 , 3 , 2 121 4 , 3 , 1 100
Ignorance (127), Manor (127),
 BIC (135) 8 , 3 , 0 625 5 , 3 , 1 144 4 , 3 , 1 100 3 , 3 , 1 64
1 Data estimates times 1: max y = 1.00 , min y = 1.00 , max y = 1.00 , and φ = 0.70 ; 2 Data estimates times 1: max y = 2.75 , min y = 2.72 , max y = 2.75 , and φ = 0.86 ; 3 Data estimates times 1: max y = 5.01 , min y = 5.01 , max y = 4.73 , and φ = 1.22 ; 4 Data estimates times 1: max y = 8.15 , min y = 7.76 , max y = 8.15 , and φ = 2.12 .
Apart from this deviation, we have that the pattern of model choices is roughly the same as observed in the first Monte Carlo study. The Ignorance, Manor, and BIC evidences are the most conservative of all the viable evidences in terms of the number of parameters m of the respective spline models. The Neeley and Constantineau evidences are slightly less conservative, as they choose both for σ n = 1 and σ n = 2 a model that is one order less conservative in terms of the number of parameters m, relatively to the Ignorance, Manor, and BIC evidences. The AIC evidence takes the high ground in that it is consistently less conservative in terms of the number of parameters m, relatively to the Ignorance, Manor, Neeley, Constantineau, and BIC evidences. Finally, the “sure thing” evidence just chooses the largest model available, thus, consistently (grossly) over-fitting the data.
In Figure A13, Figure A14, Figure A15, Figure A16 and Figure A17, the fitted C-spline models are given per evidence (group), starting with the “sure thing” evidence and in descending order of liberalness in terms of the number of parameters m. In Figure A17 there is a possible instance of under-fitting for a noise level σ n = 2 (i.e., column 4) by the model which is picked by the Ignorance, Manor, and BIC evidences.
Figure A13. Sample size N = 15,000 and C-spline models of target function (A7) are picked by the “sure thing” evidence (41) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A13. Sample size N = 15,000 and C-spline models of target function (A7) are picked by the “sure thing” evidence (41) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g013
Figure A14. Sample size N = 15,000 and C-spline models of target function (A7) are picked by the AIC evidence (143) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A14. Sample size N = 15,000 and C-spline models of target function (A7) are picked by the AIC evidence (143) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g014
Figure A15. Sample size N = 15,000 and C-spline models of target function (A7) are picked by the Constantineau evidence (130) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A15. Sample size N = 15,000 and C-spline models of target function (A7) are picked by the Constantineau evidence (130) for different noise levels. Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g015
Figure A16. Sample size N = 15,000 and C-spline models of target function (A7) are picked by the Neeley evidence (129) for different noise levels and for a straightforward data estimate of φ . Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A16. Sample size N = 15,000 and C-spline models of target function (A7) are picked by the Neeley evidence (129) for different noise levels and for a straightforward data estimate of φ . Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g016
Figure A17. Sample size N = 15,000 and C-spline models of target function (A7) are picked by the Ignorance, Manor, and BIC evidences, (135), (127), and (128), for different noise levels and for straightforward data estimates of max y , min y , and max y . Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A17. Sample size N = 15,000 and C-spline models of target function (A7) are picked by the Ignorance, Manor, and BIC evidences, (135), (127), and (128), for different noise levels and for straightforward data estimates of max y , min y , and max y . Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g017
It is found, Table A6, that for a multiplication factor of 10 for the data estimates of max y , min y , max y , and φ , and a Gaussian noise level of σ n = 1 / 2 the Ignorance and Manor evidences become more conservative than the BIC. Also, for Gaussian noise levels of σ n = 1 and σ n = 2 the Neely evidence becomes just as conservative as the BIC.
Table A6. C-spline models (geometry g, polynomial order d, continuity order r) and number of parameters m that were chosen by the discussed evidences, for N = 15 , 000 , under Gaussian noise levels σ n = 0 , 1 / 2 , 1 , and 2, and a multiplication factor of 10 for the estimates of the characteristics of the dependent variable y.
Table A6. C-spline models (geometry g, polynomial order d, continuity order r) and number of parameters m that were chosen by the discussed evidences, for N = 15 , 000 , under Gaussian noise levels σ n = 0 , 1 / 2 , 1 , and 2, and a multiplication factor of 10 for the estimates of the characteristics of the dependent variable y.
σ n = 0 σ n = 1 / 2 σ n = 1 σ n = 2
EvidencesModel 1 m Model 2 m Model 3 m Model 4 m
“Sure thing” (41) 13 , 3 , 0 1600 13 , 3 , 0 1600 13 , 3 , 0 1600 13 , 3 , 0 1600
AIC (143) 13 , 3 , 0 1600 6 , 3 , 1 196 5 , 3 , 1 144 9 , 2 , 1 121
Constantineau (130) 13 , 3 , 0 1600 5 , 3 , 1 144 8 , 3 , 2 121 4 , 3 , 1 100
Neeley (127) 8 , 3 , 0 625 5 , 3 , 1 144 4 , 3 , 1 100 3 , 3 , 1 64
BIC (135) 8 , 3 , 0 625 5 , 3 , 1 144 4 , 3 , 1 100 3 , 3 , 1 64
Ignorance (127), Manor (127) 8 , 3 , 0 625 8 , 3 , 2 121 4 , 3 , 1 100 3 , 3 , 1 64
1 Data estimates times 10: max y = 10.0 , min y = 10.0 , max y = 10.0 , and φ = 7.0 ; 2 Data estimates times 10: max y = 27.5 , min y = 27.2 , max y = 27.5 , and φ = 8.6 ; 3 Data estimates times 10: max y = 50.1 , min y = 50.1 , max y = 47.3 , and φ = 12.2 ; 4 Data estimates times 10: max y = 81.5 , min y = 77.6 , max y = 81.5 , and φ = 21.2 .
In Figure A18 and Figure A19, the fitted C-spline models are given for the Neeley evidence and the Ignorance and Manor evidences, respectively. In Figure A19 there is a possible instance of slight under-fitting for a noise level σ n = 1 / 2 (i.e., column 2) by the model which is picked by the Ignorance and Manor evidences.
Figure A18. Sample size N = 15 , 000 and C-spline models of target function (A7) are picked by the Neeley evidence (129), for different noise levels and for a multiplication by a factor 10 of the data estimate of φ . Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A18. Sample size N = 15 , 000 and C-spline models of target function (A7) are picked by the Neeley evidence (129), for different noise levels and for a multiplication by a factor 10 of the data estimate of φ . Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g018
Figure A19. Sample size N = 15 , 000 and C-spline models of target function (A7) are picked by the Ignorance and Manorevidences, (127) and (128), for different noise levels and for a multiplication by a factor 10 of max y , max y , and min y . Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Figure A19. Sample size N = 15 , 000 and C-spline models of target function (A7) are picked by the Ignorance and Manorevidences, (127) and (128), for different noise levels and for a multiplication by a factor 10 of max y , max y , and min y . Columns correspond with noise levels σ n = 0 , 1 / 2 , 1 , and 2, respectively. Rows correspond with spline model, residual of spline model relative to target function, and cross sections of spline model (blue) and target function (black).
Entropy 19 00250 g019
The full outputs of the Bayesian model selection analyses of Table A5 and Table A6 for the Gaussian noise level of σ = 1 are given in Table A7, Table A8, Table A9 and Table A10, respectively.
Table A7. First half output of the model selection analysis for data sampled from target function (A7), sample size N = 15,000, Gaussian error of σ e = 1 , and straightforward data estimates of max y , min y , max y , and φ ; given are (internally) ranked logarithms of the discussed evidences, ranked sample error standard deviations (from low to high) and R-square values, number of parameters m, and spline model specifications (geometry, polynomial-order, and continuity-order).
Table A7. First half output of the model selection analysis for data sampled from target function (A7), sample size N = 15,000, Gaussian error of σ e = 1 , and straightforward data estimates of max y , min y , max y , and φ ; given are (internally) ranked logarithms of the discussed evidences, ranked sample error standard deviations (from low to high) and R-square values, number of parameters m, and spline model specifications (geometry, polynomial-order, and continuity-order).
IgnoranceManorNeeleyConstantineauBICAIC“Sure Thing”Error StdR-SquaremModel Specs
1−72,6451−72,6423−72,5115−72,4651−72,64923−72,26844−72,168441.00440.32100431
2−72,6562−72,6531−72,4931−72,4383−72,6717−72,21039−72,089391.00390.33121832
3−72,6603−72,6572−72,4982−72,4434−72,6759−72,21541−72,094411.00410.33121921
4−72,6654−72,6637−72,5329−72,4862−72,67027−72,28945−72,189451.00450.32100732
5−72,6925−72,6896−72,5306−72,4756−72,70820−72,24742−72,126421.00420.32121520
6−72,6926−72,69010−72,55813−72,5135−72,69630−72,31646−72,216461.01460.32100821
7−72,7067−72,7034−72,5133−72,44710−72,7361−72,18833−72,044331.00330.33144531
8−72,7118−72,7085−72,5184−72,45212−72,7413−72,19334−72,049341.00340.331441021
9−72,7189−72,7169−72,55612−72,5029−72,73424−72,27343−72,152431.00430.321211010
10−72,72210−72,72019−72,61325−72,5777−72,71737−72,40949−72,328491.01490.3181420
11−72,72911−72,72821−72,62126−72,5858−72,72538−72,41750−72,336501.01500.3181810
12−72,73412−72,73115−72,60020−72,55511−72,73832−72,35747−72,257471.01470.31100910
13−72,73613−72,7338−72,5437−72,47815−72,76711−72,21838−72,074381.00380.33144932
14−72,73814−72,73517−72,60421−72,55913−72,74233−72,36248−72,262481.01480.31100330
15−72,75215−72,74911−72,55910−72,49416−72,78314−72,23440−72,090401.00400.331441110
16−72,76816−72,76725−72,66028−72,62414−72,76540−72,45651−72,375511.02510.3081721
17−72,78717−72,78312−72,5618−72,48317−72,8352−72,19227−72,023270.99270.331691121
18−72,79818−72,79413−72,57211−72,49420−72,8475−72,20331−72,034310.99310.33169620
19−72,82119−72,81714−72,59414−72,51722−72,86913−72,22635−72,057351.00350.331691032
20−72,83020−72,82616−72,60316−72,52623−72,87815−72,23436−72,065361.00360.331691210
21−72,83221−72,82818−72,60517−72,52824−72,88017−72,23737−72,068371.00370.33169430
22−72,84722−72,84529−72,73933−72,70318−72,84344−72,53552−72,454521.02520.2981632
23−72,85323−72,85232−72,76834−72,74019−72,84446−72,60053−72,536531.03530.2964331
24−72,85924−72,85833−72,77435−72,74621−72,84947−72,60654−72,542541.03540.2964710
25−72,87625−72,87220−72,61315−72,52326−72,9464−72,20025−72,004250.99250.341961221
26−72,89026−72,88622−72,62818−72,53827−72,9608−72,21426−72,018260.99260.33196631
27−72,90227−72,89723−72,63919−72,54928−72,97212−72,22529−72,029290.99290.331961310
28−72,91329−72,91236−72,82839−72,80025−72,90449−72,66055−72,596551.03550.2864621
29−72,91328−72,90924−72,65022−72,56129−72,98318−72,23732−72,041320.99320.331961132
30−72,96930−72,96326−72,66723−72,56330−73,0636−72,20720−71,982200.99200.34225720
31−72,97731−72,97127−72,67524−72,57131−73,07210−72,21523−71,990230.99230.342251321
32−73,01932−73,01428−72,71727−72,61433−73,11422−72,25730−72,032300.99300.332251232
33−73,09133−73,08530−72,74729−72,63034−73,21519−72,24021−71,984210.99210.34256530
34−73,09934−73,09840−73,01442−72,98732−73,09053−72,84756−72,783561.05560.2664532
35−73,10635−73,10131−72,76330−72,64635−73,23121−72,25624−72,000240.99240.34256731
36−73,13336−73,12734−72,79032−72,67238−73,25825−72,28328−72,027280.99280.332561332
37−73,17937−73,17335−72,79131−72,65840−73,33616−72,23514−71,946140.99140.34289820
38−73,25038−73,24943−73,18547−73,16536−73,23855−73,05157−73,002571.06570.2449610
39−73,26339−73,26244−73,19848−73,17837−73,25156−73,06458−73,015581.06580.2449521
Table A8. Second half output of the model selection analysis for data sampled from target function (A7), sample size N = 15,000, Gaussian error of σ e = 1 , and straightforward data estimates of max y , min y , max y , and φ ; given are (internally) ranked logarithms of the discussed evidences, ranked sample error standard deviations (from low to high) and R-square values, number of parameters m, and spline model specifications (geometry, polynomial-order, and continuity-order).
Table A8. Second half output of the model selection analysis for data sampled from target function (A7), sample size N = 15,000, Gaussian error of σ e = 1 , and straightforward data estimates of max y , min y , max y , and φ ; given are (internally) ranked logarithms of the discussed evidences, ranked sample error standard deviations (from low to high) and R-square values, number of parameters m, and spline model specifications (geometry, polynomial-order, and continuity-order).
IgnoranceManorNeeleyConstantineauBICAIC“Sure Thing”Error StdR-SquaremModel Specs
40−73,27440−73,27345−73,20950−73,18939−73,26257−73,07659−73,027591.06590.2449320
41−73,35141−73,34437−72,91736−72,76843−73,54629−72,31222−71,988220.99220.34324831
42−73,38942−73,38749−73,32453−73,30341−73,37759−73,19060−73,141601.07600.2349230
43−73,39943−73,39850−73,33554−73,31442−73,38860−73,20161−73,152611.07610.2349432
44−73,42544−73,41638−72,94037−72,77344−73,65926−72,28412−71,923120.99120.34361920
45−73,44945−73,44039−72,96438−72,79845−73,68428−72,30915−71,948150.99150.34361630
46−73,62146−73,61241−73,08541−72,90047−73,90234−72,37919−71,979190.99190.34400931
47−73,68047−73,67042−73,08840−72,88352−74,00931−72,3309−71,88990.9890.354411020
48−73,78748−73,78656−73,73959−73,72546−73,77462−73,63762−73,601621.10620.1836510
49−73,86649−73,85546−73,21643−72,99256−74,25036−72,40711−71,923110.99110.34484730
50−73,91250−73,90148−73,26345−73,04057−74,29839−72,45418−71,970180.99180.344841031
51−73,95952−73,95859−73,91161−73,89748−73,94663−73,80963−73,773631.12630.1636231
52−73,95953−73,95860−73,92662−73,91649−73,94664−73,85165−73,826651.12650.1525220
53−73,96751−73,95547−73,25744−73,01158−74,40935−72,3947−71,86570.9870.355291120
54−73,98054−73,98061−73,94763−73,93850−73,96866−73,87266−73,847661.12660.1525410
55−74,00155−74,00062−73,95464−73,94051−73,98965−73,85264−73,816641.12640.1536421
56−74,10556−74,10563−74,05865−74,04453−74,09367−73,95667−73,920671.13670.1436332
57−74,19357−74,19265−74,16067−74,15154−74,18068−74,08568−74,060681.14680.1325321
58−74,22158−74,20852−73,44849−73,18364−74,72945−72,53617−71,960170.99170.345761131
59−74,24759−74,24766−74,21568−74,20555−74,23569−74,14069−74,115691.14690.1225232
60−74,27960−74,26551−73,44046−73,14968−74,85241−72,4726−718,4760.9860.356251220
61−74,32061−74,30653−73,48151−73,19169−74,89442−72,51410−71,889100.98100.35625830
62−74,43062−74,43067−74,41070−74,40459−74,42070−74,35970−74,343701.16700.0916221
63−74,46265−74,46271−74,44173−74,43562−74,45273−74,39173−74,375711.16730.0916130
64−74,46264−74,46270−74,44172−74,43561−74,45272−74,39172−74,375721.16720.0916131
65−74,46263−74,46269−74,44171−74,43560−74,45271−74,39171−74,375731.16710.0916132
66−74,47666−74,47572−74,45574−74,44963−74,46574−74,40474−74,388741.16740.0916310
67−74,55167−74,53555−73,64455−73,33271−75,20048−72,62616−71,950160.99160.346761231
68−74,57868−74,56154−73,59852−73,25772−75,29943−72,5233−71,79430.9830.357291320
69−74,78169−74,78174−74,77075−74,76665−74,77375−74,73975−74,730751.19750.049210
70−74,84572−74,84576−74,83477−74,83067−74,83777−74,80377−74,794761.20770.049120
71−74,84571−74,84575−74,83476−74,83066−74,83776−74,80376−74,794771.20760.049121
72−74,84970−74,83157−73,79656−73,43373−75,65750−72,6728−71,88880.9880.35784930
73−74,90373−74,88558−73,85157−73,48974−75,71451−72,72813−71,944130.99130.347841331
74−74,93274−74,93277−74,92778−74,92670−74,92878−74,91378−74,909781.20780.024110
75−75,37775−75,35564−74,08658−73,63875−76,46252−72,8025−71,84150.9850.359611030
76−75,96576−75,93868−74,41260−73,87276−77,37554−72,9734−71,81740.9840.3511561130
77−76,59177−76,56073−74,75166−74,11177−78,37458−73,1612−71,79220.9820.3513691230
78−77,23878−77,20178−75,08669−74,33578−79,44161−73,3491−71,74910.9810.3616001330
Table A9. First half output of the model selection analysis for data sampled from target function (A7), sample size N = 15,000, Gaussian error of σ e = 1 , and times 10 data estimates of max y , min y , max y , and φ ; given are (internally) ranked logarithms of the discussed evidences, ranked sample error standard deviations (from low to high) and R-square values, number of parameters m, and spline model specifications (geometry, polynomial-order, and continuity-order).
Table A9. First half output of the model selection analysis for data sampled from target function (A7), sample size N = 15,000, Gaussian error of σ e = 1 , and times 10 data estimates of max y , min y , max y , and φ ; given are (internally) ranked logarithms of the discussed evidences, ranked sample error standard deviations (from low to high) and R-square values, number of parameters m, and spline model specifications (geometry, polynomial-order, and continuity-order).
IgnoranceManorNeeleyConstantineauBICAIC“Sure Thing”Error StdR-SquaremModel Specs
1−72,8591−72,8561−72,7155−72,4651−72,64923−72,26844−72,168441.00440.32100431
2−72,8792−72,8772−72,7369−72,4862−72,67027−72,28945−72,189451.00450.32100732
3−72,8953−72,8937−72,77925−72,5777−72,71737−72,40949−72,328491.01490.3181420
4−72,9034−72,9008−72,78726−72,5858−72,72538−72,41750−72,336501.01500.3181810
5−72,9065−72,9035−72,76313−72,5135−72,69630−72,31646−72,216461.01460.32100821
6−72,9156−72,9113−72,7411−72,4383−72,6717−72,21039−72,089391.00390.33121832
7−72,9197−72,9164−72,7462−72,4434−72,6759−72,21541−72,094411.00410.33121921
8−72,9428−72,93914−72,82628−72,62414−72,76540−72,45651−-72,375511.02510.3081721
9−72,9489−72,94510−72,80420−72,55511−72,73832−72,35747−72,257471.01470.31100910
10−72,95110−72,9486−72,7786−72,4756−72,70820−72,24742−72,126421.00420.32121520
11−72,95211−72,94911−72,80921−72,55913−72,74233−72,36248−72,262481.01480.31100330
12−72,97712−72,9749−72,80412−72,5029−72,73424−72,27343−72,152431.00430.321211010
13−72,99013−72,98817−72,89834−72,74019−72,84446−72,60053−72,536531.03530.2964331
14−72,99614−72,99419−72,90435−72,74621−72,84947−72,60654−72,542541.03540.2964710
15−73,01515−73,01112−72,8093−72,44710−72,7361−72,18833−72,044331.00330.33144531
16−73,01916−73,01513−72,8134−72,45212−72,7413−72,19334−72,049341.00340.331441021
17−73,02017−73,01818−72,90433−72,70318−72,84344−72,53552−72,454521.02520.2981632
18−73,04518−73,04115−72,8397−72,47815−72,76711−72,21838−72,074381.00380.33144932
19−73,05019−73,04825−72,95839−72,80025−72,90449−72,66055−72,596551.03550.2864621
20−73,06020−73,05716−72,85410−72,49416−72,78314−72,23440−72,090401.00400.331441110
21−73,14921−73,14520−72,9078−72,48317−72,8352−72,19227−72,023270.99270.331691121
22−73,16022−73,15621−72,91811−72,49420−72,8475−72,20331−72,034310.99310.33169620
23−73,18323−73,17822−72,94114−72,51722−72,86913−72,22635−72,057351.00350.331691032
24−73,19124−73,18723−72,94916−72,52623−72,87815−72,23436−72,065361.00360.331691210
25−73,19325−73,18924−72,95117−72,52824−72,88017−72,23737−72,068371.00370.33169430
26−73,23626−73,23432−73,14442−72,98732−73,09053−72,84756−72,783561.05560.2664532
27−73,29627−73,29126−73,01615−72,52326−72,9464−72,20025−72,004250.99250.341961221
28−73,31028−73,30527−73,03018−72,53827−72,9608−72,21426−72,018260.99260.33196631
29−73,32229−73,31628−73,04119−72,54928−72,97212−72,22529−72,029290.99290.331961310
30−73,33330−73,32829−73,05222−72,56129−72,98318−72,23732−72,041320.99320.331961132
31−73,35431−73,35335−73,28447−73,16536−73,23855−73,05157−73,002571.06570.2449610
32−73,36732−73,36637−73,29748−73,17837−73,25156−73,06458−73,015581.06580.2449521
33−73,37933−73,37738−73,30850−73,18939−73,26257−73,07659−73,027591.06590.2449320
34−73,45034−73,44430−73,12823−72,56330−73,0636−72,20720−71,982200.99200.34225720
35−73,45935−73,45231−73,13724−72,57131−73,07210−72,21523−71,990230.99230.342251321
36−73,49336−73,49241−73,42353−73,30341−73,37759−73,19060−73,141601.07600.2349230
37−73,50137−73,49533−73,17927−72,61433−73,11422−72,25730−72,032300.99300.332251232
38−73,50438−73,50242−73,43354−73,31442−73,38860−73,20161−73,152611.07610.2349432
39−73,63939−73,63234−73,27329−72,63034−73,21519−72,24021−71,984210.99210.34256530
Table A10. Second half output of the model selection analysis for data sampled from target function (A7), sample size N = 15,000, Gaussian error of σ e = 1 , and times 10 data estimates of max y , min y , max y , and φ ; given are (internally) ranked logarithms of the discussed evidences, ranked sample error standard deviations (from low to high) and R-square values, number of parameters m, and spline model specifications (geometry, polynomial-order, and continuity-order).
Table A10. Second half output of the model selection analysis for data sampled from target function (A7), sample size N = 15,000, Gaussian error of σ e = 1 , and times 10 data estimates of max y , min y , max y , and φ ; given are (internally) ranked logarithms of the discussed evidences, ranked sample error standard deviations (from low to high) and R-square values, number of parameters m, and spline model specifications (geometry, polynomial-order, and continuity-order).
IgnoranceManorNeeleyConstantineauBICAIC“Sure Thing”Error StdR-SquaremModel Specs
40−73,65540−73,64836−73,28830−72,64635−73,23121−72,25624−72,000240.99240.34256731
41−73,68141−73,67439−73,31532−72,67238−73,25825−72,28328−72,027280.99280.332561332
42−73,79842−73,79040-73,38531−72,65840−73,33616−72,23514−71,946140.99140.34289820
43−73,86343−73,86246−73,81159−73,72546−73,77462−73,63762−73,601621.10620.1836510
44−74,01244−74,01148−73,97662−73,91649−73,94664−73,85164−73,826641.12640.1525220
45−74,03345−74,03351−73,99763−73,93850−73,96865−73,87265−73,847651.12650.1525410
46−74,03546−74,03449−73,98361−73,89748−73,94663−73,80963−73,773631.12630.1636231
47−74,04547−74,03643−73,58236−72,76843−73,54629−72,31222−71,988220.99220.34324831
48−74,15448−74,15352−74,10364−74,01752−74,06666−73,92966−73,893661.13660.1536421
49−74,19849−74,18844−73,68137−72,77344−73,65926−72,28412−71,923120.99120.34361920
50−74,22250−74,21245−73,70538−72,79845−73,68428−72,30915−71,948150.99150.34361630
51−74,24651−74,24554−74,21067−74,15154−74,18068−74,08568−74,060681.14680.1325321
52−74,25752−74,25653−74,20666−74,12053−74,16967−74,03267−73,996671.13670.1336332
53−74,30053−74,29957−74,26468−74,20555−74,23569−74,14069−74,115691.14690.1225232
54−74,46454−74,46459−74,44170−74,40459−74,42070−74,35970−74,343701.16700.0916221
55−74,47855−74,46747−73,90541−72,90047−73,90234−72,37919−71,979190.99190.34400931
56−74,49658−74,49662−74,47373−74,43562−74,45273−74,39173−74,375711.16730.0916130
57−74,49657−74,49661−74,47372−74,43561−74,45272−74,39172−74,375721.16720.0916131
58−74,49656−74,49660−74,47371−74,43560−74,45271−74,39171−74,375731.16710.0916132
59−74,50959−74,50963−74,48774−74,44963−74,46574−74,40474−74,388741.16740.0916310
60−74,62560−74,61350−73,99440−72,88351−74,00931−72,3309−71,88990.9890.354411020
61−74,80061−74,80067−74,78775−74,76665−74,77375−74,73975−74,730751.19750.049210
62−74,86463−74,86469−74,85177−74,83067−74,83777−74,80377−74,794761.20770.049120
63−74,86462−74,86468−74,85176−74,83066−74,83776−74,80376−74,794771.20760.049121
64−74,90364−74,89055−74,21043−72,99256−74,25036−72,40711−71,923110.99110.34484730
65−74,94166−74,94170−74,93578−74,92670−74,92878−74,91378−74,909781.20780.024110
66−74,94965−74,93556−74,25645−73,04057−74,29839−72,45418−71,970180.99180.344841031
67−75,10167−75,08658−74,34444−73,01158−74,40935−72,3947−71,86570.9870.355291120
68−75,45568−75,43964−74,63049−73,18364−74,72945−72,53617−71,960170.99170.345761131
69−75,61969−75,60265−74,72446−73,14968−74,85241−72,4726−71,84760.9860.356251220
70−75,65970−75,64266−74,76551−73,19169−74,89442−72,51410−71,889100.98100.35625830
71−75,99971−75,98071−75,03155−73,33271−75,20048−72,62616−71,950160.99160.346761231
72−76,14172−76,12172−75,09752−73,25772−75,29943−72,5233−71,79430.9830.357291320
73−76,52973−76,50873−75,40756−73,43373−75,65750−72,6728−71,88880.9880.35784930
74−76,58374−76,56174−75,46157−73,48974−75,71451−72,72813−71,944130.99130.347841331
75−77,43675−77,41075−76,06158−73,63875−76,46252−72,8025−71,84150.9850.359611030
76−78,44376−78,41276−76,78960−73,87276−77,37554−72,9734−71,81740.9840.3511561130
77−79,52677−79,48977−77,56765−74,11177−78,37458−73,1612−71,79220.9820.3513691230
78−80,66878−80,62578−78,37969−74,33578−79,44161−73,3491−71,74910.9810.3616001330

Appendix B. Introducing C-Splines

Appendix B.1. A Simple Trivariate C-Spline Model

If we have predictors from a three dimensional domain x , y , z , where 0 x , y , z 1 , and a corresponding dependent variable v, then the simplest non-trivial spline model is the model which partitions the cube of the three dimensional domain in 2 × 2 × 2 = 8 sub-cubes, has polynomial order 1 with no interactions, that is,
f x , y , z = 1 + x + y + z ,
and continuity of order 0 (i.e., piecewise polynomials themselves need to connect, but not their derivatives.) It is found that this particular spline model corresponds with the C-spline basis B C [5]:
B C u x , y , z = 1 x 0 . 5 x 0 . 5 y 0 . 5 y 0 . 5 z 0 . 5 z 0 . 5 , u = 1 , 1 0 x 0 . 5 y 0 . 5 y 0 . 5 z 0 . 5 z 0 . 5 , u = 2 , 1 x 0 . 5 x 0 . 5 0 y 0 . 5 z 0 . 5 z 0 . 5 , u = 3 , 1 0 x 0 . 5 0 y 0 . 5 z 0 . 5 z 0 . 5 , u = 4 , 1 x 0 . 5 x 0 . 5 y 0 . 5 y 0 . 5 0 z 0 . 5 , u = 5 , 1 0 x 0 . 5 y 0 . 5 y 0 . 5 0 z 0 . 5 , u = 6 , 1 x 0 . 5 x 0 . 5 0 y 0 . 5 0 z 0 . 5 , u = 7 , 1 0 x 0 . 5 0 y 0 . 5 0 z 0 . 5 , u = 8 .
where each of the rows u correspond with a particular sub-domain in 0 x , y , z 1 .
Let i , j , and k be the x-, y-, and z-axis sub-domain coordinates, respectively. Then we have that the row number u of B C is the following function of the sub-domain coordinates
u i , j , k = i + j 1 2 + k 1 4 ,
where the coordinates i , j , k for a given sub-domain can be found as
i x = 1 , x 0 . 5 , 2 , x > 0 . 5 , j y = 1 , y 0 . 5 , 2 , y > 0 . 5 , k z = 1 , z 0 . 5 , 2 , z > 0 . 5 .
Now, if we have a data set with sample size N, then we may go iteratively through this data set, as we determine for each entry in the predictor matrix x n , y n , z n the corresponding coordinates i n , j n , k n , by way of (A11):
i n , j n , k n = i x n , j y n , k z n
These coordinates then map to the row u n , by way of (A10):
u n = u i n , j n , k n .
We then substitute the values x n , y n , z n into the vector B C u n x , y , z , which gives us B C u n x n , y n , z n . Finally, we set the nth row of the spline predictor matrix B ˜ C to
B ˜ C n = B C u n x n , y n , z n .
As we follow this procedure for n = 1 , , N , we end up with a N × 7 spline predictor matrix B ˜ C . If we regress this spline predictor matrix on the dependent variable vector v , we obtain the spline regression coefficients
β ^ = B ˜ C T B ˜ C 1 B ˜ C T v .
If we combine the functions (A10) and (A11), so as to obtain the sub-domain number directly as a function of x, y, and z,
q x , y , z = u i x , j y , k z ,
then the C-spline model on the domain 0 x , y , z 1 for the expected value (156), with a 2-by-2-by-2 geometry, a polynomial order 1 with no interactions, and continuity order 0 may be written down as the inner product, (A9) and (A12),
f x , y , z = B C q x , y , z x , y , z · β ^ .
In Figure A20 we give a demonstration of the spline equivalent (A14) of the polynomial (A8), by way of the spline basis (A9), where we (arbitrarily and as a reference for the reader) let
β ^ = 3 . 22574 , 6 . 50497 , 0 . 378211 , 4 . 29487 , 3 . 68232 , 3 . 41941 , 3 . 1923 .
Figure A20. Example of the trivariate C-spline model for (A8), for z = 0 . 5 , y = 0 . 5 , and x = 0 . 5 , respectively.
Figure A20. Example of the trivariate C-spline model for (A8), for z = 0 . 5 , y = 0 . 5 , and x = 0 . 5 , respectively.
Entropy 19 00250 g020
Note that 8 trivariate piecewise polynomials of order 1 with no interactions ordinarily would make for m = 8 × 4 = 32 parameters, whereas just the one trivariate piecewise polynomial (A8) over the total unpartitioned domain makes for m = 4 parameters. Seeing that (A9) consists of m = 7 parameters, it follows that the constraint for connectedness of the polynomials has incurred a cost of
32 7 = 25
free parameters relative to the unconstrained case, or, alternatively, a gain of
7 4 = 3
free parameters relative to the case where one polynomial is defined over the whole of the domain.

Appendix B.2. Enforcing Connectivity

The sub-domains
D 1 : 0.5 < x , y 1 and 0 z 0.5 , D 2 : 0.5 < x , y 1 and 0.5 < z 1 ,
connect at z = 0 . 5 . The sub-domains D 1 and D 2 are associated with the sub-domain numbers q x , y , z = 4 and q x , y , z = 8 , respectively, (A13). It follows that D 1 and D 2 have corresponding C-spline basis vectors (A9)
B C 4 x , y , z = 1 0 x 0 . 5 0 y 0 . 5 z 0 . 5 z 0 . 5
and
B C 8 x , y , z = 1 0 x 0 . 5 0 y 0 . 5 0 z 0 . 5 .
If we approach the z-boundary of the 4th and the 8th sub-domain, or, equivalent, if we let z 0 . 5 in the domains D 1 and D 2 , from below and above, respectively, then it may be checked that the above C-spline basis vectors converge to the same vector:
lim z 0 . 5 B C 4 x , y , z = 1 0 x 0 . 5 0 y 0 . 5 0 0
and
lim z 0 . 5 + B C 8 x , y , z = 1 0 x 0 . 5 0 y 0 . 5 0 0 + .
It follows that the C-spline model (A14) will connect at the z-boundary of the sub-domains D 1 and D 2 , for any regression coefficient vector β ^ , as the z-boundary is crossed from below and the inner product goes from
B C 4 x , y , 0 . 5 · β ^
to
B C 8 x , y , 0 . 5 + · β ^ ,
and vise versa. It may be checked that this holds for all possible boundary crossings in the domain 0 x , y , z 1 . Stated differently, the C-spline model (A14) enforces the piecewise polynomials to connect at their domain boundaries by way of its C-spline basis (A9).

Appendix B.3. Adding Polynomial Interaction and Power Terms

Now, the C-spline model (A14) enforces the piecewise polynomials to connect at their domain boundaries by way of its C-spline basis (A9). So, it follows that any product of the columns in (A9) must also enforce this connectedness; to be more specific, if we want to introduce an interaction between x and y, then we just need to multiply the two x-columns with the two y-columns of (A9) in the following manner:
x y = x 0 . 5 · y 0 . 5 x 0 . 5 · y 0 . 5 x 0 . 5 · y 0 . 5 x 0 . 5 · y 0 . 5 0 · y 0 . 5 0 · y 0 . 5 x 0 . 5 · y 0 . 5 x 0 . 5 · y 0 . 5 x 0 . 5 · 0 x 0 . 5 · y 0 . 5 x 0 . 5 · 0 x 0 . 5 · y 0 . 5 0 · 0 0 · y 0 . 5 x 0 . 5 · 0 x 0 . 5 · y 0 . 5 x 0 . 5 · y 0 . 5 x 0 . 5 · y 0 . 5 x 0 . 5 · y 0 . 5 x 0 . 5 · y 0 . 5 0 · y 0 . 5 0 · y 0 . 5 x 0 . 5 · y 0 . 5 x 0 . 5 · y 0 . 5 x 0 . 5 · 0 x 0 . 5 · y 0 . 5 x 0 . 5 · 0 x 0 . 5 · y 0 . 5 0 · 0 0 · y 0 . 5 x 0 . 5 · 0 x 0 . 5 · y 0 . 5 = x 0 . 5 y 0 . 5 x 0 . 5 y 0 . 5 x 0 . 5 y 0 . 5 x 0 . 5 y 0 . 5 0 0 x 0 . 5 y 0 . 5 x 0 . 5 y 0 . 5 0 x 0 . 5 y 0 . 5 0 x 0 . 5 y 0 . 5 0 0 0 x 0 . 5 y 0 . 5 x 0 . 5 y 0 . 5 x 0 . 5 y 0 . 5 x 0 . 5 y 0 . 5 x 0 . 5 y 0 . 5 0 0 x 0 . 5 y 0 . 5 x 0 . 5 y 0 . 5 0 x 0 . 5 y 0 . 5 0 x 0 . 5 y 0 . 5 0 0 0 x 0 . 5 y 0 . 5
or, equivalently, as any linear combination of columns also will adhere to the constraint of connectivity,
x y = x 0 . 5 y 0 . 5 0 0 0 0 x 0 . 5 y 0 . 5 0 0 0 0 x 0 . 5 y 0 . 5 0 0 0 0 x 0 . 5 y 0 . 5 x 0 . 5 y 0 . 5 0 0 0 0 x 0 . 5 y 0 . 5 0 0 0 0 x 0 . 5 y 0 . 5 0 0 0 0 x 0 . 5 y 0 . 5
And it may be checked that the addition of these columns to the spline basis (A9) will still result in an enforcement of the constraint of connectedness. (Similarly, one may also row reduce the spline basis (A9), should one wish to do so.)
By way of induction, it follows that the number of columns in the introduction of the spline polynomials interactions x y , x z , y z , and x y z to (A8),
f x , y , z = 1 + x + y + z + x y + x z + y z + x y z ,
will result in a spline basis which has
m = 7 + 2 2 + 2 2 + 2 2 + 2 3 = 27
free parameters. In Figure A21 we give a demonstration of the added flexibility of the spline equivalent (A14) of the polynomial (A16) for a random β ^ .
Figure A21. Example of the trivariate C-spline model for (A16), for z = 0 . 5 , y = 0 . 5 , and x = 0 . 5 , respectively.
Figure A21. Example of the trivariate C-spline model for (A16), for z = 0 . 5 , y = 0 . 5 , and x = 0 . 5 , respectively.
Entropy 19 00250 g021
Also, the term x k may be simply constructed as by taking the kth power of the two x-columns with the two y-columns of (A9)
x 2 = x 0 . 5 2 x 0 . 5 2 0 x 0 . 5 2 x 0 . 5 2 x 0 . 5 2 0 x 0 . 5 2 x 0 . 5 2 x 0 . 5 2 0 x 0 . 5 2 x 0 . 5 2 x 0 . 5 2 0 x 0 . 5 2
It follows that the addition of x 2 , y 2 , and z 2 and the subtraction of the term x y z to (A16)
f x , y , z = 1 + x + x 2 + y + y 2 + z + z 2 + x y + x z + y z ,
will result in a spline basis which has
m = 27 + 2 + 2 + 2 8 = 25
free parameters. In Figure A22 we give a demonstration of the added flexibility of the spline equivalent (A14) of the polynomial (A17) for a random β ^ .
Figure A22. Example of the trivariate C-spline model for (A17), for z = 0 . 5 , y = 0 . 5 , and x = 0 . 5 , respectively.
Figure A22. Example of the trivariate C-spline model for (A17), for z = 0 . 5 , y = 0 . 5 , and x = 0 . 5 , respectively.
Entropy 19 00250 g022

References

  1. Jaynes, E.T. Prior Probabilities. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 227–241. [Google Scholar] [CrossRef]
  2. Van Erp, H.R.N.; Linger, R.O.; van Gelder, P.H.A.J.M. Deriving Proper Uniform Priors for Regression Coefficients, Part II. AIP Conf. Proc. 2011, 1305, 101. [Google Scholar]
  3. Skilling, J. This Physicist’s View of Gelman’s Bayes. Bayesian Anal. 2008. Available online: http://www.stat.columbia.edu/gelman/stuff_for_blog/rant2.pdf (accessed on 27 April 2017).
  4. Van Erp, H.R.N.; van Gelder, P.H.A.J.M. Deriving Proper Uniform Priors for Regression Coefficients. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering; Mohammad-Djafari, A., Bercher, J., Bessiere, P., Eds.; American Institute of Physics: College Park, MD, USA, 2012; pp. 101–106. [Google Scholar]
  5. Van Erp, H.R.N.; Linger, R.O.; van Gelder, P.H.A.J.M. Constructing Cartesian Splines. arXiv, 2014; arXiv:1409.5955v1. [Google Scholar]
  6. Zellner, A. An Introduction to Bayesian Inference in Econometrics; J. Wiley & Sons, Inc.: New York, NY, USA, 1971. [Google Scholar]
  7. MacKay, D.J.C. Information Theory, Inference, and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
  8. Knuth, K.H.; Habeck, M.; Malakar, N.K.; Mubeen, A.M.; Placek, B. Bayesian Evidence and Model Selection. arXiv, 2014; arXiv:1411.3013v1. [Google Scholar]
  9. Bretthorst, L.G. Bayesian Spectrum Analysis and Parameter Estimation; Springer: New York, NY, USA, 1988; Volume 48. [Google Scholar]
  10. Lay, D.C. Linear Algebra and Its Applications; Addison-Wesley Publishing Company: Boston, MA, USA, 2000. [Google Scholar]
  11. Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
  12. Lindgren, B.W. Statistical Theory; Chapman & Hall, Inc.: New York, NY, USA, 1993. [Google Scholar]
  13. Kass, R.; Raftery, A. Bayes Factors. J. Am. Stat. Assoc. 1995, 90, 773–795. [Google Scholar] [CrossRef]
  14. Linger, R.O.; van Erp, H.R.N.; van Gelder, P.H.A.J.M. Constructing Explicit B-Spline Bases. arXiv, 2014; arXiv:1409.3824v1. [Google Scholar]
  15. Ivakhenko, A.G. Group Method of Data Handling—A Rival of the Method of Stochastic Approximation. Sov. Autom. Control 1966, 13, 43–71. [Google Scholar]
  16. Awanou, G. Energy Methods in 3D Spline Approximations of Navier-Stokes Equations. Ph.D. Thesis, University of Georgia, Athens, GA, USA, 2003. [Google Scholar]
  17. MacKay, D.J.C. Bayesian Non-Linear Modelling with Neural Networks. University of Cambridge Programme for Industry, Modelling Phase Transformations in Steels, 1995. Available online: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.1325 (accessed on 27 April 2017).

Share and Cite

MDPI and ACS Style

Erp, H.R.N.v.; Linger, R.O.; Gelder, P.H.A.J.M.v. Deriving Proper Uniform Priors for Regression Coefficients, Parts I, II, and III . Entropy 2017, 19, 250. https://0-doi-org.brum.beds.ac.uk/10.3390/e19060250

AMA Style

Erp HRNv, Linger RO, Gelder PHAJMv. Deriving Proper Uniform Priors for Regression Coefficients, Parts I, II, and III . Entropy. 2017; 19(6):250. https://0-doi-org.brum.beds.ac.uk/10.3390/e19060250

Chicago/Turabian Style

Erp, H.R. Noel van, Ronald. O. Linger, and Pieter H.A.J.M. van Gelder. 2017. "Deriving Proper Uniform Priors for Regression Coefficients, Parts I, II, and III " Entropy 19, no. 6: 250. https://0-doi-org.brum.beds.ac.uk/10.3390/e19060250

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop