Next Article in Journal
Some New Integral Identities for Solenoidal Fields and Applications
Previous Article in Journal
One-Dimensional Nonlinear Stefan Problems in Storm’s Materials
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Folded Normal Distribution

1
School of Mathematical Sciences, University of Nottingham, NG7 2RD, UK
2
School of Business and Economics, TEI of Ionian Islands, 31100 Lefkada, Greece
3
Statistical Research Centre, Executive Business Centre, Bournemouth University, BH8 8EB, UK
*
Author to whom correspondence should be addressed.
Submission received: 10 October 2013 / Revised: 26 January 2014 / Accepted: 26 January 2014 / Published: 14 February 2014

Abstract

:
The characteristic function of the folded normal distribution and its moment function are derived. The entropy of the folded normal distribution and the Kullback–Leibler from the normal and half normal distributions are approximated using Taylor series. The accuracy of the results are also assessed using different criteria. The maximum likelihood estimates and confidence intervals for the parameters are obtained using the asymptotic theory and bootstrap method. The coverage of the confidence intervals is also examined.

1. Introduction

Mainly studied in the 1960s, the folded normal distribution is a special case of the Gaussian distribution occurring when the sign of the variable is always positive. In 1961, a method of estimating the parameters based upon the estimating equations of the moments was discussed in [1], where they also gave some examples of its applications in the industrial sector. The folded normal distribution was used to study the magnitude of deviation of an automobile strut alignment [2]. The properties of the multivariate folded normal distribution with its possible applications were studied in [3]. In addition, tables with probabilities for a range of values of the vector of parameters were provided, and an application of the model with real data was illustrated. An alternative method using the second and fourth moments of the distribution was proposed in [4], whilst [5] performed maximum likelihood estimation and calculated the asymptotic information matrix. Thereafter, the sequential probability ratio test for the null hypothesis of the location parameter being zero against a specific alternative was evaluated in [6] with the idea of illustrating the use of cumulative sum control charts for multiple observations.
In [7], the author dealt with the hypothesis testing of the zero location parameter regardless of the variance being known or not. The distribution formed by the ratio of two folded normal variables was studied and illustrated with a few applications in [8]. The folded normal distribution has been applied to many practical problems. For instance, introduced in [9] is an economic model to determine the process specification limits for folded normally distributed data.
Through this paper, we will examine the folded normal distribution from a different perspective. In the process, we will consider the study of some of its properties, namely the characteristic and moment generating functions, the Laplace and Fourier transformations and the mean residual life of this distribution. The entropy of this distribution and its Kullback–Leibler divergence from the normal and half normal distributions will be approximated via the Taylor series. The accuracy of the approximations are assessed using numerical examples.
Also reviewed here is the maximum likelihood estimates (for an introduction, see [1]), with examples from simulated data given for illustration purposes. Simulation studies will be performed to assess the validity of the estimates with and without bootstrap calibration in low sample cases. Numerical optimization of the log-likelihood will be carried out using the simplex method [10].

2. The Folded Normal

The folded normal distribution with parameters μ , σ 2 stems from taking the absolute value of a normal distribution with the same vector of parameters. The density of Y, with Y N μ , σ 2 is given by:
f y = 1 2 π σ 2 e 1 2 σ 2 y μ 2
Thus, X = Y , denoted by Y F N μ , σ 2 , has the following density:
f x = 1 2 π σ 2 e 1 2 σ 2 x μ 2 + e 1 2 σ 2 x + μ 2
The density can be written in a more attractive form [5]:
f x = 2 π σ 2 e x 2 + μ 2 2 2 σ 2 cosh μ x σ 2
and by expanding the c o s h via a Taylor series, we can also write the density as:
f x = 2 π σ 2 e x 2 + μ 2 2 2 σ 2 n = 0 1 n 2 n ! μ x σ 2 2 n
We can see that the folded normal distribution is not a member of the exponential family. The cumulative distribution can be written as:
F x = 1 2 e r f x μ 2 σ 2 + e r f x + μ 2 σ 2
where e r f is the error function:
e r f x = 2 p i 0 x e t 2 d x
The mean and the variance of Equation (2) is calculated using direct calculation of the integrals as follows [1]:
μ f = 2 π σ e μ 2 2 σ 2 + μ 1 2 Φ μ σ
σ f 2 = μ 2 + σ 2 μ f 2
where Φ . is the cumulative distribution function of the standard normal distribution. The third and fourth moments about the origin are calculated in [4]. We develop the calculation further by providing the characteristic function and the moment generating function of Equation (2). Figure (1) shows the densities of the folded normal for some parameter values.
Figure 1. The black line is the density of the N μ , σ 2 and the red line of the F N μ , σ 2 . The parameters in the left figure (a) are μ = 2 and σ 2 = 3 and in the right figure (b) μ = 2 and σ 2 = 4 .
Figure 1. The black line is the density of the N μ , σ 2 and the red line of the F N μ , σ 2 . The parameters in the left figure (a) are μ = 2 and σ 2 = 3 and in the right figure (b) μ = 2 and σ 2 = 4 .
Mathematics 02 00012 g001

2.1. Relations to Other Distributions

The distribution of Z = X / σ is a non-central χ distribution with one degree of freedom and non-centrality parameter equal to ( μ / σ ) 2 [11]. It is clear that when μ = 0 , a central χ 1 is obtained. The half normal distribution is a special case of Equation (2), with μ = 0 for which [12] showed that it is the limiting form of the folded (central) t distribution as the degrees of freedom of the latter go to infinity. Both distributions are further developed in the bivariate case in [13].
The folded normal distribution can also be seen as the the limit of the folded non-standardized t distribution as the degrees of freedom go to infinity. The folded non-standardized t distribution is the distribution of the absolute value of the non-standardized t distribution with v degrees of freedom:
g x = Γ v + 1 2 Γ v 2 v π σ 2 1 + 1 v x μ 2 σ 2 v + 1 2 + 1 + 1 v x + μ 2 σ 2 v + 1 2

2.2. Mode of the Folded Normal Distribution

The mode of the distribution is the value of x for which the density is maximised. In order to find this value, we take the first derivative of the density with respect to x and set it equal to zero. Unfortunately, there is no closed form. We can, however, write the derivative in a better way and end up with a non-linear equation.
d f x d x = 0 x μ σ 2 e 1 2 x μ 2 σ 2 x + μ σ 2 e 1 2 x + μ 2 σ 2 = 0
x e 1 2 x μ 2 σ 2 + e 1 2 x + μ 2 σ 2 μ e 1 2 x μ 2 σ 2 e 1 2 x + μ 2 σ 2 = 0
x 1 + e 2 μ x σ 2 μ 1 e 2 μ x σ 2 = 0
μ + x e 2 μ x σ 2 = μ x
x = σ 2 2 μ log μ x μ + x
We saw from numerical investigation that when μ < σ , the maximum is met when x = 0 . When μ σ , the maximum is met at x > 0 , and when μ becomes greater than 3 σ , the maximum approaches μ. This is of course something to be expected, since, in this case, the folded normal converges to the normal distribution.

2.3. Characteristic Function and Other Related Functions of the Folded Normal Distribution

Forms for the higher moments of the distribution when the moment is an odd and even number is provided in [4]. Here, we derive its characteristic and, thus, the moment generating function.
φ x t = E e i t X = 0 e i t x f X x d x = 0 e i t x 1 2 π σ 2 e 1 2 σ 2 x μ 2 + e 1 2 σ 2 x + μ 2 d x = 0 e i t x 1 2 σ 2 x μ 2 2 π σ 2 d x + 0 e i t x 1 2 σ 2 x + μ 2 2 π σ 2 d x
= 0 e A 2 π σ 2 d x + 0 e B 2 π σ 2 d x
We will work now with the forms A and B.
A = i t x 1 2 σ 2 x μ 2 = 2 i σ 2 t x x 2 + 2 μ x μ 2 2 σ 2 = x 2 2 x i σ 2 t + μ + μ 2 2 σ 2
= x i σ 2 t + μ 2 + σ 4 t 2 2 i σ 2 t μ 2 σ 2 = x a 2 2 σ 2 σ 2 t 2 2 + i μ t
where a = i σ 2 t + μ . Thus, the first part of Equation (15) becomes:
0 e A 2 π σ 2 d x = e σ 2 t 2 2 + i μ t 0 e x α 2 2 π σ 2 d x = e σ 2 t 2 2 + i μ t 1 P X 0
= e σ 2 t 2 2 + i μ t 1 Φ a σ = e σ 2 t 2 2 + i μ t 1 Φ μ σ i σ t
The second exponent, B, using similar calculations becomes:
B = i t x 1 2 σ 2 x + μ 2 = x i σ 2 t μ 2 2 σ 2 σ 2 t 2 2 i μ t
and, thus, the second part of Equation (15) becomes:
0 e B 2 π σ 2 d x = e σ 2 t 2 2 i μ t 1 Φ μ σ i σ t
Finally, the characteristic function becomes:
φ x t = e σ 2 t 2 2 + i μ t 1 Φ μ σ + i σ t + e σ 2 t 2 2 i μ t 1 Φ μ σ + i σ t
Below, we list some more functions that include expectations.
  • The moment generating function of Equation (2) exists and is equal to:
    M x t = φ x i t = e σ 2 t 2 2 + μ t 1 Φ μ σ σ t + e σ 2 t 2 2 μ t 1 Φ μ σ σ t
    We can see that the characteristic generating function can be differentiated infinitely many times, since the first derivative contains the density of the normal distribution, and thus, it always contains some exponential terms. The folded normal distribution is not a stable distribution. That is, the distribution of the sum of its random variables do not form a folded normal distribution. We can see this from the characteristic (or the moment) generating function Equation (22) or Equation (23).
  • The cumulant generating function is simply the logarithm of the moment generating function:
    K x t = log M x t = σ 2 t 2 2 + μ t log 1 Φ μ σ σ t + e 2 μ t 1 Φ μ σ σ t
  • The Laplace transformation can easily be derived from the moment generating function and is equal to:
    E e t x = e σ 2 t 2 2 μ t 1 Φ μ σ + σ t + e σ 2 t 2 2 + μ t 1 Φ μ σ + σ t
  • The Fourier transformation is:
    f ^ t = e 2 π i x t f x d x = E e 2 π i X t
    However, this is closely related to the characteristic function. We can see that E e 2 π i x t = ϕ x 2 π t . Thus, Equation (26) becomes:
    f ^ t = ϕ x 2 π t = e 4 π 2 σ 2 t 2 2 i 2 π μ t 1 Φ μ σ i 2 π σ t
    + e 4 π 2 σ 2 t 2 2 + i 2 π μ t 1 Φ μ σ i 2 π σ t
  • The mean residual life is given by:
    E X t | X > t = E X | X > t t
    where t R + . The above conditional expectation is given by:
    E X | X > t = t x f x P x > t d x = t x f x 1 F t d x
    The denominator in Equation (30) is written as 1 1 2 e r f x μ 2 σ 2 + e r f x + μ 2 σ 2 . The contents within the integral in the numerator of Equation (30) could be replaced by 1 F t , as well, but we will not replace it. The calculation of the numerator is done in the same way as the calculation of the mean. Thus:
    t x f x d x = t x 1 2 π σ 2 e 1 2 σ 2 x μ 2 d x + t x 1 2 π σ 2 e 1 2 σ 2 x + μ 2 d x
    = σ 2 π e t μ 2 σ 2 + μ 1 Φ t μ σ + σ 2 π e t μ 2 σ 2 μ Φ t μ σ
    = 2 π σ e t μ 2 σ 2 + μ 1 2 Φ t μ σ
    Finally, Equation (30) can be written as:
    E X t | X > t = 2 π σ e t μ 2 σ 2 + μ 1 2 Φ t μ σ 1 1 2 e r f x μ 2 σ 2 + e r f x + μ 2 σ 2 t

3. Entropy and Kullback–Leibler Divergence

When studying a distribution, the entropy and the Kullback–Leibler divergence from some other distributions are two measures that have to be calculated. In this case, we tried to approximate both of these quantities using a Taylor series. Numerical examples are displayed to show the performance of the approximations.

3.1. Entropy

The entropy is defined as the negative expectation of log f x .
E = E log f x = 0 log f x f x d x = 0 f x log 1 2 π σ 2 e 1 2 σ 2 x μ 2 + e 1 2 σ 2 x + μ 2 d x = log 2 π σ 2 0 f x d x 0 f x log e x μ 2 2 σ 2 1 + e x + μ 2 2 σ 2 e x μ 2 2 σ 2 d x = log 2 π σ 2 + 0 x 2 2 μ x + μ 2 2 σ 2 f x 0 f x log 1 + e 2 μ x σ 2 d x
Let us now take the second term of Equation (35) and see what is equal to:
1 2 σ 2 0 x 2 f x = μ 2 + σ 2 2 σ 2 by exploiting the knowledge of variance Equation ( 8 )
2 μ 2 σ 2 0 x f x = μ μ f σ 2 since the first moment is given in Equation ( 7 ) and
μ 2 2 σ 2 0 f x = μ 2 2 σ 2
Finally, the third term of Equation (35) is equal to:
A n = 0 f x n = 1 1 n + 1 n e 2 n μ x σ 2 d x
by making use of the Taylor expansion for log 1 + x around zero, but instead of x, we have e 2 μ x σ 2 . Thus, we have managed to “break” the second integral of entropy Equation (35) down to smaller pieces of:
A n = 0 n = 1 1 n + 1 n e a n x 1 2 π σ 2 e 1 2 σ 2 x μ 2 d x 0 n = 1 1 n + 1 n e a n x 1 2 π σ 2 e 1 2 σ 2 x + μ 2 d x = n = 1 1 n + 1 n e μ + a n σ 2 2 μ 2 2 σ 2 1 Φ μ σ a n σ n = 1 1 n + 1 n e μ a n σ 2 2 μ 2 2 σ 2 1 Φ μ σ a n σ
by interchanging the order of the summation and the integration, filling up the square in the same way to the characteristic function and with a n = 2 n μ σ 2 . The final form of the entropy is given in Equation (40):
E log 2 π σ 2 + 1 2 + μ 2 μ μ f σ 2 n = 1 1 n + 1 n e μ 2 n μ 2 μ 2 2 σ 2 1 Φ μ σ + 2 n μ σ 2 σ n = 1 1 n + 1 n e μ 2 n μ 2 μ 2 2 σ 2 1 Φ μ σ + 2 n μ σ 2 σ
Figure 2 shows the true value of Equation (40), when σ = 5 and μ ranges from zero to 25, thus for values of θ = μ σ from zero to five. The true value was calculated using numerical integration. Rprovides this option with the command integrate. The second and third order approximations (using the first two and three terms of the infinite sums in Equation (40)), are also displayed for comparison.
Figure 2. Entropy values for a range of values of θ = μ σ with σ = 1 (a) and σ = 5 (b).
Figure 2. Entropy values for a range of values of θ = μ σ with σ = 1 (a) and σ = 5 (b).
Mathematics 02 00012 g002
We can see that the second order approximation is not as good as the third order, especially for small values of θ. The Taylor approximation of Equation (40) is valid when the value, a n , is close to zero. As with the logarithm approximation, the expansion is around zero; thus, when we start going further away from zero, the approximation loses its accuracy. The same is true in our case. When the values of θ are small, then the value of log 1 + e 2 μ x σ 2 is far from zero. As θ increases, and, thus, the exponential term decreases, the Taylor series approximates true value better. This is why we see a small discrepancy of the approximations on the left of Figure 2, which become negligible later on.

3.2. Kullback–Leibler Divergence from the Normal Distribution

The Kullback–Leibler divergence [14] of one distribution from another in general is defined as the expectation of the logarithm of the ratio of the two distributions with respect to the first one:
K L f | g = E f log f g = f x log f x g x d x
The divergence of the folded normal distribution from the normal distribution is equal to:
K L ( F N | | N ) = 0 1 2 π σ 2 e 1 2 σ 2 x μ 2 + e 1 2 σ 2 x + μ 2 log 1 2 π σ 2 e 1 2 σ 2 x μ 2 + e 1 2 σ 2 x + μ 2 1 2 π σ 2 e 1 2 σ 2 x μ 2 d x = 0 1 2 π σ 2 e 1 2 σ 2 x μ 2 + e 1 2 σ 2 x + μ 2 log 1 + e 2 μ x σ 2 d x
which is the same as the second integral of Equation (35). Thus, we can approximate this divergence by the same Taylor series:
K L ( F N | | N ) n = 1 1 n + 1 n e μ 2 n μ 2 μ 2 2 σ 2 1 Φ μ σ + 2 n μ σ 2 σ + n = 1 1 n + 1 n e μ 2 n μ 2 μ 2 2 σ 2 1 Φ μ σ + 2 n μ σ 2 σ
Figure 3. Kullback–Leibler divergence from the normal for a range of values of θ = μ σ with σ = 1 (a) and σ = 5 (b).
Figure 3. Kullback–Leibler divergence from the normal for a range of values of θ = μ σ with σ = 1 (a) and σ = 5 (b).
Mathematics 02 00012 g003
Figure 3 presents two cases of the Kullback–Leibler divergence, for illustration purposes, when the first two and three terms of the infinite sum have been used. In the first graph, the standard deviation is equal to one, and in the second case, it is equal to five. The divergence seems independent of the variance. The change occurs as a result of the value of θ. It becomes clear that when the value of the mean to the standard deviation increases, the folded normal converges to the normal distribution.

3.3. Kullback–Leibler Divergence from the Half Normal Distribution

As mentioned in Section 2.1, the half normal distribution is a special case of the folded normal distribution with μ = 0 . The Kullback–Leilber divergence of the folded normal from the half normal distribution is equal to:
K L ( F N μ , σ 2 | | F N μ = 0 , σ 2 ) = = 0 1 2 π σ 2 e 1 2 σ 2 x μ 2 + e 1 2 σ 2 x + μ 2 log 1 2 π σ 2 e 1 2 σ 2 x μ 2 + e 1 2 σ 2 x + μ 2 2 2 π σ 2 e 1 2 σ 2 x 2 d x = log 2 0 f x ; μ , σ 2 d x + 0 f x ; μ , σ 2 log e μ 2 2 σ 2 + μ x σ 2 + e μ 2 2 σ 2 μ x σ 2 d x = log 2 + 0 μ x σ 2 μ 2 2 σ 2 f x ; μ , σ 2 d x + 0 f x ; μ , σ 2 log 1 + e 2 μ x σ 2 d x = log 2 + 2 μ μ f μ 2 2 σ 2 + K L ( F N | | N )
where f x ; μ , σ 2 stands for the folded normal Equation (2) and μ f is the expected value given in Equation (7). Figure 4 shows the approximations to the true value when σ = 1 and σ = 5 . This time, we used the third and fifth order approximations, but even then, for small values of θ, the approximations were not satisfactory.
Figure 4. Kullback–Leibler divergence from the half normal for a range of values of θ = μ σ with σ = 1 (a) and σ = 5 (b).
Figure 4. Kullback–Leibler divergence from the half normal for a range of values of θ = μ σ with σ = 1 (a) and σ = 5 (b).
Mathematics 02 00012 g004
The previous result cannot lead to an inequality regarding the Kullback–Leibler divergences from the two other distributions. When μ > σ , then the divergence from the half normal will be greater than the divergence from the normal, and when μ < σ , the opposite is true. However, this is not strict, since it can be the case for either inequality that the relationship between the divergences is not true. Instead, we can use it as a rule of thumb in general.

4. Parameter Estimation

We will show two ways of estimating the parameters. The first one can be found in [1], but we review it and add some more details. Both of them are essentially the maximum likelihood estimation procedure, but in the first case, we perform maximization, whereas in the second case, we seek the root of an equation.
The log-likelihood of Equation (2) can be written in the following way:
l = n 2 log 2 π σ 2 + i = 1 n log e x i μ 2 2 σ 2 + e x i + μ 2 2 σ 2 l = n 2 log 2 π σ 2 + i = 1 n log e x i μ 2 2 σ 2 1 + e x i + μ 2 2 σ 2 e x i μ 2 2 σ 2 l = n 2 log 2 π σ 2 i = 1 n x i μ 2 2 σ 2 + i = 1 n log 1 + e 2 μ x i σ 2
where n is the sample size of the x i values. The partial derivatives of Equation (41) are:
l μ = i = 1 n x i μ σ 2 2 σ 2 i = 1 n x i e 2 μ x i σ 2 1 + e 2 μ x i σ 2 = i = 1 n x i μ σ 2 2 σ 2 i = 1 n x i 1 + e 2 μ x i σ 2 , and l σ 2 = n 2 σ 2 + i = 1 n x i μ 2 2 σ 4 + 2 μ σ 4 i = 1 n x i e 2 μ x i σ 2 1 + e 2 μ x i σ 2 l σ 2 = n 2 σ 2 + i = 1 n x i μ 2 2 σ 4 + 2 μ σ 4 i = 1 n x i 1 + e 2 μ x i σ 2
By equating the first derivative of the log-likelihood to zero, we obtain a nice relationship:
i = 1 n x i 1 + e 2 μ x i σ 2 = i = 1 n x i μ 2
Note that Equation (42) has three solutions, one at zero and two more with the opposite sign. The example in Section 4.1 will show graphically the three solutions. By substituting Equation (42), to the derivative of the log-likelihood w.r.t σ 2 and equating to zero, we get the following expression for the variance:
σ 2 = i = 1 n x i μ 2 n + 2 μ i = 1 n x i μ n = i = 1 n x i 2 μ 2 n = i = 1 n x i 2 n μ 2
The above relationships Equations (42) and (43) can be used to obtain maximum likelihood estimates in an efficient recursive way. We start with an initial value for σ 2 and find the positive root of Equation (42). Then, we insert this value of μ in Equation (43) and get an updated value of σ 2 . The procedure is being repeated until the change in the log-likelihood value is negligible.
Another easier and more efficient way is to perform a search algorithm. Let us write Equation (42) in a more elegant way.
2 i = 1 n x i 1 + e 2 μ x i σ 2 i = 1 n x i 1 + e 2 μ x i σ 2 1 + e 2 μ x i σ 2 + n μ = 0 i = 1 n x i 1 e 2 μ x i σ 2 1 + e 2 μ x i σ 2 + n μ = 0
where σ 2 is defined in Equation (43). It becomes clear that the optimization the log-likelihood Equation (41) with respect to the two parameters has turned into a root search of a function with one parameter only. We tried to perform maximization via the E-M algorithm, treating the sign as the missing information, but it did not prove very good in this case.

4.1. An Example with Simulated Data

We generated 100 random values from the F N ( 2 , 9 ) in order to illustrate the maximum likelihood estimation procedure. The estimated parameter values were equal to μ ^ = 2 . 183 , σ ^ 2 = 8 . 065 . The corresponding 95 % confidence intervals for μ and σ 2 were 0 . 782 , 3 . 585 and 2 . 022 , 14 . 108 respectively. Figure 5 shows graphically the existence of the three extrema of the log-likelihood Equation (41), one minimum (always at zero) and two maxima at the maximum likelihood estimates of μ.
Figure 5. The left graph (a) shows the three solutions of the log-likelihood. The right three-dimensional figure (b) shows the values of the log-likelihood for a range of mean and variance values.
Figure 5. The left graph (a) shows the three solutions of the log-likelihood. The right three-dimensional figure (b) shows the values of the log-likelihood for a range of mean and variance values.
Mathematics 02 00012 g005

4.2. Simulation Studies

Simulation studies were implemented to examine the accuracy of the estimates using numerical optimization based on the simplex method [10]. Numerical optimization was performed in [15], using the optim function. The term accuracy refers to interval estimation rather than point estimation, since the interest was on constructing confidence intervals for the parameters. The number of simulations was set equal to R = 1,000. The sample sizes ranged from 20 to 100 for a range of values of the parameter vector. The R-package VGAM[16] offers algorithms for obtaining maximum likelihood estimates of the folded normal, but we have not used it here.
For every simulation, we calculated 95 % confidence intervals using the normal approximation, where the variance was estimated from the inverse of the observed information matrix. The maximum likelihood estimates are asymptotically normal with variance equal to the inverse of the Fisher’s information. The sample estimate of this information is given by the second derivative (Hessian matrix) of the log-likelihood with respect to the parameter. This is an asymptotic confidence interval.
Bootstrap confidence intervals were also calculated using the percentile method [17]. For every simulation, we produced the bootstrap distribution of the data with B = 1000 bootstrap repetitions. Thus, we calculated the 2 . 5 % lower and upper quantiles for each of the parameters. In addition, we calculated the correlations for every pair of the parameters.
Table 1, Table 2, Table 3 and Table 4 present the coverage of the 95 % confidence intervals for the two parameters at different pairs of sample size and mean. The rows correspond to the sample size, whereas the columns correspond to the ratio θ = μ σ , with σ = 5 fixed.
Table 1. Estimated coverage probability of the 95 % confidence intervals for the mean parameter, μ, using the observed information matrix.
Table 1. Estimated coverage probability of the 95 % confidence intervals for the mean parameter, μ, using the observed information matrix.
Valuesofθ
Sample size0.511.522.533.54
200.6890.9300.9550.9310.9260.9400.9300.948
300.6790.9210.9490.9430.9250.9260.9410.915
400.6900.9160.9360.9330.9410.9480.9440.928
500.7180.9440.9550.9380.9330.9480.9460.946
600.6990.9500.9680.9480.9490.9410.9420.946
700.7210.9310.9560.9390.9390.9390.9490.945
800.6910.9300.9500.9400.9460.9360.9450.939
900.7200.9320.9600.9490.9490.9390.9540.944
1000.7380.9450.9490.9380.9430.9260.9460.952
What can be seen from Table 1 and Table 2 is that whist the sample size is important, the value of θ, the mean to standard deviation ratio, is more important. As this ration increase the coverage probability increases, as well, and reaches the desired nominal 95 % . This is also true for the bootstrap confidence intervals, but the coverage is in general higher and increases faster as the sample size increases in contrast to the asymptotic confidence interval. What is more is that when the value of θ is less than one, the bootstrap confidence interval is to be preferred. When the value of θ becomes equal to or more than one, then both the bootstrap and the asymptotic confidence intervals produce similar coverages.
The results regarding the variance are presented in Table 3 and Table 4. When the value of θ is small, both ways of obtaining confidence intervals for this parameter are rather conservative. The bootstrap intervals tend to perform better, but not up to the expectations. Even when the value of θ is large, if the sample sizes are not large enough, the nominal coverage of 95 % is not attained.
Table 2. Estimated coverage probability of the bootstrap 95 % confidence intervals for the mean parameter, μ, using the percentile method.
Table 2. Estimated coverage probability of the bootstrap 95 % confidence intervals for the mean parameter, μ, using the percentile method.
Valuesofθ
Sample size0.511.522.533.54
200.8900.9250.9390.9210.9180.9400.9290.942
300.8940.9310.9330.9430.9260.9220.9420.910
400.9100.9250.9270.9330.9410.9470.9460.928
500.9140.9430.9420.9340.9340.9450.9460.943
600.9040.9490.9530.9500.9410.9380.9430.944
700.8930.9340.9430.9360.9370.9380.9490.939
800.9180.9400.9390.9390.9440.9350.9460.938
900.9200.9340.9520.9480.9460.9390.9510.947
1000.9180.9400.9360.9320.9460.9250.9450.949
Table 3. Estimated coverage probability of the 95 % confidence intervals for the variance parameter, σ 2 , using the observed information matrix.
Table 3. Estimated coverage probability of the 95 % confidence intervals for the variance parameter, σ 2 , using the observed information matrix.
Valuesofθ
Sample size0.511.522.533.54
200.6490.7650.8540.8530.8760.8700.8620.885
300.6970.7940.8700.8980.8920.8980.8940.896
400.7230.8490.8930.9140.9190.9130.9090.902
500.7510.8670.9160.9070.9110.9240.8990.912
600.7450.8650.9110.9130.9160.9060.9200.933
700.7690.8740.9280.9280.9120.9300.9260.935
800.7760.8830.9270.9190.9340.9360.9160.924
900.7950.9010.9310.9320.9250.9300.9400.941
1000.8240.9040.9270.9330.9250.9360.9320.942
The correlation between the two parameters was also estimated for every simulation from the observed information matrix. The results are displayed in Table 5. The correlation between the two parameters is always negative irrespective of the sample size or the value of θ, except for the case when θ = 4 . In this case, the correlation becomes zero as expected. As the value of θ grows larger, the probability of the normal distribution, which lies on the negative axis, becomes smaller until it becomes negligible. In this case, the distribution equals the classical normal distribution for which the two parameters are known to be orthogonal.
Table 4. Estimated coverage probability of the bootstrap 95 % confidence intervals for the variance parameter, σ 2 , using the percentile method.
Table 4. Estimated coverage probability of the bootstrap 95 % confidence intervals for the variance parameter, σ 2 , using the percentile method.
Valuesofθ
Sample size0.511.522.533.54
200.6570.8140.8620.8420.8400.8320.8180.824
300.7010.8500.8850.8910.8820.8670.8690.866
400.7430.8810.8960.9130.9120.8860.8810.878
500.7720.8950.9210.9160.8970.9010.8850.892
600.7970.9070.9120.9100.9060.8970.9070.916
700.8070.9040.9250.9150.9090.9180.9080.924
800.8220.8950.9250.9140.9250.9170.9090.909
900.8690.9160.9320.9220.9190.9150.9340.929
1000.8730.9150.9180.9250.9060.9310.9200.939
Table 5. Estimated correlations between the two parameters obtained from the observed information matrix.
Table 5. Estimated correlations between the two parameters obtained from the observed information matrix.
Valuesofθ
Sample size0.511.522.533.54
20−0.600−0.495−0.272−0.086−0.025−0.006−0.0010.000
30−0.638−0.537−0.262−0.089−0.022−0.005−0.0010.000
40−0.695−0.548−0.251−0.081−0.021−0.005−0.0010.000
50−0.723−0.580−0.259−0.076−0.020−0.005−0.0010.000
60−0.750−0.597−0.251−0.075−0.019−0.004−0.0010.000
70−0.771−0.588−0.256−0.073−0.019−0.004−0.0010.000
80−0.774−0.604−0.253−0.074−0.019−0.004−0.0010.000
90−0.796−0.599−0.245−0.073−0.018−0.004−0.0010.000
100−0.804−0.611−0.252−0.072−0.019−0.004−0.0010.000
Table 6 shows the probability of a normal random variable being less than zero when σ = 5 and the same values of θ as in the simulation studies.
Table 6. Probability of a normal variable having negative values.
Table 6. Probability of a normal variable having negative values.
Valuesofθ
0.511.522.533.54
0.3090.1590.0670.0230.0060.0010.0000.000
When the ratio of mean to standard deviation is small, the area of the normal distribution in the negative side is large, and as the value of this ratio increases, the probability decreases until it becomes zero. In this case, the folded normal is the normal distribution, since there are no negative values to fold on to the positive side. This of course is in accordance with all the previous observations and results we saw.

5. Application to Body Mass Index Data

We fitted the folded normal distribution on real data. These are observations of the the body mass index of 700 New Zealand adults, accessible via the R package VGAM [16]. These measurements are a random sample from the Fletcher Challenge/Auckland Heart and Health survey conducted in the early 1990s [18]. Figure 6 contains a histogram of the data along with the parametric (folded normal) and the non-parametric (kernel) density estimation. It should be noted that the fitted folded normal here converges in distribution to the normal.
Figure 6. The histogram on the left shows the body mass indices of 700 New Zealand adults. The green line is the fitted folded normal and the blue line is the kernel density. The perspective plot on the right shows the log-likelihood of the body mass index data as a function of the mean and the variance.
Figure 6. The histogram on the left shows the body mass indices of 700 New Zealand adults. The green line is the fitted folded normal and the blue line is the kernel density. The perspective plot on the right shows the log-likelihood of the body mass index data as a function of the mean and the variance.
Mathematics 02 00012 g006
The estimated parameters (using the optim command in R) were μ ^ = 26 . 685 ( 0 . 175 ) and σ ^ 2 = 21 . 324 ( 1 . 140 ) , with their standard error appearing inside the parentheses. Since the sample size is very large, there is no need to estimate their standard errors and, consequently, 95 % confidence intervals, even though their ratio is only 1 . 251 . Their estimated correlation coefficient was very close to zero (2 × 10 4 ), and the estimated probability of the folded normal with these parameters below zero is equal to zero.

6. Discussion

We derived the characteristic function of this distribution and, thus, its moment function. The cumulant generating function is simply the logarithm of the moment generating function, and therefore, it is easy to calculate. The importance of these two functions is that they allow us to calculate all the moments of the distribution. In addition, we calculated the Laplace and Fourier transformations and the mean residual life.
The entropy of the folded normal distribution and the Kullback–Leibler divergence of this distribution from the normal and half normal distributions were approximated using the Taylor series. The results were numerically evaluated against the true values and were as expected.
We reviewed the maximum likelihood estimates and simplified their calculation and saw some properties of them. Confidence intervals for the parameters were obtained using the asymptotic theory and the bootstrap methodology under the umbrella of simulation studies.
The coverage of the confidence intervals for the two parameters was lower than the desired nominal in the small sample cases and when the mean to standard deviation ratio was lower than one. An alternative way to correct the under-coverage of the mean parameter is to use an alternative parametrization. The parameters θ = μ σ and σ are calculated in [5]. If we use θ and μ, then the coverage of the interval estimation of μ is corrected, but the corresponding coverage of the confidence interval for σ 2 is still low.
The correlation between the two parameters was always negative and decreasing as the value of θ was increasing, as expected, until the two parameters become independent.
An application of the folded normal distribution to real data was exhibited, providing evidence that it can be used to model non-negative data adequately.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Leone, F.C.; Nelson, L.S.; Nottingham, R.B. The folded normal distribution. Technometrics 1961, 3, 543–550. [Google Scholar] [CrossRef]
  2. Lin, H.C. The measurement of a process capability for folded normal process data. Int. J. Adv. Manuf. Technol. 2004, 24, 223–228. [Google Scholar] [CrossRef]
  3. Chakraborty, A.K.; Chatterjee, M. On multivariate folded normal distribution. Sankhya 2013, 75, 1–15. [Google Scholar] [CrossRef]
  4. Elandt, R.C. The folded normal distribution: Two methods of estimating parameters from moments. Technometrics 1961, 3, 551–562. [Google Scholar] [CrossRef]
  5. Johnson, N.L. The folded normal distribution: Accuracy of estimation by maximum likelihood. Technometrics 1962, 4, 249–256. [Google Scholar] [CrossRef]
  6. Johnson, N.L. Cumulative sum control charts for the folded normal distribution. Technometrics 1963, 5, 451–458. [Google Scholar] [CrossRef]
  7. Sundberg, R. On estimation and testing for the folded normal distribution. Commun. Stat.-Theory Methods 1974, 3, 55–72. [Google Scholar] [CrossRef]
  8. Kim, H.J. On the ratio of two folded normal distributions. Commun. Stat.-Theory Methods 2006, 35, 965–977. [Google Scholar] [CrossRef]
  9. Liao, M.Y. Economic tolerance design for folded normal data. Int. J. Prod. Res. 2010, 48, 4123–4137. [Google Scholar] [CrossRef]
  10. Nelder, J.A.; Mead, R. A simplex method for function minimization. Comput. J. 1965, 7, 308–313. [Google Scholar] [CrossRef]
  11. Johnson, N.L; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions; John Wiley & Sons, Inc.: New York, NY, USA, 1994. [Google Scholar]
  12. Psarakis, S.; Panaretos, J. The folded t distribution. Commun. Stat.-Theory Methods 1990, 19, 2717–2734. [Google Scholar] [CrossRef]
  13. Psarakis, S.; Panaretos, J. On some bivariate extensions of the folded normal and the folded t distributions. J. Appl. Stat. Sci. 2000, 10, 119–136. [Google Scholar]
  14. Kullback, S. Information Theory and Statistics; Dover Publications: New York, NY, USA, 1977. [Google Scholar]
  15. R Development Core Team. R: A Language and Environment for Statistical Computing. 2012. Available online: http://www.R-project.org/ (accessed on 1 December 2013).
  16. Yee, T.W. The VGAM package for categorical data analysis. J. Stat. Softw. 2010, 32, 1–34. [Google Scholar]
  17. Efron, B.; Tibshirani, R. An Introduction to the Bootstrap; Chapman and Hall/CRC: New York, NY, USA, 1993. [Google Scholar]
  18. MacMahon, S.; Norton, R.; Jackson, R.; Mackie, M.J.; Cheng, A.; Vander Hoorn, S.; Milne, A.; McCulloch, A. Fletcher challenge-university of Auckland heart and health study: Design and baseline findings. N. Zeal. Med. J. 1995, 108, 499–502. [Google Scholar]

Share and Cite

MDPI and ACS Style

Tsagris, M.; Beneki, C.; Hassani, H. On the Folded Normal Distribution. Mathematics 2014, 2, 12-28. https://0-doi-org.brum.beds.ac.uk/10.3390/math2010012

AMA Style

Tsagris M, Beneki C, Hassani H. On the Folded Normal Distribution. Mathematics. 2014; 2(1):12-28. https://0-doi-org.brum.beds.ac.uk/10.3390/math2010012

Chicago/Turabian Style

Tsagris, Michail, Christina Beneki, and Hossein Hassani. 2014. "On the Folded Normal Distribution" Mathematics 2, no. 1: 12-28. https://0-doi-org.brum.beds.ac.uk/10.3390/math2010012

Article Metrics

Back to TopTop