Next Article in Journal
Balancing the Electromagnetic Field Exposure in Wireless Multi-Hop Networks: An EMF-Aware Routing Scheme
Previous Article in Journal
Impact of Advance Payments of Tax on Profit on Effectiveness of Investments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Inflated Unit-Birnbaum-Saunders Distribution

by
Guillermo Martínez-Flórez
1,†,
Roger Tovar-Falón
1,† and
Carlos Barrera-Causil
2,*,†
1
Departamento de Matemáticas y Estadística, Facultad de Ciencias Básicas, Universidad de Córdoba, Montería 230002, Colombia
2
Grupo de Investigación Davinci, Facultad de Ciencias Exactas y Aplicadas, Instituto Tecnológico Metropolitano, Medellín 050034, Colombia
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Submission received: 27 January 2022 / Revised: 16 February 2022 / Accepted: 17 February 2022 / Published: 21 February 2022
(This article belongs to the Section Probability and Statistics)

Abstract

:
The modeling different data behaviour like the human development index as a function of life expectancy, the water capacity of a reservoir with respect to a certain threshold, or the percentage of death rate of an infant before his or her first birthday, are situations which a researcher can face. It is noteworthy that these problems may have in common data with excessive zeros and ones. Then, it is essential to have flexible and accuracy models to fit data with these features. Given the relevance of data modeling with excessive zeros and ones, in this paper, a mixture of discrete and continuous distributions is proposed for modeling data with these behaviors. Additionally, the Unit-Birnbaum-Saunders distribution is considered with the aim to explain the continuous component of the model and the features of a Bernoulli process. The estimation of the parameters is based on the maximum likelihood method. Observed and expected information matrices are derived, illustrating interesting aspects of the likelihood approach. Finally, with practical applications by using real data we can show the advantage of using our proposal concerning the inflated beta model.

1. Introduction

One of the most used distributions to fit fatigue and life data is the Birnbaum-Saunders (BS) distribution, which was introduced in [1]. The BS distribution has a probability density function (PDF) given by
f T ( t ) = t 3 / 2 ( t + β ) 2 α β ϕ 1 α t β β t , t > 0 ,
where ϕ ( · ) is the PDF of the normal distribution, α > 0 is a shape parameter and β > 0 is a scale parameter. If T is a random variable with BS distribution, then it is denoted as T BS ( α , β ) .
The BS distribution has been applied in many areas such as biology, medicine, forestry, environment, among others, and it has been extended to a large number of families of distributions. Díaz-García and Leiva-Sánchez [2] for example, introduced the extension of the BS model in the case of the elliptically symmetric distributions, while the elliptically asymmetric case was studied by Vilca-Labra and Leiva-Sánchez [3]. Another extension was considered by Martínez-Flórez et al. [4], which proposed the exponentiated (commonly known as alpha-power model) BS family of distributions. On the other hand, Moreno-Arenas et al. [5] studied the hazard proportional BS model, and Lemonte [6] proposed the multivariate Birnbaum-Saunders model.
Recently, [7] presented a type of BS distribution, which is useful to fit data with support on the interval ( 0 , 1 ) , becoming a new alternative to the beta and Kumaraswamy [8] distributions. This new model (named unit-Birnbaum-Saunders (UBS) model) is obtained by applying the transformation X = exp ( T ) , where T BS ( α , β ) . Note that, this type of transformation has been used by other authors such as Mazucheli et al. [9], Mazucheli et al. [10] and Menezes et al. [11] among others. It follows that T = log ( X ) with the Jacobian of the transformation obtained by calculating the first derivative of T with respect to X, which is given by d t d x = 1 x . Then, by the random variable transformation theorem, see Cassella and Berguer [12] (p. 51), it follows that the PDF of the UBS model is given by
f UBS ( x ; α , β ) = 1 2 x α β 2 π β log x 1 / 2 + β log x 3 / 2 × exp 1 2 α 2 log x β + β log x + 2 , x ( 0 , 1 ) ,
where α > 0 is a shape parameter and β > 0 is a scale parameter. The model is denoted as X UBS ( α , β ) .
Statistical models to explain variables on the unit interval, such as proportions, rates or indexes have been studied by different authors, see [13,14,15,16,17,18,19,20]. Extensions of these models to situations where response variables are on the intervals [ 0 , 1 ] , [ 0 , 1 ) or ( 0 , 1 ] have been considered by [21,22,23,24]. Real data applications include the proportion of deaths caused by smoking, problems related to the estimation of the gross domestic product (GDP) and so on. Note that this area has been studied by [9,10,11].
This paper aims to propose an alternative approach for modeling response variables with zero and/or one inflation. A Bernoulli model that links excessive zeros and/or ones with a group of covariates that may influence the probability of their occurrence is considered. One of the main contributions of our proposed model is that it is useful for the analysis of material fatigue data on the interval [ 0 , 1 ] with excess of zeros and/or ones. In the literature, there are other models for fitting data on [ 0 , 1 ] ([22,23,24]), however, these models are not explicitly applied to fitting material fatigue data. We emphasize that the Birnbaum-Saunders distribution is widely used in the analysis of this type of data.
The rest of the paper is organized as follows. In Section 2 the doubly-censored random variable is defined, and the two-part model studied by [25] to analyze this type of variables is presented. Section 3 extends the UBS model to censored data, and the excessive zeros and/or ones are considered by using a mixture of Bernoulli and a doubly-censored UBS model. The parameters estimation is made by using the maximum likelihood approach. The non-singular observed and expected (Fisher) information matrices are derived. Finally, Section 4 presents three analysis with real data sets to compare the proposed model with the modified Beta model.

2. Censoring

A variable is said to be censored when one or more of the observed values are unknown beyond an upper or lower bound. In several practical situations, censoring occurs for limitations of the measuring devices or the experimental project. For example, the needle of a scale that does not provide a reading above 200 kg for all objects that weigh more than this limit, or the measure of viral load in people with HIV. When the data are censored, the probability distribution is a mixture between a continuous and a discrete distribution, and the two-part model by Cragg [25] is a way to analyze situations where data have a mixture between a discrete and a continuous distribution. The two-part model [25] is given by
g ( y i ) = p i ( 1 I i ) + ( 1 p i ) f ( y i ) I i ,
where p i is the probability that determines the relative contribution of the point mass.
The random variable Y is left-censored at the value c, if for a random sample y 1 , , y n of size n, only the values of Y that are greater than the constant c can be observed; whereas, for values of Y less than or equal to constant c, only the value c is observed. Therefore, the values of the random variable Y can be written as follows:
y i = y i , if y i > c , c , if y i c ,
where c is said to be the censoring point, and the random variable Y has PDF given by
g ( y i ) = f ( y i ) , if y i > c , p , if y i c ,
For the case f ( · ) = ϕ ( · ) it has the Tobit model.
A model is said to be inflated (excess of zeros and/or ones) if the probability mass for some of its points exceeds that allowed by the proposed model. In this case, it is usual to assume that the data distribution is a mixture between a standard distribution and a degenerate distribution concentrated in a point. Some cases that are studied in the statistical literature are the zero-inflated binomial (ZIB) model, zero-inflated Poisson (ZIP) model, and zero-inflated negative binomial (ZINB) model. For more illustration, see [23].
Definition 1.
A random variable Y is said to be a doubly-censored random variable if Y is left-censored and right-censored. For a random sample y 1 , y 2 , , y n , following certain distribution, it is defined
y i = c 0 , i f y i c 0 , y i , i f c 0 < y i < c 1 , c 1 , i f y i c 1
An extension of the model (3) for the doubly-censored case is given by
g ( y i ) = p 0 i I 0 i ( y i ) + p 1 i I 1 i ( y i ) + ( 1 p 0 i p 1 i ) f + ( y i ) ( 1 I 0 i ( y i ) I 1 i ( y i ) ) ,
where I 0 i ( y i ) = 1 if y i c 0 and zero otherwise, I 1 i ( y i ) = 1 if y i > c 1 and zero otherwise. It has that p 0 i is the proportion of observations below constant c 0 (the lower detection limit) and p 1 i the proportion of observations above constant c 1 (the upper detection limit).
If y i N ( μ , σ 2 ) , it follows from Definition 1 that the PDF of a doubly-censored random variable Y with normal distribution (DCN), which is an extension of the Tobit model, is given by
f ( y i ) = Φ c 0 μ σ , if y i c 0 , 1 η ϕ y i μ σ , if c 0 < y i < c 1 , 1 Φ c 1 μ σ , if y i c 1

3. The Bernoulli/Doubly Censored Birnbaum—Saunders Mixture Model

In this section, a new doubly-censored model based on unit-Birnbaum-Saunders distribution is introduced.

3.1. Mixture Model

As an alternative to the doubly-censored on [ 0 , 1 ] interval models, the doubly-censored Tobit model, inflated beta distribution and doubly-censored power-normal model; a new model based on a mixture between a Bernoulli random variable and the asymmetric UBS ( α , β ) model is introduced. It is considered that the continuous part ranging in ( 0 , 1 ) is modeled by a random variable following a UBS ( α , β ) distribution, while the point mass at zero can be modeled by a Bernoulli random variable with parameter γ , namely Ber ( γ ) .
Definition 2.
A random variable X that assumes values on the closed interval [ 0 , 1 ] is said to have a zero-and-one-inflated Bernoulli unit-Birnbaum-Saunders distribution (BUBSZOI) with parameters α , β , γ and p, if X has PDF given by
f ( x ) = p ( 1 γ ) , i f x = 0 , ( 1 p ) f UBS ( x ; α , β ) , i f 0 < x < 1 , p γ , i f x = 1 ,
with 0 < p , γ < 1 and α , β > 0 , where f UBS ( x i ; α , β ) is the UBS distribution (2). We write X BUBSZOI ( α , β , p , γ ) . We can see that, if X BUBSZOI ( α , β , p , γ ) , then P ( x = 0 ) = p ( 1 γ ) and P ( x = 1 ) = p γ .
Let X BUBSZOI ( α , β , p , γ ) , then the cumulative distribution function (CDF) of X is given by
F X ( x ) = p ( 1 γ ) , if x 0 , p ( 1 γ ) + ( 1 p ) Φ w ( x ) , if 0 < x < 1 1 , if x 1 .
where w ( x ) = 1 α log x β β log x . After some algebraic manipulation, the k-th moment of X BUBSZOI ( α , β , p , γ ) can be obtained by using
E ( X k ) = p γ + ( 1 p ) 2 k α 2 β + 2 k α 2 β + 1 + 1 2 k α 2 β + 1 exp 2 k α 2 β + 1 1 α 2 ,
for k = 1 , 2 , , it follows that
μ 1 = E ( X ) = p γ + ( 1 p ) 2 α 2 β + 2 α 2 β + 1 + 1 2 α 2 β + 1 exp 2 α 2 β + 1 1 α 2 ,
and
μ 2 = E ( X 2 ) = p γ + ( 1 p ) 4 α 2 β + 4 α 2 β + 1 + 1 4 α 2 β + 1 exp 4 α 2 β + 1 1 α 2 ,
so
V a r ( X ) = ( 1 p ) σ UBS 2 + p ( 1 p ) μ 1 2 + p γ 1 p γ 2 ( 1 p ) μ 1 ,
where σ UBS 2 is the variance of a random variable following a UBS ( α , β ) distribution.

3.1.1. Maximum Likelihood Estimation

Let X = ( x 1 , , x n ) a random sample of a BUBSZOI distribution. Defining 1 as the sums corresponding to 0 < X i < 1 , n 0 = i = 1 n I { 0 } ( x i ) , n 1 = i = 1 n I { 1 } ( x i ) and n 01 = i = 1 n I { 0 , 1 } ( x i ) , where I A ( x ) is the indicator function for the set A, then, it follows that the log-likelihood function for the parameter vector θ = ( α , β , p , γ ) given the sample X can be written as
( θ ; X ) = n 01 log ( p ) + ( n n 01 ) log ( 1 p ) + n 1 log ( γ ) + n 0 log ( 1 γ ) + 1 { log ( x i ) log ( α ) 1 2 log ( β ) + log ( log x i + β ) 3 2 log ( log x i ) + 1 2 α 2 log x i β + β log x i + 2 } .
The elements of the score function, defined as the first partial derivative of the log-likelihood function concerning the parameters, are given by
( θ ; X ) α = 1 α 1 1 + 1 α 2 log x i β + β log x i + 2 , ( θ ; X ) β = n 2 β + 1 1 log x i + β + 1 2 α 2 1 1 log x i log x i β 2 , ( θ ; X ) p = n 01 n p p ( 1 p ) , ( θ ; X ) γ = n 1 γ n 01 γ ( 1 γ )
The maximum likelihood estimator (MLE) for θ = ( α , β , p , γ ) , can be obtained by solving the system of equations that results by equating previous derivatives to zero. Hence, we obtain the solutions p ^ = n 01 / n and γ ^ = n 1 / n 01 for the proportions of zeros and ones, respectively. It can be shown that the estimator p ^ is unbiased for p. The system of equations obtained for ( α , β ) does not have an analytical solution and it must be solved by numerical methods like Newton-Raphson or quasi-Newton. The estimator α ^ is obtained as function of β ^ by
α ^ = s β ^ + β ^ r 2
where s = ( log x ) ¯ and r = log x 1 ¯ 1 while β ^ is the solution of the equation
β 2 ( 2 r + K ( β ) ) + r ( s + K ( β ) ) = 0 .
where K ( β ) = log x + β 1 ¯ 1 .

3.1.2. Observed Information Matrix

The elements of the observed information matrix are obtained multiplying by minus the second partial derivative of the log-likelihood function concerning each of the parameters, i.e.,
κ θ j θ j = 2 ( θ ; X ) θ j θ j
where θ j , θ j ( α , β , p , γ ) . Then, it follows that
κ α α = 1 1 α 2 + 3 α 2 a x i 2 , κ β α = 1 1 α 3 log x i β 2 1 log x i , κ β β = 1 1 2 β 2 + 1 ( log x i + β ) 2 + 1 α 2 β 3 ( log x i ) , κ p p = n 01 ( 1 2 p ) + n p 2 p 2 ( 1 p ) 2 , κ γ γ = n 1 ( 1 2 γ ) + n 01 γ 2 γ 2 ( 1 γ ) 2 ,
and
κ α p = j β p = j α γ = j β γ = j γ p = 0 ,
where
a x = 1 α log x β β log x .
The elements of the Fisher information matrix can be obtained by multiplying by n 1 the expected values of the elements of the matrix of second derivatives of the log-likelihood function. Following [22] the Fisher information matrix for θ = ( α , β , p , γ ) is given by
I ( θ ) = ( 1 p ) 1 α 2 0 0 0 0 2 π + α q ( α ) 2 π α 2 β 2 0 0 0 0 1 p ( 1 p ) 2 0 0 0 0 p γ ( 1 γ ) ( 1 p ) ,
where q ( α ) = α 2 / π π exp 2 / α 2 erf 2 / α / 2 , where erf ( x ) = 2 π x exp ( t 2 ) d t is the error function, see [26]. We can note that the 2 × 2 superior submatrix of I ( θ ) matches with the Fisher information matrix of the UBS distribution. This shows that the parameters vectors ( α , β ) and ( p , γ ) are orthogonal, so that the information matrix is blocked orthogonal and can be written as
I ( θ ) = Diag { I α , β , I p , γ } ,
where
I α , β = Diag 1 p α 2 , ( 1 p ) 2 π + α q ( α ) 2 π α 2 β 2 ,
and
I p , γ = Diag 1 p ( 1 p ) , p γ ( 1 γ ) .
For large samples, the MLE θ ^ of θ follows a distribution asymptotically normal, i.e.,
( θ ^ θ ) A N 4 ( 0 , I 1 θ ) ,
resulting that the asymptotic variance Σ θ ^ , of the MLE θ ^ is n 1 times the inverse of I ( θ ) and given by
Σ θ ^ = α 2 n ( 1 p ) 0 0 0 0 1 n ( 1 p ) 2 π α 2 β 2 2 π + α q ( α ) 0 0 0 0 p ( 1 p ) n 0 0 0 0 γ ( 1 γ ) n p .
The approximation N 4 ( θ , Σ θ ^ ) can be used to construct the confidence intervals for θ r , which are given by the formula
θ ^ r z 1 ρ / 2 σ ^ ( θ ^ r ) ,
where σ ^ ( · ) is the r-th diagonal element of the matrix Σ θ ^ and z 1 ρ / 2 is the 100 ( 1 ρ / 2 ) % quantile of the standard normal distribution.

3.2. Mixture Under Reparameterization

A representation of the Bernoulli/UBS mixture model in the form of the doubly-censored model given in (4), that is, it could be written based on the probabilities of the limit points c 0 = 0 and c 1 = 1 , call them δ 0 and δ 1 > 0 respectively, can be obtained under the reparameterization used by [23] by letting δ 1 = p γ and δ 0 = 1 δ 1 , where δ 0 = P ( x = 0 ) and δ 1 = P ( x = 1 ) with 0 < δ 0 + δ 1 < 1 , which leads to the PDF given by
f ( x ) = δ 0 , if x = 0 , ( 1 δ 0 δ 1 ) g UBS ( x ; α , β ) , if 0 < x < 1 , δ 1 , if x = 1 ,
which is denoted by BUBSZOI R ( α , β , δ 0 , δ 1 ) . If X BUBSZOI R ( α , β , δ 0 , δ 1 ) , then the CDF of X is given by
F R ( x ) = δ 0 , if x 0 , δ 0 + ( 1 δ 0 δ 1 ) Φ w ( x ) , if 0 < x < 1 , 1 , if x 1 .
where w ( x ) = 1 α log x β β log x .
Given a random sample of size n of a X BUBSZOI R ( α , β , δ 0 , δ 1 ) , the log-likelihood function to estimate the parameter vector θ = ( α , β , δ 0 , δ 1 ) can be written as
( θ ; X ) = n 0 log ( δ 0 ) + n 1 log ( δ 1 ) + ( n n 01 ) log ( 1 δ 0 δ 1 ) + 1 { log ( x i ) log ( α ) 1 2 log ( β ) + log ( log x i + β ) 3 2 log ( log x i ) + 1 2 α 2 log x i β + β log x i + 2 } ,
Then, the score equations follow by equating to zero the score functions, and leading to the following equations
1 α 1 1 + 1 α 2 log x i β + β log x i + 2 = 0 , n 2 β + 1 1 log x i + β + 1 2 α 2 1 1 log x i log x i β 2 = 0 ,
which can be solved numerically by using Newton-Raphson. From equations ( θ ; X ) δ 0 = 0 and ( θ ; X ) δ 1 = 0 it is obtained the estimator δ ^ 0 = n 0 / n of the proportion of zeros in the sample and δ ^ 1 = n 1 / n , the proportions of ones in the sample. In this new model, the Fisher information matrix can be written as I ( θ ) = Diag { I δ 0 , δ 1 , I α , β } , where the elements of I δ 0 , δ 1 are given by i δ 0 δ 0 = 1 δ 1 δ 0 ( 1 δ 0 δ 1 ) ,   i δ 1 δ 0 = 1 1 δ 0 δ 1 and i δ 1 δ 1 = 1 δ 0 δ 1 ( 1 δ 0 δ 1 ) , with I α , β as were computed for the model BUBSZOI ( α , β , p , γ ) .
For this new parameterization, the parameters of the censored and non-censored parts of the model are orthogonal, so the corresponding MLEs are asymptotically orthogonal and the parameters can be estimated separately.
For n large,
n ( θ ^ θ ) A N 4 ( 0 , Σ θ ^ ) ,
meaning that θ ^ is consistent and asymptotically normally distributed with sample variance
Σ θ ^ = α 2 n ( 1 δ 0 δ 1 ) 0 0 0 0 1 n ( 1 δ 0 δ 1 ) 2 π α 2 β 2 2 π + α q ( α ) 0 0 0 0 δ 0 ( 1 δ 0 ) n δ 0 δ 1 n 0 0 δ 0 δ 1 n δ 1 ( 1 δ 1 ) n .

3.3. Censored Models for Zero or One Inflation

Particular cases of the previous zero-and-one-inflated model, are the situation of zero-inflated and one-inflated. In the case of the zero-inflated, the density function is given by:
f ( x ) = δ 0 , if x = 0 , ( 1 δ 0 ) g UBS ( x ; α , β ) , if 0 < x < 1 .
where 0 < δ 0 = P ( x = 0 ) < 1 . This model is denoted BUBSZI R ( α , β , δ 0 ) . The log-likelihood function of θ = ( α , β , δ 0 ) can be written as:
( θ ; X ) = n 0 log ( δ 0 ) + ( n n 0 ) log ( 1 δ 0 ) + 1 { log ( x i ) log ( α ) 1 2 log ( β ) + log ( log x i + β ) 3 2 log ( log x i ) + 1 2 α 2 log x i β + β log x i + 2 } ,
The MLEs of the parameters α and β are obtained numerically from the equations ( θ ; X ) α = 0 and ( θ ; X ) β = 0 , as in the general case of the model BUBSZOI R ( α , β , δ 0 , δ 1 ) . For the case of the parameter δ 0 , the estimate is obtained from the equation ( θ ; X ) δ 0 = 0 and it is given by δ ^ 0 = n 0 / n , which is the estimated proportion of zeros in the sample.
The variance of the MLE vector θ = ( α , β , δ 0 ) takes the form
Σ θ ^ = α 2 n ( 1 δ 0 ) 0 0 0 1 n ( 1 δ 0 ) 2 π α 2 β 2 2 π + α q ( α ) 0 0 0 δ 0 ( 1 δ 0 ) n .
In the case of the one-inflated, the likelihood function is given by:
f ( x ) = δ 1 , if x = 1 , ( 1 δ 1 ) g UBS ( x ; α , β ) , if 0 < x < 1 .
where 0 < δ 1 = Pr ( x = 1 ) < 1 . This is denoted by BUBSOI R ( α , β , δ 1 ) . The log-likelihood function of θ = ( α , β , δ 1 ) can be written as
( θ ; X ) = n 1 log ( δ 1 ) + ( n n 1 ) log ( 1 δ 1 ) + 1 { log ( x i ) log ( α ) 1 2 log ( β ) + log ( log x i + β ) 3 2 log ( log x i ) + 1 2 α 2 log x i β + β log x i + 2 } ,
As in the case of the zero-inflated, the MLEs of the parameters α and β are obtained numerically from the equations ( θ ; X ) α = 0 and ( θ ; X ) β = 0 , while δ 1 is estimated from equation ( θ ; X ) δ 1 = 0 , thus, it is obtained the estimator δ ^ 1 = n 1 / n , which is the proportion of ones in the sample.
The variance of the MLE of the parameter vector θ = ( α , β , δ 1 ) , is
Σ θ ^ = α 2 n ( 1 δ 1 ) 0 0 0 1 n ( 1 δ 1 ) 2 π α 2 β 2 2 π + α q ( α ) 0 0 0 δ 1 ( 1 δ 1 ) n .

3.4. Testing Nested Models

Let consider F θ and G γ with corresponding density functions f ( y i x i , θ ) and g ( y i x i , β ) , respectively, the likelihood ratio statistic to compare models is given by
L R ( θ ^ , β ^ ) f ( θ ^ ) g ( β ^ ) = i = 1 n log f ( y i x i , θ ^ ) g ( y i x i , β ^ ) ,
This likelihood ratio statistic does not have a chi-square distribution. To overcome this problem, Vuong [27] proposed an alternative approach based on the Kullback-Liebler information criterion, [28]. The statistic is given by
T LR , NN = 1 n L R ( θ ^ , β ^ ) ω ^ 2 ,
where
ω ^ 2 = 1 n i = 1 n log f ( y i x i , θ ^ ) g ( y i x i , β ^ ) 2 1 n i = 1 n log f ( y i x i , θ ^ ) g ( y i x i , β ^ ) 2
is an estimator for the variance of 1 n L R ( θ ^ , β ^ ) .
Hence, it was shown that, as n ,
T LR , NN d N ( 0 , 1 )
under
H 0 : E log f ( y i x i , θ ) g ( y i x i , β ) = 0 ,
then, the models are equivalent. At the 5% level, being z 0.025 the critical value, the model is rejected if T LR , NN > z 0.025 , that is, T LR , NN < z 0.025 .

4. Real Data Illustrations

In this section, the usefulness of the proposed models is presented. The BUBSZOI and BUBSZI distributions are fitted to real data sets.

4.1. Illustration 1 of the BUBSZOI Model

For this illustration, we use the data set available at http:www.datasus.gov.br, (accesed date: 27 November 2021). This data set corresponds to the proportions of infant deaths in 5561 Brazilian counties. The histogram, that show the behavior of the data, is given in the following Figure 1 (on left).
Table 1 shows some descriptive statistics of the data set. For this data set, the inflated beta (BEINF) distribution was fitted, Ref. [23]. In addition, the MBLPN model proposed by [22] is fitted, where is assumed a mixture of a Bernoulli random variable for the discrete part, and a log-power-normal model for the continuous part (between zero and one). This is denoted by MBLPN ( μ , σ , α , δ 0 , δ 1 ) . The BUBSZOI model is also fitted. The MLEs (with standard errors in parentheses) of the fitted parameter models are given in Table 2. Figure 1b is the CDF for the BUBSZOI model, showing that the model presents a good fit for the studied data set.
To compare BUBSZOI model against the MBLPN model of [22] and the BEINF model, a test of non-nested models is used. Let F θ the BUBSZOI model and G β , the MBLPN model, the Vuong’s approach leads to the observed value T LR , NN = 0.0116 which is not greater than the critical value z 0.025 = 1.96 and hence, the MBLPN distribution is not better than the BUBSZOI model. Similarly, to compare the BUBSZOI and BEINF models, it has T LR , NN = 55.3415 which is greater than the critical value z 0.025 = 1.96 which favors the BUBSZOI model, then the best model to fit the data is the BUBSZOI.

4.2. Illustration of the BUBSZI Model

A situation in which our model can be useful, occurs when we want to fit a statistical model to data sets related to percentages of people with some feature of interest (with high or low frequency of occurrence). For example, this second illustration uses the database available and explained in detail at http:www.pnud.org.br (accessed date: 14 September 2021), corresponding to the percentage of people whit certain poverty conditions. Here, the frequency histogram of the data is presented in Figure 2a. Note that their shape is as an inverted J, a feature that can be modeled by the BEZI model, see [23], it is assumed a mixture of a Bernoulli random variable for the discrete part and a beta regression for the continuous (between zero and one), which is denoted by BEZI ( μ , σ , δ 0 ) . The total number of zeros in the sample is represented with a vertical bar at zero. Additionally, we consider a left-censored Tobit model and the BUBSZI distribution. The MLEs (with standard errors in parentheses) of the parameters of the proposed models are presented in Table 3.
Figure 2 also shows the CDF for the BEZI and BUBSZI model, illustrating the fact that the models present a good fit for the studied data set.
Now, being F θ the BUBSZI model and G β , the BEZI model, the Vuong’s approach leads to the observed value T LR , NN = 18.6431 . This value is greater than the critical point z 0.025 = 1.96 and hence, the BUBSZI distribution is the best model. Similarly, for comparing models BUBSZI and Tobit, we have T LR , NN = 11.5110 which favors the BUBSZI model, then the best model to fit the data is the BUBSZI model.

4.3. Illustration 2 of the BUBSZOI Model

As was mentioned in the introduction, medicine is an important area of application of our model. So, this new illustration is made by using the data set which was studied by [29] and corresponds to a clinical marker of periodontal disease. The histogram of the response ( X ) (proportion of diseased tooth sites of the incisors tooth), is presented in Figure 3a showing the data behavior. Notice that, the data set presents high inflation of X = 1 , but for some units, we have X = 0 .
We fit the beta and BUBSZOI models to the data set. The MLEs of the μ and σ of the beta model are given by μ ^ = 0.6774 ( 0.0143 ) and σ ^ = 0.4698 ( 0.0158 ) , while for the BUBSZOI model, we obtained the MLEs α ^ = 1.2135 ( 0.0573 ) and β ^ = 0.2638 ( 0.0178 ) . The estimates of the δ 0 and δ 1 in both models are δ ^ 0 = 0.0034 ( 0.0034 ) and δ ^ 1 = 0.2241 ( 0.0244 ) .
The value of Vuong’s statistic to compare the models under consideration is given by T LR , NN = 26.99265 , which is greater than z 0.025 = 1.96 which favors the BUBSZOI model, so the best model to fit the data is the BUBSZOI model. Figure 3 (on right), presents the CDF of the BUBSZOI model, illustrating the fact that the model presents a good fit for the studied data set.

5. Concluding Remarks

The modeling of data with excessive zeros or ones is a task that is required in areas like economy, medicine, agriculture, and so on. Possible applications in these areas could be related to the modeling of the infant mortality rate, the proportion of deaths caused by smoking, a clinical marker of periodontal disease, the estimation of the gross domestic product, or the mortality in traffic accidents, among others. As shown in previous sections, different alternatives could be found in the literature to model this behavior, as the extensions of the inflated beta model, which were used to compare their performance with our proposal.
This paper discusses an alternative to the beta regression model in the situation of excessive zeros and/or ones. The approach is based on an extension of the Tobit model with excess of zeros considered in [30]. The estimation is based on the likelihood approach and the Fisher information matrix is derived having orthogonality between the parameters, which simplifies large sample properties of the maximum likelihood estimators. Three illustrations with real data show that the proposed models can be even better than the extensions of the inflated beta model considered in [23].

Author Contributions

Conceptualization, G.M.-F., R.T.-F. and C.B.-C.; data curation, R.T.-F. and C.B.-C.; formal analysis, G.M.-F. and R.T.-F.; funding acquisition, C.B.-C.; investigation, G.M.-F. and R.T.-F.; methodology, G.M.-F.; project administration, G.M.-F. and C.B.-C.; resources, G.M.-F., R.T.-F. and C.B.-C.; supervision, G.M.-F., R.T.-F. and C.B.-C.; visualization, C.B.-C.; writing—original draft, C.B.-C.; writing—review and editing, C.B.-C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Instituto Tecnológico Metropolitano (ITM) through the project (P21112) Fortalecimiento y consolidación del grupo didáctica y modelamiento en ciencias exactas y aplicadas DAVINCI para responder a las necesidades de las industrias 4.0.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Details about data available are given in Section 4.

Acknowledgments

G.M.-F. and R.T.-F. acknowledges the support given by Universidad de Córdoba, Montería, Colombia. C.B.-C. extends their sincere gratitude to the Instituto Tecnológico Metropolitano (ITM).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Birnbaum, Z.W.; Saunders, S.C. A new family of life distributions. J. Appl. Probab. 1969, 6, 319–327. [Google Scholar] [CrossRef]
  2. Díaz-García, J.A.; Leiva-Sánchez, V. A new family of life distributions based on the elliptically contoured distributions. J. Stat. Plan. Inference 2005, 128, 445–457. [Google Scholar] [CrossRef]
  3. Vilca-Labra, F.; Leiva-Sánchez, V. A new fatigue life model based on the family of skew-elliptical distributions. Commun. Stat. Theory Methods 2006, 35, 229–244. [Google Scholar] [CrossRef]
  4. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. An alpha-power extension for the Birnbaum-Saunders distribution. Statistics 2014, 48, 896–912. [Google Scholar] [CrossRef]
  5. Moreno-Arenas, G.; Martínez-Flórez, G.; Barrera-Causil, C. Proportional hazard Birnbaum-Saunders distribution with application to the survival data analysis. Rev. Colomb. Estadística 2016, 39, 129–147. [Google Scholar] [CrossRef]
  6. Lemonte, A.J. Multivariate Birnbaum-Saunders regression model. J. Stat. Comput. Simul. 2013, 46, 2244–2257. [Google Scholar] [CrossRef]
  7. Mazucheli, J.; Menezes, A.F.B.; Dey, S. The unit-Birnbaum-Saunders distribution with applications. Chil. J. Stat. 2018, 9, 47–57. [Google Scholar]
  8. Kumaraswamy, P. A generalized probability density function for double-bounded random processes. J. Hydrol. 1980, 46, 79–88. [Google Scholar] [CrossRef]
  9. Mazucheli, J.; Menezes, A.F.B.; Dey, S. Improved maximum-likelihood estimators for the parameters of the unit-gamma distribution. Commun. Stat. Theory Methods 2018, 47, 3767–3778. [Google Scholar] [CrossRef]
  10. Mazucheli, J.; Menezes, A.F.B.; Ghitany, M.E. The unit-Weibull distribution and associated inference. J. Appl. Probab. Stat. 2018, 13, 1–22. [Google Scholar]
  11. Menezes, A.F.B.; Mazucheli, J.; Dey, S. The unit-logistic distribution: Different methods of estimation. Pesqui. Oper. 2018, 38, 555–578. [Google Scholar] [CrossRef]
  12. Cassella, G.; Berger, R.L. Statistical Inference; Duxbury: Belmont, CA, USA, 2002. [Google Scholar]
  13. Bayes, C.L.; Bazan, J.L.; García, C. A new robust regression model for proportions. Bayesian Anal. 2012, 7, 841–866. [Google Scholar] [CrossRef]
  14. Branscum, A.J.; Johnson, W.O.; Thurmond, M.C. Bayesian beta regression: Applications to household expenditure data and genetic distance between foot-and-mouth diseases viruses. Aust. N. Z. J. Stat. 2007, 49, 287–301. [Google Scholar] [CrossRef]
  15. Cribari-Neto, F.; Vasconcellos, K.L.P. Nearly unbiased maximum likelihood estimation for the beta distribution. J. Stat. Comput. Simul. 2002, 72, 107–118. [Google Scholar] [CrossRef]
  16. Ferrari, S.; Cribari-Neto, F. Beta regression for modelling rates and proportions. J. Appl. Stat. 2004, 31, 799–815. [Google Scholar] [CrossRef]
  17. Kieschnick, R.; McCullough, B.D. Regression analysis of variates observed on (0, 1): Percentages, proportions and fractions. Stat. Model. 2003, 3, 193–213. [Google Scholar] [CrossRef] [Green Version]
  18. Lemonte, A.J.; Cribari-Neto, F.; Vasconcellos, K.L.P. Improved statistical inference for the two-parameter Birnbaum-Saunders distribution. Comput. Stat. Data Anal. 2007, 51, 4656–4681. [Google Scholar] [CrossRef]
  19. Paolino, P. Maximum likelihood estimation of models with beta-distributed dependent variables. Political Anal. 2001, 9, 325–346. [Google Scholar] [CrossRef]
  20. Vasconcellos, K.L.P.; Cribari-Neto, F. Improved maximum likelihood estimation in a new class of beta regression models. Braz. J. Probab. Stat. 2005, 19, 13–31. [Google Scholar]
  21. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. Doubly censored power-normal regression models with inflation. Test 2014, 24, 265–286. [Google Scholar] [CrossRef]
  22. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. Power-models for proportions with zero/one excess. Appl. Math. Inf. Sci. 2018, 24, 293–303. [Google Scholar] [CrossRef]
  23. Ospina, R.; Ferrari, S.L.P. Inflated beta distribution. Stat. Pap. 2010, 51, 111–126. [Google Scholar] [CrossRef] [Green Version]
  24. Ospina, R.; Ferrari, S.L.P. A general class of zero-or-one inflated beta regression models. Comput. Stat. Data Anal. 2012, 56, 1609–1623. [Google Scholar] [CrossRef] [Green Version]
  25. Cragg, J. Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 1971, 39, 829–844. [Google Scholar] [CrossRef]
  26. Prudnikov, A.P.; Brychkov, Y.A.; Marichev, O.I. Inntegrals and Series: More Special Functions; Gordon and Breach Science Publishers: New York, NY, USA, 1990. [Google Scholar]
  27. Vuong, Q.H. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 1989, 57, 307–333. [Google Scholar] [CrossRef] [Green Version]
  28. Kullback, S.; Leibler, R.A. On information and sufficiencys. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  29. Galvis, D.M.; Lachos, V.H.; Bandyphaday, D. Augmented mixed models for clustered proportion data. Stat. Methods Med. Res. 2017, 26, 880–897. [Google Scholar]
  30. Moulton, L.H.; Halsey, N.A. A mixture model with detection limits for regression analyses of antibody response to vaccine. Biometrics 1995, 51, 1570–1578. [Google Scholar] [CrossRef]
Figure 1. (a) Histogram of the death proportions. (b) Graphs: empiric (solid line), BUBSZOI model (dotted line).
Figure 1. (a) Histogram of the death proportions. (b) Graphs: empiric (solid line), BUBSZOI model (dotted line).
Mathematics 10 00667 g001
Figure 2. (a) Histogram of the death proportions. (b) Graphs: empiric (solid line), BEZI (dashed line). (c) Graphs: empiric (solid line), BUBSZI (dotted line).
Figure 2. (a) Histogram of the death proportions. (b) Graphs: empiric (solid line), BEZI (dashed line). (c) Graphs: empiric (solid line), BUBSZI (dotted line).
Mathematics 10 00667 g002
Figure 3. (a) Histogram of the the variable X. (b) Graphs: empiric (solid line), BUBSZOI (dotted line).
Figure 3. (a) Histogram of the the variable X. (b) Graphs: empiric (solid line), BUBSZOI (dotted line).
Mathematics 10 00667 g003
Table 1. Statistical summary of the infant deaths data.
Table 1. Statistical summary of the infant deaths data.
DatanMeanSEBiasKurtosis
Complete55610.1370.2462.0696.647
Non-censored35410.2920.2160.8112.94
Table 2. The MLE of the parameters of the mixtures of the Bernoulli distribution with: Beta, LPN and UBS models.
Table 2. The MLE of the parameters of the mixtures of the Bernoulli distribution with: Beta, LPN and UBS models.
Est.BEINFMBLPNBUBSZOI
μ ^ 0.297 (0.004)0.661 (0.0066)
σ ^ 0.456 (0.005)0.022 (0.0030)
α ^ 0.001 (0.001)0.783 (0.012)
β ^ 1.198 (0.019)
δ ^ 0 0.606 (0.007)0.606 (0.007)0.606 (0.007)
δ ^ 1 0.031 (0.002)0.031 (0.002)0.031 (0.002)
Table 3. The MLE of the parameters of the BEZI, Tobit and BUBSZI models.
Table 3. The MLE of the parameters of the BEZI, Tobit and BUBSZI models.
Est.BEZITobitBUBSZI
μ ^ 0.088 (0.004)0.085 (0.004)
σ ^ 6.290 (0.404)0.104 (0.003)
α ^ 0.575 (0.016)
β ^ 2.993 (0.066)
δ ^ 0 0.023 (0.006)0.023 (0.006)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Martínez-Flórez, G.; Tovar-Falón, R.; Barrera-Causil, C. Inflated Unit-Birnbaum-Saunders Distribution. Mathematics 2022, 10, 667. https://0-doi-org.brum.beds.ac.uk/10.3390/math10040667

AMA Style

Martínez-Flórez G, Tovar-Falón R, Barrera-Causil C. Inflated Unit-Birnbaum-Saunders Distribution. Mathematics. 2022; 10(4):667. https://0-doi-org.brum.beds.ac.uk/10.3390/math10040667

Chicago/Turabian Style

Martínez-Flórez, Guillermo, Roger Tovar-Falón, and Carlos Barrera-Causil. 2022. "Inflated Unit-Birnbaum-Saunders Distribution" Mathematics 10, no. 4: 667. https://0-doi-org.brum.beds.ac.uk/10.3390/math10040667

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop