Next Article in Journal
A Bayesian Approach for Imputation of Censored Survival Data
Next Article in Special Issue
A General Description of Growth Trends
Previous Article in Journal
A Flexible Mixed Model for Clustered Count Data
Previous Article in Special Issue
Optimal Neighborhood Selection for AR-ARCH Random Fields with Application to Mortality
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Noncentral Lindley Construction Illustrated in an INAR(1) Environment

by
Johannes Ferreira
1,2,* and
Ané van der Merwe
1
1
Department of Statistics, Faculty of Natural and Agricultural Sciences, University of Pretoria, Pretoria 0028, South Africa
2
Centre of Excellence in Mathematical and Statistical Sciences, Johannesburg 2050, South Africa
*
Author to whom correspondence should be addressed.
Submission received: 3 December 2021 / Revised: 23 December 2021 / Accepted: 28 December 2021 / Published: 10 January 2022
(This article belongs to the Special Issue Modern Time Series Analysis)

Abstract

:
This paper proposes a previously unconsidered generalization of the Lindley distribution by allowing for a measure of noncentrality. Essential structural characteristics are investigated and derived in explicit and tractable forms, and the estimability of the model is illustrated via the fit of this developed model to real data. Subsequently, this model is used as a candidate for the parameter of a Poisson model, which allows for departure from the usual equidispersion restriction that the Poisson offers when modelling count data. This Poisson-noncentral Lindley is also systematically investigated and characteristics are derived. The value of this count model is illustrated and implemented as the count error distribution in an integer autoregressive environment, and juxtaposed against other popular models. The effect of the systematically-induced noncentrality parameter is illustrated and paves the way for future flexible modelling not only as a standalone contender in continuous Lindley-type scenarios but also in discrete and discrete time series scenarios when the often-encountered equidispersed assumption is not adhered to in practical data environments.

1. Introduction

The analysis of time series has enjoyed sustained interest, in particular the case when the data generating process emanates from an underlying counting model, resulting in the considered innovation process to follow some stochastic discrete distribution. Practical examples of such scenarios include the number of daily transactions on a stock exchange, number of days patients spend in a health care facility, or the daily counts of positive COVID-19 cases in a specific geographical area.
Under this discrete innovation assumption, the often considered assumption of normality leaves the practitioner in an unideal position due to this continuous assumption for a process which is inherently discrete in nature. In particular, Ref. [1] provides a seminal guideline on the use and implementation of a discrete-based time series analysis with discrete innovations, and various authors (such as [2,3,4,5] and others) have considered the first-order integer-valued (stationary) autoregressive process (INAR(1)) as a more viable contender compared to the usual (continuous) first-order autoregressive process (AR(1)), where the error term ϵ t > 0 is oftentimes characterised by a Poisson random variable. Suppose that X t represents the (non-negative) discrete time series count at time t ( t = 0 , 1 , 2 , ), then the INAR(1) process is defined by
X t = p X t 1 + ϵ t , t Z
where p ( 0 , 1 ) and ∘ denote the binomial thinning operator, defined by p X t 1 : = i = 1 X t 1 W i where W i is a Bernoulli random variable such that P ( W i = 1 ) = 1 P ( W i = 0 ) = p . For the innovations ϵ t in (1), the mean is indicated by μ ϵ and finite variance by σ ϵ 2 ; and for the INAR(1) model itself in (1), the mean is given by μ X t = μ ϵ 1 p and the variance by σ X t 2 = σ ϵ 2 + p μ ϵ 1 p 2 (see also [1,6,7]). The choice of the Poisson model for ϵ t was introduced by [8], but has been shown to be inefficient for accurate inference due to the equidispersion property, where μ ϵ = σ ϵ 2 of the Poisson distribution.
To address this, Refs. [6,9,10,11] introduced alternative considerations for the choice of ϵ t , some of which emanates from the Lindley distribution or its generalisations incorporated into a discrete model for the INAR(1) innovation process. The Lindley distribution in itself has shown significant versatility as a continuous lifetime model, and a variety of generalisations of this model has been studied and reported on in literature (see [12,13,14]). In addition to this, the incorporation of the Lindley distribution and its generalisations within a Poisson framework to obtain generalised discrete models has also experienced a sustained interest, mainly due to the attractive feature of being able to model departures that data might exhibit from equidispersion. In this study, a meaningful noncentrality parameter is systematically induced into the Lindley distribution from where a discrete counting model is also derived. This counting model is then illustrated as a candidate for ϵ t within a discrete time series environment. In this way, this work enriches the existing literature with a previously unconsidered noncentral candidate with a parameter that has leverage and meaning within understanding and modelling dispersion in discrete models, apart from a standalone contribution to the ever-growing literature on statistical distribution theory.
Therefore the contribution of this paper is threefold:
  • A noncentral Lindley distribution is systematically constructed and some statistical characteristics are derived;
  • A discrete counting model based on this noncentral Lindley distribution is derived via compounding with the Poisson distribution, together with essential statistical characteristics;
  • This discrete counting model is implemented and illustrated as innovation structure (i.e., ϵ t ) within an INAR(1) time series environment, and juxtaposed against often-considered contenders for the innovation structures.
The particular value of interpretable and systematically induced noncentrality in these contributions is highlighted. These contributions pave the way for flexible statistical modelling in practice not only for meaningful inference, but also continuing the understanding, expansion, and development of the statistical theoretic universe. These models, as competitors, exhibits meaningful and tractable choices for the practitioner to employ within not only usual positive real-data modelling, but also discrete modelling, and thirdly, within an INAR counting time series environment.
The paper is outlined as follows. In Section 2, the systematic construction of the noncentral Lindley (ncL) distribution is discussed and characteristics of this model are derived, followed by the construction and characteristics of the Poisson-noncentral Lindley (PncL) model. Section 3 describes and contains the modelling results of applying the newly developed models to real data, including a fitting and discussion around the implementation of the PncL within the INAR(1) (1) model. The final thoughts are contained in Section 4.

2. Building Blocks of New Model

The Lindley distribution (initially introduced by [15]) has seen a resurgence of attention over the span of the last 15 years, which could be attributed to its elegant construction as a weighted two-mixture distribution. This mixture consists of an exponential distribution with parameter β > 0 as the first component (with density function denoted by f e x p ( y ) ), and a gamma distribution with scale and shape parameters ( β , 2 ) (with density function denoted by f g a m ( y ) ). The weight function for this two-mixture distribution is given by ω = β 1 + β . In this paper, we consider a generalised weight function with additional parameters α and θ for additional flexibility; that is, ω = α β α β + θ where θ > 0 and α β + θ > 0 (see [3,16]). This leaves a random variable Y with density function:
f ( y ) = ω f e x p ( y ) + ( 1 ω ) f g a m ( y ) = β 2 α β + θ exp ( β y ) ( α + y θ ) , y > 0 .
Our point of departure for the contribution of this paper is to consider a noncentral gamma distribution with noncentral parameter λ > 0 instead of the usual gamma choice in (2). This leads to our proposed noncentral Lindley construction (ncL) construction, defined below.

2.1. A Noncentral Lindley Construction

We construct an ncL distribution by considering a random variable Y which follows the noncentral gamma distribution with scale, shape, and noncentral parameters ( β , 2 , λ ) with density function (see [17]):
f n c g a m ( y ) = i = 1 exp ( λ 2 ) ( λ 2 ) i i ! β 2 + i Γ ( 2 + i ) y 2 + i 1 exp ( β y ) , y > 0
where Γ ( · ) denotes the gamma function. Various authors (see for example, [17,18,19,20]) have considered and employed the noncentral gamma distribution as an essential generalisation of the gamma distribution, and also as a direct generalisation which emanates from the popular noncentral chi-square distribution. In particular, the hierarchical representation of (3) is theoretically attractive as a discrete mixture of the (usual) gamma distribution.
The consideration of this noncentral gamma distribution (3) as an enriched mixture component in (2), instead of the usual gamma distribution, is thus a systematic and organic parameter enrichment to result in a noncentral Lindley alternative. In this case, the density function of the resulting ncL distribution is given by:
f ( y ) = ω f e x p ( y ) + ( 1 ω ) f n c g a m ( y ) = β 2 α β + θ exp ( β y ) α + θ y exp λ 2 i = 0 ( λ 2 β y ) i i ! 1 ( 2 ) i = β 2 α β + θ exp ( β y ) α + y θ exp λ 2 0 F 1 2 ; λ 2 β y
where 0 F 1 ( · ) denotes the confluent hypergeometric function, and ( a ) j denotes the Pochammer coefficient (see [21]). In this case, we denote Y ncL ( β , α , θ , λ ) . When λ = 0 , (4) reduces to the considered generalised Lindley model of [3] with density function (2). Furthermore, when α = β = 1 , (4) reflects the usual Lindley density function (see [15]).
The distribution function, moment generating function, and moments of Y ncL ( β , α , θ , λ ) with density (4) is presented in the following theorem. The proof is contained in Appendix A.
Theorem 1.
Suppose that the random variable Y is distributed as ncL with density function (4). Then, the distribution function, moment generating function, and moments of Y is given by:
1. 
F Y ( y ) = ω ( 1 exp ( β y ) ) + ( 1 ω ) exp λ 2 i = 0 ( λ 2 ) i ( 2 ) i γ ( i + 2 , β y ) ,
2. 
M Y ( t ) = β 2 ( α β + θ ) ( β t ) 2 α ( β t ) + θ exp λ 2 t β t ,
3. 
E ( Y r ) = β 2 α β + θ α Γ ( r + 1 ) β r + 1 + θ exp ( λ 2 ) Γ ( r + 2 ) ( α β + θ ) β r 1 F 1 r + 2 ; 2 ; λ 2
where r > 0 , t < β , γ ( · , · ) denotes the incomplete lower gamma function (see [21]) and 1 F 1 ( · ) represents the hypergeometric function with one lower and one upper parameter (see [21]). Note that γ ( i + 2 , β y ) in (5) may be simplified further into a closed form as ( i + 1 ) ! 1 exp ( β y ) m = 0 i + 1 ( β y ) m m ! using [21], equation 8.352.6.
These expressions exhibit the elegant incorporation of the noncentral parameter λ , and using all three characteristics (5), (6), and (7) by setting λ = 0 , the corresponding characteristics of the usual generalised Lindley of [22] (see also [3]) may be obtained.
In Figure 1, some particular shapes of the density function (4) are illustrated for various combinations of parameters. The following effects are observed for changes in the different parameters:
  • λ affects the location and spread of the distribution. As λ increases (comparing Figure 1a–c relative to Figure 1d–f), the distribution flattens out, shifts to the right and becomes less skewed;
  • β affects the scale of the distribution. As β increases, the scale of the distribution increases (comparing the densities in Figure 1a and Figure 1d, respectively);
  • α and θ affect the shape of the distribution;
    -
    As α increases positively (refer to Figure 1b and Figure 1e, respectively), the distribution tends to the shape of an exponential distribution;
    -
    In contrast to the latter, the shape of the distribution tends to that similar to an exponential distribution as θ tends to zero (refer to Figure 1c and Figure 1f, respectively);
    -
    The origin of the distribution is shifted for y > 0 when α > 0 .

2.2. A Counting Model: PncL Distribution

For discrete (count data) modelling, the Poisson distribution is a well-known and popular choice due to its tractability and ease of implementation. The author of [23] illustrated the possibility of allowing the Poisson parameter (say μ ) to be distributed according to the Lindley distribution of [15], and investigated characteristics related to this. When considering the dispersion index D I = E ( X ) Var ( X ) , it is noted that the equidispersion property (when D I = 1 ) of the Poisson distribution is often restrictive—in this sense, the contribution of the ncL distribution as a viable candidate for the distribution of the Poisson parameter μ is discussed in this section. This proves to be particularly valuable and directly insightful as the practitioner may allow for leverage on the mean (or rather, noncentrality) of the model via the use of λ and thus inherently be able to model departures from equidispersion which the data may exhibit.
Suppose a variable X follows a Poisson distribution with parameter μ > 0 , and let μ be distributed as ncL with a density function as in (4). Then, using the compounding method with [21], the mass function for X is given by:
p ( x ) = 0 exp ( μ ) μ x x ! β 2 α β + θ exp ( β μ ) α + θ exp ( λ 2 ) μ 0 F 1 ( 2 ; λ 2 β μ ) d μ = β 2 α β + θ 1 x ! α 0 μ x exp ( μ ( β + 1 ) ) d μ + θ exp ( λ 2 ) 0 exp ( μ ( β + 1 ) ) μ x + 1 0 F 1 ( 2 ; λ 2 β μ ) ] = α β 2 ( α β + θ ) ( β + 1 ) x + 1 + θ β 2 exp ( λ 2 ) Γ ( x + 2 ) ( α β + θ ) ( β + 1 ) x + 2 x ! 1 F 1 x + 2 , 2 ; λ 2 β β + 1
for x = 0 , 1 , 2 , representing counts. In this case, we define X PncL ( β , α , θ , λ ) .
The probability generating function and factorial moments of the distribution in (8) are presented in the following theorem. The proof is contained in Appendix A.
Theorem 2.
Suppose that the random variable X is distributed as a PncL variable with mass function (8). Then, the probability generating function and factorial moment of X are given by:
1. 
G X ( s ) = β 2 ( α β + θ ) ( β + 1 s ) 2 α ( β + 1 s ) + θ exp λ 2 s 1 β + 1 s ,
2. 
E ( ( X ) r ) = β 2 α β + θ Γ r + 1 β + 1 r + 1 α + θ exp λ 2 1 F 1 2 , r + 1 ; λ 2 β β + 1
where s < β + 1 and r > 0 . Note that by replacing s with exp ( s ) in (9), the moment generating function of the distribution in (8) is obtained.
Our systematically constructed discrete model includes the model of [23] as a special case, particularly when α = θ = 1 and λ = 0 and also reflects a similar model as that of [3]. It is, however, important to note that [3] relied on the survival discretisation method instead of compounding to obtain a discrete candidate based on their considered underlying generalised Lindley model.
Table 1 illustrates the behaviour that can be captured by the PncL model (8) by calculating the theoretical values of the mean, variance, D I , skewness, and kurtosis for different values of λ . It is valuable to note the correspondence between the mean and λ in particular, as this provides a unique insight for the practitioner of the leverage related to the D I via the incorporation of λ . In this case, the skewness and kurtosis of the variable X is given by E ( X 3 ) 3 E ( X 2 ) E ( X ) + 2 ( E ( X ) ) 3 ( Var ( X ) ) 3 / 2 and E ( X 4 ) 4 E ( X 2 ) E ( X ) + 6 E ( X 2 ) ( E ( X ) ) 2 3 ( E ( X ) ) 4 ( Var ( X ) ) 2 , respectively, and calculated using the moments (A1)–(A4) as given in Appendix B.
In addition to Table 1, Figure 2 illustrates some particular shapes of the mass function (8) for different values of λ for the following combinations of β , α , and θ :
  • β = 0.1 , α = 0.02 and θ = 0.3 , illustrated in Figure 2a–c;
  • β = 1 , α = 0.02 and θ = 0.3 , illustrated in Figure 2d–f;
  • β = 1 , α = 0.8 and θ = 1.3 , illustrated in Figure 2g–i.
The following is observed from Figure 2:
  • λ affects the location (evident from the shifts in the means illustrated in red) and spread of the distribution. As λ increases, the distribution flattens out, shifts to the right and becomes less skewed. This is also evident from Table 1;
  • β affects the scale of the distribution. As β increases, the scale of the distribution increases (comparing the mass functions in Figure 2a–c to Figure 2d–f, respectively);
  • α and θ jointly affect the shape of the distribution (comparing the mass functions in Figure 2d–f to Figure 2g–i, respectively).
Figure 2. Shapes of the mass function (8) for arbitrary parameter choices, with means indicated in red. (ai) some particular shapes of the mass function (8) for different values of λ for the following combinations of β , α , and θ .
Figure 2. Shapes of the mass function (8) for arbitrary parameter choices, with means indicated in red. (ai) some particular shapes of the mass function (8) for different values of λ for the following combinations of β , α , and θ .
Stats 05 00005 g002

3. Numerical Illustrations

In this section, data fitting applications are presented in the case of the (continuous) ncL, (discrete) PncL, as well as the implementation of the PncL in an INAR(1) environment, to illustrate the position that the newly developed models in Section 2.1 and Section 2.2 takes up in the literature. The models are fitted using maximum likelihood estimation for ncL and PncL, and conditional maximum likelihood estimation for the illustration and fit of the INAR(1) process. These fits are comparatively investigated with other popular contenders using criteria which includes the negative maximum log-likelihood (−) and Akaike’s information criterion (AIC). The datasets considered below can be found within the original published works, but for the convenience of the reader, the data has been uploaded to a Google Drive link available at the end of this paper. All computations are carried out using R 4.1.0 in a Win 64 environment with a 1.30 GHz/Intel(R) Core(TM) i7-1065G7 CPU Processor and 8.0 GB RAM.

3.1. Fitting of the ncL

The value of the ncL distribution (4) is illustrated here by fitting this distribution to two datasets:
  • Waiting times (in minutes) before service of 100 banking customers;
  • Survival times (in days) of 72 guinea pigs infected with virulent tubercle bacilli.
The fits are compared to the usual Lindley model of [22] (i.e., (4) when α = θ = 1 and λ = 0 ) and the three parameter Lindley (3PL) model of [16] (i.e., (4) when λ = 0 ). The maximum likelihood estimates for the parameters of these models are reported in Table 2 and Table 3, together with the goodness of fit statistics. In addition to the − and AIC statistics, the Kolmogorov–Smirnov ( K S ) nonparametric test is implemented to compare the data to the estimated models. The K S test statistic is defined as:
K S = max y F n ( y ) F ( y )
where F n ( y ) and F ( y ) represent the empirical and estimated distribution functions, respectively [24]. Since the K S test statistic measures the largest absolute distance between the empirical and estimated distribution functions, smaller K S test statistics (with larger p-values) suggest a better fit.
With similar − and AIC statistics, Table 2 shows that the proposed ncL model compares well to its Lindley and 3PL contenders for the banking waiting times dataset. With a small (close to zero) estimated λ , it can be concluded that the noncentrality is insignificant for this data, hence the ncL model performs similar to the Lindley and 3PL models. These results are verified in Figure 3.
According to all goodness of fit statistics, Table 3 shows that the proposed ncL model outperforms both its Lindley and 3PL contenders for the guinea pigs dataset. This suggests that the noncentrality within this dataset is captured effectively, which is also evident from Figure 4.

3.2. Fitting of the PncL

Here, the discrete PncL (8), in comparison to the Poisson–Lindley (PL) and Poisson three parameter Lindley (P-3PL) of [3,23], respectively, is fitted to the following datasets:
  • The length of stay of 67 patients on a psychiatric ward (see [3]);
  • The survival times (in days) of 72 guinea pigs infected with virulent tubercle bacilli.
It is of interest to note that the ncL model was also fitted to the second dataset mentioned above. The values of this dataset are spread among a higher range and could be considered continuous (see previous section), but in essence, the integer nature of the data could also be considered as counts. In this spirit, we analyse the same dataset using the PncL as well. Table 4 and Table 5 report the results of the psychiatric data fitting and the estimated statistics of interest, such as the mean, variance, D I , skewness, and kurtosis, respectively. Comparing the goodness of fit statistics in Table 4, it is clear that both the P-3PL and PncL perform similarly, outperforming the PL. Considering the estimated statistics in Table 5, it is evident that the proposed PncL describes the data best.
Considering the discrete fit for the guinea pigs dataset, Table 6 and Table 7 report the estimation results and estimated statistics of interest, such as the mean, variance, D I , skewness, and kurtosis, respectively. Comparing the goodness of fit statistics in Table 6, it is clear that the proposed PncL outperforms both PL and P-3PL models, capturing the characteristics of the data well (evident from Table 7).

3.3. Fitting of the PncL within INAR(1)

Here, we model the INAR process as given in (1), where the one-step transition probability of the process is given by:
P ( X t = k | X t 1 = l ) = i = 1 min ( k , l ) P ( B l p = i ) × P P n c L ( ϵ t = k i )
where B l p B i n o m i a l ( p , l ) and p ( 0 , 1 ) , and in the case of this study, P P n c L ( ϵ t = k i ) , is given by (8). This proposed process aims to allow the researcher a more interpretable approach to leverage information of the mean via the noncentrality parameter λ , and subsequently incorporate this as part of the estimation of the model when applied to time series data that illustrates departures from equidispersion.
Inspired by [7,25], the following remark is presented without proof, and captures meaningful properties of an INAR(1) process when subjected with PncL innovations with mass function in (8):
Remark 1.
If X t is the INAR(1) process as defined in (1) with transition probability given by (11), then:
1. 
E ( X t | X t 1 ) = p X t 1 + E ( X ) ;
2. 
V a r ( X t | X t 1 ) = p ( 1 p ) X t 1 + E ( X 2 ) ( E ( X ) ) 2 ;
3. 
μ X t = E ( X ) 1 p ;
4. 
σ X t 2 = E ( X 2 ) ( E ( X ) ) 2 + p E ( X ) 1 p 2 ;
5. 
γ k = p k V a r ( X t k ) , ρ k = p k ;
for lag k > 0 . Here, E ( X ) is given by (A1), and E ( X 2 ) is given by (A2).
We consider the following time series datasets for the conditional maximum likelihood estimation of the PncL distribution within an INAR(1) context, in comparison to its well-known Poisson, negative binomial (NB), Bell (see [7,26]), and P-3PL (see [3]) contenders:
  • Daily number of downloads of certain software for the period June 2006 up to February 2007 (sample size T = 267 ) (see [27]);
  • Monthly number of strikes leading to at least 1000 workers being idle (published by the U.S. Bureau of Labor Statistics, http://www.bls.gov/wsp/ (accessed on 1 December 2021)). The time period January 1994 to December 2002 (sample size T = 108 ) is considered.
The conditional log-likelihood function of our INAR(1) process with innovations given by (8) is defined as:
Θ = t = 2 T ln P X t = k | X t 1 = l = t = 2 T ln i = 1 min X t , X t 1 X t 1 i p i 1 p X t 1 i p P n c L X t i
where Θ = ( p , α , β , θ , λ ) is the vector of maximum likelihood estimates of the parameters and p P n c L ( · ) denotes the mass function in (8).
Figure 5 and Figure 6 illustrate the time series plots of both the downloads’ and strikes’ data, respectively, together with the observed marginal distributions, autocorrelation functions (ACFs), and partial autocorrelation functions (PACFs). The sample mean and variance for the downloads data is given by 2.401 and 7.534, respectively, whereas the sample mean and variance for the strikes’ data is given by 4.944 and 7.922, respectively. After evaluating the sample ACFs and PACFs, it is concluded that an AR(1)-like process is an appropriate choice for fitting a time series model to both these datasets (suggested by the significant sample partial autocorrelations at lags 1 only). Furthermore, since both these time series are discrete in nature, an INAR(1) process is suggested with an error term ϵ t characterised by a discrete distribution allowing for overdispersion (due to a sample D I > 1 for both time series).
Table 8 shows the estimation results for the downloads data. With the PncL INAR(1) model having the lowest − value and an estimated mean and variance of 2.366 and 6.995 (compared to the sample mean and variance of 2.4 and 7.5), respectively, we conclude that the proposed PncL INAR(1) model competes well in relation to its NB INAR(1) contender, having the smallest AIC value. This estimated PncL INAR(1) model is given by:
X t = 0.157 X t 1 + ϵ t
where the error term structure is estimated as ϵ t PncL ( 1.023 , 2.32 , 1.29 , 3.919 ) .
Considering the estimation results for the strikes’ data, Table 9 shows that the proposed PncL INAR(1) model competes well to its previously proposed competitors. With the lowest − value and an estimated mean and variance of 4.981 and 6.922 (compared to the sample mean and variance of 4.9 and 7.9), respectively, we can conclude that the PncL INAR(1) model is an adequate fit for the strikes’ data with an estimated function given by:
X t = 0.558 X t 1 + ϵ t
where the error term structure is estimated as ϵ t PncL ( 3.843 , 0.29 , 13.083 , 14.191 ) . Note that even though the Bell INAR(1) yields the smallest AIC value, it overestimates the variance within the time series. In this table, the mean and variance are rounded to three decimal places and in this case the means present almost equal for all of the fitted models.
In order to examine the accuracy of the proposed candidate, it remains necessary to analyse residuals from the fitted model, considering the standardised Pearson residuals for t = 2 , , T defined by:
e t = x t E X t | x t 1 Var X t | x t 1
where E X t | x t 1 = p X t 1 + μ ϵ and Var X t | x t 1 = p ( 1 p ) X t 1 + σ ϵ 2 (see [1]), with μ ϵ and σ ϵ 2 denoting the mean and variance of the innovation ϵ t distribution calculated using the moments (A1) and (A2) given in Appendix B. For the model to be an adequate fit, it is required that these residuals are uncorrelated with a mean close to zero and a variance close to one. Note that a variance deviating from one indicates that the model does not sufficiently capture all dispersion present in the data. The mean and variance of the Pearson residuals obtained from the PncL INAR(1) model for the downloads data are −0.001 and 0.631, respectively. Considering the Pearson residuals for the strikes’ data, we observe a mean and variance of −0.001 and 0.522, respectively. With these values being close to the desired values, and together with the respective ACFs of the Pearson residuals showing no significant autocorrelation in Figure 7 and Figure 8, it is concluded that the proposed PncL INAR(1) model fits both time series adequately.

4. Conclusions

This study presents an enriched noncentral Lindley distribution (ncL) containing a (noncentral) parameter which has been systematically introduced via the noncentral gamma distribution. The characteristics of this newly developed model received attention, which includes the cumulative distribution function, moment generating function, and the moments of the distribution which are derived in closed form analytical expressions. This model retains mathematical elegance with closed forms for the characterisations, and is shown to contain existing models in the literature for specific choices of the parameters. In addition, a discrete counterpart was derived by using this ncL model via compounding with a Poisson variable. In this way, the interpretability of the noncentral parameter is inherited within a discrete environment as well, and paves the way for the practitioner to leverage information regarding the mean (or rather, noncentrality) of any suitable given data to provide insight on the dispersion index of the data in question. This model is incorporated within a time series context as innovation terms in an INAR(1) environment and juxtaposed against other popular models.
It is valuable to note that no one distribution will always outperform another, due to the data-driven nature of statistical model fitting. As such, the contributions in this paper does not exclusively outperform competing models in terms of the considered datasets—but results indicate that it is as good a model, if not occasionally slightly better, than the usual choices. The true value of this paper lies within this systematic construction of a previously unconsidered continuous (ncL) and discrete (PncL) distribution, and the inclusion of a noncentral parameter which is estimable and interpretable in a location context for data. Additionally, these models contain usual considered models as special cases, and thus acts as a unifying consideration for practitioners not only with continuous or discrete interests, but also with discrete time series interests. In future, further considerations and comparisons of other unconsidered discrete models (such as the refreshing recent contribution of [28] as well as [29]) within a discrete time series environment may be pursued, in conjunction with alternate thinning considerations (such as [30]).

Author Contributions

Conceptualization, J.F.; methodology, J.F. and A.v.d.M.; software, A.v.d.M.; formal analysis, J.F. and A.v.d.M.; writing—original draft preparation, J.F.; writing—review and editing, J.F. and A.v.d.M.; supervision, J.F.; funding acquisition, J.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work forms part of the grant RDP296/2021 based at the University of Pretoria, South Africa.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

This manuscript does not contain any studies with human participants, animals, or informed consent.

Data Availability Statement

Data in this study is available at https://drive.google.com/drive/folders/1pxymEy37SqJ8nqdBjj4MJ_-DxhiJ8TXC?usp=sharing accessed on 1 December 2021.

Acknowledgments

The author wishes to thank the Department of Statistics at the University of Pretoria and the Centre of Excellence in Mathematical and Statistical Sciences based at the University of the Witwatersrand, Johannesburg, South Africa for funding. Finally, the authors wish to thank two anonymous reviewers whose comments assisted to refine the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs

Proof Theorem 1.
  • See that
    F Y y = β 2 α β + θ 0 y exp β t α + t θ exp λ 2 0 F 1 2 ; λ 2 β t d t = α β 2 α β + θ 0 y exp β t d t + θ β 2 α β + θ exp λ 2 0 y t exp β t 0 F 1 2 ; λ 2 β t d t = α β α β + θ 1 exp β x + θ α β + θ exp λ 2 i = 0 λ 2 i 2 i i ! γ i + 2 , β x = ω 1 exp β x + 1 ω exp λ 2 i = 0 λ 2 i 2 i i ! γ i + 2 , β x .
  • Using [21] p. 815, see that
    M Y y = 0 exp t y β 2 α β + θ exp β y α + y θ exp λ 2 0 F 1 2 ; λ 2 β y d y = α β 2 α β + θ 0 exp β t y d y + θ β 2 α β + θ exp λ 2 0 y exp β t y 0 F 1 2 ; λ 2 β y d y = β 2 α β + θ α β t + θ exp λ 2 β t 2 exp λ 2 β β t ,
    which leaves the final result.
  • Using [21] p. 815, see that
    E Y r = 0 y r β 2 α β + θ exp β y α + y θ exp λ 2 0 F 1 2 ; λ 2 β y d y = α β 2 α β + θ 0 y r exp β y d y + θ β 2 α β + θ exp λ 2 0 y r + 1 exp β y 0 F 1 2 ; λ 2 β y d y = β 2 α β + θ α β r + 1 Γ r + 1 + β 2 α β + θ θ β r + 2 exp λ 2 Γ r + 2 1 F 1 r + 2 , 2 ; λ 2 ,
    which leaves the final result. □
Proof Theorem 2.
  • See that
    G X s = x = 0 s x 0 exp μ μ x x ! β 2 α β + θ exp β μ α + μ θ exp λ 2 0 F 1 2 ; λ 2 β μ d μ = β 2 α β + θ 0 exp s μ μ β μ α + μ θ exp λ 2 0 F 1 2 ; λ 2 β μ d μ = β 2 α β + θ α 1 β + 1 s + θ exp λ 2 1 β + 1 s 2 1 F 1 2 , 2 ; λ 2 β β + 1 s = β 2 α β + θ 1 β + 1 s 2 α β + 1 s + θ exp λ 2 s 1 β + 1 s ,
    which leaves the final result.
  • See that
    E X r = x = 0 x x 1 x r + 1 0 exp μ μ x x ! β 2 α β + θ exp β μ × α + μ θ exp λ 2 0 F 1 2 ; λ 2 β μ d μ = 0 x = r exp μ μ x r x r ! μ r exp μ β 2 α β + θ exp β μ × α + μ θ exp λ 2 0 F 1 2 ; λ 2 β μ d μ = β 2 α β + θ 0 μ r exp β + 1 μ α + μ θ exp λ 2 0 F 1 2 ; λ 2 β μ d μ = α β 2 α β + θ 0 μ r exp β + 1 μ d μ + β 2 α β + θ θ exp λ 2 0 μ r exp β + 1 μ 0 F 1 2 ; λ 2 β μ d μ = α β 2 α β + θ Γ r + 1 β + 1 r + 1 + β 2 α β + θ θ exp λ 2 Γ r + 1 β + 1 r + 1 1 F 1 2 , r + 1 ; λ 2 β β + 1 ,
    which again, using [21] p. 815, leaves the final result. □

Appendix B. Moments

The first four moments of interest of the PncL distribution (8) is given below.
E ( X ) = α + θ β λ 2 + 2 α β + θ .
E ( X 2 ) = 1 α β + θ θ λ 2 β + 2 β + 3 λ β 2 + λ 2 4 β 2 + 6 β 2 + α + 2 α β .
E ( X 3 ) = 1 α β + θ [ α ( 1 + 6 β + 6 β 2 ) + θ ( λ 3 8 β 3 + 3 λ 2 β 3 + 3 λ 2 4 β 2 + 18 λ β 3 + 9 λ β 2 + λ 2 β + 2 β + 18 β 2 + 24 β 3 ) ] .
E ( X 4 ) = 1 α β + θ ( α ( 1 + 14 β + 36 β 2 + 24 β 3 ) + θ × ( 2 + λ 2 β + 42 + 21 λ + 7 4 λ 2 β 2 + 144 + 108 λ + 18 λ 2 + 3 4 λ 3 β 3 + 120 + 120 λ + 30 λ 2 + 5 2 λ 3 + 1 16 λ 4 β 4 ) ) .

References

  1. Weiß, C.H. An Introduction to Discrete-Valued Time Series; John Wiley & Sons: Hoboken, NJ, USA, 2018. [Google Scholar]
  2. Lívio, T.; Khan, N.M.; Bourguignon, M.; Bakouch, H.S. An INAR (1) model with Poisson–Lindley innovations. Econ. Bull. 2018, 38, 1505–1513. [Google Scholar]
  3. Eliwa, M.S.; Altun, E.; El-Dawoody, M.; El-Morshedy, M. A new three-parameter discrete distribution with associated INAR (1) process and applications. IEEE Access 2020, 8, 91150–91162. [Google Scholar] [CrossRef]
  4. Eliwa, M.; El-Morshedy, M. A one-parameter discrete distribution for over-dispersed data: Statistical and reliability properties with applications. J. Appl. Stat. 2021, 1–21. [Google Scholar] [CrossRef]
  5. Irshad, M.R.; Chesneau, C.; D’cruz, V.; Maya, R. Discrete Pseudo Lindley Distribution: Properties, Estimation and Application on INAR (1) Process. Math. Comput. Appl. 2021, 26, 76. [Google Scholar] [CrossRef]
  6. Alzaid, A.; Al-Osh, M. First-order integer-valued autoregressive (INAR (1)) process: Distributional and regression properties. Stat. Neerl. 1988, 42, 53–61. [Google Scholar] [CrossRef]
  7. Huang, J.; Zhu, F. A New First-Order Integer-Valued Autoregressive Model with Bell Innovations. Entropy 2021, 23, 713. [Google Scholar] [CrossRef] [PubMed]
  8. McKenzie, E. Some simple models for discrete variate time series. Water Resour. Bull. 1985, 21, 645–650. [Google Scholar] [CrossRef]
  9. Kim, H.; Lee, S. On first-order integer-valued autoregressive process with Katz family innovations. J. Stat. Comput. Simul. 2017, 87, 546–562. [Google Scholar] [CrossRef]
  10. Al-Osh, M.A.; Alzaid, A.A. First-order integer-valued autoregressive (INAR (1)) process. J. Time Ser. Anal. 1987, 8, 261–275. [Google Scholar] [CrossRef]
  11. Ristić, M.M.; Bakouch, H.S.; Nastić, A.S. A new geometric first-order integer-valued autoregressive (NGINAR (1)) process. J. Stat. Plan. Inference 2009, 139, 2218–2226. [Google Scholar] [CrossRef]
  12. Ekhosuehi, N.; Opone, F. A three parameter generalized Lindley distribution: Properties and application. Statistica 2018, 78, 233–249. [Google Scholar]
  13. Abd El-Monsef, M. A new Lindley distribution with location parameter. Commun. Stat. Theory Methods 2016, 45, 5204–5219. [Google Scholar] [CrossRef]
  14. Nedjar, S.; Zeghdoudi, H. On gamma Lindley distribution: Properties and simulations. J. Comput. Appl. Math. 2016, 298, 167–174. [Google Scholar] [CrossRef]
  15. Lindley, D.V. Fiducial distributions and Bayes’ theorem. J. R. Stat. Soc. Ser. Methodol. 1958, 20, 102–107. [Google Scholar] [CrossRef]
  16. Shanker, R.; Shukla, K.K.; Shanker, R.; Tekie, A. A three-parameter Lindley distribution. Am. J. Math. Stat. 2017, 7, 15–26. [Google Scholar]
  17. Knüsel, L.; Bablok, B. Computation of the noncentral gamma distribution. SIAM J. Sci. Comput. 1996, 17, 1224–1231. [Google Scholar] [CrossRef]
  18. Bekker, A.; Ferreira, J. Bivariate gamma type distributions for modelling wireless performance metrics. Stat. Optim. Inf. Comput. 2018, 6, 335–353. [Google Scholar] [CrossRef]
  19. Chen, Z.Y. The S-system computation of non-central gamma distribution. J. Stat. Comput. Simul. 2005, 75, 813–829. [Google Scholar] [CrossRef]
  20. de Oliveira, I.R.C.; Ferreira, D.F. Computing the noncentral gamma distribution, its inverse and the noncentrality parameter. Comput. Stat. 2013, 28, 1663–1680. [Google Scholar] [CrossRef]
  21. Gradshteyn, I.S.; Ryzhik, I.M. Table of Integrals, Series, and Products; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
  22. Ghitany, M.E.; Atieh, B.; Nadarajah, S. Lindley distribution and its application. Math. Comput. Simul. 2008, 78, 493–506. [Google Scholar] [CrossRef]
  23. Sankaran, M. The discrete Poisson-Lindley distribution. Biometrics 1970, 26, 145–149. [Google Scholar] [CrossRef]
  24. Massey Jr, F.J. The Kolmogorov-Smirnov test for goodness of fit. J. Am. Stat. Assoc. 1951, 46, 68–78. [Google Scholar] [CrossRef]
  25. Qi, X.; Li, Q.; Zhu, F. Modeling time series of count with excess zeros and ones based on INAR (1) model with zero-and-one inflated Poisson innovations. J. Comput. Appl. Math. 2019, 346, 572–590. [Google Scholar] [CrossRef]
  26. Castellares, F.; Ferrari, S.L.; Lemonte, A.J. On the Bell distribution and its associated regression model for count data. Appl. Math. Model. 2018, 56, 172–185. [Google Scholar] [CrossRef]
  27. Weiß, C.H. Thinning operations for modelling time series of counts—A survey. AStA Adv. Stat. Anal. 2008, 92, 319–341. [Google Scholar] [CrossRef]
  28. Hanandeh, A.; Al-Nasser, A.D. New Distribution for Fitting Discrete Data: The Poisson-Gold Distribution and Its Statistical Properties. Austrian J. Stat. 2021, 50, 19–35. [Google Scholar] [CrossRef]
  29. Bhati, D.; Sastry, D.; Qadri, P.M. A new generalized Poisson-Lindley distribution: Applications and properties. Austrian J. Stat. 2015, 44, 35–51. [Google Scholar] [CrossRef] [Green Version]
  30. Liu, Z.; Zhu, F. A new extension of thinning-based integer-valued autoregressive models for count data. Entropy 2021, 23, 62. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Shapes of the density function (4) for arbitrary parameter choices. (af) some particular shapes of the density function for various combinations of parameters.
Figure 1. Shapes of the density function (4) for arbitrary parameter choices. (af) some particular shapes of the density function for various combinations of parameters.
Stats 05 00005 g001
Figure 3. Fitted Lindley (black), 3PL (blue), and ncL (red) models with corresponding estimated distribution functions for the banking waiting times dataset.
Figure 3. Fitted Lindley (black), 3PL (blue), and ncL (red) models with corresponding estimated distribution functions for the banking waiting times dataset.
Stats 05 00005 g003
Figure 4. Fitted Lindley (black), 3PL (blue), and ncL (red) models with corresponding estimated distribution functions for the guinea pigs dataset.
Figure 4. Fitted Lindley (black), 3PL (blue), and ncL (red) models with corresponding estimated distribution functions for the guinea pigs dataset.
Stats 05 00005 g004
Figure 5. The time plot, observed marginal distribution, ACF, and PACF of the downloads dataset.
Figure 5. The time plot, observed marginal distribution, ACF, and PACF of the downloads dataset.
Stats 05 00005 g005
Figure 6. The time plot, observed marginal distribution, ACF, and PACF of the strikes dataset.
Figure 6. The time plot, observed marginal distribution, ACF, and PACF of the strikes dataset.
Stats 05 00005 g006
Figure 7. The ACF of the Pearson residuals (13) for the downloads dataset.
Figure 7. The ACF of the Pearson residuals (13) for the downloads dataset.
Stats 05 00005 g007
Figure 8. The ACF of the Pearson residuals (13) for the strikes dataset.
Figure 8. The ACF of the Pearson residuals (13) for the strikes dataset.
Stats 05 00005 g008
Table 1. Summary statistics for the PncL distribution for α = 0.01 , β = 0.9 , θ = 0.5 , and various values of λ .
Table 1. Summary statistics for the PncL distribution for α = 0.01 , β = 0.9 , θ = 0.5 , and various values of λ .
λ 051015202530
Mean 2.203 4.931 7.660 10.389 13.117 15.846 18.575
Variance 4.671 13.705 23.007 32.576 42.414 52.520 62.894
D I 2.121 2.779 3.004 3.136 3.233 3.314 3.386
Skewness 1.495 1.132 0.900 0.746 0.627 0.528 0.440
Kurtosis 6.240 4.772 4.108 3.785 3.608 3.513 3.471
Table 2. Estimated parameters and goodness of fit statistics of considered models for the banking waiting times dataset.
Table 2. Estimated parameters and goodness of fit statistics of considered models for the banking waiting times dataset.
Model β α θ λ AIC KS p-Value
Lindley0.187n/an/an/a319.04640.070.0680.749
3PL0.211−0.5541.477n/a316.93639.850.0570.904
ncL0.211−0.5311.4130.001316.93641.850.0570.903
Table 3. Estimated parameters and goodness of fit statistics of considered models for the guinea pigs dataset.
Table 3. Estimated parameters and goodness of fit statistics of considered models for the guinea pigs dataset.
Model β α θ λ AIC KS p-Value
Lindley0.011n/an/an/a429.28860.560.170.031
3PL0.012−287.5940.167n/a428.11862.210.1520.072
ncL0.027−6.07155.9855.48426.05860.10.1010.455
Table 4. Estimated parameters and goodness of fit statistics of considered discrete models for the psychiatric dataset.
Table 4. Estimated parameters and goodness of fit statistics of considered discrete models for the psychiatric dataset.
Model β α θ λ AIC
PL0.616n/an/an/a138.37278.74
P-3PL0.726−1.1454.031n/a129.39264.78
PncL1.665−0.8024.582.711129.36266.73
Table 5. Mean, variance, D I , skewness, and kurtosis values of considered discrete models for the psychiatric dataset.
Table 5. Mean, variance, D I , skewness, and kurtosis values of considered discrete models for the psychiatric dataset.
ModelMeanVarianceDISkewnessKurtosis
PL2.6297.522.8611.6476.879
P-3PL2.5973.8291.4741.385.94
PncL2.5973.6841.4191.2855.481
Data2.5873.6681.4130.980.231
Table 6. Estimated parameters and goodness of fit statistics of considered discrete models for the guinea pigs dataset.
Table 6. Estimated parameters and goodness of fit statistics of considered discrete models for the guinea pigs dataset.
Model β α θ λ AIC
PL0.011n/an/an/a429.47860.94
P-3PL0.012−119.96315.541n/a428.07862.13
PncL0.0318.21673.0627.089425.94859.89
Table 7. Mean, variance, D I , skewness, and kurtosis values of considered discrete models for the guinea pigs dataset.
Table 7. Mean, variance, D I , skewness, and kurtosis values of considered discrete models for the guinea pigs dataset.
ModelMeanVarianceDISkewnessKurtosis
PL176.80315,980.890.3881.4146.001
P-3PL176.82114,178.980.1881.4286.03
PncL176.8099517.553.8290.924.174
Data176.81910,702.960.531.3712.225
Table 8. Estimated parameters, goodness of fit statistics, fitted mean, and fitted variance of considered models for the download counts time series dataset.
Table 8. Estimated parameters, goodness of fit statistics, fitted mean, and fitted variance of considered models for the download counts time series dataset.
ModelParameterEstimatesAICMean ( μ X t )Variance ( σ X t 2 )
Poisson INAR(1)p0.172634.112722.3652.365
μ 1.959
NB INAR(1)p0.154537.910822.3667.187
n0.850
π 0.298
Bell INAR(1)p0.108554.211122.3674.240
θ 0.877
P-3PL INAR(1)p0.138538.310852.3666.601
α 16.180
β 0.399
θ 0.001
PncL INAR(1)p0.157537.610852.3666.995
α 2.320
β 1.023
θ 1.290
λ 3.919
Table 9. Estimated parameters, goodness of fit statistics, fitted mean, and fitted variance of considered models for the strikes counts time series dataset.
Table 9. Estimated parameters, goodness of fit statistics, fitted mean, and fitted variance of considered models for the strikes counts time series dataset.
ModelParameterEstimatesAICMean ( μ X t )Variance ( σ X t 2 )
Poisson INAR(1)p0.506234.5473.14.9814.981
μ 2.460
NB INAR(1)p0.548231.8469.74.9816.858
n3.858
π 0.632
Bell INAR(1)p0.579232.1468.24.9817.741
θ 0.875
P-3PL INAR(1)p0.548232.0472.14.9817.139
α −0.322
β 0.737
θ 8.276
PncL INAR(1)p0.558231.8473.64.9816.922
α 0.290
β 3.843
θ 13.083
λ 14.191
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ferreira, J.; van der Merwe, A. A Noncentral Lindley Construction Illustrated in an INAR(1) Environment. Stats 2022, 5, 70-88. https://0-doi-org.brum.beds.ac.uk/10.3390/stats5010005

AMA Style

Ferreira J, van der Merwe A. A Noncentral Lindley Construction Illustrated in an INAR(1) Environment. Stats. 2022; 5(1):70-88. https://0-doi-org.brum.beds.ac.uk/10.3390/stats5010005

Chicago/Turabian Style

Ferreira, Johannes, and Ané van der Merwe. 2022. "A Noncentral Lindley Construction Illustrated in an INAR(1) Environment" Stats 5, no. 1: 70-88. https://0-doi-org.brum.beds.ac.uk/10.3390/stats5010005

Article Metrics

Back to TopTop