Next Article in Journal
The Burr XII Autoregressive Moving Average Model
Previous Article in Journal
A Testing Coverage Based SRGM Subject to the Uncertainty of the Operating Environment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Two-Component Unit Weibull Mixture Model to Analyze Vote Proportions †

by
Renata Rojas Guerra
1,*,
Fernando A. Peña-Ramírez
2,
Charles P. Mafalda
3 and
Gauss Moutinho Cordeiro
3
1
Department of Statistics, Centro de Ciências Naturais e Exatas, Universidade Federal de Santa Maria, Santa Maria 97105-900, Brazil
2
Department of Statistics, Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá 111321, Colombia
3
Department of Statistics, Centro de Ciências Exatas e da Natureza, Universidade Federal de Pernambuco, Recife 50670-901, Brazil
*
Author to whom correspondence should be addressed.
Presented at the 1st International Online Conference on Mathematics and Applications, 1–15 May 2023; Available online: https://iocma2023.sciforum.net/.
Comput. Sci. Math. Forum 2023, 7(1), 45; https://0-doi-org.brum.beds.ac.uk/10.3390/IOCMA2023-14550
Published: 5 May 2023

Abstract

:
In this paper, we present a two-component Weibull mixture model. An important property is that this new model accommodates bimodality, which can appear in data representing phenomena in some heterogeneous populations. We provide statistical properties, such as the quantile function and moments. Additionally, the expectation-maximization (EM) algorithm is used to find maximum-likelihood estimates of the model parameters. Further, a Monte Carlo study is carried out to evaluate the performance of the estimators on finite samples. The new model’s relevance is shown with an application referring to the vote proportion for the Brazilian presidential elections runoff in 2018. The proportion of votes is an important measure in analyzing electoral data. Since it is a variable limited to the unitary interval, unit distributions should be considered to analyze its probabilistic behavior. Thus, the introduced model is suitable for describing the characteristics detected in these data, such as the asymmetric behavior, bimodality, and the unit interval as support. In the application, the superiority of the proposed model is verified when comparing the fit with the two-component beta mixture models.

1. Introduction

Finite mixture models appeared in a study on the asymmetry of grouped materials not being homogeneous [1], being useful in the presence of multimodality, heavy tails, and asymmetry [2]. Many works have appeared in the literature in the context of finite mixtures. For example, Jewell [3] proposed a model for exponential mixtures. Considering Weibull mixture models, we can cite [4] for characterizations of the failure rate function and [5] for reliability approximations. Recently, Huang et al. [6] analyzed individual periods in combined sea waves using parametric mixture models.
In data with limited support, beta mixture models have been studied by several authors. Ji et al. [7] proposed a study on the beta mixture to solve problems related to correlations of gene expression levels, Bouguila et al. [8] presented a study on Bayesian analysis, and Grün et al. [9] studied beta mixture in regression models. The Kumaraswamy mixture model is an alternative to the beta mixture models. Khalid et al. [10] carried out a Bayesian study on the three-component Kumaraswamy mixture.
In this paper, a new two-component mixture model is proposed as an alternative to model population heterogeneities in the unit support. We consider that each mixture component follows a unit Weibull ( UW ) distribution [11]. Some of the contributions of this new distribution, the so-called Weibull mixture model of the two-component unit ( UWUW ), are: (i) all estimation routines, including simulations and applications, are performed using the expectation-maximization (EM) algorithm, and (ii) applicability for electoral data modeling. The EM algorithm is a computational method used to calculate the maximum likelihood estimator (MLE) iteratively [12]. It is widely used to estimate the maximum probability for finite mixture models [13]. Finally, the adjustment to electoral data, defined as the district’s share of votes by the total number of valid votes cast in the district, the proportions of votes are useful since the electoral districts can vary considerably in the size of the population [14]. Additionally, this measure can analyze other characteristics of the electoral process, such as electoral volatility [15] and nationalization of electoral change [14]. The data set used refers to the proportions of votes in the Brazilian presidential elections runoff in 2018.
The rest of the work is organized as follows. In Section 2, the new mixture model is presented. Section 3 introduces the EM algorithm to perform maximum likelihood estimation for the UWUW model. In Section 4, an application is made with electoral data. The final considerations of this work are addressed in Section 5.

2. The Proposed Model

In this section, the two-component unit Weibull mixture distribution, so-denoted UWUW , is introduced. Let X be a random variable with UWUW distribution. Then, its cdf is obtained as
F UWUW ( x ; Θ ) = p F UW ( x ; θ 1 ) + ( 1 p ) F UW ( x ; θ 2 ) = p τ [ log x / log μ 1 ] β 1 + ( 1 p ) τ [ log x / log μ 2 ] β 2 ,
where θ 1 = ( μ 1 , β 1 ) , θ 2 = ( μ 2 , β 2 ) , μ 1 and μ 2 ( 0 , 1 ) are location parameters associated with the τ th quantiles of each component of the mixture, β 1 and β 2 > 0 are shape parameters, and  τ ( 0 , 1 ) is assumed to be known. One can note we use a parameterization based on quantiles to formulate each component of the mixture. The advantage of working with reparametrization in terms of quantiles is its flexibility to model data with heterogeneous conditional distributions [16,17]. The UWUW probability density function (pdf) is given by
f UWUW ( x ; Θ ) = p f UW ( x ; θ 1 ) + ( 1 p ) f UW ( x ; θ 2 ) = p β 1 log τ x log μ 1 log x log μ 1 β 1 1 τ ( log x / log μ 1 ) β 1 + ( 1 p ) β 2 log τ x log μ 2 log x log μ 2 β 2 1 τ ( log x / log μ 2 ) β 2 .
Figure 1 shows some plots of the UWUW pdf for some combinations of parameters and τ = 0.5 , which reveals the high flexibility of the new distribution. It accommodates bimodal, unimodal, descending, and bath forms under different asymmetric characteristics. Additionally, it is possible to identify a bimodal form for different values of p. Hereafter, we denote X as a random variable following a UWUW distribution, this is, X UWUW ( Θ ) .

3. Parameter Estimation

An approach to the iterative computation of MLEs when the observations can be treated as incomplete data is the well-known expectation-maximization (EM) algorithm. Considering the context of two-component mixture models, let x = { x 1 , , x n } be a random sample of size n from a random variable X having pdf (4) with unknown parameter vector Θ = ( θ 1 , θ 2 , p ) , where θ 1 = ( μ 1 , β 1 ) and θ 2 = ( μ 2 , β 2 ) . It is customary to call x of “incomplete data” since it is associated with a second component z = { z 1 , , z n } of unobserved values of a latent random variable Z. Each value z i of Z indicates which component of the mixture belongs to the ith observation x i such that
z i = 1 if x i has pdf f UW ( x | θ 1 ) , 0 if x i has pdf f UW ( x | θ 2 ) ,
where P ( Z = 1 ) = p and P ( Z = 0 ) = 1 p . The complete-data specification is determined by the joint density of ( X , Z )
f X , Z ( x i , z i ; Θ ) = p β 1 log τ x i log μ 1 log x i log μ 1 β 1 1 τ ( log x i / log μ 1 ) β 1 z i × ( 1 p ) β 2 log τ x log μ 2 log x log μ 2 β 2 1 τ ( log x / log μ 2 ) β 2 1 z i ,
and based on it, the complete log-likelihood function, for the sample of size n, is given by
l c ( Θ ) = i = 1 n log f X , Z ( x i , z i ; Θ ) = i = 1 n z i log p β 1 log τ x i log μ 1 log x i log μ 1 β 1 1 τ ( log x i / log μ 1 ) β 1 + i = 1 n ( 1 z i ) log ( 1 p ) β 2 log τ x log μ 2 log x log μ 2 β 2 1 τ ( log x / log μ 2 ) β 2 .
The EM algorithm iterates, between two steps, to compute the MLEs of Θ . In the E-step or expectation step, due to  (2), is unobservable, it is replaced by its conditional expectation with respect to the conditional distribution of Z, given x and the current parameter estimates. More specifically, in the ( k + 1 ) th iteration, the E-step computes    
Q ( Θ , Θ ( k ) ) = E Θ ( k ) l c ( Θ ) | x = i = 1 n log f X , Z ( x i , z i ; Θ ) = i = 1 n z ¯ i 1 log p β 1 log τ x i log μ 1 log x i log μ 1 β 1 1 τ ( log x i / log μ 1 ) β 1 + i = 1 n z ¯ i 2 log ( 1 p ) β 2 log τ x log μ 2 log x log μ 2 β 2 1 τ ( log x / log μ 2 ) β 2 ,
where
z ¯ i 1 = p ( k ) f U W ( x ; θ 1 ( k ) ) p ( k ) f U W ( x ; θ 1 ( k ) ) + ( 1 p ( k ) ) f U W ( x ; θ 2 ( k ) ) ,
z ¯ i 2 = ( 1 p ( k ) ) f U W ( x ; θ 2 ( k ) ) p ( k ) f U W ( x ; θ 1 ( k ) ) + ( 1 p ( k ) ) f U W ( x ; θ 2 ( k ) ) ,
and Θ ( k ) = ( θ 1 ( k ) , θ 2 ( k ) , p ( k ) ) are obtained from the kth iteration.
The M-step or maximization step, requires the maximization of (3) with respect to Θ . This is
Θ ( k + 1 ) = arg max Θ Q ( Θ , Θ ( k ) ) .
The vector Θ ( k + 1 ) is used to initialize the next iteration. Thus, the EM algorithm is initialized by the starting values Θ ( 0 ) = ( θ 1 ( 0 ) , θ 2 ( 0 ) , p ( 0 ) ) and the MLEs Θ ^ of Θ are obtained by Θ ^ = Θ ( k + 1 ) when a convergence criterion | Θ ( k + 1 ) Θ ( k ) | < ε is reached [12]. We set ε = 10,000. It should be noted that it is not possible to obtain analytical results from these expressions. It is necessary to perform this maximization by applying some iterative techniques, for example, Newton–Raphson’s method [18].

4. Application

In what follows, we present a case study that illustrates the suitability of the UWUW distribution for modeling real unit data sets. The database considered is the municipality’s vote proportion of the winning candidate in the Brazilian presidential elections runoff in 2018. Since it presents a bimodal shape, see Figure 2a, a unimodal distribution would not be appropriate to fit this data set. Therefore, the  UWUW distribution is a suitable alternative to model these data. Its performance is compared with other double-bounded component mixtures that have already been studied in the literature: two-component beta mixture ( BB ) model. In this paper, the parameterization proposed by [19] is considered to define the BB model, which has pdf given by
f ( x ; Θ ) = p Γ ( μ 1 + β 1 ) Γ ( μ 1 ) Γ ( β 1 ) x μ 1 1 ( 1 x ) β 1 1 + ( 1 p ) Γ ( μ 2 + β 2 ) Γ ( μ 2 ) Γ ( β 2 ) x μ 2 1 ( 1 x ) β 2 1 , 0 < x < 1 ,
where Θ = ( μ 1 , μ 2 , β 1 , β 2 , p ) , μ 1 and μ 2 ( 0 , 1 ) are location parameters associated with the mean of each mixture component, β 1 and β 2 > 0 are precision parameters, and  p ( 0 , 1 ) is the parameter that measures the weights of the mixture.
For all competitive mixture models, the parameter estimation is carried out using the EM algorithm following the steps described in Section 3. The Corrected Anderson–Darling ( A * ) [20], Cramér–von Misses ( W * ) [21], and the Kolmogorov–Smirnov ( K S ) [22] statistics are calculated to assess the quality-of-fit for the three fitted models. The lower their values are, the better the model fit. All the analysis is performed using the R programming language, and the goodness-of-fit measures are computed using the AdequacyModel [23] subroutine.
Table 1 displays the parameter estimates, standard errors, and the model comparison criteria of the three considered models. The results indicate that the UWUW distribution provides the lowest values for all goodness-of-fit statistics. The KWKW presents the worse performance, not being an adequate alternative to fit these data.
Figure 2a presents the histogram of the vote proportion data overlaid with the estimated densities of the fitted models. The bimodality of the data is confirmed, and the UWUW model provides the closest fit to the histogram. Clearly, the KWKW model is not adequate to fit these data. Further, Figure 2b gives plots of the empirical and estimated cdfs. This visual inspection favors the results in Figure 2a and Table 1, indicating that the proposed model is appropriate to fit these data. Thus, it can be an effective alternative to analyze vote proportions, being quite competitive with the BB model and providing consistently better fits than the KWKW model. Therefore, the UWUW provides a useful tool for modeling bimodal data restricted to the unit interval. Additionally, with the estimates of the mixture parameters, it is possible to identify that more than 50 % of the observations belong to the first mixture component. The estimated median of the first component is μ ^ 1 = 0.2677 and the estimated median of the second component is μ ^ 2 = 0.6649 .

5. Conclusions

A two-component mixture model was defined to describe the heterogeneities of the population with the limited domain. The two-component unit Weibull mixture ( UWUW ) model is formulated considering that each mixture component follows the unit Weibull distribution. Some of the main properties of UWUW have been presented, such as ordinary moments. The EM algorithm was used to obtain maximum likelihood estimates for the model parameters. To evaluate the performance of the EM algorithm, Monte Carlo simulations were performed. An application to electoral data illustrates the importance and potential of the new model. The motivating data set is about the vote proportions obtained by the winning candidate in the Brazilian presidential runoff elections in 2018. The results indicate that our proposal is adequate to fit this data set since it is suitable to analyze the asymmetric and bimodal behaviors. From the mixing parameter estimate, we can conclude that 53.68% of the observations are from the first component of the mixture with estimated median at μ ^ 1 = 0.2677 . The estimated median for the municipalites from the second mixture component was μ ^ 2 = 0.6491 . This application proved empirically that the UWUW performance may overcome other two-component mixture models based on other widely known unit distributions such as the beta and Kumaraswamy.

Author Contributions

Conceptualization, R.R.G.; methodology, R.R.G. and F.A.P.-R.; software, R.R.G.; validation, R.R.G. and F.A.P.-R.; formal analysis, R.R.G.; investigation, R.R.G., F.A.P.-R. and C.P.M.; resources, C.P.M.; data curation, R.R.G.; writing—original draft preparation, R.R.G., F.A.P.-R. and C.P.M.; writing—review and editing, G.M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pearson, K. Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. Lond. A 1894, 185, 71–110. [Google Scholar]
  2. Lachos, V.H.; Moreno, E.J.L.; Chen, K.; Cabral, C.R.B. Finite mixture modeling of censored data using the multivariate Student-t distribution. J. Multivar. Anal. 2017, 159, 151–167. [Google Scholar] [CrossRef]
  3. Jewell, N.P. Mixtures of exponential distributions. Ann. Stat. 1982, 10, 479–484. [Google Scholar] [CrossRef]
  4. Jiang, R.; Murthy, D. Mixture of Weibull distributions-parametric characterization of failure rate function. Appl. Stoch. Model. Data Anal. 1998, 14, 47–65. [Google Scholar] [CrossRef]
  5. Bučar, T.; Nagode, M.; Fajdiga, M. Reliability approximation using finite Weibull mixture distributions. Reliab. Eng. Syst. Saf. 2004, 84, 241–251. [Google Scholar] [CrossRef]
  6. Huang, W.; Dong, S. Probability distribution of wave periods in combined sea states with finite mixture models. Appl. Ocean Res. 2019, 92, 101938. [Google Scholar] [CrossRef]
  7. Ji, Y.; Wu, C.; Liu, P.; Wang, J.; Coombes, K.R. Applications of beta-mixture models in bioinformatics. Bioinformatics 2005, 21, 2118–2122. [Google Scholar] [CrossRef]
  8. Bouguila, N.; Ziou, D.; Monga, E. Practical Bayesian estimation of a finite beta mixture through Gibbs sampling and its applications. Stat. Comput. 2006, 16, 215–225. [Google Scholar] [CrossRef]
  9. Grün, B.; Kosmidis, I.; Zeileis, A. Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned; Technical Report, Working Papers in Economics and Statistics; University of Innsbruck, Research Platform Empirical and Experimental Economics (EEECON): Innsbruck, Austria, 2011. [Google Scholar]
  10. Khalid, M.; Aslam, M.; Sindhu, T.N. Bayesian analysis of 3-components Kumaraswamy mixture model: Quadrature method vs. Importance sampling. Alex. Eng. J. 2020, 59, 2753–2763. [Google Scholar] [CrossRef]
  11. Mazucheli, J.; Menezes, A.; Ghitany, M. The unit-Weibull distribution and associated inference. J. Appl. Probab. Stat. 2018, 13, 1–22. [Google Scholar]
  12. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977, 39, 1–22. [Google Scholar]
  13. Redner, R.A.; Walker, H.F. Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 1984, 26, 195–239. [Google Scholar] [CrossRef]
  14. Alemán, E.; Kellam, M. The nationalization of presidential elections in the Americas. Elect. Stud. 2017, 47, 125–135. [Google Scholar] [CrossRef]
  15. Powell, E.N.; Tucker, J.A. Revisiting electoral volatility in post-communist countries: New data, new results and new approaches. Br. J. Political Sci. 2013, 44, 123–147. [Google Scholar] [CrossRef]
  16. Bayes, C.L.; Bazán, J.L.; De Castro, M. A quantile parametric mixed regression model for bounded response variables. Stat. Its Interface 2017, 10, 483–493. [Google Scholar] [CrossRef]
  17. Mazucheli, J.; Menezes, A.F.B.; Fernandes, L.B.; de Oliveira, R.P.; Ghitany, M.E. The unit-Weibull distribution as an alternative to the Kumaraswamy distribution for the modeling of quantiles conditional on covariates. J. Appl. Stat. 2020, 47, 954–974. [Google Scholar] [CrossRef]
  18. Press, W.H.; Teukolsky, S.A.; Vetterling, W.T.; Flannery, B.P. Numerical Recipes 3rd Edition: The Art of Scientific Computing; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
  19. Ferrari, S.L.P.; Cribari-Neto, F. Beta regression for modelling rates and proportions. J. Appl. Stat. 2004, 7, 799–815. [Google Scholar] [CrossRef]
  20. Chen, G.; Balakrishnan, N. A general purpose approximate goodness-of-fit test. J. Qual. Technol. 1995, 27, 154–161. [Google Scholar] [CrossRef]
  21. Durbin, J.; Knott, M. Components of Cramér-Von Mises statistics. J. R. Stat. Soc. Ser. B (Methodol.) 1972, 34, 290–307. [Google Scholar] [CrossRef]
  22. Goodman, L.A. Kolmogorov-Smirnov tests for psychological research. Psychol. Bull. 1954, 51, 160–168. [Google Scholar] [CrossRef]
  23. Marinho, P.R.D.; Silva, R.B.; Bourguignon, M.; Cordeiro, G.M.; Nadarajah, S. AdequacyModel: An R package for probability distributions and general purpose optimization. PLoS ONE 2019, 14, e0221487. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Plots of the UWUW density for some parameter values. (a) For p = 0.4 . (b) For p = 0.6 and β 1 = 3.0 .
Figure 1. Plots of the UWUW density for some parameter values. (a) For p = 0.4 . (b) For p = 0.6 and β 1 = 3.0 .
Csmf 07 00045 g001
Figure 2. Estimated densities (a) and empirical cdf (b) of the BB , KWKW and UWUW models.
Figure 2. Estimated densities (a) and empirical cdf (b) of the BB , KWKW and UWUW models.
Csmf 07 00045 g002
Table 1. Parameter estimates and standard errors (given in parentheses) for the models fitted to Bolsonaro’s vote proportion in Brazilian presidential elections in 2018.
Table 1. Parameter estimates and standard errors (given in parentheses) for the models fitted to Bolsonaro’s vote proportion in Brazilian presidential elections in 2018.
μ 1 ^ μ 2 ^ β 1 ^ β 1 ^ p ^ W * A * KS
BB 0.58160.19859.751029.32600.72681.29377.45840.0477
(0.0035)(0.0026)(0.3201)(1.3521)-
UWUW 0.26770.64912.70112.96110.53680.41193.67680.0153
(0.0039)(0.0027)(0.0545)(0.0567)-
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guerra, R.R.; Peña-Ramírez, F.A.; Mafalda, C.P.; Cordeiro, G.M. Two-Component Unit Weibull Mixture Model to Analyze Vote Proportions. Comput. Sci. Math. Forum 2023, 7, 45. https://0-doi-org.brum.beds.ac.uk/10.3390/IOCMA2023-14550

AMA Style

Guerra RR, Peña-Ramírez FA, Mafalda CP, Cordeiro GM. Two-Component Unit Weibull Mixture Model to Analyze Vote Proportions. Computer Sciences & Mathematics Forum. 2023; 7(1):45. https://0-doi-org.brum.beds.ac.uk/10.3390/IOCMA2023-14550

Chicago/Turabian Style

Guerra, Renata Rojas, Fernando A. Peña-Ramírez, Charles P. Mafalda, and Gauss Moutinho Cordeiro. 2023. "Two-Component Unit Weibull Mixture Model to Analyze Vote Proportions" Computer Sciences & Mathematics Forum 7, no. 1: 45. https://0-doi-org.brum.beds.ac.uk/10.3390/IOCMA2023-14550

Article Metrics

Back to TopTop