Machine Learning and Statistical Modeling with Applications in Real-World Data and Artificial Intelligence

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Computational and Applied Mathematics".

Deadline for manuscript submissions: closed (3 July 2023) | Viewed by 76521

Special Issue Editor


E-Mail Website
Guest Editor
School of Industrial Engineering, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile
Interests: advanced applied multivariate analysis; artificial intelligence, deep learning, and machine learning; big data, business intelligence, data mining, and data science; statistical learning and modeling
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The focus of this Special Issue is machine learning and statistical modeling for facing real-world problems, such as COVID-19 pandemic-related data. We welcome contributions in artificial intelligence, classification, and supervised/unsupervised learning, as well as in the topics detailed below. We especially encourage interdisciplinary works.

This Special Issue looks for submissions including, but not limited to, those in applied data science with potential applications in COVID-19 and emphasis in the following areas (in alphabetical order):

(i) Artificial intelligence.

(ii) Bayesian methods.

(iii) Big data, dimensionality high, and large-scale data analysis.

(iv) Deep and statistical learning.

(v) Machine learning.

(vi) Evolutionary-based, game-based, physics-based, and swarm-based algorithms, among others.

(vii) Multivariate analysis as clustering, PCA, and PLS, among others.

(viii) Statistical modeling and its diagnostics.

Prof. Dr. Victor Leiva
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • artificial neural networks
  • big data, big data analytics, and big data science
  • bioinformatics, health informatics, and bio-computing
  • coronavirus disease, COVID-19, and SARS-CoV-2
  • data analytics, data mining, and expert systems
  • decision support systems and knowledge discovery in databases
  • deep learning, machine learning, and statistical learning
  • digital transformation and digitization
  • monitoring/recognizing/forecasting of emotions and sentiment analysis
  • multivariate analysis
  • optimization algorithms
  • predictive models and analytics using artificial intelligence
  • statistical analysis/modeling and its diagnostics

Published Papers (29 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

22 pages, 637 KiB  
Article
Exploring Low-Risk Anomalies: A Dynamic CAPM Utilizing a Machine Learning Approach
by Jiawei Wang and Zhen Chen
Mathematics 2023, 11(14), 3220; https://0-doi-org.brum.beds.ac.uk/10.3390/math11143220 - 22 Jul 2023
Cited by 2 | Viewed by 1572
Abstract
Low-risk pricing anomalies, characterized by lower returns in higher-risk stocks, are prevalent in equity markets and challenge traditional asset pricing theory. Previous studies primarily relied on linear regression methods, which analyze a limited number of factors and overlook the advantages of machine learning [...] Read more.
Low-risk pricing anomalies, characterized by lower returns in higher-risk stocks, are prevalent in equity markets and challenge traditional asset pricing theory. Previous studies primarily relied on linear regression methods, which analyze a limited number of factors and overlook the advantages of machine learning in handling high-dimensional data. This study aims to address these anomalies in the Chinese market by employing machine learning techniques to measure systematic risk. A large dataset consisting of 770 variables, encompassing macroeconomic, micro-firm, and cross-effect factors, was constructed to develop a machine learning-based dynamic capital asset pricing model. Additionally, we investigated the differences in factors influencing time-varying beta between state-owned enterprises (SOEs) and non-SOEs, providing economic explanations for the black-box issues. Our findings demonstrated the effectiveness of random forest and neural networks, with the four-layer neural network performing best and leading to a substantial rise in the excess return of the long–short portfolio, up to 0.36%. Notably, liquidity indicators emerged as the primary drivers influencing beta, followed by momentum. Moreover, our analysis revealed a shift in variable importance during the transition from SOEs to non-SOEs, as liquidity and momentum gradually replaced fundamentals and valuation as key determinants. This research contributes to both theoretical and practical domains by bridging the research gap in incorporating machine learning methods into asset pricing research. Full article
Show Figures

Figure 1

21 pages, 3541 KiB  
Article
Predicting Depression during the COVID-19 Pandemic Using Interpretable TabNet: A Case Study in South Korea
by Hung Viet Nguyen and Haewon Byeon
Mathematics 2023, 11(14), 3145; https://0-doi-org.brum.beds.ac.uk/10.3390/math11143145 - 17 Jul 2023
Cited by 1 | Viewed by 1367
Abstract
COVID-19 has further aggravated problems by compelling people to stay indoors and limit social interactions, leading to a worsening of the depression situation. This study aimed to construct a TabNet model combined with SHapley Additive exPlanations (SHAP) to predict depression in South Korean [...] Read more.
COVID-19 has further aggravated problems by compelling people to stay indoors and limit social interactions, leading to a worsening of the depression situation. This study aimed to construct a TabNet model combined with SHapley Additive exPlanations (SHAP) to predict depression in South Korean society during the COVID-19 pandemic. We used a tabular dataset extracted from the Seoul Welfare Survey with a total of 3027 samples. The TabNet model was trained on this dataset, and its performance was compared to that of several other machine learning models, including Random Forest, eXtreme Gradient Boosting, Light Gradient Boosting, and CatBoost. According to the results, the TabNet model achieved an Area under the receiver operating characteristic curve value (AUC) of 0.9957 on the training set and an AUC of 0.9937 on the test set. Additionally, the study investigated the TabNet model’s local interpretability using SHapley Additive exPlanations (SHAP) to provide post hoc global and local explanations for the proposed model. By combining the TabNet model with SHAP, our proposed model might offer a valuable tool for professionals in social fields, and psychologists without expert knowledge in the field of data analysis can easily comprehend the decision-making process of this AI model. Full article
Show Figures

Figure 1

21 pages, 31655 KiB  
Article
Carrier Phase Residual Modeling and Fault Monitoring Using Short-Baseline Double Difference and Machine Learning
by Dong-Kyeong Lee, Yebin Lee and Byungwoon Park
Mathematics 2023, 11(12), 2696; https://0-doi-org.brum.beds.ac.uk/10.3390/math11122696 - 14 Jun 2023
Cited by 5 | Viewed by 1148
Abstract
Global Navigation Satellite Systems (GNSS) are used to provide accurate position, navigation, and time (PNT) information to users in various sectors of our society including transportation. Augmentation systems such as differential GNSS (DGNSS), real-time kinematics (RTK), and Precise Point Positioning (PPP) improve the [...] Read more.
Global Navigation Satellite Systems (GNSS) are used to provide accurate position, navigation, and time (PNT) information to users in various sectors of our society including transportation. Augmentation systems such as differential GNSS (DGNSS), real-time kinematics (RTK), and Precise Point Positioning (PPP) improve the GNSS performance, and providing reliable measurements from its reference station is very crucial. To ensure safe and accurate PNT solutions, code and carrier measurements must be monitored for potential faults or a performance degrade. Although there exist numerous methods to model and monitor the measurements, research on the carrier phase measurements is not as extensive as the code measurements. This paper introduces a split of residuals into receiver noise and multipath components to customize their estimation according to their respective statistical properties. This study also proposes a method to use machine learning-based non-linear regression to effectively model and monitor potential faults in the GNSS measurements including the carrier phase. A training dataset is used to model the nominal quantities of GNSS measurement residuals, and inflation factors are applied to over-bound the fault-free residuals. These inflated residuals are coupled with uncertainty factors to compute thresholds for monitoring carrier phase residuals, and the effectiveness of the thresholds is validated with a test dataset by achieving the false alarm rate of 6.61×106, slightly lower than the desired level of 105. Full article
Show Figures

Figure 1

16 pages, 4260 KiB  
Article
COVID-19 Genome Sequence Analysis for New Variant Prediction and Generation
by Amin Ullah, Khalid Mahmood Malik, Abdul Khader Jilani Saudagar, Muhammad Badruddin Khan, Mozaherul Hoque Abul Hasanat, Abdullah AlTameem, Mohammed AlKhathami and Muhammad Sajjad
Mathematics 2022, 10(22), 4267; https://0-doi-org.brum.beds.ac.uk/10.3390/math10224267 - 15 Nov 2022
Cited by 7 | Viewed by 2326
Abstract
The new COVID-19 variants of concern are causing more infections and spreading much faster than their predecessors. Recent cases show that even vaccinated people are highly affected by these new variants. The proactive nucleotide sequence prediction of possible new variants of COVID-19 and [...] Read more.
The new COVID-19 variants of concern are causing more infections and spreading much faster than their predecessors. Recent cases show that even vaccinated people are highly affected by these new variants. The proactive nucleotide sequence prediction of possible new variants of COVID-19 and developing better healthcare plans to address their spread require a unified framework for variant classification and early prediction. This paper attempts to answer the following research questions: can a convolutional neural network with self-attention by extracting discriminative features from nucleotide sequences be used to classify COVID-19 variants? Second, is it possible to employ uncertainty calculation in the predicted probability distribution to predict new variants? Finally, can synthetic approaches such as variational autoencoder-decoder networks be employed to generate a synthetic new variant from random noise? Experimental results show that the generated sequence is significantly similar to the original coronavirus and its variants, proving that our neural network can learn the mutation patterns from the old variants. Moreover, to our knowledge, we are the first to collect data for all COVID-19 variants for computational analysis. The proposed framework is extensively evaluated for classification, new variant prediction, and new variant generation tasks and achieves better performance for all tasks. Our code, data, and trained models are available on GitHub (https://github.com/Aminullah6264/COVID19, accessed on 16 September 2022). Full article
Show Figures

Figure 1

17 pages, 4153 KiB  
Article
It’s Your Turn, Are You Ready to Get Vaccinated? Towards an Exploration of Vaccine Hesitancy Using Sentiment Analysis of Instagram Posts
by Mohammed Talha Alam, Shahab Saquib Sohail, Syed Ubaid, Shakil, Zafar Ali, Mohammad Hijji, Abdul Khader Jilani Saudagar and Khan Muhammad
Mathematics 2022, 10(22), 4165; https://0-doi-org.brum.beds.ac.uk/10.3390/math10224165 - 08 Nov 2022
Cited by 6 | Viewed by 2619
Abstract
The deadly threat caused by the rapid spread of COVID-19 has been restricted by virtue of vaccines. However, there is misinformation regarding the certainty and positives outcome of getting vaccinated; hence, many people are reluctant to opt for it. Therefore, in this paper, [...] Read more.
The deadly threat caused by the rapid spread of COVID-19 has been restricted by virtue of vaccines. However, there is misinformation regarding the certainty and positives outcome of getting vaccinated; hence, many people are reluctant to opt for it. Therefore, in this paper, we identified public sentiments and hesitancy toward the COVID-19 vaccines based on Instagram posts as part of intelligent surveillance. We first retrieved more than 10k publicly available comments and captions posted under different vaccine hashtags (namely, covaxin, covishield, and sputnik). Next, we translated the extracted comments into a common language (English), followed by the calculation of the polarity score of each comment, which helped identify the vaccine sentiments and opinions in the comments (positive, negative, and neutral) with an accuracy of more than 80%. Moreover, upon analysing the sentiments, we found that covaxin received 71.4% positive, 18.5% neutral, and 10.1% negative comments; covishield obtained 64.2% positive, 24.5% neutral, and 11.3% negative post; and sputnik received 55.8% positive, 15.5% neutral, and 28.7% negative sentiments. Understanding vaccination perceptions and views through Instagram comments, captions, and posts is helpful for public health officials seeking to enhance vaccine uptake by promoting positive marketing and reducing negative marketing. In addition to this, some interesting future directions are also suggested considering the investigated problem. Full article
Show Figures

Figure 1

23 pages, 5038 KiB  
Article
A Comprehensive Analysis of Chinese, Japanese, Korean, US-PIMA Indian, and Trinidadian Screening Scores for Diabetes Risk Assessment and Prediction
by Norma Latif Fitriyani, Muhammad Syafrudin, Siti Maghfirotul Ulyah, Ganjar Alfian, Syifa Latif Qolbiyani and Muhammad Anshari
Mathematics 2022, 10(21), 4027; https://0-doi-org.brum.beds.ac.uk/10.3390/math10214027 - 30 Oct 2022
Cited by 2 | Viewed by 1753
Abstract
Risk assessment and developing predictive models for diabetes prevention is considered an important task. Therefore, we proposed to analyze and provide a comprehensive analysis of the performance of diabetes screening scores for risk assessment and prediction in five populations: the Chinese, Japanese, Korean, [...] Read more.
Risk assessment and developing predictive models for diabetes prevention is considered an important task. Therefore, we proposed to analyze and provide a comprehensive analysis of the performance of diabetes screening scores for risk assessment and prediction in five populations: the Chinese, Japanese, Korean, US-PIMA Indian, and Trinidadian populations, utilizing statistical and machine learning (ML) methods. Additionally, due to the present COVID-19 epidemic, it is necessary to investigate how diabetes and COVID-19 are related to one another. Thus, by using a sample of the Korean population, the interrelationship between diabetes and COVID-19 was further investigated. The results revealed that by using a statistical method, the optimal cut points among Chinese, Japanese, Korean, US-PIMA Indian, and Trinidadian populations were 6.205 mmol/L (FPG), 5.523 mmol/L (FPG), and 5.375% (HbA1c), 150.50–106.50 mg/dL (FBS), 123.50 mg/dL (2hPG), and 107.50 mg/dL (FBG), respectively, with AUC scores of 0.97, 0.80, 0.78, 0.85, 0.79, and 0.905. The results also confirmed that diabetes has a significant relationship with COVID-19 in the Korean population (p-value 0.001), with an adjusted OR of 1.21. Finally, the overall best ML models were performed by Naïve Bayes with AUC scores of 0.736, 0.75, and 0.83 in the Japanese, Korean, and Trinidadian populations, respectively. Full article
Show Figures

Figure 1

22 pages, 4142 KiB  
Article
Hybrid Deep Learning Applied on Saudi Smart Grids for Short-Term Load Forecasting
by Abdullah Alrasheedi and Abdulaziz Almalaq
Mathematics 2022, 10(15), 2666; https://0-doi-org.brum.beds.ac.uk/10.3390/math10152666 - 28 Jul 2022
Cited by 10 | Viewed by 2220
Abstract
Despite advancements in smart grid (SG) technology, effective load forecasting utilizing big data or large-scale datasets remains a complex task for energy management, planning, and control. The Saudi SGs, in alignment with the Saudi Vision 2030, have been envisioned as future electrical grids [...] Read more.
Despite advancements in smart grid (SG) technology, effective load forecasting utilizing big data or large-scale datasets remains a complex task for energy management, planning, and control. The Saudi SGs, in alignment with the Saudi Vision 2030, have been envisioned as future electrical grids with a bidirectional flow of power and data. To that end, data analysis and predictive models can enhance Saudi SG planning and control via artificial intelligence (AI). Recently, many AI methods including deep learning (DL) algorithms for SG applications have been published in the literature and have shown superior time series predictions compared with conventional prediction models. Current load-prediction research for the Saudi grid focuses on identifying anticipated loads and consumptions, on utilizing limited historical data and the behavior of the load’s consumption, and on conducting shallow forecasting models. However, little scientific proof on complex DL models or real-life application has been conducted by researchers; few articles have studied sophisticated large-scale prediction models for Saudi grids. This paper proposes hybrid DL methods to enhance the outcomes in Saudi SG load forecasting, to improve problem-relevant features, and to accurately predict complicated power consumption, with the goal of developing reliable forecasting models and of obtaining knowledge of the relationships between the various features and attributes in the Saudi SGs. The model in this paper utilizes a real dataset from the Jeddah and Medinah grids in Saudi Arabia for a full year, 2021, with a one-hour time resolution. A benchmark strategy using different conventional DL methods including artificial neural network, recurrent neural network (RNN), conventional neural networks (CNN), long short-term memory (LSTM), gated recurrent unit (GRU), and different real datasets is used to verify the proposed models. The prediction results demonstrate the effectiveness of the proposed hybrid DL models, with CNN–GRU and CNN–RNN with NRMSE obtaining 1.4673% and 1.222% improvements, respectively, in load forecasting accuracy. Full article
Show Figures

Figure 1

16 pages, 2447 KiB  
Article
Ensemble Voting Regression Based on Machine Learning for Predicting Medical Waste: A Case from Turkey
by Babek Erdebilli and Burcu Devrim-İçtenbaş
Mathematics 2022, 10(14), 2466; https://0-doi-org.brum.beds.ac.uk/10.3390/math10142466 - 15 Jul 2022
Cited by 17 | Viewed by 3648
Abstract
Predicting medical waste (MW) properly is vital for an effective waste management system (WMS), but it is difficult because of inadequate data and various factors that impact MW. This study’s primary objective was to develop an ensemble voting regression algorithm based on machine [...] Read more.
Predicting medical waste (MW) properly is vital for an effective waste management system (WMS), but it is difficult because of inadequate data and various factors that impact MW. This study’s primary objective was to develop an ensemble voting regression algorithm based on machine learning (ML) algorithms such as random forests (RFs), gradient boosting machines (GBMs), and adaptive boosting (AdaBoost) to predict the MW for Istanbul, the largest city in Turkey. This was the first study to use ML algorithms to predict MW, to our knowledge. First, three ML algorithms were developed based on official data. To compare their performances, performance measures such as mean absolute deviation (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R-squared) were calculated. Among the standalone ML models, RF achieved the best performance. Then, these base models were used to construct the proposed ensemble voting regression (VR) model utilizing weighted averages according to the base models’ performances. The proposed model outperformed three baseline models, with the lowest RMSE (843.70). This study gives an effective tool to practitioners and decision-makers for planning and constructing medical waste management systems by predicting the MW quantity. Full article
Show Figures

Figure 1

16 pages, 1949 KiB  
Article
Bayesian Information Criterion for Fitting the Optimum Order of Markov Chain Models: Methodology and Application to Air Pollution Data
by Yousif Alyousifi, Kamarulzaman Ibrahim, Mahmod Othamn, Wan Zawiah Wan Zin, Nicolas Vergne and Abdullah Al-Yaari
Mathematics 2022, 10(13), 2280; https://0-doi-org.brum.beds.ac.uk/10.3390/math10132280 - 29 Jun 2022
Cited by 2 | Viewed by 1319
Abstract
The analysis of air pollution behavior is becoming crucial, where information on air pollution behavior is vital for managing air quality events. Many studies have described the stochastic behavior of air pollution based on the Markov chain (MC) models. Fitting the optimum order [...] Read more.
The analysis of air pollution behavior is becoming crucial, where information on air pollution behavior is vital for managing air quality events. Many studies have described the stochastic behavior of air pollution based on the Markov chain (MC) models. Fitting the optimum order of MC models is essential for describing the stochastic process. However, uncertainty remains concerning the optimum order of such models for representing and characterizing air pollution index (API) data. In this study, the optimum order of the MC models for hourly and daily API sequences from seven stations in the central region of Peninsular Malaysia is identified, based on the Bayesian information criteria (BIC), contributing to exploring an adequate explanation of the probabilistic dependence of air pollution. A summary of the statistics for the API was calculated prior to the analysis. The Markov property and the divergence for the empirically estimated transition matrix of an MC sequence are also investigated. It is found from the analysis that the optimum order varies from one station to another. At most stations, for both observed and simulated API data, the second and third orders of the MC models are found to be optimum for hourly API occurrences, while the first-order MC is found to be most fitting for describing the dynamics of the daily API. Overall, fitting the optimum order of the MC model for the API data sequence captured the delay effect of air pollution. Accordingly, we concluded that the air quality standard lies within controllable limits, except for some infrequent occurrences of API values exceeding the unhealthy level. Full article
Show Figures

Figure 1

18 pages, 1146 KiB  
Article
Analysis of Machine Learning Approaches’ Performance in Prediction Problems with Human Activity Patterns
by Ricardo Torres-López, David Casillas-Pérez, Jorge Pérez-Aracil, Laura Cornejo-Bueno, Enrique Alexandre and Sancho Salcedo-Sanz
Mathematics 2022, 10(13), 2187; https://0-doi-org.brum.beds.ac.uk/10.3390/math10132187 - 23 Jun 2022
Cited by 4 | Viewed by 1178
Abstract
Prediction problems in timed datasets related to human activities are especially difficult to solve, because of the specific characteristics and the scarce number of predictive (input) variables available to tackle these problems. In this paper, we try to find out whether Machine Learning [...] Read more.
Prediction problems in timed datasets related to human activities are especially difficult to solve, because of the specific characteristics and the scarce number of predictive (input) variables available to tackle these problems. In this paper, we try to find out whether Machine Learning (ML) approaches can be successfully applied to these problems. We deal with timed datasets with human activity patterns, in which the input variables are exclusively related to the day or type of day when the prediction is carried out and, usually, to the meteorology of those days. These problems with a marked human activity pattern frequently appear in mobility and traffic-related problems, delivery prediction (packets, food), and many other activities, usually in cities. We evaluate the performance in these problems of different ML methods such as artificial neural networks (multi-layer perceptrons, extreme learning machines) and support vector regression algorithms, together with an Analogue-type (KNN) approach, which serves as a baseline algorithm and provides information about when it is expected that ML approaches will fail, by looking for similar situations in the past. The considered ML algorithms are evaluated in four real prediction problems with human activity patterns, such as school absences, bike-sharing demand, parking occupation, and packets delivered in a post office. The results obtained show the good performance of the ML algorithms, revealing that they can deal with scarce information in all the problems considered. The results obtained have also revealed the importance of including meteorology as the input variables, showing that meteorology is frequently behind demand peaks or valleys in this kind of problem. Finally, we show that having a number of similar situations in the past (training set) prevents ML algorithms from making important mistakes in the prediction obtained. Full article
Show Figures

Figure 1

16 pages, 2603 KiB  
Article
Application of Data Science for Cluster Analysis of COVID-19 Mortality According to Sociodemographic Factors at Municipal Level in Mexico
by Joaquín Pérez-Ortega, Nelva Nely Almanza-Ortega, Kirvis Torres-Poveda, Gerardo Martínez-González, José Crispín Zavala-Díaz and Rodolfo Pazos-Rangel
Mathematics 2022, 10(13), 2167; https://0-doi-org.brum.beds.ac.uk/10.3390/math10132167 - 22 Jun 2022
Cited by 4 | Viewed by 1900
Abstract
Mexico is among the five countries with the largest number of reported deaths from COVID-19 disease, and the mortality rates associated to infections are heterogeneous in the country due to structural factors concerning population. This study aims at the analysis of clusters related [...] Read more.
Mexico is among the five countries with the largest number of reported deaths from COVID-19 disease, and the mortality rates associated to infections are heterogeneous in the country due to structural factors concerning population. This study aims at the analysis of clusters related to mortality rate from COVID-19 at the municipal level in Mexico from the perspective of Data Science. In this sense, a new application is presented that uses a machine learning hybrid algorithm for generating clusters of municipalities with similar values of sociodemographic indicators and mortality rates. To provide a systematic framework, we applied an extension of the International Business Machines Corporation (IBM) methodology called Batch Foundation Methodology for Data Science (FMDS). For the study, 1,086,743 death certificates corresponding to the year 2020 were used, among other official data. As a result of the analysis, two key indicators related to mortality from COVID-19 at the municipal level were identified: one is population density and the other is percentage of population in poverty. Based on these indicators, 16 municipality clusters were determined. Among the main results of this research, it was found that clusters with high values of mortality rate had high values of population density and low poverty levels. In contrast, clusters with low density values and high poverty levels had low mortality rates. Finally, we think that the patterns found, expressed as municipality clusters with similar characteristics, can be useful for decision making by health authorities regarding disease prevention and control for reinforcing public health measures and optimizing resource distribution for reducing hospitalizations and mortality. Full article
Show Figures

Figure 1

13 pages, 8243 KiB  
Article
A Masked Self-Supervised Pretraining Method for Face Parsing
by Zhuang Li, Leilei Cao, Hongbin Wang and Lihong Xu
Mathematics 2022, 10(12), 2002; https://0-doi-org.brum.beds.ac.uk/10.3390/math10122002 - 10 Jun 2022
Cited by 1 | Viewed by 1798
Abstract
Face Parsing aims to partition the face into different semantic parts, which can be applied into many downstream tasks, e.g., face mask up, face swapping, and face animation. With the popularity of cameras, it is easier to acquire facial images. However, pixel-wise manually [...] Read more.
Face Parsing aims to partition the face into different semantic parts, which can be applied into many downstream tasks, e.g., face mask up, face swapping, and face animation. With the popularity of cameras, it is easier to acquire facial images. However, pixel-wise manually labeling is time-consuming and labor-intensive, which motivates us to explore the unlabeled data. In this paper, we present a self-supervised learning method attempting to make full use of the unlabeled facial images for face parsing. In particular, we randomly mask some patches in the central area of facial images, and the model is required to reconstruct the masked patches. This self-supervised pretraining is capable of making the model capture facial feature representations through these unlabeled data. After self-supervised pretraining, the model is fine-tuned on a few labeled data for the face parsing task. Experimental results show that the model achieves better performance for face parsing assisted by the self-supervised pretraining, which greatly decreases the labeling cost. Our approach achieves 74.41 mIoU on the LaPa test set fine-tuned on only 0.2% of the labeled data of the whole training data, surpassing the model that is directly trained by a large margin of +5.02 mIoU. In addition, our approach achieves a new state-of-the-art on the LaPa and CelebAMask-HQ test set. Full article
Show Figures

Figure 1

24 pages, 502 KiB  
Article
An Equity-Based Optimization Model to Solve the Location Problem for Healthcare Centers Applied to Hospital Beds and COVID-19 Vaccination
by Erwin J. Delgado, Xavier Cabezas, Carlos Martin-Barreiro, Víctor Leiva and Fernando Rojas
Mathematics 2022, 10(11), 1825; https://0-doi-org.brum.beds.ac.uk/10.3390/math10111825 - 26 May 2022
Cited by 8 | Viewed by 2491
Abstract
Governments must consider different issues when deciding on the location of healthcare centers. In addition to the costs of opening such centers, three further elements should be addressed: accessibility, demand, and equity. Such locations must be chosen to meet the corresponding demand, so [...] Read more.
Governments must consider different issues when deciding on the location of healthcare centers. In addition to the costs of opening such centers, three further elements should be addressed: accessibility, demand, and equity. Such locations must be chosen to meet the corresponding demand, so that they guarantee a socially equitable distribution, and to ensure that they are accessible to a sufficient degree. The location of the centers must be chosen from a set of possible facilities to guarantee certain minimum standards for the operational viability of the centers. Since the set of potential locations does not necessarily cover the demand of all geographical zones, the efficiency criterion must be maximized. However, the efficient distribution of resources does not necessarily meet the equity criterion. Thus, decision-makers must consider the trade-off between these two criteria: efficiency and equity. The described problem corresponds to the challenge that governments face in seeking to minimize the impact of the pandemic on citizens, where healthcare centers may be either public hospitals that care for COVID-19 patients or vaccination points. In this paper, we focus on the problem of a zone-divided region requiring the localization of healthcare centers. We propose a non-linear programming model to solve this problem based on a coverage formula using the Gini index to measure equity and accessibility. Then, we consider an approach using epsilon constraints that makes this problem solvable with mixed integer linear computations at each iteration. A simulation algorithm is also considered to generate problem instances, while computational experiments are carried out to show the potential use of the proposed mathematical programming model. The results show that the spatial distribution influences the coverage level of the healthcare system. Nevertheless, this distribution does not reduce inequity at accessible healthcare centers, as the distribution of the supply of health centers must be incorporated into the decision-making process. Full article
Show Figures

Figure 1

23 pages, 984 KiB  
Article
Vasicek Quantile and Mean Regression Models for Bounded Data: New Formulation, Mathematical Derivations, and Numerical Applications
by Josmar Mazucheli, Bruna Alves, Mustafa Ç. Korkmaz and Víctor Leiva
Mathematics 2022, 10(9), 1389; https://0-doi-org.brum.beds.ac.uk/10.3390/math10091389 - 21 Apr 2022
Cited by 12 | Viewed by 2144
Abstract
The Vasicek distribution is a two-parameter probability model with bounded support on the open unit interval. This distribution allows for different and flexible shapes and plays an important role in many statistical applications, especially for modeling default rates in the field of finance. [...] Read more.
The Vasicek distribution is a two-parameter probability model with bounded support on the open unit interval. This distribution allows for different and flexible shapes and plays an important role in many statistical applications, especially for modeling default rates in the field of finance. Although its probability density function resembles some well-known distributions, such as the beta and Kumaraswamy models, the Vasicek distribution has not been considered to analyze data on the unit interval, especially when we have, in addition to a response variable, one or more covariates. In this paper, we propose to estimate quantiles or means, conditional on covariates, assuming that the response variable is Vasicek distributed. Through appropriate link functions, two Vasicek regression models for data on the unit interval are formulated: one considers a quantile parameterization and another one its original parameterization. Monte Carlo simulations are provided to assess the statistical properties of the maximum likelihood estimators, as well as the coverage probability. An R package developed by the authors, named vasicekreg, makes available the results of the present investigation. Applications with two real data sets are conducted for illustrative purposes: in one of them, the unit Vasicek quantile regression outperforms the models based on the Johnson-SB, Kumaraswamy, unit-logistic, and unit-Weibull distributions, whereas in the second one, the unit Vasicek mean regression outperforms the fits obtained by the beta and simplex distributions. Our investigation suggests that unit Vasicek quantile and mean regressions can be of practical usage as alternatives to some well-known models for analyzing data on the unit interval. Full article
Show Figures

Figure 1

37 pages, 15951 KiB  
Article
Abnormality Detection and Failure Prediction Using Explainable Bayesian Deep Learning: Methodology and Case Study with Industrial Data
by Ahmad Kamal Mohd Nor, Srinivasa Rao Pedapati, Masdi Muhammad and Víctor Leiva
Mathematics 2022, 10(4), 554; https://0-doi-org.brum.beds.ac.uk/10.3390/math10040554 - 11 Feb 2022
Cited by 24 | Viewed by 3498
Abstract
Mistrust, amplified by numerous artificial intelligence (AI) related incidents, is an issue that has caused the energy and industrial sectors to be amongst the slowest adopter of AI methods. Central to this issue is the black-box problem of AI, which impedes investments and [...] Read more.
Mistrust, amplified by numerous artificial intelligence (AI) related incidents, is an issue that has caused the energy and industrial sectors to be amongst the slowest adopter of AI methods. Central to this issue is the black-box problem of AI, which impedes investments and is fast becoming a legal hazard for users. Explainable AI (XAI) is a recent paradigm to tackle such an issue. Being the backbone of the industry, the prognostic and health management (PHM) domain has recently been introduced into XAI. However, many deficiencies, particularly the lack of explanation assessment methods and uncertainty quantification, plague this young domain. In the present paper, we elaborate a framework on explainable anomaly detection and failure prognostic employing a Bayesian deep learning model and Shapley additive explanations (SHAP) to generate local and global explanations from the PHM tasks. An uncertainty measure of the Bayesian model is utilized as a marker for anomalies and expands the prognostic explanation scope to include the model’s confidence. In addition, the global explanation is used to improve prognostic performance, an aspect neglected from the handful of studies on PHM-XAI. The quality of the explanation is examined employing local accuracy and consistency properties. The elaborated framework is tested on real-world gas turbine anomalies and synthetic turbofan failure prediction data. Seven out of eight of the tested anomalies were successfully identified. Additionally, the prognostic outcome showed a 19% improvement in statistical terms and achieved the highest prognostic score amongst best published results on the topic. Full article
Show Figures

Figure 1

15 pages, 4181 KiB  
Article
Nonlinear Regression-Based GNSS Multipath Modelling in Deep Urban Area
by Yongjun Lee and Byungwoon Park
Mathematics 2022, 10(3), 412; https://0-doi-org.brum.beds.ac.uk/10.3390/math10030412 - 27 Jan 2022
Cited by 17 | Viewed by 2405
Abstract
As the necessity of location information closely related to everyday life has increased, the use of global navigation satellite systems (GNSS) has gradually increased in populated urban areas. Contrary to the high necessity and expectation of GNSS in urban areas, GNSS performance is [...] Read more.
As the necessity of location information closely related to everyday life has increased, the use of global navigation satellite systems (GNSS) has gradually increased in populated urban areas. Contrary to the high necessity and expectation of GNSS in urban areas, GNSS performance is easily degraded by multipath errors due to high-rise buildings and is very difficult to guarantee. Errors in the signals reflected by the buildings, i.e., multipath and non-line-of-sight (NLOS) errors, are the major cause of the poor accuracy in urban areas. Unlike other GNSS major error sources, the reflected signal error, which is a user-dependent error, is difficult to differentiate or model. This paper suggests training a multipath prediction model based on support vector regression to obtain a function of the elevation and azimuth angle of each satellite. To extract an unbiased multipath from the GNSS measurements, the clock error of high-elevation QZSS was estimated, and the clock offset with other constellations was also calculated. A nonlinear multipath map was generated, as a result of training with the extracted multipaths, by a Support Vector Machine, which appropriately reflected the geometry of the building near the user. The model was effective at improving the urban area positioning accuracy by 58.4% horizontally and 77.7% vertically, allowing us to achieve a 20 m accuracy level in a deep urban area, Teheran-ro, Seoul, Korea. Full article
Show Figures

Figure 1

17 pages, 3549 KiB  
Article
Machine Learning Models to Predict Critical Episodes of Environmental Pollution for PM2.5 and PM10 in Talca, Chile
by Gonzálo Carreño, Xaviera A. López-Cortés and Carolina Marchant
Mathematics 2022, 10(3), 373; https://0-doi-org.brum.beds.ac.uk/10.3390/math10030373 - 26 Jan 2022
Cited by 7 | Viewed by 2892
Abstract
One of the main environmental problems that affects people’s health and quality of life is air pollution by particulate matter. Chile has nine of the ten most polluted cities in South America according to a report presented in 2019 by Greenpeace and AirVisual [...] Read more.
One of the main environmental problems that affects people’s health and quality of life is air pollution by particulate matter. Chile has nine of the ten most polluted cities in South America according to a report presented in 2019 by Greenpeace and AirVisual that measured the air quality index based on the levels of fine particles. Most Chilean cities are highly contaminated by particulate matter, especially during the months of April to August (the critical episode management period). The objective of this study is to predict particulate matter levels based on meteorological and climatic features, such as temperature, wind speed, wind direction, precipitation and relative air humidity in Talca, Chile, during the critical episode management periods between 2014 and 2018. Predictive models based on machine learning techniques were used, considering training datasets with meteorological and climatic data, and particulate matter levels from the three air quality monitoring stations in Talca, Chile. We carried out the training of 24 models to predict particulate matter levels considering the 24-h average and average between 05:00 to 11:00 p.m. For the model testing, data from the year 2018 during the critical episode management period were used. The obtained results indicate that our models are able to effectively predict levels of particulate matter, enabling correct management of critical episodes, especially for alert, pre-emergency and emergency conditions. We used the cross-platform and open-source programming language Python for the development and implementation of the proposed models and R-project for some visualizations. Full article
Show Figures

Figure 1

30 pages, 8408 KiB  
Article
Estimation of the Instantaneous Reproduction Number and Its Confidence Interval for Modeling the COVID-19 Pandemic
by Publio Darío Cortés-Carvajal, Mitzi Cubilla-Montilla and David Ricardo González-Cortés
Mathematics 2022, 10(2), 287; https://0-doi-org.brum.beds.ac.uk/10.3390/math10020287 - 17 Jan 2022
Cited by 5 | Viewed by 2429
Abstract
In this paper, we derive an optimal model for calculating the instantaneous reproduction number, which is an important metric to help in controlling the evolution of epidemics. Our approach, within a frequentist framework, gave us the opportunity to calculate a more realistic [...] Read more.
In this paper, we derive an optimal model for calculating the instantaneous reproduction number, which is an important metric to help in controlling the evolution of epidemics. Our approach, within a frequentist framework, gave us the opportunity to calculate a more realistic confidence interval, a fundamental tool for a safe interpretation of the instantaneous reproduction number value, so that health and governmental people pay more attention to it. Our reasoning begins by decoupling the incidence data in mean and Gaussian noise by using practical series analysis techniques; then, we continue with a likely relationship between the present and past incidence data. Monte Carlo simulations and numerical integrations were conducted to complement the analytical proofs, and illustrations are provided for each stage of analysis to validate the analytical results. Finally, a real case study is discussed with the incidence data of the Republic of Panama regarding the COVID-19 pandemic. We have shown that, for the calculation of the confidence interval of the instantaneous reproduction number, it is essential to include all sources of variability, not only the Poissonian processes of the incidences. This proposal is delivered with analysis tools developed with Microsoft Excel. Full article
Show Figures

Figure 1

20 pages, 829 KiB  
Article
Bayesian Constitutionalization: Twitter Sentiment Analysis of the Chilean Constitutional Process through Bayesian Network Classifiers
by Gonzalo A. Ruz, Pablo A. Henríquez and Aldo Mascareño
Mathematics 2022, 10(2), 166; https://0-doi-org.brum.beds.ac.uk/10.3390/math10020166 - 06 Jan 2022
Cited by 5 | Viewed by 1777
Abstract
Constitutional processes are a cornerstone of modern democracies. Whether revolutionary or institutionally organized, they establish the core values of social order and determine the institutional architecture that governs social life. Constitutional processes are themselves evolutionary practices of mutual learning in which actors, regardless [...] Read more.
Constitutional processes are a cornerstone of modern democracies. Whether revolutionary or institutionally organized, they establish the core values of social order and determine the institutional architecture that governs social life. Constitutional processes are themselves evolutionary practices of mutual learning in which actors, regardless of their initial political positions, continuously interact with each other, demonstrating differences and making alliances regarding different topics. In this article, we develop Tree Augmented Naive Bayes (TAN) classifiers to model the behavior of constituent agents. According to the nature of the constituent dynamics, weights are learned by the model from the data using an evolution strategy to obtain a good classification performance. For our analysis, we used the constituent agents’ communications on Twitter during the installation period of the Constitutional Convention (July–October 2021). In order to differentiate political positions (left, center, right), we applied the developed algorithm to obtain the scores of 882 ballots cast in the first stage of the convention (4 July to 29 September 2021). Then, we used k-means to identify three clusters containing right-wing, center, and left-wing positions. Experimental results obtained using the three constructed datasets showed that using alternative weight values in the TAN construction procedure, inferred by an evolution strategy, yielded improvements in the classification accuracy measured in the test sets compared to the results of the TAN constructed with conditional mutual information, as well as other Bayesian network classifier construction approaches. Additionally, our results may help us to better understand political behavior in constitutional processes and to improve the accuracy of TAN classifiers applied to social, real-world data. Full article
Show Figures

Figure 1

19 pages, 828 KiB  
Article
An Algebraic Approach to Clustering and Classification with Support Vector Machines
by Güvenç Arslan, Uğur Madran and Duygu Soyoğlu
Mathematics 2022, 10(1), 128; https://0-doi-org.brum.beds.ac.uk/10.3390/math10010128 - 01 Jan 2022
Cited by 5 | Viewed by 1799
Abstract
In this note, we propose a novel classification approach by introducing a new clustering method, which is used as an intermediate step to discover the structure of a data set. The proposed clustering algorithm uses similarities and the concept of a clique to [...] Read more.
In this note, we propose a novel classification approach by introducing a new clustering method, which is used as an intermediate step to discover the structure of a data set. The proposed clustering algorithm uses similarities and the concept of a clique to obtain clusters, which can be used with different strategies for classification. This approach also reduces the size of the training data set. In this study, we apply support vector machines (SVMs) after obtaining clusters with the proposed clustering algorithm. The proposed clustering algorithm is applied with different strategies for applying SVMs. The results for several real data sets show that the performance is comparable with the standard SVM while reducing the size of the training data set and also the number of support vectors. Full article
Show Figures

Figure 1

19 pages, 707 KiB  
Article
A Novel Maximum Mean Discrepancy-Based Semi-Supervised Learning Algorithm
by Qihang Huang, Yulin He and Zhexue Huang
Mathematics 2022, 10(1), 39; https://0-doi-org.brum.beds.ac.uk/10.3390/math10010039 - 23 Dec 2021
Viewed by 2302
Abstract
To provide more external knowledge for training self-supervised learning (SSL) algorithms, this paper proposes a maximum mean discrepancy-based SSL (MMD-SSL) algorithm, which trains a well-performing classifier by iteratively refining the classifier using highly confident unlabeled samples. The MMD-SSL algorithm performs three main steps. [...] Read more.
To provide more external knowledge for training self-supervised learning (SSL) algorithms, this paper proposes a maximum mean discrepancy-based SSL (MMD-SSL) algorithm, which trains a well-performing classifier by iteratively refining the classifier using highly confident unlabeled samples. The MMD-SSL algorithm performs three main steps. First, a multilayer perceptron (MLP) is trained based on the labeled samples and is then used to assign labels to unlabeled samples. Second, the unlabeled samples are divided into multiple groups with the k-means clustering algorithm. Third, the maximum mean discrepancy (MMD) criterion is used to measure the distribution consistency between k-means-clustered samples and MLP-classified samples. The samples having a consistent distribution are labeled as highly confident samples and used to retrain the MLP. The MMD-SSL algorithm performs an iterative training until all unlabeled samples are consistently labeled. We conducted extensive experiments on 29 benchmark data sets to validate the rationality and effectiveness of the MMD-SSL algorithm. Experimental results show that the generalization capability of the MLP algorithm can gradually improve with the increase of labeled samples and the statistical analysis demonstrates that the MMD-SSL algorithm can provide better testing accuracy and kappa values than 10 other self-training and co-training SSL algorithms. Full article
Show Figures

Figure 1

22 pages, 3991 KiB  
Article
Bayesian Framework for Multi-Wave COVID-19 Epidemic Analysis Using Empirical Vaccination Data
by Jiawei Xu and Yincai Tang
Mathematics 2022, 10(1), 21; https://0-doi-org.brum.beds.ac.uk/10.3390/math10010021 - 21 Dec 2021
Cited by 8 | Viewed by 3344
Abstract
The COVID-19 pandemic has highlighted the necessity of advanced modeling inference using the limited data of daily cases. Tracking a long-term epidemic trajectory requires explanatory modeling with more complexities than the one with short-time forecasts, especially for the highly vaccinated scenario in the [...] Read more.
The COVID-19 pandemic has highlighted the necessity of advanced modeling inference using the limited data of daily cases. Tracking a long-term epidemic trajectory requires explanatory modeling with more complexities than the one with short-time forecasts, especially for the highly vaccinated scenario in the latest phase. With this work, we propose a novel modeling framework that combines an epidemiological model with Bayesian inference to perform an explanatory analysis on the spreading of COVID-19 in Israel. The Bayesian inference is implemented on a modified SEIR compartmental model supplemented by real-time vaccination data and piecewise transmission and infectious rates determined by change points. We illustrate the fitted multi-wave trajectory in Israel with the checkpoints of major changes in publicly announced interventions or critical social events. The result of our modeling framework partly reflects the impact of different stages of mitigation strategies as well as the vaccination effectiveness, and provides forecasts of near future scenarios. Full article
Show Figures

Figure 1

18 pages, 692 KiB  
Article
Event Study: Advanced Machine Learning and Statistical Technique for Analyzing Sustainability in Banking Stocks
by Varun Dogra, Aman Singh, Sahil Verma, Abdullah Alharbi and Wael Alosaimi
Mathematics 2021, 9(24), 3319; https://0-doi-org.brum.beds.ac.uk/10.3390/math9243319 - 20 Dec 2021
Cited by 7 | Viewed by 3371
Abstract
Machine learning has grown in popularity in recent years as a method for evaluating financial text data, with promising results in stock price projection from financial news. Various research has looked at the relationship between news events and stock prices, but there is [...] Read more.
Machine learning has grown in popularity in recent years as a method for evaluating financial text data, with promising results in stock price projection from financial news. Various research has looked at the relationship between news events and stock prices, but there is little evidence on how different sentiments (negative, neutral, and positive) of such events impact the performance of stocks or indices in comparison to benchmark indices. The goal of this paper is to analyze how a specific banking news event (such as a fraud or a bank merger) and other co-related news events (such as government policies or national elections), as well as the framing of both the news event and news-event sentiment, impair the formation of the respective bank’s stock and the banking index, i.e., Bank Nifty, in Indian stock markets over time. The task is achieved through three phases. In the first phase, we extract the banking and other co-related news events from the pool of financial news. The news events are further categorized into negative, positive, and neutral sentiments in the second phase. This study covers the third phase of our research work, where we analyze the impact of news events concerning sentiments or linguistics in the price movement of the respective bank’s stock, identified or recognized from these news events, against benchmark index Bank Nifty and the banking index against benchmark index Nifty50 for the short to long term. For the short term, we analyzed the movement of banking stock or index to benchmark index in terms of CARs (cumulative abnormal returns) surrounding the publication day (termed as D) of the news event in the event windows of (−1,D), (D,1), (−1,1), (D,5), (−5,−1), and (−5,5). For the long term, we analyzed the movement of banking stock or index to benchmark index in the event windows of (D,30), (−30,−1), (−30,30), (D,60), (−60,−1), and (−60,60). We explore the deep learning model, bidirectional encoder representations from transformers, and statistical method CAPM for this research. Full article
Show Figures

Figure 1

26 pages, 1669 KiB  
Article
A Hybrid Model with Spherical Fuzzy-AHP, PLS-SEM and ANN to Predict Vaccination Intention against COVID-19
by Phi-Hung Nguyen, Jung-Fa Tsai, Ming-Hua Lin and Yi-Chung Hu
Mathematics 2021, 9(23), 3075; https://0-doi-org.brum.beds.ac.uk/10.3390/math9233075 - 29 Nov 2021
Cited by 23 | Viewed by 3456
Abstract
This study aims to identify the key factors affecting individuals’ behavioral vaccination intention against COVID-19 in Vietnam through an online questionnaire survey. Differing from previous studies, a novel three-staged approach combining Spherical Fuzzy Analytic Hierarchy Process (SF-AHP), Partial Least Squares-Structural Equation Model (PLS-SEM), [...] Read more.
This study aims to identify the key factors affecting individuals’ behavioral vaccination intention against COVID-19 in Vietnam through an online questionnaire survey. Differing from previous studies, a novel three-staged approach combining Spherical Fuzzy Analytic Hierarchy Process (SF-AHP), Partial Least Squares-Structural Equation Model (PLS-SEM), and Artificial Neural Network (ANN) is proposed. Five factors associated with individuals’ behavioral vaccination intention (INT) based on 15 experts’ opinions are considered in SF-AHP analysis, including Perceived Severity of COVID-19 (PSC), Perceived COVID-19 vaccines (PVC), Trust in government intervention strategies (TRS), Social Influence (SOI), and Social media (SOM). First, the results of SF-AHP indicated that all proposed factors correlate with INT. Second, the data of 474 valid respondents were collected and analyzed using PLS-SEM. The PLS-SEM results reported that INT was directly influenced by PVC and TRS. In contrast, SOI had no direct effect on INT. Further, PSC and SOM moderated the relationship between PVC, TRS and INT, respectively. The ANN was deployed to validate the previous stages and found that the best predictors of COVID-19 vaccination intention were PVC, TRS, and SOM. These results were consistent with the SF-AHP and PLS-SEM models. This research provides an innovative new approach employing quantitative and qualitative techniques to understand individuals’ vaccination intention during the global pandemic. Furthermore, the proposed method can be used and expanded to assess the perceived efficacy of COVID-19 measures in other nations currently battling the COVID-19 outbreak. Full article
Show Figures

Figure 1

27 pages, 9351 KiB  
Article
Real-World Data-Driven Machine-Learning-Based Optimal Sensor Selection Approach for Equipment Fault Detection in a Thermal Power Plant
by Salman Khalid, Hyunho Hwang and Heung Soo Kim
Mathematics 2021, 9(21), 2814; https://0-doi-org.brum.beds.ac.uk/10.3390/math9212814 - 05 Nov 2021
Cited by 14 | Viewed by 3156
Abstract
Due to growing electricity demand, developing an efficient fault-detection system in thermal power plants (TPPs) has become a demanding issue. The most probable reason for failure in TPPs is equipment (boiler and turbine) fault. Advance detection of equipment fault can help secure maintenance [...] Read more.
Due to growing electricity demand, developing an efficient fault-detection system in thermal power plants (TPPs) has become a demanding issue. The most probable reason for failure in TPPs is equipment (boiler and turbine) fault. Advance detection of equipment fault can help secure maintenance shutdowns and enhance the capacity utilization rates of the equipment. Recently, an intelligent fault diagnosis based on multivariate algorithms has been introduced in TPPs. In TPPs, a huge number of sensors are used for process maintenance. However, not all of these sensors are sensitive to fault detection. The previous studies just relied on the experts’ provided data for equipment fault detection in TPPs. However, the performance of multivariate algorithms for fault detection is heavily dependent on the number of input sensors. The redundant and irrelevant sensors may reduce the performance of these algorithms, thus creating a need to determine the optimal sensor arrangement for efficient fault detection in TPPs. Therefore, this study proposes a novel machine-learning-based optimal sensor selection approach to analyze the boiler and turbine faults. Finally, real-world power plant equipment fault scenarios (boiler water wall tube leakage and turbine electric motor failure) are employed to verify the performance of the proposed model. The computational results indicate that the proposed approach enhanced the computational efficiency of machine-learning models by reducing the number of sensors up to 44% in the water wall tube leakage case scenario and 55% in the turbine motor fault case scenario. Further, the machine-learning performance is improved up to 97.6% and 92.6% in the water wall tube leakage and turbine motor fault case scenarios, respectively. Full article
Show Figures

Figure 1

19 pages, 526 KiB  
Article
A New Birnbaum–Saunders Distribution and Its Mathematical Features Applied to Bimodal Real-World Data from Environment and Medicine
by Jimmy Reyes, Jaime Arrué, Víctor Leiva and Carlos Martin-Barreiro
Mathematics 2021, 9(16), 1891; https://0-doi-org.brum.beds.ac.uk/10.3390/math9161891 - 09 Aug 2021
Cited by 4 | Viewed by 1985
Abstract
In this paper, we propose and derive a Birnbaum–Saunders distribution to model bimodal data. This new distribution is obtained using the product of the standard Birnbaum–Saunders distribution and a polynomial function of the fourth degree. We study the mathematical and statistical properties of [...] Read more.
In this paper, we propose and derive a Birnbaum–Saunders distribution to model bimodal data. This new distribution is obtained using the product of the standard Birnbaum–Saunders distribution and a polynomial function of the fourth degree. We study the mathematical and statistical properties of the bimodal Birnbaum–Saunders distribution, including probabilistic features and moments. Inference on its parameters is conducted using the estimation methods of moments and maximum likelihood. Based on the acceptance–rejection criterion, an algorithm is proposed to generate values of a random variable that follows the new bimodal Birnbaum–Saunders distribution. We carry out a simulation study using the Monte Carlo method to assess the statistical performance of the parameter estimators. Illustrations with real-world data sets from environmental and medical sciences are provided to show applications that can be of potential use in real problems. Full article
Show Figures

Figure 1

13 pages, 1434 KiB  
Article
Modeling COVID-19 Cases Statistically and Evaluating Their Effect on the Economy of Countries
by Hanns de la Fuente-Mella, Rolando Rubilar, Karime Chahuán-Jiménez and Víctor Leiva
Mathematics 2021, 9(13), 1558; https://0-doi-org.brum.beds.ac.uk/10.3390/math9131558 - 02 Jul 2021
Cited by 31 | Viewed by 4269
Abstract
COVID-19 infections have plagued the world and led to deaths with a heavy pneumonia manifestation. The main objective of this investigation is to evaluate the performance of certain economies during the crisis derived from the COVID-19 pandemic. The gross domestic product (GDP) and [...] Read more.
COVID-19 infections have plagued the world and led to deaths with a heavy pneumonia manifestation. The main objective of this investigation is to evaluate the performance of certain economies during the crisis derived from the COVID-19 pandemic. The gross domestic product (GDP) and global health security index (GHSI) of the countries belonging–or not–to the Organization for Economic Cooperation and Development (OECD) are considered. In this paper, statistical models are formulated to study this performance. The models’ specifications include, as the response variable, the GDP variation/growth percentage in 2020, and as the covariates: the COVID-19 disease rate from its start in March 2020 until 31 December 2020; the GHSI of 2019; the countries’ risk by default spreads from July 2019 to May 2020; belongingness or not to the OECD; and the GDP per capita in 2020. We test the heteroscedasticity phenomenon present in the modeling. The variable “COVID-19 cases per million inhabitants” is statistically significant, showing its impact on each country’s economy through the GDP variation. Therefore, we report that COVID-19 cases affect domestic economies, but that OECD membership and other risk factors are also relevant. Full article
Show Figures

Figure 1

15 pages, 799 KiB  
Article
Modeling the Risk of Infectious Diseases Transmitted by Aedes aegypti Using Survival and Aging Statistical Analysis with a Case Study in Colombia
by Henry Velasco, Henry Laniado, Mauricio Toro, Alexandra Catano-López, Víctor Leiva and Yuhlong Lio
Mathematics 2021, 9(13), 1488; https://0-doi-org.brum.beds.ac.uk/10.3390/math9131488 - 24 Jun 2021
Cited by 5 | Viewed by 2110
Abstract
Many infectious diseases are deadly to humans. The Aedes aegypi mosquito is the principal vector of infectious diseases that include chikungunya, dengue, yellow fever, and zika. Some factors such as survival time and aging are vital in its development and capacity to transmit [...] Read more.
Many infectious diseases are deadly to humans. The Aedes aegypi mosquito is the principal vector of infectious diseases that include chikungunya, dengue, yellow fever, and zika. Some factors such as survival time and aging are vital in its development and capacity to transmit the pathogens, which in turn are affected by environmental factors such as temperature. In this paper, we consider aging as the biological wear and tear presented in some mosquito populations over time, whereas survival is considered as the maximum time that a mosquito lives. We propose statistical methods that are commonly used in engineering for reliability analysis to compare transmission riskiness among different mosquitoes. We conducted a case study in three Colombian cities: Bello, Riohacha, and Villavicencio. In this study, we detected that the Aedes aegypi female mosquitoes in Bello live longer than in Riohacha and Villavicencio, and the females in Riohacha live longer than those in Villavicencio. Regarding aging, the females from Riohacha age slower than in Villavicencio and the latter age slower than in Bello. Mosquito populations that age slower are considered young and the other ones are old. In addition, we detected that the females from Bello in the temperature range of 27 C–28 C age slower than those in Bello at higher temperatures. In general, a young female has a higher risk of transmitting a disease to humans than an old female, regardless of its survival time. These findings have not been previously reported in studies of this type of infectious diseases and contributed to new knowledge in biomedicine. Full article
Show Figures

Figure 1

Review

Jump to: Research

18 pages, 1156 KiB  
Review
An Overview of Forecast Analysis with ARIMA Models during the COVID-19 Pandemic: Methodology and Case Study in Brazil
by Raydonal Ospina, João A. M. Gondim, Víctor Leiva and Cecilia Castro
Mathematics 2023, 11(14), 3069; https://0-doi-org.brum.beds.ac.uk/10.3390/math11143069 - 12 Jul 2023
Cited by 14 | Viewed by 4598
Abstract
This comprehensive overview focuses on the issues presented by the pandemic due to COVID-19, understanding its spread and the wide-ranging effects of government-imposed restrictions. The overview examines the utility of autoregressive integrated moving average (ARIMA) models, which are often overlooked in pandemic forecasting [...] Read more.
This comprehensive overview focuses on the issues presented by the pandemic due to COVID-19, understanding its spread and the wide-ranging effects of government-imposed restrictions. The overview examines the utility of autoregressive integrated moving average (ARIMA) models, which are often overlooked in pandemic forecasting due to perceived limitations in handling complex and dynamic scenarios. Our work applies ARIMA models to a case study using data from Recife, the capital of Pernambuco, Brazil, collected between March and September 2020. The research provides insights into the implications and adaptability of predictive methods in the context of a global pandemic. The findings highlight the ARIMA models’ strength in generating accurate short-term forecasts, crucial for an immediate response to slow down the disease’s rapid spread. Accurate and timely predictions serve as the basis for evidence-based public health strategies and interventions, greatly assisting in pandemic management. Our model selection involves an automated process optimizing parameters by using autocorrelation and partial autocorrelation plots, as well as various precise measures. The performance of the chosen ARIMA model is confirmed when comparing its forecasts with real data reported after the forecast period. The study successfully forecasts both confirmed and recovered COVID-19 cases across the preventive plan phases in Recife. However, limitations in the model’s performance are observed as forecasts extend into the future. By the end of the study period, the model’s error substantially increased, and it failed to detect the stabilization and deceleration of cases. The research highlights challenges associated with COVID-19 data in Brazil, such as under-reporting and data recording delays. Despite these limitations, the study emphasizes the potential of ARIMA models for short-term pandemic forecasting while emphasizing the need for further research to enhance long-term predictions. Full article
Show Figures

Figure 1

Back to TopTop