Mathematical Data Science with Applications in Business, Industry, and Medicine

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: 30 June 2024 | Viewed by 8422

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Business Administration, University of Hamburg, 20146 Hamburg, Germany
Interests: actuarial sciences; Artificial Intelligence; biostatistics; business analytics; computational statistics; data science; quantitative risk management; soft computing; statistical inference; statistical quality control
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Faculty of Business Administration, University of Hamburg, 20146 Hamburg, Germany
Interests: Artificial Intelligence; biostatistics; business analytics; computational statistics; data science; fuzzy statistics; quantitative risk management; soft computing; statistical inference; statistical quality control
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Mathematical data science is a field that combines mathematical techniques with data science methods to extract insights and knowledge from data. It includes working with data at all stages of the data life cycle, from collection and storage, to cleansing and processing, analysis and visualization of data, and communication of results and findings. Data scientists use a variety of tools and techniques to analyze data, including mathematical concepts and models, artificial intelligence techniques, machine learning algorithms, statistical analysis, and data visualization. Data science can be used to make predictions, identify patterns, and draw conclusions from data, and it is applied in a variety of areas, including business, industry, and medicine. It is a rapidly evolving field, and data scientists are expected to stay up to date with new tools, techniques, and technologies.

We welcome submissions on the latest developments in mathematical data science and its applications. This Special Issue is intended to highlight the importance of mathematical data science, particularly in the application areas of business, industry, and medicine.

Dr. Arne Johannssen
Dr. Nataliya Chukhrova
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • big data analytics
  • computational statistics
  • data science
  • deep learning
  • hypothesis testing
  • machine learning
  • probability distributions
  • reinforcement learning
  • statistical data analysis

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 8288 KiB  
Article
Addressing Concerns about Single Path Analysis in Business Cycle Turning Points: The Case of Learning Vector Quantization
by David Enck, Mario Beruvides, Víctor G. Tercero-Gómez and Alvaro E. Cordero-Franco
Mathematics 2024, 12(5), 678; https://0-doi-org.brum.beds.ac.uk/10.3390/math12050678 - 26 Feb 2024
Cited by 1 | Viewed by 524
Abstract
Data-driven approaches in machine learning are increasingly applied in economic analysis, particularly for identifying business cycle (BC) turning points. However, temporal dependence in BCs is often overlooked, leading to what we term single path analysis (SPA). SPA neglects the diverse potential routes of [...] Read more.
Data-driven approaches in machine learning are increasingly applied in economic analysis, particularly for identifying business cycle (BC) turning points. However, temporal dependence in BCs is often overlooked, leading to what we term single path analysis (SPA). SPA neglects the diverse potential routes of a temporal data structure. It hinders the evaluation and calibration of algorithms. This study emphasizes the significance of acknowledging temporal dependence in BC analysis and illustrates the problem of SPA using learning vector quantization (LVQ) as a case study. LVQ was previously adapted to use economic indicators to determine the current BC phase, exhibiting flexibility in adapting to evolving patterns. To address temporal complexities, we employed a multivariate Monte Carlo simulation incorporating a specified number of change-points, autocorrelation, and cross-correlations, from a second-order vector autoregressive model. Calibrated with varying levels of observed economic leading indicators, our approach offers a deeper understanding of LVQ’s uncertainties. Our results demonstrate the inadequacy of SPA, unveiling diverse risks and worst-case protection strategies. By encouraging researchers to consider temporal dependence, this study contributes to enhancing the robustness of data-driven approaches in financial and economic analyses, offering a comprehensive framework for addressing SPA concerns. Full article
Show Figures

Figure 1

23 pages, 1072 KiB  
Article
Composite and Mixture Distributions for Heavy-Tailed Data—An Application to Insurance Claims
by Walena Anesu Marambakuyana and Sandile Charles Shongwe
Mathematics 2024, 12(2), 335; https://0-doi-org.brum.beds.ac.uk/10.3390/math12020335 - 19 Jan 2024
Viewed by 609
Abstract
This research provides a comprehensive analysis of two-component non-Gaussian composite models and mixture models for insurance claims data. These models have gained attraction in actuarial literature because they provide flexible methods for curve-fitting. We consider 256 composite models and 256 mixture models derived [...] Read more.
This research provides a comprehensive analysis of two-component non-Gaussian composite models and mixture models for insurance claims data. These models have gained attraction in actuarial literature because they provide flexible methods for curve-fitting. We consider 256 composite models and 256 mixture models derived from 16 popular parametric distributions. The composite models are developed by piecing together two distributions at a threshold value, while the mixture models are developed as convex combinations of two distributions on the same domain. Two real insurance datasets from different industries are considered. Model selection criteria and risk metrics of the top 20 models in each category (composite/mixture) are provided by using the ‘single-best model’ approach. Finally, for each of the datasets, composite models seem to provide better risk estimates. Full article
Show Figures

Figure 1

14 pages, 5476 KiB  
Article
Archimedean Copulas-Based Estimation under One-Parameter Distributions in Coherent Systems
by Ioannis S. Triantafyllou
Mathematics 2024, 12(2), 334; https://0-doi-org.brum.beds.ac.uk/10.3390/math12020334 - 19 Jan 2024
Viewed by 516
Abstract
In the present work we provide a signature-based framework for delivering the estimated mean lifetime along with the variance of the continuous distribution of a coherent system consisting of exchangeable components. The dependency of the components is modelled by the aid of well-known [...] Read more.
In the present work we provide a signature-based framework for delivering the estimated mean lifetime along with the variance of the continuous distribution of a coherent system consisting of exchangeable components. The dependency of the components is modelled by the aid of well-known Archimedean multivariate copulas. The estimated results are calculated under two different copulas, namely the so-called Frank copula and the Joe copula. A numerical experimentation is carried out for illustrating the proposed procedure under all possible coherent systems with three components. Full article
Show Figures

Figure 1

16 pages, 1769 KiB  
Article
A Combined Runs Rules Scheme for Monitoring General Inflated Poisson Processes
by Eftychia Mamzeridou and Athanasios C. Rakitzis
Mathematics 2023, 11(22), 4671; https://0-doi-org.brum.beds.ac.uk/10.3390/math11224671 - 16 Nov 2023
Viewed by 501
Abstract
In this work, a control chart with multiple runs rules is proposed and studied in the case of monitoring inflated processes. Usually, Shewhart-type control charts for attributes do not have a lower control limit, especially when the in-control process mean level is very [...] Read more.
In this work, a control chart with multiple runs rules is proposed and studied in the case of monitoring inflated processes. Usually, Shewhart-type control charts for attributes do not have a lower control limit, especially when the in-control process mean level is very low, such as in the case of processes with a low number of defects per inspected unit. Therefore, it is not possible to detect a decrease in the process mean level. A common solution to this problem is to apply a runs rule on the lower side of the chart. Motivated by this approach, we suggest a Shewhart-type chart, supplemented with two runs rules; one is used for detecting decreases in process mean level, and the other is used for improving the chart’s sensitivity in the detection of small and moderate increasing shifts in the process mean level. Using the Markov chain method, we examine the performance of various schemes in terms of the average run length and the expected average run length. Two illustrative examples for the use of the proposed schemes in practice are also discussed. The numerical results show that the considered schemes can detect efficiently various shifts in process parameters in either direction. Full article
Show Figures

Figure 1

23 pages, 2941 KiB  
Article
Data-Driven Surveillance of Internet Usage Using a Polynomial Profile Monitoring Scheme
by Unarine Netshiozwi, Ali Yeganeh, Sandile Charles Shongwe and Ahmad Hakimi
Mathematics 2023, 11(17), 3650; https://0-doi-org.brum.beds.ac.uk/10.3390/math11173650 - 23 Aug 2023
Cited by 1 | Viewed by 811
Abstract
Control charts, which are one of the major tools in the Statistical Process Control (SPC) domain, are used to monitor a process over time and improve the final quality of a product through variation reduction and defect prevention. As a novel development of [...] Read more.
Control charts, which are one of the major tools in the Statistical Process Control (SPC) domain, are used to monitor a process over time and improve the final quality of a product through variation reduction and defect prevention. As a novel development of control charts, referred to as profile monitoring, the study variable is not defined as a quality characteristic; it is a functional relationship between some explanatory and response variables which are monitored in such a way that the major aim is to check the stability of this model (profile) over time. Most of the previous works in the area of profile monitoring have focused on the development of different theories and assumptions, but very little attention has been paid to the practical application in real-life scenarios in this field of study. To address this knowledge gap, this paper proposes a monitoring framework based on the idea of profile monitoring as a data-driven method to monitor the internet usage of a telecom company. By definition of a polynomial model between the hours of each day and the internet usage within each hour, we propose a framework with three monitoring goals: (i) detection of unnatural patterns, (ii) identifying the impact of policies such as providing discounts and, (iii) investigation of general social behaviour variations in the internet usage. The results shows that shifts of different magnitudes can occur in each goal. With the aim of different charting statistics such as Hoteling T2 and MEWMA, the proposed framework can be properly implemented as a monitoring scheme under different shift magnitudes. The results indicate that the MEWMA scheme can perform well in small shifts and has faster detection ability as compared to the Hoteling T2 scheme. Full article
Show Figures

Figure 1

31 pages, 372 KiB  
Article
New Machine-Learning Control Charts for Simultaneous Monitoring of Multivariate Normal Process Parameters with Detection and Identification
by Hamed Sabahno and Seyed Taghi Akhavan Niaki
Mathematics 2023, 11(16), 3566; https://0-doi-org.brum.beds.ac.uk/10.3390/math11163566 - 17 Aug 2023
Cited by 1 | Viewed by 1116
Abstract
Simultaneous monitoring of the process parameters in a multivariate normal process has caught researchers’ attention during the last two decades. However, only statistical control charts have been developed so far for this purpose. On the other hand, machine-learning (ML) techniques have rarely been [...] Read more.
Simultaneous monitoring of the process parameters in a multivariate normal process has caught researchers’ attention during the last two decades. However, only statistical control charts have been developed so far for this purpose. On the other hand, machine-learning (ML) techniques have rarely been developed to be used in control charts. In this paper, three ML control charts are proposed using the concepts of artificial neural networks, support vector machines, and random forests techniques. These ML techniques are trained to obtain linear outputs, and then based on the concepts of memory-less control charts, the process is classified into in-control or out-of-control states. Two different input scenarios and two different training methods are used for the proposed ML structures. In addition, two different process control scenarios are utilized. In one, the goal is only the detection of the out-of-control situation. In the other one, the identification of the responsible variable (s)/process parameter (s) for the out-of-control signal is also an aim (detection–identification). After developing the ML control charts for each scenario, we compare them to one another, as well as to the most recently developed statistical control charts. The results show significantly better performance of the proposed ML control charts against the traditional memory-less statistical control charts in most compared cases. Finally, an illustrative example is presented to show how the proposed scheme can be implemented in a healthcare process. Full article
22 pages, 1420 KiB  
Article
Process Capability and Performance Indices for Discrete Data
by Vasileios Alevizakos
Mathematics 2023, 11(16), 3457; https://0-doi-org.brum.beds.ac.uk/10.3390/math11163457 - 09 Aug 2023
Viewed by 953
Abstract
Process capability and performance indices (PCIs and PPIs) are used in industry to provide numerical measures for the capability and performance of several processes. The majority of the literature refers to PCIs and PPIs for continuous data. The aim of this paper is [...] Read more.
Process capability and performance indices (PCIs and PPIs) are used in industry to provide numerical measures for the capability and performance of several processes. The majority of the literature refers to PCIs and PPIs for continuous data. The aim of this paper is to compute the classical indices for discrete data following Poisson, binomial or negative binomial distribution using various transformation techniques. A simulation study under different situations of a process and comparisons with other existing PCIs for discrete data are also presented. The methodology of computing the indices is easy to use, and as a result, one can have an assessment of the process capability and performance without difficulty. Three examples are further provided to illustrate the application of the transformation techniques. Full article
Show Figures

Figure 1

17 pages, 1377 KiB  
Article
A Three-Stage Nonparametric Kernel-Based Time Series Model Based on Fuzzy Data
by Gholamreza Hesamian, Arne Johannssen and Nataliya Chukhrova
Mathematics 2023, 11(13), 2800; https://0-doi-org.brum.beds.ac.uk/10.3390/math11132800 - 21 Jun 2023
Cited by 2 | Viewed by 764
Abstract
In this paper, a nonlinear time series model is developed for the case when the underlying time series data are reported by LR fuzzy numbers. To this end, we present a three-stage nonparametric kernel-based estimation procedure for the center as well as [...] Read more.
In this paper, a nonlinear time series model is developed for the case when the underlying time series data are reported by LR fuzzy numbers. To this end, we present a three-stage nonparametric kernel-based estimation procedure for the center as well as the left and right spreads of the unknown nonlinear fuzzy smooth function. In each stage, the nonparametric Nadaraya–Watson estimator is used to evaluate the center and the spreads of the fuzzy smooth function. A hybrid algorithm is proposed to estimate the unknown optimal bandwidths and autoregressive order simultaneously. Various goodness-of-fit measures are utilized for performance assessment of the fuzzy nonlinear kernel-based time series model and for comparative analysis. The practical applicability and superiority of the novel approach in comparison with further fuzzy time series models are demonstrated via a simulation study and some real-life applications. Full article
Show Figures

Figure 1

23 pages, 734 KiB  
Article
Investigating the Relationship between Processor and Memory Reliability in Data Science: A Bivariate Model Approach
by Hanan Haj Ahmad, Ehab M. Almetwally and Dina A. Ramadan
Mathematics 2023, 11(9), 2142; https://0-doi-org.brum.beds.ac.uk/10.3390/math11092142 - 03 May 2023
Cited by 3 | Viewed by 1092
Abstract
Modeling the failure times of processors and memories in computers is crucial for ensuring the reliability and robustness of data science workflows. By understanding the failure characteristics of the hardware components, data scientists can develop strategies to mitigate the impact of failures on [...] Read more.
Modeling the failure times of processors and memories in computers is crucial for ensuring the reliability and robustness of data science workflows. By understanding the failure characteristics of the hardware components, data scientists can develop strategies to mitigate the impact of failures on their computations, and design systems that are more fault-tolerant and resilient. In particular, failure time modeling allows data scientists to predict the likelihood and frequency of hardware failures, which can help inform decisions about system design and resource allocation. In this paper, we aimed to model the failure times of processors and memories of computers; this was performed by formulating a new type of bivariate model using the copula function. The modified extended exponential distribution is the suggested lifetime of the experimental units. It was shown that the new bivariate model has many important properties, which are presented in this work. The inferential statistics for the distribution parameters were obtained under the assumption of a Type-II censored sampling scheme. Therefore, point and interval estimation were observed using the maximum likelihood and the Bayesian estimation methods. Additionally, bootstrap confidence intervals were calculated. Numerical analysis via the Markov Chain Monte Carlo method was performed. Finally, a real data example of processors and memories failure time was examined and the efficiency of the new bivariate distribution of fitting the data sample was observed by comparing it with other bivariate models. Full article
Show Figures

Figure 1

Back to TopTop