Machine Learning Approach for Care Improvement of Children and Youth with Type 1 Diabetes Treated with Hybrid Closed-Loop System

Campanella, Sara; Sabbatini, Luisiana; Cherubini, Valentino; Tiberi, Valentina; Marino, Monica; Pierleoni, Paola; Belli, Alberto; Boccolini, Giada; Palma, Lorenzo

doi:10.3390/electronics11142227

Open AccessArticle

Machine Learning Approach for Care Improvement of Children and Youth with Type 1 Diabetes Treated with Hybrid Closed-Loop System

¹

Department of Information Engineering (DII), Università Politecnica delle Marche, 60131 Ancona, Italy

²

Department of Women’s and Children’s Health, “G. Salesi” Hospital, 60123 Ancona, Italy

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(14), 2227; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics11142227

Submission received: 22 June 2022 / Revised: 12 July 2022 / Accepted: 14 July 2022 / Published: 16 July 2022

(This article belongs to the Special Issue Machine Learning in Electronic and Biomedical Engineering, Volume II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Type 1 diabetes is a disease affecting beta cells of the pancreas and it’s responsible for a decreased insulin secretion, leading to an increased blood glucose level. The traditional method for glucose treatment is based on finger-stick measurement of the blood glucose concentration and consequent manual insulin injection. Nowadays insulin pumps and continuous glucose monitoring systems are replacing them, being simpler and automatized. This paper focuses on analyzing and improving the knowledge about which Machine Learning algorithms can work best with glycaemic data and tries to find out the relation between insulin pump settings and glycaemic control. The dataset is composed of 90 days of recordings taken from 16 children and adolescents. Three Machine Learning approaches, two for classification, Logistic Regression (LR) and Random Forest (RL), and one for regression, Multivariate Linear Regression (MLR), have been used for the purpose. Specifically, the pump settings analysis was performed based on the Time In Range (TIR) computation and comparison consequent to pump setting changes. RF and MLR have shown the best results, while, for the settings’ analysis, the data show a discrete correlation between changes and TIRs. This study provides an interesting closer look at the data recorded by the insulin pump and a suitable starting point for a thorough and complete analysis of them.

Keywords:

Machine Learning; T1DM; insulin pump; continuous glucose monitoring; hybrid closed-loop system

1. Introduction

Autoimmune diseases are conditions in which the immune system mistakenly attacks the body. Usually, the immune system can distinguish foreign cells from self cells. With autoimmune diseases, the immune system can’t work properly and mistakes a part of the organism, such as organs or skin, for a foreign body. It releases proteins called auto-antibodies that attack healthy cells. Diabetes is one of the most widespread autoimmune diseases and it is defined as a metabolic problem related to impaired glucose regulation, caused by the damaged cells of the pancreas [1]. It can be classified as:

Type 1 diabetes (T1D), or juvenile, since affects mainly children and adolescents, identified by the insulin deficiency due to the pancreatic beta-cell loss [2].
Type 2 diabetes (T2D), caused by pancreatic beta-cell dysfunction and insulin resistance in target organs [3].

The population affected by diabetes is projected to increment by 25% in 2030 and 51% in 2045. [4] and it’s relevant to underline the incidence of this pathology among children: in a study proposed by Divers et al. [5] T1D and T2D frequency increased 1.4% and 7.1%, respectively, among U.S. youths in the last two decades.

The etiology factors that lead to T1D can be mainly found in autoantibodies, genetic and environmental factors [6], but some other drivers can be added to the list like gender, ethnicity, and geographical origin. It is interesting to evaluate that unlikely the most widespread autoimmune diseases that disproportionately affect females, on average, girls and boys are equally affected with T1D in young population [7]. Moreover, urbanization has tremendously affected trends in the incidence of diabetes (279.2 million in urban centers compared with 145.7 million in rural settings) [8].

For those who are affected by T1D, different therapies exist and they comprehend everyday injection of fast-acting insulin with meals joined with day-to-day basal insulin or constant subcutaneous insulin infusion by means of an insulin pump [9]. Nevertheless, a complete metabolic normalization is not possible yet, and, if diabetes is not well treated, can lead to heterogeneous complications such as ketoacidosis, hypoglycemia, micro and macrovascular diseases [10,11,12]. Among them, peripheral neuropathy is increasing its incidence among children and adolescents [13] within 5 years from the diagnosis.

Therefore, researchers are focused on the development of an Artificial Pancreas (AP) that can imitate the real one in all its functions. The challenges of the AP are gradually being addressed with the development of advanced algorithmic strategies [14] to improve the devices’ performance as well as the customization of the algorithms, in order to make these devices more accessible and easier to be used [15].

Traditionally, the primary method of blood glucose monitoring in diabetic patients has been self-monitoring of blood glucose (SMBG) [16], but nowadays the diabetic technology [17] is becoming more preponderant in the monitoring and treatment. Continuous Blood Monitoring (CGM) devices and insulin pumps represent the new frontier to be investigated and enhanced. The CGM is an innovative diabetes technology that has the capability to simplify daily glucose monitoring, avoid situations of mild and severe hypoglycemia as well as identify people with diabetes [18]. The union of the CGM sensor with an insulin pump has led to the development of algorithmically controlled pumps that suspend insulin delivery if hypoglycemia levels are predicted within the next 30 min, or provide additional insulin bolus to face the predicted hyperglycemia in the near future. These systems are called hybrid closed-loop systems, and treating T1D with these devices is fast becoming the standard of care [19].

Several hybrid closed-loop systems have been commercialized, gradually transforming the treatment of T1D in children and adults, and reflecting the rapid transition of this evolving technology from research to clinical practice [20]. The t:slim X2™ insulin pump system, launched in 2016 by Tandem™ Diabetes Care, is one of them. It is an innovative insulin pump technology, able to pair with the Dexcom G6^® CGM sensor [21]. The t:slim X2™ pump is embedded with the Control-IQ™ technology, a model predictive control (MPC) algorithm that is capable of predicting upcoming glucose levels (30 min in the future) based on CGM data, and automatically and consequently adjusts insulin doses, keeping the blood glucose levels inside the normal ranges [21]. They have been individuated during the international panel of physicians in 2019, expressing the limits of desired blood glucose levels, and are at the base of the time in ranges (TIR) metric [22]. It includes three key CGM measurements made of the percentage of readings and time per day spent in these ranges 15: within target glucose (TIR), inside the range 70–180 mg/dL, below target glucose range (TBR), below 70 mg/dL, and above target glucose range (TAR), over 180 mg/dL. Several studies have demonstrated the efficacy of AP in improving the TIR in adolescents and children [23,24] but, in order to obtain a more customized analysis and treatment, all these algorithms should be improved and new approaches have to be tested. In fact, sensors embedded with algorithms capable of merging information coming from the CGM and the insulin pump, and subsequently adjusting the treatment minute-by-minute, don’t exist.

Children and adolescents with type 1 diabetes who use non-automated insulin delivery strategies often fail to achieve target blood glucose levels in the real world. The introduction of an automated closed-loop insulin delivery system with algorithms that help minimize hypoglycemia and control hyperglycemia can motivate patients to strive to maximize blood sugar control [25].

Recently, Machine Learning (ML) has become popular with its growing applications, especially in diabetes research. The application of artificial intelligence and, more specifically, of ML in diabetes is feasible and desirable for the development of effective data processing and management tools and devices. These technologies can impact and improve the lives of diabetic patients, the work done by the healthcare professional, and the overall healthcare system. [26] The ML adds a new dimension of self-care for people with diabetes, introduce fast and reliable decision-making and flexible follow-up for healthcare providers, and optimize the resource utilization of the healthcare system. The usage of ML in diabetes care is huge and it varies from the predictions and analysis of glucose levels to the optimization of hypo and hyperglycemia situations [27], and for insulin pumps improvements [28]. For example, a study proposed by Seo et al. [29] a Machine Learning algorithm for predicting postprandial hypoglycemia has been suggested since it’s still a challenge due to extreme glucose fluctuations that occur around mealtimes. The authors went through four machine learning models with a unique data-driven feature set: a random forest, a support vector machine, a K-nearest neighbor, and a logistic regression. Noaro et al. [30] presented a machine learning model, based on multiple linear regression and least absolute shrinkage and selection operator, to improve the calculation of mealtime insulin boluses in T1D therapy using CGM data in UVa/Padova T1D simulator environment. In 2020, Askari and colleagues [31] proposed an adaptive and predictive control framework to incorporate disturbance prediction and pattern learning based on the subject’s historical data and subsequent forecasting. In the same year, Colmegna et al. [32] tested, in silico, a linear parameter-varying control law whose ultimate objective is reducing the user intervention to the minimum with a specific focus on moderate-intensity exercise. Adams and colleagues [33] conduct a comparative case study using quadratic discriminant analysis and support vector machine algorithms to classify blood sugar levels using data collected from wearable sensors in patients with type 1 diabetes.

Despite this, the majority of the research is focused on the prediction of glucose values or on the detection of risk factors and complications associated with diabetes. None of them try to figure out patterns in the relationship between the glucose oscillations and the insulin basal rate and, specifically, how continuous monitoring and insulin injection can affect it.

Therefore, the scope of this study is to further analyze and improve the knowledge about which ML algorithms can advance diabetes management and, subsequently, the time in range, in young populations, using data provided by the Tandem™’s t:slim X2™ coupled with the DexCom G6^® sensor.

2. Materials and Methods

This work’s primary objective is to optimize the choice of the ML algorithm for diabetes control, with the ultimate goal of improving the subjects’ TIR. Data coming from the hybrid closed-loop system was used in ML approaches to gain insightful information on the relationship between the measurements and the settings to make the pump’s optimization feasible.

Patients were required by the clinicians at Ancona’s pediatric Hospital “G. Salesi” to upload their DexCom G6^® sensor and Tandem™’s t:slim X2™ insulin pump data on the Diasend platform. The hospital provided us data coming from the website after having removed any personal information belonging to the subjects, except for their year of birth and sex. For all of them, the main interest was in blood glucose levels, insulin injections, carbohydrates intake, and pump settings. The blood glucose level was registered thanks to the CGM device while the carbohydrates, measured in grams, are uploaded by the patients themselves. Instead, the pump settings are decided by the clinicians and modified at each medical examination.

The dataset is composed of 16 subjects, 10 females and 6 males, all born in the 2000s. The most complete patients database was considered in order to test the quality of the proposed approach, since this is a preliminary study. For this reason, the sample size is so limited even though a larger dataset is desirable. The chosen working environment is Google’s Colaboratory, using Python and data covering 3 months (90 days) was used for the analyses.

This section is divided into three subsections, namely Pre-processing, Machine Learning Algorithms and Pump settings’ analysis, each dedicated to a specific portion of the work carried out.

2.1. Pre-Processing

To prepare the data for the analyses, glycemia and insulin information had to be comparable since the frequency at which blood glucose levels and insulin velocities are measured are different: CGM measures glycemia once every five minutes, while injection rate is measured less frequently. Moreover, insulin and carbohydrate data were not only scarcer, but they were also often recorded at completely different times compared to the glycaemic one. Therefore, to solve this issue and even all the data, the function merge_asof, from the Python’s library Pandas, was used. This function works on the columns specified in the “on” parameter and compares values: if there is no perfect match, it takes the previous one and merges them. It is also possible to choose a latency to be tolerated: since blood glucose levels are measured every 5 min by the DexCom G6^® sensor, the time delta for the “tolerance” parameter was set at 240 s. The void cells of the merged data were filled with zeros. There is a technical explanation behind this procedure: in general, no values recorded by the system mean that no injections have been programmed by the algorithm to avoid hypoglyceamia. Thus, since these new empty cells are created after the merge and in that specific time instant the algorithm hadn’t planned any injection, they were filled with zeros.

The final input dataset is a matrix composed of about 50,000 samples regarding the glycaemic and insulinic data and several features. About the feature extraction, firstly, only glycaemic data and carbohydrates were used, creating an input matrix with only 2 features. A second attempt was made adding glycaemic values shifted 5 and 10 min, thus increasing the number of features to 4. For the Logistic regression, matrices with both 2 and 4 features were used while for the Random Forest, after obtaining scarce results with 4 features, glycemic data shifted within 30 min were considered, increasing to 8 total features. Regarding the Multivariate Linear regression, both attempts with 4 and 8 features were made but only the former one had shown up to standard outcomes.

Features normalization between 0 and 1 has been tested for all the approaches reaching unsatisfying outcomes, hence it was discarded.

In Figure 1, the flow chart of the pre-processing steps is reported.

2.2. Machine Learning Algorithms

Logistic Regression (LR), Random Forest (RF), and Multivariate Linear Regression (MLR) were chosen as ML algorithms for our purpose. For all of them, basal insulin and glycaemic info were used but the manipulation was different according to the purposes of the algorithm, that are classification for the first two models, and regression for the last one.

The aforementioned glycemia-insulin rate relationship was studied through subsequent classification and regression steps. The reason behind the choice of this approach lies in the nature of the dataset: since the insulin rate often goes to zero to avoid hypoglycemic situations, the data is strongly unbalanced on zero.

For the classification problem, a model able to compute if Control-IQ™ decided to change the insulin rate or not, based on info coming from glycaemic values and carbohydrate intake, was required. Thus, LR and RF models were implemented as solutions to this task.

The LR was used for classification to predict a binary outcome, in this case 0 or 1, meaning no change in basal rate and change in basal velocity, respectively. The independent variables were the glycaemic values and carbohydrate intakes while the dependent one was the basal rate. Due to low performances, lagged glycaemic values were added (lag of 5 and 10 min), attempting to obtain better results. Despite this, no significant improvement has been observed.

The RF classification was used to improve the performance of the LR algorithm. This algorithm picks N random records from the dataset and builds a decision tree based on them, repeatedly for the chosen number of trees wanted by us (in this case, 200 trees). This means that each tree predicts the category the new record belongs to, and then the category that wins the majority vote is the predicted class given in the output. The RF algorithm has some interesting advantages: it is unbiased because there are multiple trees and each tree is trained on a subset of the data, and it is stable even if the data has missing values or is not well scaled.

Here, glycaemic values shifted within 30 min have been considered together with the carbohydrate intake as predictors since the 4 features matrix didn’t show agreeable results.

Since the results of this new approach were satisfactory, these outcomes have been used, as input, to the MLR. This choice was made to verify if there is a relationship between multiple variables, how strong it is, how accurately we can estimate the effect of each variable, how accurately we can predict the target, and if the relationship is linear. In particular, the connection between the change in insulin rate and blood glucose levels was investigated. An attempt both with 4 and 8 predictors has been made but, due to the low performances of the latter one, only the results obtained from the former have been reported here.

In Figure 2, it is represented the data analysis approach followed in this study.

Here the MLR equation is reported:

Y = β_{0} + β_{1} G l y c e m i a + β_{2} C a r b s + β_{3} G l y c_{t m i n u s 1} + β_{4} G l y c_{t m i n u s 2}

(1)

where

G l y c e m i a

indicates the glycaemic recorded values,

C a r b s

indicated the carbohydrates recorded intake,

G l y c_{t m i n u s 1}

indicates the previous (5 min before) recorded values,

G l y c_{t m i n u s 2}

indicates the second previous (10 min before) recorded values, and

β_{i}

are the regression coefficients to be fitted.

The metrics used for the evaluation of the two classification models are Accuracy, Recall (

R e c

), Precision (

P r e c

) and F1-value (

F 1

) while for the regression one

R^{2}

, F-

s t a t

and p-

v a l u e

are used. The accuracy score is how often we can expect our model to correctly predict an outcome out of the total number of times it made predictions and it is defined as the ratio of true positives and true negatives to all positive and negative observations. The

R e c

value is a measure of how many positive cases are correctly predicted by the classifier, among all the positive case; the

P r e c

value is a measure of how many of the positive predictions made are correct, while

F 1

is a measure combining both precision and recall. It is generally described as the harmonic mean of the two. The

R^{2}

value is a statistical measure that represents the proportion of the variance for a dependent variable that is explained by the independent variables considered in a regression model, while F-

s t a t

is the ratio of variances, calculated from a sample of data and used to provide information about a whole dataset. It must be used in combination with the p-

value

while deciding if your overall results are significant or not. So, the p-

value

for each coefficient tests the null hypothesis that the regression coefficient is equal to zero: a low p-

value

indicates that the null hypothesis can be rejected meaning that is a significant addition to the model because changes in the predictor’s value are related to those in the response variable.

Here the formulas of the used metrics are reported:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(2)

P r e c = \frac{T P}{T P + F P}

(3)

R e c = \frac{T P}{T P + F N}

(4)

F 1 = \frac{2 * P r e c * R e c}{P r e c + R e c} = \frac{2 * T P}{2 * T P + F P + F N}

(5)

R^{2} = 1 - \frac{sum squared regression}{total sum of squares} = 1 - \frac{\sum {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum {(y_{i} - {\bar{y}}_{i})}^{2}}

(6)

p - v a l u e = 1 - \frac{\hat{p} - p_{0}}{\sqrt{\frac{p_{0} (1 - p_{0})}{n}}}

(7)

F - s t a t = \frac{variance of 1 dataset}{variance of 2 dataset} = \frac{σ_{1}^{2}}{σ_{2}^{2}}

(8)

2.3. Insulin Pump’s Analysis

Data considered for the pump’s settings analysis, which is the second task addressed by this work, are basal rate (BR) and Insulin-to-Carbohydrate Ratio (ICR). The insulin Sensitivity Factor couldn’t be taken into consideration due to the lack of data. Eight subjects satisfied the requirements for this meticulous analysis, due to the consistency of their basal data, the content of their settings’ changes, if they were using Control-IQ or not and the availability of data. The overall changes of the parameters created different date segments regarding BR and/or ICR changes for each subject. Based on both BR and ICR changes, the subjects’ meal TIRs (lunch: 11:30–15:00 and dinner 18:30–23:00) were computed for each of their different date intervals. This was done to analyze how the combination of changes of the parameters perform, meaning how well they control blood glucose levels keeping them inside the guidelines’ thresholds [22]. The choice of these mealtime intervals was based on the metabolic regulation of food intake both in healthy and diabetic individuals [34,35,36].

3. Results

For each patient, an evaluation of the previously mentioned models is made. The LR did not perform well enough even after the normalization of the data and the consideration of glucose values in the recent past, up to 10 min before. Therefore, an RF model was implemented to look for better results. The following performance metrics were computed to rate the quality of this approach and all the results are reported in Table 1. It shows the

R e c

,

P r e c

, and

F 1

computed for each label alongside the Accuracy score for both the classification algorithms. The outcomes demonstrate the advantage of using RF as a classification method compared to the LR. For the RF algorithm, only the results obtained considering from 5 to 30 min lagged glucose values are reported, since the outcomes obtained from the 4 features matrix weren’t adequate while for the LR only results coming from the 4 features matrix are listed.

Then, the outcome of the RF, meaning the array of predicted labels were used to preprocess the data given as input to the MLR. In Table 2 the metrics for the MLR for each patient have been elaborated, showing the

R^{2}

, the p-value, F-stat. Since the main goal of this approach was to understand the connection between the change in insulin rate and glycemia, this can also be done through the analysis of

R^{2}

values and the coefficients’ p-value for each subject. In general, the MLR demonstrates to be a reliable method for the purpose of our work since

R^{2}

and p-value indicate a good fit of the regression model and that between our independent variables and the response variable there is a statistically significant relationship, respectively.

Additionally, the residual plots Figure 3 of those subjects that have a particular and meaningful pattern are reported, displaying a cloud of points around 0 with constant variance and no trend.

For the pump settings’ analysis, results are reported in Figure 4. The top panel represents TIR computed for each patient for all the periods under study to represent the total glycaemic control for then making generic considerations. The middle and bottom columns represent TIRs computed for each patient in the specific period, related to the insulin pump’s changes, during lunch and dinner time, respectively. Some subjects only had insightful data changes for lunchtime, so their dinner TIRs will not be reported. In general, all the subjects demonstrate strict adherence to treatment and spent the majority of the time in the suggested time in the range >70%. Moreover, changes of TIRs during mealtime prove that pump settings’ changes are fully connected with glucose oscillations, and thus, the time in range.

4. Discussion

The first step of this study was to solve the classification problem: both LR and RF were employed to individuate which one was able to compute if Control-IQ™ decided to modify the insulin rate or not. The results obtained by the two approaches are totally dissimilar: LR didn’t provide accettable results and didn’t improve when the features extraction of data was changed. The main issue must probably be traced to the nature of the dataset: while the glycaemic values are consistent, basal insulin data is scarcer. This leads to a very imbalanced dataset, which is also rather inhomogeneous. As already hinted, the RF approach was proposed as a second classification method to investigate if the results would be more satisfactory. The RF algorithm improved the results when using the glycaemic data with the time shift of blood glucose levels within 30 min. It’s interesting to compare them through the results visible in Table 1, analysing

R e c

,

P r e c

and

F 1

. In some patients, as M2711 and F4503, the

P r e c

and

R e c

are both 0, and, therefore,

F 1

cannot be calculated and it is put equal to zero. This means that the classifier cannot predict any correct positive result, being useless for the original purpose. On the contrary, RF’s

P r e c

and

R e c

, for the same two subjects, report higher values underlying that the classifier is capable of correctly classifying the dependent variable. In some other cases, like F1408 and M4003, average values of all the three parameters are shown in the columns of LR results, demonstrating a discrete prediction behavior. The same trend is notable in

R e c

,

P r e c

, and

F 1

computed for the RF but they show a better outcome, stating the higher reliability of the model with respect to the LR. Of course, similar considerations can be made for all the subjects that show a similar trend in the results. The advantage of using RF compared to LR are also visible in Table 1, where the Accuracy score are reported. If we analyze them together with the other metric outcomes, it can be pointed out that for the LR the accuracy score is not so trustworthy: for example, for subject M3309, the accuracy score is quite high even though the

R e c

,

P r e c

and

F 1

demonstrate that the model is not sufficiently capable to predict 1 s. This phenomenon is only due to the highly imbalanced data: the probability of having to predict a 1 is so low, that even if the algorithm only predicts 0, it is still accurate. This means that the Accuracy score, in this case, is not a reliable metric. On the contrary, the accuracy scores for the RF are generally a lot more valid and trustworthy and more homogeneous due to the reliable results obtained for

R e c

,

P r e c

, and

F 1

.

Our results are comparable to the ones found in literature: for example, the research by Seo et al. [29] used the RF model, which showed the best predictive performance for postprandial hypoglycemic events compared to other models as the LR, Support Vector Machine, and K–nearest neighbor approaches, obtaining great outcomes in sensitivity, specificity, and F1 score. Moreover, in a work proposed by Dave et al. [37], where LR and RF were used for prediction purposes, the superiority of the RF approach is stated. Both studies confirm the stability and strength of the RF method when working with glycaemic and insulin data.

An additional step proposed by [37] is to identify in advance which parameter can mostly influence the classification algorithms to obtain more accurate results. This is a necessary procedure to consider and implement in the future: knowing which parameters directly impact the basal rate allows us to develop a more sensitive algorithm that can positively affect the glycaemic control, even though such approaches cannot grasp the synergistic influences of parameters on the dependent variable.

Since the main goal of the MLR approach was to understand the connection between the change in insulin rate and glycemia, the results analysis can be done through the evaluation of

R^{2}

values and the coefficients’ p-values, visible in Table 2. The p-values were computed for all the variables used in the model, such as glycaemic data, carbohydrates intake and glycaemic values shifted of 5 ad 10 min, identifying four coefficients (

β_{0}

,

β_{1}

,

β_{2}

,

β_{3}

and

β_{4}

). Only p-values different for zero, were reported. If the coefficient is specified, this means that it was meaningful for the data interpretation. Otherwise, it means that no differences in p-values were detected.

The

R^{2}

is used to explain the strength of the relationship between the model and the dependent variable. In general, a higher

R^{2}

value means that the model fits the data better compared to lower values. But when we compare

R^{2}

and p-values, the model meaning changes. In fact, p-values are used to explain the mathematical relationship between each independent variable and the dependent variable. If low

R^{2}

and low p-values are obtained, it means that the model doesn’t explain much variation of the data, but the independent variables are, at least, significant; on the contrary, if both values are high it means that the model explains a lot of the variation of the data but the independent variables are not consistent. In other cases, two different situations can occur: Low

R^{2}

with high p-values (≤0.05) or high

R^{2}

with low p-values. The former one means that the model doesn’t describe a large part of the information variety and the independent features are not significant; the latter one implies that the model explains a lot of variation within the data and the variables are significant, and this is the desirable outcome when modeling. In this case,

R^{2}

for subjects F0110, F1208, F4503, and M3208 are the only ones that can be considered moderate (not so high), while the others are all lower. The latter also have very high F-stat values: probably, since their data is very imbalanced, this regression cannot efficiently find a relationship between their independent and dependent variables.

Nevertheless, as shown in Table 2, the overall p-values are 0.000 or <0.050. The only exceptions can be seen for subjects F4609, M2008, and M3309 even if these last p-values are lower. The p-value for each coefficient tests the null hypothesis that if the coefficient is equal to zero, it has no effect on the relationship. A low p-value means that one can reject the said hypothesis and the predictor that coefficient is linked to is likely to be meaningful inside the model as its changes are related to changes in the response variable. On the contrary, a larger p-value suggests that that predictor is statistically insignificant. So, this means that for the aforementioned subjects the

β_{2}

and

β_{3}

coefficients, associated with the previous glycaemic values are probably not significant for the model.

Regarding the residuals’ plots, ideally, residual values should be equally and randomly spaced around the horizontal axis. The analysis of the plots is done retrospectively and, when it has a positive outcome, it demonstrates that the model is a good fit for the data. The residuals’ plots, for the majority of the subjects like F0110 and M4409, visible in Figure 3, respectively, show residuals that are randomly distributed, proving the algorithm quality. Conversely, subjects F0207 and F2010, show residuals that are not independent and randomly distributed. In fact, subject F0207 shows a decreasing trend, while subject F2810 shows an increasing trend.

All these results show that the MLR is a good regression model, capable of working well with glycaemic and insulin data. Of course, some improvements should be taken into consideration like increasing the number of features and using a solid and broader dataset.

For what concerns pump setting analysis, the main improvement in TIRs during meals was obtained by changes in BR and ICR, significantly increasing them. In fact, a general improvement of TIRs has been seen, underling that there is a strict correlation between settings adjustment and TIR improvements. Looking at the graphs in Figure 4, TIRs changes are reported divided into three separate panels to visualize the pump settings’ influence in a more suitable way.

Firstly, generic consideration of glycaemic control has to be done. From Figure 4a, the time in range reports a higher percentage with respect to the target one (over 70%), exception made for two patients who report values slightly lower if compared with the others. From a closer look into the data coming from mealtime, it can be appreciated the increase in TIR for almost all the subjects.

For patient F4609, dinner TIRs continued to improve throughout the 91-day span of analysis, while the ones computed for lunchtime improved at first but then worsened. That probably indicates that the changes done around the second half of the day were more efficient than the ones done in the middle of the day, probably due to changes in time slots.

Other significant examples are F0207 and M4003: for the first patient, lunch, and dinner TIRs continued to improve in the analyzed period. In this case, ICR never changes, meaning that this subject is strongly and positively influenced by the BR. For subject M4003, lunch TIR slightly improves while the dinner one worsens. In this case, ICR does not change, so the results come from BR changes realized through raises.

It is important to highlight how many external factors, such as seasonality, health condition, and psychological factors, should be considered when concluding oscillations in achieved performances. Hence, it is highly recommended for patients to keep a personal diary to annotate all this valuable information that can help clinicians adequately modify treatment.

5. Conclusions

The main objective of this work was to mine data that could lead toward the advancement of the decision on which ML algorithm fits better glycaemic data and, thus, it is a useful tool to improve the TIR.

As could be expected, ML approaches showed how algorithms that are oriented to imbalanced data work better on this kind of dataset. They also highlighted how the change of predictors does not always have the same outcome: for example, while the addition of past glycaemic values was good for LR e MRL, it did not have the same results on the RF. So, it would be interesting to test the overall variability of closely recorded blood glucose levels for the subjects and compare that to these results. Regarding the pump settings’ analysis, it is essential and crucial to work on more data belonging to more subjects. The analysis of the relationship between the pump changes and TIRs values showed a strong connection between them, so, represents the first steps toward understanding this new system have been taken. Thus, it is crucial to use more data belonging to a longer time period and cluster the dataset to analyze if there are some characteristics that can affect the overall glycaemic control.

In future works, we will evaluate the inclusion of some additional parameters to create a more complex and more specific features matrix with the aim of improving the performance achieved and leading to the determination of new metrics for the assessment of glycemic control, suitable for being used in the embedded sensor as setting parameters. Moreover, patients and their families must be educated on the upload of their data and assured of how their effort could lead to a sharpened and straightforward glycaemic control.

Despite the above open issues, this work presents a first attempt to create a fully performative algorithm that could be embedded into a single sensor capable of automatically modifying and adjusting pump settings, based on the glycaemic values, and, thus, delivering the right insulin dose.

Research should be further developed, trying to obtain a fully closed-loop system capable of realistically mimicking the healthy glucose metabolism. The development of such a system would be life-changing for all individuals affected by T1D, but especially for children and adolescents.

Author Contributions

Conceptualization, L.P. and V.C.; methodology, L.S., L.P. and V.T.; software, S.C.; validation, P.P., A.B., M.M. and G.B.; formal analysis, M.M. and G.B.; investigation, L.S., S.C. and L.P.; data curation, V.T. and S.C.; writing—original draft preparation, S.C.; writing—review and editing, L.P., L.S. and V.C.; visualization, A.B., P.P. and L.S.; supervision, L.P. and V.C.; project administration, L.P., V.C. and P.P.; funding acquisition, P.P. and L.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethics review and approval were waived for this study because the retrospective analysis of the recorded data was conducted using completely anonymous data. The experimental study did not involve any invasive or medical procedures and introduced no lifestyle changes. All subjects gave their informed consent prior to the collection and acquisition of the data, which was carried out in compliance with the ethical principles of the Helsinki Declaration.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available from the corresponding author upon reasonable request.

Acknowledgments

This work is supported by Marche Region in the implementation of the financial program POR MARCHE FESR 2014-2020, project “Miracle” (Marche Innovation and Research fAcilities for Connected and sustainable Living Environments), CUP B28I19000330007.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kerner, W.; Brückel, J. Definition, classification and diagnosis of diabetes mellitus. Exp. Clin. Endocrinol. Diabetes 2014, 122, 384–386. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Katsarou, A.; Gudbjörnsdottir, S.; Rawshani, A.; Dabelea, D.; Bonifacio, E.; Anderson, B.J.; Jacobsen, L.M.; Schatz, D.A.; Lernmark, Å. Type 1 diabetes mellitus. Nat. Rev. Dis. Prim. 2017, 3, 1–17. [Google Scholar] [CrossRef] [PubMed]
Chatterjee, S.; Khunti, K.; Davies, M.J. Type 2 diabetes. Lancet 2017, 389, 2239–2251. [Google Scholar] [CrossRef]
Saeedi, P.; Petersohn, I.; Salpea, P.; Malanda, B.; Karuranga, S.; Unwin, N.; Colagiuri, S.; Guariguata, L.; Motala, A.A.; Ogurtsova, K.; et al. Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas. Diabetes Res. Clin. Pract. 2019, 157, 107843. [Google Scholar] [CrossRef] [Green Version]
Divers, J.; Mayer-Davis, E.J.; Lawrence, J.M.; Isom, S.; Dabelea, D.; Dolan, L.; Imperatore, G.; Marcovina, S.; Pettitt, D.J.; Pihoker, C.; et al. Trends in incidence of type 1 and type 2 diabetes among youths—Selected counties and Indian reservations, United States, 2002–2015. Morb. Mortal. Wkly. Rep. 2020, 69, 161. [Google Scholar] [CrossRef]
Dariya, B.; Chalikonda, G.; Srivani, G.; Alam, A.; Nagaraju, G.P. Pathophysiology, Etiology, Epidemiology of Type 1 Diabetes and Computational Approaches for Immune Targets and Therapy. Crit. Rev. Immunol. 2019, 39. [Google Scholar] [CrossRef]
Soltesz, G.; Patterson, C.; Dahlquist, G.; Group, E.S. Worldwide childhood type 1 diabetes incidence–what can we learn from epidemiology? Pediatr. Diabetes 2007, 8, 6–14. [Google Scholar] [CrossRef]
Lovic, D.; Piperidou, A.; Zografou, I.; Grassos, H.; Pittaras, A.; Manolis, A. The growing epidemic of diabetes mellitus. Curr. Vasc. Pharmacol. 2020, 18, 104–109. [Google Scholar] [CrossRef]
Maahs, D.M.; West, N.A.; Lawrence, J.M.; Mayer-Davis, E.J. Epidemiology of type 1 diabetes. Endocrinol. Metab. Clin. 2010, 39, 481–497. [Google Scholar] [CrossRef] [Green Version]
DiMeglio, L.A.; Evans-Molina, C.; Oram, R.A. Type 1 diabetes. Lancet 2018, 391, 2449–2462. [Google Scholar] [CrossRef]
Cameron, F.J.; Wherrett, D.K. Care of diabetes in children and adolescents: Controversies, changes, and consensus. Lancet 2015, 385, 2096–2106. [Google Scholar] [CrossRef]
Jespersen, L.N.; Vested, M.H.; Johansen, L.B.; Grabowski, D. Mirroring Life of Adolescents with Type 1 Diabetes—An Outline of Key Aspects. Diabetology 2021, 2, 13. [Google Scholar] [CrossRef]
Walter-Höliner, I.; Barbarini, D.S.; Lütschg, J.; Blassnig-Ezeh, A.; Zanier, U.; Saely, C.H.; Simma, B. High prevalence and incidence of diabetic peripheral neuropathy in children and adolescents with type 1 diabetes mellitus: Results from a five-year prospective cohort study. Pediatr. Neurol. 2018, 80, 51–60. [Google Scholar] [CrossRef] [PubMed]
Daskalaki, E.; Diem, P.; Mougiakakou, S.G. Model-free machine learning in biomedicine: Feasibility study in type 1 diabetes. PLoS ONE 2016, 11, e0158722. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Keith-Hynes, P.; Mize, B.; Robert, A.; Place, J. The diabetes assistant: A smartphone-based system for real-time control of blood glucose. Electronics 2014, 3, 609–623. [Google Scholar] [CrossRef] [Green Version]
Patton, S.R. Adherence to glycemic monitoring in diabetes. J. Diabetes Sci. Technol. 2015, 9, 668–675. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Eiland, L.; Thangavelu, T.; Drincic, A. Has technology improved diabetes management in relation to age, gender, and ethnicity? Curr. Diabetes Rep. 2019, 19, 1–9. [Google Scholar] [CrossRef]
Hermanns, N.; Heinemann, L.; Freckmann, G.; Waldenmaier, D.; Ehrmann, D. Impact of CGM on the management of hypoglycemia problems: Overview and secondary analysis of the HypoDE study. J. Diabetes Sci. Technol. 2019, 13, 636–644. [Google Scholar] [CrossRef]
Marks, B.E.; Wolfsdorf, J.I. Monitoring of pediatric type 1 diabetes. Front. Endocrinol. 2020, 11, 128. [Google Scholar] [CrossRef]
Boughton, C.; Hovorka, R. Is an artificial pancreas (closed-loop system) for Type 1 diabetes effective? Diabet. Med. 2019, 36, 279–286. [Google Scholar] [CrossRef]
Berget, C.; Lange, S.; Messer, L.; Forlenza, G.P. A clinical review of the t: Slim X2 insulin pump. Expert Opin. Drug Deliv. 2020, 17, 1675–1687. [Google Scholar] [CrossRef] [PubMed]
Battelino, T.; Danne, T.; Bergenstal, R.M.; Amiel, S.A.; Beck, R.; Biester, T.; Bosi, E.; Buckingham, B.A.; Cefalu, W.T.; Close, K.L.; et al. Clinical targets for continuous glucose monitoring data interpretation: Recommendations from the international consensus on time in range. Diabetes Care 2019, 42, 1593–1603. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Brown, S.A.; Kovatchev, B.P.; Raghinaru, D.; Lum, J.W.; Buckingham, B.A.; Kudva, Y.C.; Laffel, L.M.; Levy, C.J.; Pinsker, J.E.; Wadwa, R.P.; et al. Six-month randomized, multicenter trial of closed-loop control in type 1 diabetes. N. Engl. J. Med. 2019, 381, 1707–1717. [Google Scholar] [CrossRef]
Breton, M.D.; Kovatchev, B.P. One year real-world use of the control-IQ advanced hybrid closed-loop technology. Diabetes Technol. Ther. 2021, 23, 601–608. [Google Scholar] [CrossRef] [PubMed]
Cherubini, V.; Rabbone, I.; Berioli, M.G.; Giorda, S.; Lo Presti, D.; Maltoni, G.; Mameli, C.; Marigliano, M.; Marino, M.; Minuto, N.; et al. Effectiveness of a closed-loop control system and a virtual educational camp for children and adolescents with type 1 diabetes: A prospective, multicentre, real-life study. Diabetes Obes. Metab. 2021, 23, 2484–2491. [Google Scholar] [CrossRef]
Ellahham, S. Artificial Intelligence: The Future for Diabetes Care. Am. J. Med. 2020, 133, 895–900. [Google Scholar] [CrossRef] [PubMed]
Borle, N.C.; Ryan, E.A.; Greiner, R. The challenge of predicting blood glucose concentration changes in patients with type I diabetes. Health Inform. J. 2021, 27, 1460458220977584. [Google Scholar] [CrossRef]
Daniels, J.; Herrero, P.; Georgiou, P. A Deep Learning Framework for Automatic Meal Detection and Estimation in Artificial Pancreas Systems. Sensors 2022, 22, 466. [Google Scholar] [CrossRef]
Seo, W.; Lee, Y.B.; Lee, S.; Jin, S.M.; Park, S.M. A machine-learning approach to predict postprandial hypoglycemia. BMC Med. Inform. Decis. Mak. 2019, 19, 1–13. [Google Scholar] [CrossRef] [Green Version]
Noaro, G.; Cappon, G.; Vettoretti, M.; Sparacino, G.; Del Favero, S.; Facchinetti, A. Machine-learning based model to improve insulin bolus calculation in type 1 diabetes therapy. IEEE Trans. Biomed. Eng. 2020, 68, 247–255. [Google Scholar] [CrossRef]
Askari, M.R.; Hajizadeh, I.; Rashid, M.; Hobbs, N.; Zavala, V.M.; Cinar, A. Adaptive-learning model predictive control for complex physiological systems: Automated insulin delivery in diabetes. Annu. Rev. Control 2020, 50, 1–12. [Google Scholar] [CrossRef]
Colmegna, P.; Bianchi, F.D.; Sánchez-Peña, R. Automatic glucose control during meals and exercise in type 1 diabetes: Proof-of-concept in silico tests using a switched LPV approach. IEEE Control Syst. Lett. 2020, 5, 1489–1494. [Google Scholar] [CrossRef]
Adams, D.; Nsugbe, E. Predictive Glucose Monitoring for People with Diabetes Using Wearable Sensors. Eng. Proc. 2021, 10, 20. [Google Scholar]
York, D.A. Metabolic regulation of food intake. Metab. Regul. Food Intake. 1990, 34, 33–39. [Google Scholar] [CrossRef] [PubMed]
Wen, S.; Wang, C.; Gong, M.; Zhou, L. An overview of energy and metabolic regulation. Sci. China Life Sci. 2019, 62, 771–790. [Google Scholar] [CrossRef] [PubMed]
Unwin, D.; Livesey, G.; Haslam, D. It is the glycaemic response to, not the carbohydrate content of food that matters in diabetes and obesity: The glycaemic index revisited. J. Insul. Resist. 2016, 1, 1–9. [Google Scholar] [CrossRef] [Green Version]
Dave, D.; DeSalvo, D.J.; Haridas, B.; McKay, S.; Shenoy, A.; Koh, C.J.; Lawley, M.; Erraguntla, M. Feature-based machine learning model for real-time hypoglycemia prediction. J. Diabetes Sci. Technol. 2021, 15, 842–855. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Pre-processing steps for the Machine Learning algorithms.

Figure 2. Flowchart of the modelling approach.

Figure 3. (a) Plot of the residuals resulting from the multivariate linear regression for subject F0110. (b) Plot of the residuals resulting from the multivariate linear regression for subject F0207.(c) Plot of the residuals resulting from the multivariate linear regression for subject F2810. (d) Plot of the residuals resulting from the multivariate linear regression for subject F4609.

Figure 4. Trends of TIR for each change in the insulin pump and the full period under study for each patient. (a) Tir computed for all the 91 days. (b) TIR computed in different period during lunch time. (c) TIR computed in different period during dinner time.

Table 1. Logistic Regression and Random Forest Performance metrics.

Patient	Labels	Logistic Regression				Random Forest
Patient	Labels	Accuracy (%)	$Prec$	$Rec$	$F 1$	Accuracy (%)	$Prec$	$Prec$	$F 1$
F0110	0	59	0.59	0.62	0.61	70	0.70	0.76	0.72
F0110	1	59	0.58	0.55	0.57	70	0.72	0.66	0.69
F0207	0	54	0.55	0.96	0.70	67	0.68	0.77	0.72
F0207	1	54	0.21	0.01	0.02	67	0.66	0.56	0.61
F1208	0	55	0.56	0.78	0.65	71	0.72	0.74	0.73
F1208	1	55	0.54	0.29	0.38	71	0.69	0.68	0.69
F1408	0	65	0.66	0.57	0.61	74	0.71	0.79	0.75
F1408	1	65	0.64	0.72	0.68	74	0.78	0.70	0.74
F2810	0	63	0.64	0.93	0.76	74	0.78	0.82	0.80
F2810	1	63	0.59	0.16	0.26	74	0.69	0.64	0.66
F2910	0	53	0.56	0.76	0.64	70	0.71	0.77	0.74
F2910	1	53	0.47	0.26	0.33	70	0.69	0.63	0.65
F4301	0	57	0.58	1.00	0.73	70	0.72	0.79	0.75
F4301	1	57	0.00	0.00	0.00	70	0.67	0.58	0.62
F4503	0	64	0.63	1.00	0.78	81	0.84	0.88	0.86
F4503	1	64	0.00	0.00	0.00	81	0.78	0.70	0.74
F4609	0	55	0.62	0.06	0.11	68	0.66	0.62	0.64
F4609	1	55	0.55	0.97	0.71	68	0.70	0.73	0.72
F4714	0	67	0.69	1.00	0.81	72	0.77	0.85	0.81
F4714	1	67	0.00	0.00	0.00	72	0.58	0.45	0.51
M2008	0	52	0.53	0.91	0.67	62	0.64	0.65	0.64
M2008	1	52	0.51	0.11	0.18	62	0.61	0.59	0.60
M2711	0	59	0.59	1.00	0.75	71	0.74	0.80	0.77
M2711	1	59	0.00	0.00	0.00	71	0.67	0.60	0.63
M3208	0	61	0.65	0.58	0.61	71	0.72	0.74	0.73
M3208	1	61	0.59	0.65	0.62	71	0.70	0.68	0.69
M3309	0	67	0.68	1.00	0.81	71	0.65	0.64	0.64
M3309	1	67	0.00	0.00	0.00	71	0.70	0.71	0.71
M4003	0	54	0.51	0.27	0.36	67	0.51	0.27	0.36
M4003	1	54	0.57	0.78	0.66	67	0.57	0.78	0.66
M4409	0	68	0.69	1.00	0.81	77	0.81	0.89	0.85
M4409	1	68	0.00	0.00	0.00	77	0.68	0.53	0.60

Table 2. This table displays the

R^{2}

, F-

s t a t

and p-values of the multivariate linear regression for each subject.

Table 2. This table displays the

R^{2}

, F-

s t a t

and p-values of the multivariate linear regression for each subject.

Patient	$R^{2}$	F-Stat	p-Values
F0110	0.515	3867	0.000
F0207	0.462	2942	0.000
F1208	0.515	3711	0.000
F1408	0.477	3543	0.000
F2810	0.463	2515	0.000
F2910	0.342	1786	0.000
F4301	0.436	4271	$β_{2}$ = 0.002
F4503	0.568	3604	0.000
F4609	0.449	6655	$β_{3}$ = 0.188
F4714	0.343	1276	0.000
M2008	0.333	2978	$β_{3}$ = 0.288
M2711	0.407	3367	$β_{2}$ = 0.025
M3208	0.515	6030	0.000
M3309	0.296	1651	$β_{3}$ = 0.053
M4003	0.448	3238	0.000
M4409	0.304	1045	0.000

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Campanella, S.; Sabbatini, L.; Cherubini, V.; Tiberi, V.; Marino, M.; Pierleoni, P.; Belli, A.; Boccolini, G.; Palma, L. Machine Learning Approach for Care Improvement of Children and Youth with Type 1 Diabetes Treated with Hybrid Closed-Loop System. Electronics 2022, 11, 2227. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics11142227

AMA Style

Campanella S, Sabbatini L, Cherubini V, Tiberi V, Marino M, Pierleoni P, Belli A, Boccolini G, Palma L. Machine Learning Approach for Care Improvement of Children and Youth with Type 1 Diabetes Treated with Hybrid Closed-Loop System. Electronics. 2022; 11(14):2227. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics11142227

Chicago/Turabian Style

Campanella, Sara, Luisiana Sabbatini, Valentino Cherubini, Valentina Tiberi, Monica Marino, Paola Pierleoni, Alberto Belli, Giada Boccolini, and Lorenzo Palma. 2022. "Machine Learning Approach for Care Improvement of Children and Youth with Type 1 Diabetes Treated with Hybrid Closed-Loop System" Electronics 11, no. 14: 2227. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics11142227

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Approach for Care Improvement of Children and Youth with Type 1 Diabetes Treated with Hybrid Closed-Loop System

Abstract

1. Introduction

2. Materials and Methods

2.1. Pre-Processing

2.2. Machine Learning Algorithms

2.3. Insulin Pump’s Analysis

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI