1. Introduction
Ultrasound medicine includes ultrasound diagnostics, ultrasound therapy, and biomedical ultrasound engineering. Therefore, ultrasound medicine has the characteristics of combining medicine, science, and engineering. It involves a wide range of contents and is of high value in the prevention, diagnosis, and treatment of diseases. Moreover, for doctors and patients, ultrasound technology is easy to operate, non-invasive, reusable, and relatively inexpensive, and it can clearly observe the chest, abdomen, blood vessels, muscles, etc. Therefore, it is widely used in the diagnosis of major diseases and the analysis of human movement functions [
1].
To the human body, the muscle is similar to a second heart. The state of the muscle directly reflects whether the body has any diseases [
2,
3]. For example, a number of studies have shown that three changes in muscles after a stroke are reduced muscle mass, reduced fiber length, and reduced pennation angle [
4]. The signal related to muscle structure changes detected by ultrasound is named Ultrasound Myograph (SMG) [
5]. It uses an ultrasound instrument to record ultrasound images generated during muscle contraction to dynamically obtain changes in muscle structure parameters in real time. This method greatly reduces the measurement interference caused by adjacent muscles, and a more comprehensive exploration of the deep muscles of the human body can be conducted. Ultrasound images help extract muscle parameters and help in quantifying the study of pathological conditions such as muscle atrophy, aging, and sclerosis [
6,
7,
8,
9]. Therefore, obtaining muscle parameters accurately from ultrasound images is an important basis for judging whether muscles are healthy and whether the human body is healthy. The difficulty in obtaining information from ultrasound images with large background noise has become a problem that needs to be solved. Previously, muscle parameters were manually labeled and extracted. This method is time consuming and labor intensive, and it requires professional operations. Therefore, with the rapid development of computer software and hardware in recent years, many methods for automatically measuring pennation angle and other parameters based on ultrasound images have been proposed.
In 2008, the Hough transform has been improved through the repeated voting strategy to detect the straight line in the ultrasound image, and then it was used to estimate the direction of the straight line in the ultrasound image [
10]. Based on this method, an automatic linear extraction method combining local Radon transform and revoting strategy is proposed [
11], which is used for the detection and tracking of muscle fibers in ultrasound images. Based on multi-resolution analysis and line feature extraction, a Gabor wavelet convolution kernel is designed for the linear structure of gastrocnemius muscle. The automatic detection result of this method is close to the result of manual annotation [
12]. A new algorithm for automatically estimating the direction of muscle fibers in musculoskeletal images is proposed. This method is based on a new adaptive fading Bayesian Kalman filter (AF-BKF) and automatic region of interest (ROI) extraction method, which is used for patient diagnosis and rehabilitation evaluation [
13]. Due to the high background noise of ultrasound images, it is more difficult to distinguish muscle tracts from the background in muscle ultrasound images. Therefore, the extraction of muscle tracts based on image processing has greater limitations.
With the rapid development of deep learning, in addition to the method of judging the direction based on the characteristics of muscle fiber structure, deep learning has been widely used in the analysis of medical images [
14,
15]. In 2015, U-net [
16] has been proposed. Its U-shaped structure combines the information of the bottom and high levels to achieve semantic segmentation. The proposal of U-net has aroused a large number of researchers’ interest in U-shaped semantic segmentation network and made great contributions to the field of semantic segmentation. Many improved Unets are used in medical image processing, medical image segmentation, and so on. For example, Li Xiaomeng et al. [
17] have proposed a new hybrid dense connection U-net (H-DensueNet) for segmentation of liver and tumor on CT images. This method is superior to other state-of-the-art methods in tumor segmentation results. In [
18], researchers have proposed a new type of Unet 3+, which utilizes full-scale jump connections and in-depth monitoring. Unet 3+ not only further improves accuracy but also reduces network parameters to improve calculation efficiency. In the paper, the author verified the effectiveness of the method on two datasets.
Researchers at Google have proposed a transformer model for natural language processing in 2017 [
19]. The attention mechanism used by this model is called self-attention, which improves the parallelism of the model. However, in previous computer vision work, attention was either used in combination with convolutional networks or used to replace certain components of convolutional networks, and its overall structure did not change much. Dosovitskiy et al. [
20] have proved that directly applying a pure converter to an image patch sequence without relying on convolutional neural networks(CNN) can perform image classification tasks well. When the model is pre-trained on large amounts of data, vision transformer(ViT) obtains excellent results while reducing the need for computing resources. The self-attention mechanism in transformer has been widely used in the field of computer vision, including target detection and target semantic segmentation. Researchers have also applied the self-attention mechanism to the semantic segmentation of medical images.
Jieneng Chen et al. [
21] have proposed the TransUNet network at the CVPR conference in 2021, which combines UNet and transformer structures. UNet can make up for the shortcomings of transformer’s insufficient low-level detail extraction resulting in limited positioning capabilities. At the same time, it combines the advantages of transformer’s global attention to provide an effective method for medical image semantic segmentation. Deep learning methods are used in the field of medical image semantic segmentation to achieve very good results on tumor segmentation results.
For muscle ultrasound images, Cunningham and Ryan et al. [
22] have used automatic segmentation and common CNN and deep residual convolutional network(ResNet) architectures to predict the direction of muscle fibers in low-resolution B-mode ultrasonography. In [
23], the authors have used U-net-based deep training to identify deep and shallow muscle membrane lines and muscle fibers, and then they processed the parameters of the muscle image according to the output results of the network model.
For the recognition of the direction of muscle fibers, we believe that it can be directly judged through the training model, and unnecessary interference in semantic segmentation can be eliminated. The advantage is that it is simpler than the training task of semantic segmentation, and it saves the subsequent complicated steps of processing the segmented image. In the previous work [
24], the model has been trained to judge the direction of muscle fibers, but the measured value of a single muscle fiber would have errors and interference. Most of the muscle ultrasound images are collected as a video, that is, a sequence. Therefore, based on the previous work, we propose a method combining Kalman filter algorithm [
25] with neural network to correct the direction of muscle fibers in a sequence of ultrasound images referring to the method in [
26]. We improve the Kalman filter algorithm to make it suitable for tracking the direction of muscle fibers, fitting the noise, and preventing the inaccuracy caused by initializing the filter with inappropriate parameters. Firstly, the reference line is introduced into the muscle fiber image, and the reference line is actually a straight line. The line is continuously corrected by using the deep learning method to obtain the direction of the muscle fibers. Then, based on prior knowledge, the direction of muscle fibers is predicted. Finally, by combining the measured value and the predicted value, the improved Kalman filter is used to optimally estimate the direction of muscle fibers.
The remaining content is organized as follows. We introduce Materials and Methods in the second section. In the third section, we show the experiment of extracting and tracking muscle fiber direction in the sequence of ultrasound images. Then, experimental results will been discussed in the fourth section. Our final summary of the paper is provided in the last section.
2. Materials and Methods
Deep learning is applied to many areas of images. In the previous work reported in [
24], a reference line is introduced into the image of the muscle fiber, and the ResNet-50 deep learning network [
27] is used to judge the relationship between the muscle fibers and the line by using sub-images. Then, the direction of the line is continuously adjusted until it is judged to be consistent with the direction of the fibers. However, the measured value obtained in this manner is affected by the initial state of the reference line and the random position of the sub-image, which introduces noise. The characteristic of a Kalman filter is to process the noisy input and observation signals based on linear state-space representation to obtain the system state or real signal. Since the measurement is based on a single image, for ultrasound video, there is a connection between frames, and the state variables of the previous frame are closely related to the angle of this frame. Therefore, for the estimation of muscle fibers angle of the current frame, in addition to the measured value, the state quantity of muscle fibers in the previous frame of ultrasound image is also added to the calculation, such as the angle between the muscle fiber and horizontal direction and speed of the previous frame. In this manner, the fluctuation and noise introduced due to the inaccuracy of the measurement are smoothed so as to realize the dynamic measurement of the direction of the muscle fibers in the continuous ultrasound image sequence. The formula of Kalman filter is shown in Formulas (1) and (2).
Firstly, the Kalman filter predicts the next state of the system by using the process model of the system. The prediction process includes Formulas (1) and (2). Using Formula (1), assuming that the current system state is
k, the current state can be predicted based on the previous state according to the system model. Among them,
is the prediction result of using the previous state;
is the optimal result of the previous state;
is the control matrix of the current state, it can be 0. Formula (2) is used to update the error covariance matrix between the predicted value and the true value. In Formula (2),
is the covariance of
;
represents the transpose matrix of
A; and
Q is the covariance of the system process. By using Formulas (1) and (2), the Kalman filter completes the prediction of the system. Then, combining the predicted value
and measured value
, the optimal estimated value
of the current state
k is obtained by using Formula (3). The Kalman gain is updated according to Formula (4), and Formula (5) is used to update the covariance of
.
The changes from frame to frame are small and regular; thus, we model the changes in the angle of muscle fibers according to Kalman filtering. Firstly, the state variable
x has two values: One is the angle
between the muscle fibers and the parallel direction, and the other is the change
between the muscle fiber angle of this frame and the previous frame. Secondly,
z is the measured value obtained by using deep learning and reference line adjustment.
The measurement matrix
H and the state transition matrix
A are set as follows.
Since the Kalman filter is suitable for linear systems and the noise therein is assumed to be Gaussian white noise, obviously, the system in the real world will not be linear and the noise will not be Gaussian white noise. Therefore, we consider adding nonlinear factors to improve the Kalman filter. In [
28], the authors discuss the implementation of artificial neural networks such as nonlinear autoregressive models with exogenous input(NARX models) [
29,
30] on field programmable gate array. In [
26], two NARX models are used to join the Kalman filter, and good results are obtained, but at the same time, it also increases the time of the Kalman filter algorithm. Considering that in the context of detecting the direction of the muscle fibers, the angle and speed will not change like other nonlinear systems. Therefore, we consider simplifying this method to improve the Kalman algorithm. In order to add the influence of the time series and let the state at the previous moment or multiple previous moments affect the state at this moment, this method adds an NARX model for regression prediction.
The NARX formula is shown below. When
, the structure of the NARX model is shown in the
Figure 1. NARX model relates the current value of a time series to both: past values of the same series
and current and past values of the exogenous series
. In other words, for a given two time series, NARX model can predict the state of the current system through learning.
The block diagram of the combination of NARX model and Kalman filter algorithm is shown below. We used the NARX model to estimate the value of the current moment by
,
,…,
,
,
,…,
,
,…,
, and
…,
. The formula is as follows.
In Formula (11), i represents the i-th image in the image sequence; represents the predicted value calculated by the Kalman filter; K represents the Kalman gain; z represents the measured value calculated according to the output of the resnet model; y is the output result of the NARX model; and m represents that the data of the previous m frames is used as the input of the model.
We embed the above model into the traditional Kalman filter. The block diagram is shown in the
Figure 2. Firstly, the Kalman filter predicts the direction of the muscle fiber according to the set parameters and the system model, then enters the step of updating the filter, and then it calculates the Kalman gain according to Formula (4). Then, the Kalman gain enters in different ways. One method is to continue the traditional Kalman update process to obtain the estimated value
. The other method is to enter the NARX model with a set of predicted values, a set of measured values, and a set of output values of the NARX model to obtain
. The estimated value obtained by two different methods is superimposed to obtain the final estimated value. The estimated value
in the Kalman filter is updated, and the loop continues.
3. Experiment
In Algorithm 1, our proposed method is described. The input is a sequence of ultrasound images of muscle fibers,
,
,
, …,
, and the output is the angle results
R of the muscle fibers corresponding to the image sequence. Before calculating the angle, initialize the Kalman filter according to Formulas (8) and (9). Then, the algorithm enters the loop to process the image sequence. For an image
in the sequence, a reference line is randomly generated on
to assist in judging the direction of the fibers.
with the line is divided into sub-images. ResNet-50 was used to determine whether the direction of the muscle fiber is parallel to the line. The algorithm continuously adjusts the line until the direction of the muscle fibers is found. By this method, the measured value
of the direction of the muscle fibers is obtained.
Algorithm 1 Obtain the estimate value using improved Kalman Filter. |
Require: A sequence of images of muscle fibers with reference lines, , init Kalman Filter, , array ;
|
Ensure: The array containing the direction angles of muscle fibers, ;
- 1:
while
do - 2:
- 3:
- 4:
- 5:
- 6:
- 7:
- 8:
- 9:
- 10:
- 11:
- 12:
- 13:
update array P, K, Z, Y - 14:
end while - 15:
return
- 16:
function getZ() - 17:
Randomly generate a reference line l on - 18:
while l is not parallel to the direction of muscle fibers do - 19:
Adjust l - 20:
Split into sub-images - 21:
Put sub-images into Resnet - 22:
Get the output of Resnet - 23:
end while - 24:
- 25:
return z - 26:
end function
|
The improved Kalman filter is used to optimally estimate the direction of the muscle fiber. Firstly, the Kalman filter obtains the predicted value of the muscle fiber direction by using Formulas (1) and (2). Then, based on the measured value
obtained previously, an estimated value
is obtained. Then, the internal parameters of the Kalman filter (Kalman gain
K, predicted value
P, measured value
Z, and estimated value
Y of the previous m frame) are used as the input of net to estimate the direction of the muscle fiber in the i-th frame image and to obtain the estimate value of NARX module
. The algorithm combines estimated value
and estimated value
to obtain the optimal estimated value of
. Finally, we update the parameters of the latest Kalman filter and the final output
to the
P,
K,
Z, and
Y arrays and then calculate the direction of muscle fibers in the next image. The algorithm flow in a loop is shown in the
Figure 3. By using our proposed algorithm, the result not only combines the predicted value and the measured value through the Kalman algorithm but also adds information about the angle of the previous frame.
Before using the improved Kalman filter, the NARX models have to be trained first. The training platform is Windows 10, Python3.6, Torch 1.18, and NVIDIA 3070. The dataset used to train ResNet comes from videos and images of gastrocnemius muscle when young and old healthy adults experience slight dorsiflexion. In this article, the dataset we use to train NARX model includes data with time-series properties. The dataset used for training is four groups of 160 frames of the angle of the muscle fiber direction and the corresponding parameters of the Kalman filter. The first 80% sets of data are the training dataset, and the remaining 20% sets of data are the test dataset. In the training process, the above P, K, Z, and Y arrays are used as a set of inputs of NARX model. The manually labeled angle of the corresponding frame is used as true value. The loss function used in the training process is the square loss function, and its formula is as follows.
For the same image,
Figure 4a is the result of manual annotation.
Figure 4b is the result of only the output of the ResNet network, and
Figure 4c is the result of the improved Kalman filter and ResNet. In order to observe the difference, we place the three results on an image, as shown in
Figure 4d. The red line in the figure represents the result of manual labeling, the green line represents the result directly outputted by ResNet-50, and the blue line represents the result of our proposed method. It can be observed that the filtered result is closer to the manual measurement result, which to a certain extent intuitively proves the effectiveness of our proposed method. More results are shown in
Figure 5.
4. Discussion
In order to verify the effectiveness of the method proposed in this paper, we compare it with the extended Kalman filter (EKF) and unscented Kalman Filter(UKF). The prediction and update of the Kalman filter are based on linear calculations, but real-world calculations are often nonlinear. EKF expands the nonlinear system by using the Taylor series at its reference point, and it takes its first-order linear part as the approximation of the nonlinear model so as to obtain a linearized description of the nonlinear system at the current moment. In this manner, the Kalman filter algorithm is suitable for nonlinear environments. The difference between EKF and the Kalman filter is that the state prediction equation and the observation equation are nonlinear. UKF is a combination of lossless transformation (UT) and standard Kalman filter system. UKF is proved to be a powerful technique for estimating the state of chaotic neurons (Ref. [
31]). Therefore, in this section, we will also make a comparison with UKF.
The root mean square error is used to reflect accuracy. The root mean square error (RMSE) is the square root of the ratio of the square of the deviation between predicted value
and true value
to the number of observations
n. Moreover, the calculation of RMSE is shown in Formula (13).
Table 1 shows the different results obtained by using different methods on different continuous muscle image sequences. It includes the results obtained by using ResNet to judge and adjust the reference line, the result of the measurement value combined with Kalman filter, the result of the measurement value combined with EKF and UKF, and the result of the method proposed in this paper. In addition to the above methods, we also compared the particle filter(PF). The idea of PF is based on Monte Carlo methods [
32,
33]. It uses a particle set to represent probability, which can be used in any form of a state space model. Among them, Subject 1 and Subject 2 contain 40 frames of continuous muscle images, and Subject 3 contains 150 frames of continuous muscle images. The datasets have received ethical approval from the relevant committees of Shenzhen Institutes of Advanced Technology, CAS. It can be observed that the RMSE value of our method is lower.
In order to evaluate the consistency of the above five methods with the manual annotation method, we chose the Bland–Altman plot [
34,
35]. This is an intuitive evaluation method used to compare the consistency between the two methods. In the Bland–Altman plots, the horizontal axis is the mean of the two indicators, and the vertical axis is the difference between the two indicators. The average of difference is represented by the ordinate of the blue line in the figure. The upper and lower dashed lines are ±1.96 times the standard deviation of the difference, and the upper and lower limit interval is called the 95% agreement limit. If the width of the consistency limit is within the acceptable range, we think the consistency is better; otherwise, the consistency is poor.
Figure 6a–d are the Bland–Altman plots of the manually labeled results and the results obtained by the ResNet method, Resnet-KF, Resnet-EKF, and Resnet-UKF, respectively.
Figure 6e–f show the Bland–Altman plots, which are the comparison between the manually annotated results and the results obtained by Resnet-LSTM and our proposed method, respectively. In order to compare the parameters more intuitively, the parameters of the five groups of BA diagrams are shown in
Table 2. By conducting the comparison, we can find that the absolute value of the upper and lower limits of our proposed method is smaller than that of the other methods, and the arithmetic mean of the results obtained using the LSTM method is the smallest. However, whether the upper and lower limits are within the acceptable error range is an important indicator of the consistency of the evaluation method.
The results obtained by different methods are shown in
Figure 7. Combined with
Table 1, our proposed method is based on the measured value being closer to the result obtained by manual measurement. The dark blue line represents the result of manual measurement, and the green line represents the result obtained using LSTM. We can see that the LSTM method with the mean value closer to 0 in the BA diagram captures the trend of angle transformation well, but the results of some frames are quite different. Therefore, by conducting a comparison and analysis of different aspects, it can be concluded that the method proposed in this paper has a better effect in tracking the angle of muscle fibers in consecutive frames.