The Extraction of Foreground Regions of the Moving Objects Based on Spatio-Temporal Information under a Static Camera

Zhang, Yugui; Yu, Lina; Li, Shuang; Wang, Gang; Jiang, Xin; Li, Wenfa

doi:10.3390/electronics12153346

Open AccessArticle

The Extraction of Foreground Regions of the Moving Objects Based on Spatio-Temporal Information under a Static Camera

¹

Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China

²

School of Computing and Data Engineering, NingboTech University, Ningbo 315100, China

³

Beijing Academy of Artificial Intelligence, Beijing 100084, China

⁴

Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(15), 3346; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics12153346

Submission received: 6 June 2023 / Revised: 24 July 2023 / Accepted: 29 July 2023 / Published: 4 August 2023

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

The rapid development of computer vision technology provides a basic guarantee for public security reliance on video surveillance. In current video surveillance based on static cameras, accurate and quick extractions of foreground regions of moving objects enable quicker analysis of the behavior of meaningful objects and thus improve the intelligent analysis level of video surveillance. However, there would always occur false detection in the extraction of foreground regions, because of the shaking of tree branches and leaves in the scene and the “ghosting” area caused by the delayed updating of the background model. To solve this problem, this paper proposes a method for the extraction of foreground regions by using spatio-temporal information. This method can accurately extract foreground regions of moving objects by utilizing the difference and complementarity between spatial domain methods and temporal domain methods and further in combination with image processing technology. Specifically, the foreground regions of moving objects can be extracted by the morphological processing of the combination of the spatial information and the morphologically processed temporal information in the video. The experimental results show that the proposed method for the extraction of foreground regions of moving objects in view of the spatio-temporal information can reduce false detections caused by the shaking of tree branches and leaves, and thus effectively extract foreground regions of moving objects.

Keywords:

extraction of foreground regions; the moving object; spatio-temporal information; spatial domain method; temporal domain method

1. Introduction

Video surveillance systems are a basic element of the security protection system, and are a comprehensive system having a strong defensive ability. Video surveillance is widely used in many occasions because of its intuitiveness, convenience, and abundant information [1,2,3,4,5,6,7,8]. In recent years, with the rapid development of computers, networks, image processing, and transmission technology, video surveillance technology has made a significant development [1,2,3,4,5]. In video surveillance, there exists a variety of objects either in a stationary or moving state, such as pedestrians, vehicles, animals, flowers, and trees. In general, stationary objects lack movement information such as speed, angle, and direction that are included in moving objects, making its research less urgent than moving objects, while in the moving objects, there always occurs missed and false detection because of the influence caused by the shaking of tree branches and leaves, and the view angle and distance of cameras in the monitoring environment [9,10,11,12,13]. Therefore, this paper focuses on accurate extraction of foreground regions of moving objects in video sequences.

In outdoor video surveillance, there exist shaking of tree branches and leaves, flowers, grasses, and similar, which would result in numerous false detections of objects in the extraction of foreground regions of moving objects by using existing methods based on temporal information or spatial information of video sequences. In addition, existing methods based on temporal information would produce “holes” [14,15,16,17,18,19,20,21], while the methods based on spatial information would produce “ghosting” [22,23,24,25,26,27]. Therefore, in order to effectively reduce false detections of objects in the shaking area of tree branches and leaves and the “ghosting” caused by spatial information in the extraction of foreground regions, this paper effectively combines temporal information and spatial information of the video sequence in view of difference and complementarity of physical locations thereof, and further makes use of image processing techniques.

In order to better illustrate the existing mainstream methods, Figure 1 shows the processing effects in shaking areas of tree branches and leaves by respectively using the frame difference methods based on temporal information [14,15,16] and the visual background extractor (VIBE) methods based on spatial information [25], as specifically shown.

The frame difference method based on temporal information is widely used for the extraction of foreground regions of moving objects from stationary video sequences, owing to its simple, fast, and easy-to-implement computation. However, this method is sensitive to movements such as the shaking of tree branches and leaves, and generally has adjacent frames that show similar or consistent distribution, which would easily result in “holes” and false detections of objects in the frame difference results [17,21], as shown in Figure 1b. Although researchers subsequently made some improvements to the frame difference method, such as the adjacent three frame difference method [14], and the adaptive threshold frame difference method [19,20], these improved methods still cannot effectively reduce the “holes” and false detections of objects.

Barnich and Droogenbroeck [25] proposed a VIBE method for extracting foreground regions of moving objects based on the background template set by utilizing spatial information. This method could deal with rapid illumination changes by discarding the existing background model and then constructing a new background model. However, this method would easily result in “ghosting” when there are periodic motions such as the stop-and-go motion of moving objects and the wandering motion of tree branches and leaves, or when there are moving objects in the background model frame, as shown in Figure 1c. As a result, there always occurs the false detection in the extraction of foreground regions of moving objects.

In view of the above analysis, it was found that the frame difference results based on temporal information would have “holes” but not “ghosting”, while the VIBE results based on spatial information would have complete foreground regions of moving objects but have “ghosting”. Therefore, this paper proposes a method of extracting foreground regions of moving objects based on the combination of temporal information and spatial information. We name the proposed method as the foreground area extraction method with complementary spatio-temporal information (CSTI). This method effectively combines the temporal frame difference method without “ghosting” and the spatial VIBE method without “holes” in the video sequence and improves the accuracy of foreground region extraction. Therefore, this method could improve the extraction of foreground regions by combining temporal information with spatial information in video sequences, and further utilizing morphological techniques. The main research ideas of the proposed method are as follows:

(1): Using the VIBE method to extract the foreground region of the current frame, which would result in the “ghosting” region and the false detection region caused by the shaking of tree branches and leaves.
(2): Using the frame difference method to obtain the foreground region of adjacent frames, which is then subject to morphological processing to eliminate “holes”, thereby obtaining a relatively complete object.
(3): Performing an AND operation on the VIBE result and the processed frame difference result, which is then subject to morphological processing, so that the “ghosting” region and the false detection region caused by the shaking of tree branches and leaves that occurred in the VIBE result could be eliminated, thereby obtaining the final extraction result of foreground region.

The method proposed in this paper can effectively extract the foreground area of the moving object, without using the scene and weather information [3,28,29,30], whereas the current background subtraction method based on deep learning requires labeling data under different scenes and weather for model training, so as to improve the generalization ability of the model [3,28,29,30,31,32]. Therefore, considering the significant difference between the proposed method and the background subtraction method based on deep learning, we feel that there is no need to make comparison between them; instead, we chose to evaluate our proposed method by comparison with the traditional method having the superior performance.

This paper is organized as follows: the first part relates to the introduction and the problems of current methods, the second part is related work, the third part is the proposed method, the fourth part is the experimental results, and the last part is the conclusion and future work.

2. Related Work

Under stationary cameras, there would always occur a high false detection rate and a low missed detection rate in the extraction of all foreground regions of moving objects in the scene by using the existing temporal frame difference methods [14,15,16] and spatial background modeling methods [25].

2.1. Frame Difference Method

The frame difference method is widely used to extract foreground regions of moving objects from stationary video sequences. In this method, the foreground regions of moving objects can be obtained by using the pixel difference between the current frame image and the next adjacent frame image [14,15,16]. Specifically, the calculation process is as follows:

D_{k} (x, y) = |I_{k + 1} (x, y) - I_{k} (x, y)|

(1)

F_{k} (x, y) = \{\begin{matrix} 1, & D_{k} (x, y) \geq T \\ 0, & D_{k} (x, y) < T \end{matrix}

(2)

wherein, in Formula (1),

I_{k} (x, y)

and

I_{k + 1} (x, y)

are pixel values of the kth frame and the (k + 1)th frame at the pixel coordinate (x, y) in the current video sequence, respectively,

D_{k} (x, y)

is the absolute value of frame difference between the pixel value of the kth frame and the pixel value of the (k + 1)th frame at the pixel point (x, y), and in Formula (2), T is the threshold value, and

F_{k} (x, y)

is the binary frame difference obtained by the thresholding processing of

D_{k} (x, y)

.

2.2. Background Modeling Method

The background modeling method is one of the most commonly used methods in the detection of foreground regions of moving objects. This method mainly includes three key steps of background initialization, background update, and extraction of foreground regions of moving objects [22,23,24]. Specifically, firstly, the background model is initialized by using N frames to obtain a background image without moving objects; secondly, foreground regions of moving objects can be obtained by comparing the background image and the current frame image and then classifying the pixels into foreground and background according to the changing of the pixels; thirdly, the background image is updated in time, and foreground extraction and background update are performed repeatedly.

In the background initialization, clean background frames are generally obtained by using N frame sequences in the video sequence. However, in actual surveillance videos, video sequences often contain moving objects, so that the obtained background model would comprise moving objects of the foreground. Thus, it is impossible to obtain clean background frames [22], which is not beneficial for subsequent foreground extraction. In practical applications, from the perspective of comprehensive evaluation of performances and effects, the most used background modeling method is the VIBE algorithm [25], which can effectively extract the foreground region of the motion objects. The method uses the first frame to build a background model and is able to immediately discard the existing background model and quickly initialize a new one for sudden lighting changes in the scene. Compared with [33], which requires dozens of frames of video sequences to initialize its model, this method shortens the time to build the background model. From a statistical point of view, although it makes sense to use multiple frames to build a background model, it can collect a lot of data to estimate the temporal distribution of background pixels and build a cleaner background model. However, the background model constructed over multiple frames cannot cope with sudden changes in lighting. In addition, in some specific applications, the consumer may want to discover moving objects in the scene earlier and want to initialize the background model with fewer video sequences, and using the first frame to initialize the background model can be used in the video sequence. The second frame performs reliable foreground extraction, which is a distinct advantage for monitoring short video sequences or for foreground extraction algorithms used in embedded devices.

The same assumption as Jodoin et al. [26] is used in the VIBE algorithm, which assumes that adjacent pixels share similar temporal distributions. The VIBE algorithm does not require temporal information in the video sequence, but only uses a single frame to build a background model, and fills the background model with the values found in the spatial neighborhood of each pixel. More precisely, the method fills the background model with values obtained randomly around it in the first frame and selects a neighborhood large enough to contain a sufficient number of distinct samples. At the same time, as the neighborhood grows, the statistical correlation between values at different locations decreases. Experiments show that in this method, randomly selected samples in the eight connected neighborhoods of each pixel is sufficient, and this scheme of building a background model based on the first frame proves to be effective. However, the disadvantage of this method is that there is a moving object in the first frame, so that a “ghosting” region would be produced in the subsequent extraction of the foreground region. Hofmann et al. [34] present a highly efficient background modeling method named Pixel-Based Adaptive Segmenter (abbreviated as PBAS). The basic idea behind this method is to use two controllers with feedback loops for both the decision threshold as well as for the learning parameter. The proposed PBAS method outperforms most state-of-the-art methods. Rodriguez and Wohlberg [35] propose a fully incremental principal component pursuit (incPCP) algorithm for video background modeling. In the incPCP method, a matrix low-rank sparse decomposition algorithm is used to obtain the foreground and background of the observed image sequence at the same time, wherein the low-rank part obtained by decomposition is referred to as the background, and the sparse part is referred to as the foreground. In addition, the incPCP method can effectively reduce ghosting and is more sensitive to the moving objects. According to [27], a “ghosting” region is a set of connected points that are detected as moving objects but do not correspond to any real moving objects. In this case, the “ghosting” region is caused by the initialization of the background model, i.e., the background model contains the moving objects. In subsequent frames, the region after the moving object is the real background region. The existing method gradually updates the “ghosting” region by using the background update strategy, so that the “ghosting” region gradually disappears over time.

3. The Proposed Method

In actual video surveillance, there are two main factors that cause false detection in the extraction of the foreground region of the motion objects: one is the shaking of branches and leaves in the scene, and the other is the “ghosting” region caused by the untimely updated background model of the stop-and-go moving objects in the background frame [13,22]. To solve the above problem, this paper effectively utilizes the spatio-temporal information to reduce the false detection caused by the shaking of branches and leaves, so as to achieve the accurate extraction of the foreground region of the moving objects. Thus, the level of intelligent analysis of video surveillance can be improved.

The specific steps of using the spatio-temporal information complementarity and morphological processing operations proposed in the method of this paper are clearly shown in the following processing flow chart of Figure 2.

Figure 2 shows the processing flow of the proposed method. The VIBE method is used to extract the foreground region of the current frame, which would result in the “ghosting” region and the false detection region caused by the shaking of tree branches and leaves. The frame difference method is used to obtain the foreground region of adjacent frames, which is then subject to morphological processing to eliminate “holes”, thereby obtaining a relatively complete object. An AND operation is performed on the VIBE result and the processed frame difference result, which is then subject to morphological processing, so that the “ghosting” region and the false detection region could be eliminated, thereby obtaining the final extraction result of foreground region.

From Figure 1, it can be seen that in the frame difference result, there is a “hole” but no “ghosting” phenomenon, while in the VIBE method, there is a “ghosting” phenomenon, but the complete foreground region of the moving object can be obtained. In view of this, the VIBE foreground extraction results are remedied by the frame difference results to eliminate the “ghosting”. In addition, we use morphological dilation to solve the “holes”. The morphological dilation is shown in Formula (3) [36]:

A \oplus B = {z | {(\overset{\land}{B})}_{z} \cap A \neq \emptyset}

(3)

where takes the image of B about its origin and translates the image by z. The dilation of B to A is the set of all displacements z for which

\overset{\land}{B}

and A overlap at least one element. In addition, the morphological dilation used in the paper is a 3 × 3 operator. The “hole” in the frame difference result can be effectively filled by the morphological dilation operation, as shown in Figure 3.

Figure 3 shows a video sequence in the DML dataset [37]. From the visual effect, it can be seen that the morphological dilation fills the “holes” in the frame difference results to a certain extent but enlarges the background noise to a certain extent. Then the AND operation is performed between the VIBE result and the morphological-dilated frame difference result to obtain the final extraction result of the foreground region, which is shown in Formula (4):

res u l t (x, y) = f (x, y) \land g (x, y)

(4)

where

f (x, y)

is the foreground area result obtained by the VIBE method at the (x, y) position,

g (x, y)

is the foreground area result extracted by the frame difference method at the (x, y) coordinates,

\land

represents the point-by-point AND operation of

f (x, y)

and

g (x, y)

at the (x, y) position, and

res u l t (x, y)

represents the point-by-point AND result of the foreground area results of

f (x, y)

and

g (x, y)

at the (x, y) coordinates. In order to eliminate “ghosting”, the frame difference result obtained by the morphological dilation operation and the VIBE method result are added pixel-by-pixel, and then subjected to the morphological dilation operation to obtain the final extraction result of the foreground region. The specific process is shown in Figure 4.

As shown in Figure 4, the first row is the video sequence of the DML dataset [37], the second row is the VIBE method result [25], the third row is the morphological-dilated frame difference result, the fourth row is the result of the pixel-by-pixel comparison between the VIBE method and the morphological-dilated frame difference result, and the final result is obtained after morphological dilation. It can be seen that the method proposed in this paper can eliminate the false detection caused by “ghosting” and shaking of branches and leaves.

In the method proposed in this paper, when the camera is static, the VIBE result of the foreground extraction and the morphological-dilated frame difference result are successively subjected to the pixel-by-pixel AND operation and the morphological dilation to obtain the final extraction result of the foreground region. Therefore, the method proposed in this paper achieves the accurate extraction of the foreground region of the moving objects by effectively utilizing the spatio-temporal information in the video sequence.

4. Experimental Evaluation

4.1. Evaluation Metrics

In order to evaluate the accuracy of the moving objects in the extraction of foreground region, the following evaluation metrics were used in this paper [34,38,39,40]: precision rate (PR), recall rate (RE), false positive rate (FPR) and F-measure function (F-measure). The corresponding calculation formulas are shown in (5)~(8):

P R = T P / (T P + F P)

(5)

R E = T P / (T P + F N)

(6)

F P R = F P / (F P + T N)

(7)

F - m e a s u r e = 2 \times \Pr \times Re / (\Pr + Re)

(8)

where TP (true positive) is the number of true foreground pixels, that is, foreground in the foreground extraction result, and foreground at the corresponding position of ground truth; TN (true negative) is the number of true background pixels, that is, the background in the foreground extraction result, and the background in the corresponding position of ground truth; FP (false positive) is the number of falsely detected pixels in the foreground result, that is, the foreground in the foreground extraction result, and the background in the corresponding position of ground truth; FN (false negative) is the number of missed pixels in the foreground result, that is, the background in the foreground extraction result, and the foreground in the corresponding position of ground truth.

4.2. DML Dataset

To evaluate the effectiveness of the proposed method, we use the DML dataset of Wang et al. [37] under static outdoor cameras, which has 28 video sequences and each video sequence is shot continuously. The dataset contains videos of two resolutions of 1920 × 1080 and 1280 × 720, which contain no frame drop or skipping phenomenon. The frame rate is 25 frames per second. In addition, the dataset also contains three types of weather data in summer, autumn, and winter, as well as five types of weather data: normal light, wind, rain, cloudy, and strong light. In order to evaluate the effectiveness of the method proposed in this paper, we selected dynamic background data with violent branches and leaves shaking. The specific experimental visual effects and performance results are shown in Figure 5 and Table 1.

To evaluate the effectiveness of the proposed method, the scene_b_0030003 and scene_a_0020000 video sequences [37] containing violent tree and leaf shaking in the DML dataset were used. The reason is that the background modeling method can obtain good extraction results of the foreground region in video sequences without leaf shaking, which cannot prove the effectiveness of the proposed method. In the following, the effectiveness of the proposed method is evaluated by comparison with the VIBE method [25], frame difference method [14], Gaussian Mixed Model (GMM) method [41], PBAS method [34], and incPCP method [35] from the two aspects of visual effects and performance metrics. The visual results are shown in Figure 5.

From Figure 5, it can be seen that the method proposed in this paper can improve the extraction accuracy of the foreground region of the moving object compared with other methods. In particular, the first row are the original images, the second row are the ground truths corresponding to the original images, the third row are the VIBE method results, the fourth row are the frame difference method results, the fifth row are the GMM method results, the sixth row are the PBAS method results, the seventh row are the incPCP method results, and the last row are the results of the proposed method in this paper. Specifically, the first row shows the 60th frame, 989th frame, and 1740th frame of the scene_b_0030003 video sequence, respectively, in which there are large and irregular branches and leaves shaking in the area marked by the red box. As a result, the extraction results of the foreground region cannot effectively distinguish the movement of the real object from the movement of the meaningless shaking region, which lowers the accuracy of the extraction of the foreground region. From the last row, it can be seen that the combination of the spatio-temporal information and the results of foreground region extraction proposed in this paper can eliminate the “ghosting” and partial false detections as compared with the VIBE method, the “holes” and false detections as compared with the frame difference method, the false detections and “holes” as compared with the GMM method, and had few false detections as compared with the PBAS method, and reduced the false detections and “holes” as compared with the incPCP method. Among them, the PBAS method had higher detection accuracy and lower false detection rate than the VIBE method, frame difference method, GMM, and incPCP method. Therefore, the proposed method significantly improves the accuracy of foreground region extraction. However, there are still false detections in the red marked box in the first column of the last row because of the violent and irregular shaking of the branches and leaves in this area. Tracking techniques will be used in the follow-up research to further exclude false detections.

The following table shows the effectiveness of the proposed method in terms of performance metrics. The evaluation metrics used are precision rate (PR), recall rate (RE), false positive rate (FPR), and F-measure function (F-measure) [34,38,39,40], as shown in Table 1.

It can be seen from Table 1 that in the scene_b_0030003 video sequence, the proposed method has relatively high accuracy and lower false detection rates compared with the VIBE method, frame difference method, GMM method, PBAS method, and incPCP method. Among them, the higher the precision rate, recall rate, and F-measure function, the lower the false detection rate [31,32]. The VIBE method reduces the precision rate of foreground region extraction due to the “ghost” image caused by the background update and the false detection caused by the violent shaking of branches and leaves. The frame difference method has strong adaptability but is more sensitive to motion. In particular, the extraction result can only keep the contour information of the object due to the similarity of the image brightness values, which would result in “holes”, and thus reduce the accuracy of the extraction of foreground region. The GMM method performs a weighted average of multiple Gaussian model results to confirm the foreground and background but suppresses the movement of the foreground object and the shaking of branches and leaves. This would easily produce missed detection and “holes” in the extraction results of the foreground region. In the PBAS method, by statistically analyzing the inter-frame pixel differences, the foreground probability of each pixel in the current frame can be obtained. Although the PBAS method does not have a timely background update for the violent shaking of branches and leaves, resulting in the undesired “ghosts” and “holes” in the detection results, it still has a better overall result than the VIBE method. In the incPCP method, the matrix low-rank sparse decomposition algorithm is used for the background modeling, which can simultaneously obtain the foreground and background of the image sequence to be detected. The incPCP method effectively reduces the “hole” phenomenon caused by the foreground area extraction. Although the incPCP method is sensitive to the moving objects and has a relatively high false detection rate, it still has a higher recall rate as compared with other methods including the VIBE method, frame difference method, GMM, and PBAS. The method proposed in this paper combines the VIBE method results and the processed frame difference results, which can effectively extract the foreground regions of moving objects, and reduce the false detection caused by the shaking of branches and leaves and the “ghosting” caused by the VIBE method.

The effectiveness of the proposed method is evaluated using another video sequence scene_a_0020000 of the DML dataset that contains the shaking of branches and leaves. Similarly, the VIBE method [25], frame difference method [14], GMM method [41], the PBAS method [34], and the incPCP method [35] were used to verify the effectiveness of the proposed method from both visual effects and performance metrics. The visual effects are shown in Figure 6.

From Figure 6, it can be seen that the method proposed in this paper reduces the false detection caused by the shaking of branches and leaves compared with other methods. Among them, the first row are the original images, the second row are the ground truths corresponding to the original images, the third row are the VIBE method results, the fourth row are the frame difference method results, the fifth row are the GMM method results, the sixth row are the PBAS method results, the seventh row are the incPCP method results, and the last row are the results of the proposed method in this paper. Specifically, in the 79th frame, 1016th frame, and 2140th frame of the first row of the scene_a_0020000 video sequence, there are obvious branches and leaves in the red box marked area, which affects the performance of foreground region extraction of the moving object. As can be seen from the area marked by the red ellipse marked box in the above figure, there are obvious “ghosting” phenomena in the VIBE method and the incPCP method. Compared with the VIBE method, the frame difference method, the GMM method, the PBAS method, and the incPCP method, the method proposed in this paper reduces the false detection caused by the swaying area of branches and the “ghosting”. Moreover, it can effectively extract the occluded region, and improve the accuracy of foreground region extraction.

The effectiveness of the proposed method is further evaluated in terms of performance metrics. The evaluation metrics used are precision rate (PR), recall rate (RE), false positive rate (FPR), and F-measure function (F-measure) [34,38,39,40], as shown in Table 2.

From Table 2, it can be seen that in the scene_a_0020000 video sequence, the degree of branch and leaf shaking is lower than that in the scene_b_0030003 video sequence. Compared with the VIBE method, the frame difference method, the GMM method, the PBAS method, and the incPCP method, the proposed method has relatively high accuracy and a lower false detection rate. The proposed method in this paper reduces the false detection caused by the shaking of branches and leaves and accurately extracts the foreground region of the moving object. In contrast, the VIBE method shows lower accuracy of foreground region extraction because of “ghost” images and false detections caused by the shaking of branches and leaves. The frame difference method is prone to produce “holes”. The GMM method uses a weighted average of multiple Gaussian models to confirm the foreground and background, which suppresses the shaking of branches and leaves to a certain extent, but there still is missed object detection and “holes”. The PBAS method shows better effects than the VIBE method, including a lower false detection rate and a higher recall rate. The incPCP method is more sensitive to the moving objects, and thus would have a relatively high false detection rate when there are violent branches and leaves shaking. In view of above, the proposed method reduces the false detection rate in foreground region extraction and achieves accurate extraction of foreground regions of moving objects.

4.3. CDnet 2014 Dataset

The 2014 CDnet dataset provides a realistic, camera-captured, diverse set of indoor and outdoor videos [39]. These videos have been recorded by using cameras including low-resolution IP cameras, higher resolution cameras, commercial PTZ cameras, and near-infrared cameras. As a consequence, spatial resolutions of the videos in the 2014 CDnet dataset vary from 320 × 240 to 720 × 486. Due to the diverse lighting conditions and compression parameters, the level of noise and compression artifacts significantly varies from one video to another. The duration of the videos is from 900 to 7000 frames. Videos acquired by low-resolution IP cameras suffer from noticeable radial distortion. Different cameras have different hue bias due to different white balancing algorithms employed. Some cameras apply automatic exposure adjustment resulting in global brightness fluctuations in time. The frame rate also varies from one video to another, often as a result of limited bandwidth.

The video sequences with dynamic backgrounds were selected from the CDnet 2014 Dataset to evaluate the effectiveness of the proposed method. This paper selected the fall video sequence in the dynamic background in CDnet 2014 Dataset. The video sequence has 4000 frames and contains violent shaking of branches and leaves, as well as large trucks and pedestrian moving objects. This video sequence was selected in this paper to evaluate the effectiveness of the proposed method. The specific visual effects are shown in Figure 7.

As can be seen from Figure 7, the method proposed in this paper effectively eliminates the false detection caused by irregular branch and leaf shaking (the area marked by the red box), compared with other methods, and obtains a more complete foreground area of the moving objects. Among them, the first row are the original images, the second row are the ground truths corresponding to the original images, the third row are the VIBE method results, the fourth row are the frame difference method results, the fifth row are the GMM method results, the sixth row are the PBAS method results, the seventh row are the incPCP method results, and the last row are the results of the proposed method in this paper. Specifically, the first line shows the 1485th frame, the 2413th frame, and the 3166th frame of the fall video sequences, respectively. In the original image, enormous shaking of branches and leaves led to a large number of scattered and invalid false detection objects in the extraction results of the foreground region. Thus, the movement of the real object could not be effectively distinguished from the movement of the meaningless shaking region, which reduces the accuracy of foreground region extraction. It can be seen from the last row that the proposed method in this paper eliminates the false detection presented in the VIBE method, the frame difference method, the GMM method, the PBAS method, and the incPCP method in the swaying area of the branches and leaves by effectively combining the spatio-temporal information of the moving object and the post-processing technology of the image processing. Thus, the accuracy of foreground region extraction is significantly improved. From the visual effects of the experiment, it can be seen that in the method proposed in this paper, there is no need to consider the influence of vehicles and pedestrians on the extraction of foreground regions under shadows. Using the discontinuity and “hollowing” effect of pedestrians and vehicles in shadowed areas, the foreground areas of shadowed areas of vehicles and pedestrians can be eliminated.

The effectiveness of the proposed method is further demonstrated from the performance metrics of precision rate (PR), recall rate (RE), false positive rate (FPR), and F-measure function (F-measure). The specific performance evaluation results are shown in Table 3.

It can be seen from Table 3 that in the fall video sequence with violent tree and leaf shaking, the proposed method has a relatively high accuracy and lower false detection rates compared with the VIBE method, frame difference method, and GMM method. The proposed method in this paper utilizes the difference and complementarity of spatio-temporal information at the same time to reduce the false detection caused by the shaking of branches and leaves. In contrast, the VIBE method cannot update the background promptly due to the “ghosting” caused by continuous violent shaking of branches and leaves, and thus shows lower detection accuracy of foreground region extraction. The frame difference method has simple operation and strong adaptability. The frame difference results of large trucks with the same appearance are prone to “holes”, but relatively complete contour information of the object can be obtained, and a relatively complete foreground region of the moving object can also be obtained after post-processing. The GMM method uses a weighted average of multiple Gaussian kernels to confirm the foreground and background, and has a certain inhibitory effect on the shaking of branches and leaves. However, the performance of violent and continuous shaking of branches and leaves is not ideal, and the missed detection of object and “holes” will also occur. The PBAS method shows better results than the VIBE method, including a lower false detection rate and a higher recall rate in the video sequence. The incPCP method is sensitive to the moving objects, and thus has relatively high false detection and recall rates when there are violent shaking of branches and leaves. Therefore, the proposed method reduces the false detection rate in the extraction of the foreground region, and achieves accurate extraction of foreground region of the moving object.

4.4. The Collected Data

The self-built data come from the data collected in real-time in the actual operating system of the cooperative customer. In the collected data, there is shaking of branches and leaves in the area on both sides of the road. Its resolution is 1920 × 1080, the frame rate is 25 image/s, and there are vehicles and pedestrian objects in the data. In the collected data, there are scale changes of the objects from far to near and from near to far.

The selected video sequence has the shaking of branches and leaves in the area on both sides of the road; the video sequence has 4600 frames, and there is no moving object in the data of about 1500 frames. The specific visual effect is shown in Figure 8.

As can be seen from Figure 8 above, compared with the VIBE method, the frame difference method and the GMM method, the proposed method can effectively remove the falsely detected objects in the swaying areas of branches and leaves on both sides of the road. Among them, the first row are the original images, the second row are the ground truths corresponding to the original image, the third row are the VIBE method results, the fourth row are the frame difference method results, the fifth row are the GMM method results, the sixth row are the PBAS method results, the seventh row are the incPCP method results, and the last row are the results of the proposed method in this paper. Specifically, the first row shows the 200th frame, the 3160th frame, and the 4452nd frame in the video sequence. There are branches and leaves shaking on both sides of the road and there are scale changes of the object from far to near and from near to far. The third row is the visual effect obtained by VIBE’s method, and there are “ghost” objects and objects with false detections in the visual effects due to VIBE’s lack of background updates. The fourth row is the visual effect obtained by the frame difference method, due to the slight shaking of the branches and leaves on both sides of the road, and the frame difference results have a large number of false detections in the branches and leaves on both sides of the road. The fifth row is the visual effect obtained by the GMM method. This method uses multiple Gaussian kernels to comprehensively judge whether the current pixel belongs to the foreground or background, and thus suppresses the shaking of branches and leaves to a certain extent. The PBAS method shows better results than the VIBE method, including a lower false detection rate and a higher recall rate. The incPCP method is sensitive to moving objects, and thus has a relatively high false detection rate in the area where there is violent shaking of branches and leaves. At frame 4452, false detections brought by the shaking of branches and leaves are greatly reduced. The last row is the visual effect of the proposed method in this paper. The proposed method can effectively reduce the “ghosting” brought by the VIBE method and the slight branches and leaves by combining the difference and complementarity of the time-domain method and the spatial-domain method in the spatio-temporal position. The false detection caused by shaking improves the accuracy of the foreground region extraction of the moving objects.

The following is a further demonstration of the effectiveness of the proposed method from the performance evaluation metrics of precision rate (PR), recall rate (RE), false positive rate (FPR), and F-measure function (F-measure). The specific performance results are shown in Table 4.

As shown in Table 4, the proposed method has a higher recall rate and lower false detection rate than the VIBE method, frame difference method, and GMM method in the video sequences containing the shaking of branches and leaves of the self-built dataset. The proposed method has better performance; the recall rate is improved by 8.9% compared with the GMM method, by 3.2% compared with the mainstream VIBE method, by 1.9% compared with the PBAS method, and by 1.4% compared with the incPCP method, and the false detection rate is lowered by 16.6% compared with the frame difference method and by 2.9% compared with the VIBE method. Therefore, the proposed method in this paper shows improved ability of analyzing and understanding content in subsequent video sequences by fully combining the complementarity and the difference of the foreground region of the moving object extracted from the temporal information of the video sequence and the spatial information in the physical space position, and also by using the post-processing technology of image processing to accurately extract the foreground region of the moving objects which excludes falsely detected object region, especially swaying areas of branches and leaves that do not have motion continuity and no movement rules.

4.5. Algorithm Runtime Statistics

We tested the running time of VIBE method, frame difference method, GMM method, PBAS method, incPCP method, and the CSTI method proposed on the same computer. In addition, the configuration of the computer was: Intel(R) Core(TM) i7-9700k CPU @ 3.6 GHz, 32.0 GB memory. The running time of the test was an image with a resolution of 1920 × 1080. The specific running time of the individual algorithm involved in the paper is shown in Table 5.

The running time of the CSTI method proposed in the paper is higher than VIBE method, frame difference method, and GMM method, but lower than PBAS method and incPCP method.

5. Conclusions and Future Work

This research analyzed the causes of false detections in the foreground region extraction of moving objects when the camera is stationary and proposes the method of foreground region extraction based on spatio-temporal information. In this method, the result of VIBE extraction and the morphologically processed frame difference results are AND operation, and then morphologically processed to obtain the final extraction result of the foreground region of the moving object. Experiments show that the proposed method achieves accurate extraction of foreground regions when the camera is static. The method proposed in this paper has a good inhibitory effect on the violent shaking of branches and leaves. However, this proposed method still has certain limitations. For example, a small amount of false detections would occur at the edge of the shaking area of branches and leaves. The follow-up work of this paper will aim to further reduce false detection by combining the timing regularity of the branches and leaves shaking with target tracking technology. In addition, when the camera has motion, the proposed method cannot completely eliminate the false detection object owing to the shaking of branches and leaves in a wide area. Further research will be carried out on the above problems.

Author Contributions

Conceptualization, Y.Z. and L.Y.; Methodology, Y.Z. and S.L.; Software, Y.Z. and G.W.; Validation, Y.Z., X.J., G.W. and S.L.; Formal analysis, Y.Z.; Investigation, Y.Z. and L.Y.; Resources, W.L. and L.Y.; Data curation, G.W. and X.J.; Writing—original draft preparation, Y.Z.; Writing—review and editing, L.Y. and G.W.; Visualization, G.W. and S.L.; Supervision, W.L.; Project administration, W.L. and Y.Z.; Funding acquisition, G.W. and W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the National Natural Science Foundation of China (No. 61972040), the Zhejiang Provincial Natural Science Foundation of China under Grant LQ20F020006, the Zhejiang Provincial Philosophy and Social Science Foundation of China under Grant 22NDQN291YB, the Ningbo Natural Science Foundation under Grant 2023J280 and the Ningbo Key R&D Program (Digital Twin Project).

Data Availability Statement

The dataset of extraction of foreground region is available online: http://changedetection.net/.

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (No. 61972040), the Zhejiang Provincial Natural Science Foundation of China under Grant LQ20F020006, the Zhejiang Provincial Philosophy and Social Science Foundation of China under Grant 22NDQN291YB, the Ningbo Natural Science Foundation under Grant 2023J280 and the Ningbo Key R&D Program (Digital Twin Project). The authors would like to express their heartfelt gratitude to those people who have helped with this manuscript and to the reviewers for their comments on the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, D.W.; Xu, L.H.; Goodman, E.D. Illumination-robust foreground detection in a video surveillance system. IEEE Trans. Circuits Syst. Video Technol. 2013, 23, 1637–1650. [Google Scholar] [CrossRef]
Hu, Y.; Sirlantzis, K.; Howells, G.; Ragot, N.; Rodríguez, P. An online background subtraction algorithm deployed on a NAO humanoid robot based monitoring system. Robot. Auton. Syst. 2016, 85, 37–47. [Google Scholar] [CrossRef]
Kalsotra, R.; Arora, S. Background subtraction for moving object detection: Explorations of recent developments and challenges. Vis. Comput. 2022, 38, 4151–4178. [Google Scholar] [CrossRef]
Sun, Z.; Hua, Z.; Li, H. Small Moving Object Detection Algorithm Based on Motion Information. arXiv 2023, arXiv:2301.01917. [Google Scholar]
Li, X.; Nabati, R.; Singh, K.; Corona, E.; Metsis, V.; Parchami, A. EMOD: Efficient Moving Object Detection via Image Eccentricity Analysis and Sparse Neural Networks. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Waikoloa, HI, USA, 3–7 January 2023; pp. 51–59. [Google Scholar]
Liu, H.; Yu, Y.; Liu, S.; Wang, W. A Military Object Detection Model of UAV Reconnaissance Image and Feature Visualization. Appl. Sci. 2022, 12, 12236. [Google Scholar] [CrossRef]
Yin, Q.; Hu, Q.; Liu, H.; Zhang, F.; Wang, Y.; Lin, Z.; An, W.; Guo, Y. Detecting and Tracking Small and Dense Moving Objects in Satellite Videos: A Benchmark. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–18. [Google Scholar] [CrossRef]
Sultana, M.; Mahmood, A.; Jung, S.K. Unsupervised moving object detection in complex scenes using adversarial regularizations. IEEE Trans. Multimed. 2021, 23, 2005–2018. [Google Scholar] [CrossRef]
Hu, Y.; Sirlantzis, K.; Howells, G.; Ragot, N.; Rodriguez, P. An online background subtraction algorithm using a contiguously weighted linear regression model. In Proceedings of the European Signal Processing Conference (EUSIPCO), Nice, France, 31 August–4 September 2015; pp. 1845–1849. [Google Scholar]
Tamulionis, M.; Sledevič, T.; Abromavičius, V.; Kurpytė-Lipnickė, D.; Navakauskas, D.; Serackis, A.; Matuzevičius, D. Finding the Least Motion-Blurred Image by Reusing Early Features of Object Detection Network. Appl. Sci. 2023, 13, 1264. [Google Scholar] [CrossRef]
Li, J.; Liu, P.; Huang, X.; Cui, W.; Zhang, T. Learning Motion Constraint-Based Spatio-Temporal Networks for Infrared Dim Target Detections. Appl. Sci. 2022, 12, 11519. [Google Scholar] [CrossRef]
Antonio Velázquez, J.A.; Romero Huertas, M.; Alejo Eleuterio, R.; Gutiérrez, E.E.G.; López, F.D.R.; Lara, E.R. Pedestrian Localization in a Video Sequence Using Motion Detection and Active Shape Models. Appl. Sci. 2022, 12, 5371. [Google Scholar] [CrossRef]
Chapel, M.N.; Bouwmans, T. Moving objects detection with a moving camera: A comprehensive review. Comput. Sci. Rev. 2020, 38, 100310. [Google Scholar] [CrossRef]
Lipton, A.J.; Fujiyoshi, H.; Patil, R.S. Moving target classification and tracking from real-time video. In Proceedings of the Fourth IEEE Workshop on Applications of Computer Vision. WACV’98 (Cat. No. 98EX201), Princeton, NJ, USA, 19–21 October 1998; pp. 8–14. [Google Scholar]
Singla, N.S. Motion detection based on frame difference method. Int. J. Inf. Comput. Technol. 2014, 4, 1559–1565. [Google Scholar]
Liu, H.Y.; Meng, W.T.; Liu, Z. Key frame extraction of online video based on optimized frame difference. In Proceedings of the 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery, Chongqing, China, 29–31 May 2012; pp. 1238–1242. [Google Scholar]
Han, X.W.; Gao, Y.; Zheng, L.; Niu, D. Research on moving object detection algorithm based on improved three frame difference method and optical flow. In Proceedings of the 2015 Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC), Qinhuangdao, China, 18–20 September 2015; pp. 580–584. [Google Scholar]
Lei, M.Y.; Geng, J.P. Fusion of Three-frame Difference Method and Background Difference Method to Achieve Infrared Human Target Detection. In Proceedings of the 2019 IEEE 1st International Conference on Civil Aviation Safety and Information Technology (ICCASIT), Kunming, China, 17–19 October 2019; pp. 381–384. [Google Scholar]
Zang, X.H.; Li, G.; Yang, J.; Wang, W. Adaptive difference modelling for background subtraction. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar]
Zhang, Y.; Liu, Q.L. Moving target detection method based on adaptive threshold. Comput. Eng. Appl. 2014, 50, 166–168. [Google Scholar]
Zhang, F.; Zhu, J. Research and Application of Moving Target Detection. In Proceedings of the International Conference on Robots & Intelligent System, Vancouver, BC, Canada, 24–28 September 2017; pp. 239–241. [Google Scholar]
Sobral, A.; Vacavant, A. A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Comput. Vis. Image Underst. 2014, 122, 4–21. [Google Scholar] [CrossRef]
Bouwmans, T. Traditional and recent approaches in background modeling for foreground detection: An overview. Comput. Sci. Rev. 2014, 11, 31–66. [Google Scholar] [CrossRef]
Elgammal, A. Wide Area Surveillance; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
Barnich, O.; Droogenbroeck, M.V. ViBe: A universal background subtraction algorithm for video sequences. IEEE Trans. Image Process. 2010, 20, 1709–1724. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jodoin, P.M.; Mignotte, M.; Konrad, J. Statistical background subtraction using spatial cues. IEEE Trans. Circuits Syst. Video Technol. 2007, 17, 1758–1763. [Google Scholar] [CrossRef]
Shoushtarian, B.; Bez, H.E. A practical adaptive approach for dynamic background subtraction using an invariant colour model and object tracking. Pattern Recognit. Lett. 2005, 26, 5–26. [Google Scholar] [CrossRef]
Bouwmans, T.; Javed, S.; Sultana, M.; Jung, S.K. Deep neural network concepts for background subtraction: A systematic review and comparative evaluation. Neural Netw. 2019, 117, 8–66. [Google Scholar] [CrossRef] [Green Version]
Kalsotra, R.; Arora, S. A comprehensive survey of video datasets for background subtraction. IEEE Access 2019, 7, 59143–59171. [Google Scholar] [CrossRef]
Zheng, W.; Wang, K.; Wang, F.Y. A novel background subtraction algorithm based on parallel vision and Bayesian GANs. Neurocomputing 2020, 394, 178–200. [Google Scholar] [CrossRef]
Ru, C.; Wen, W.; Zhong, Y. Raman spectroscopy for on-line monitoring of botanical extraction process using convolutional neural network with background subtraction. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2023, 284, 121494. [Google Scholar] [CrossRef] [PubMed]
Zhao, C.; Hu, K.; Basu, A. Universal background subtraction based on arithmetic distribution neural network. IEEE Trans. Image Process. 2022, 31, 2934–2949. [Google Scholar] [CrossRef] [PubMed]
Elgammal, A.; Harwood, D.; Davis, L. Non-parametric model for background subtraction. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2000; pp. 751–767. [Google Scholar]
Hofmann, M.; Tiefenbacher, P.; Rigoll, G. Background segmentation with feedback: The pixel-based adaptive segmented. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 38–43. [Google Scholar]
Rodriguez, P.; Wohlberg, B. Incremental principal component pursuit for video background modeling. J. Math. Imaging Vis. 2016, 55, 1–18. [Google Scholar] [CrossRef]
Gonzales, R.C.; Wintz, P. Digital Image Processing; Addison-Wesley Longman Publishing Co., Inc.: Boston, MA, USA, 1987. [Google Scholar]
Wang, X.Y.; Hu, H.M.; Zhang, Y.G. Pedestrian Detection Based on Spatial Attention Module for Outdoor Video Surveillance. In Proceedings of the 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM), Singapore, 11–13 September 2019; pp. 247–251. [Google Scholar]
Goyette, N.; Jodoin, P.M.; Porikli, F.; Konrad, J.; Ishwar, P. Changedetection. net: A new change detection benchmark dataset. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 1–8. [Google Scholar]
Wang, Y.; Jodoin, P.M.; Porikli, F.; Konrad, J.; Benezeth, Y.; Ishwar, P. CDnet 2014: An expanded change detection benchmark dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 387–394. [Google Scholar]
Wang, Y.; Luo, Z.M.; Jodoin, P.M. Interactive deep learning method for segmenting moving objects. Pattern Recognit. Lett. 2017, 96, 66–75. [Google Scholar] [CrossRef]
Stauffer, C.; Grimson, W.E.L. Adaptive background mixture models for real-time tracking. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Fort Collins, CO, USA, 23–25 June 1999; pp. 246–252. [Google Scholar]

Figure 1. The processing effects of the current mainstream methods in the extraction of foreground regions of moving objects. (a) The original image; (b) the false detection and the “holes “of objects based on temporal information [14,15,16]; (c) the false detection and the “ghosting” of objects based on spatial information [25].

Figure 2. Processing flow of the proposed method.

Figure 3. Morphological dilation supplements the “hole” effect. (a) a2 sequence inflation result; (b) scene_a_0000000 sequence dilation result; (c) scene_a_0020000 sequence dilation result.

Figure 4. The process of extracting the foreground region of the moving objects combined with spatio-temporal information. (a) The results of spatio-temporal combination of a2 sequences; (b) scene_a_0000000 sequence spatio-temporal combination results; (c) scene_a_0020000 sequence spatio-temporal combination results.

Figure 5. Visual results of the scene_b_0030003 sequence.

Figure 6. Scene_a_0020000 extraction result of the foreground region.

Figure 7. Visual effects of fall video sequences.

Figure 8. Visual effect of self-built data.

Table 1. Performance evaluation metrics of scene_b_0030003 video sequence.

Method	PR	RE	FPR	F-Measure
VIBE method [25]	0.568	0.927	0.194	0.705
Frame difference method [14]	0.602	0.794	0.173	0.685
GMM method [41]	0.654	0.693	0.158	0.672
PBAS method [34]	0.706	0.931	0.137	0.803
incPCP method [35]	0.539	0.935	0.241	0.684
CSTI method	0.781	0.946	0.117	0.856

Table 2. Performance evaluation metrics of scene_a_0020000 video sequence.

Method	PR	RE	FPR	F-Measure
VIBE method [25]	0.654	0.923	0.183	0.767
Frame difference method [14]	0.693	0.782	0.159	0.714
GMM method [41]	0.739	0.753	0.142	0.730
PBAS method [34]	0.751	0.912	0.154	0.824
incPCP method [35]	0.679	0.931	0.217	0.785
CSTI method	0.804	0.958	0.108	0.874

Table 3. Performance evaluation metrics of fall video sequences.

Method	PR	RE	FPR	F-Measure
VIBE method [25]	0.584	0.916	0.237	0.713
Frame difference method [14]	0.633	0.845	0.195	0.724
GMM method [41]	0.541	0.904	0.261	0.670
PBAS method [34]	0.722	0.931	0.184	0.813
incPCP method [35]	0.559	0.927	0.251	0.697
CSTI method	0.769	0.952	0.124	0.851

Table 4. Self-built dataset video sequence performance evaluation metrics.

Method	PR	RE	FPR	F-Measure
VIBE method [25]	0.755	0.921	0.161	0.830
Frame difference method [14]	0.524	0.878	0.241	0.656
GMM method [41]	0.709	0.864	0.193	0.779
PBAS method [34]	0.772	0.934	0.143	0.845
incPCP method [35]	0.714	0.939	0.182	0.811
CSTI method	0.847	0.953	0.075	0.897

Table 5. The running time of each algorithm in the paper.

Method	VIBE Method	Frame Difference Method	GMM Method	PBAS Method	incPCP Method	CSTI Method
Algorithm running time	54 ms	1 ms	18 ms	863 ms	2.1 s	62 ms

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Yu, L.; Li, S.; Wang, G.; Jiang, X.; Li, W. The Extraction of Foreground Regions of the Moving Objects Based on Spatio-Temporal Information under a Static Camera. Electronics 2023, 12, 3346. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics12153346

AMA Style

Zhang Y, Yu L, Li S, Wang G, Jiang X, Li W. The Extraction of Foreground Regions of the Moving Objects Based on Spatio-Temporal Information under a Static Camera. Electronics. 2023; 12(15):3346. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics12153346

Chicago/Turabian Style

Zhang, Yugui, Lina Yu, Shuang Li, Gang Wang, Xin Jiang, and Wenfa Li. 2023. "The Extraction of Foreground Regions of the Moving Objects Based on Spatio-Temporal Information under a Static Camera" Electronics 12, no. 15: 3346. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics12153346

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Extraction of Foreground Regions of the Moving Objects Based on Spatio-Temporal Information under a Static Camera

Abstract

1. Introduction

2. Related Work

2.1. Frame Difference Method

2.2. Background Modeling Method

3. The Proposed Method

4. Experimental Evaluation

4.1. Evaluation Metrics

4.2. DML Dataset

4.3. CDnet 2014 Dataset

4.4. The Collected Data

4.5. Algorithm Runtime Statistics

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI