Video-Based Two-Stage Network for Optical Glass Sub-Millimeter Defect Detection

Zhou, Han; Yang, Xiaoling; Wang, Zhongqi; Zhang, Jie; Du, Yinchao; Chen, Jiangpeng; Zheng, Xuezhe

doi:10.3390/ai3030033

Open AccessArticle

Video-Based Two-Stage Network for Optical Glass Sub-Millimeter Defect Detection

¹

Intelligent Computing Technology, CAS, Suzhou 215000, China

²

Beijing Institute of Technology, Beijing 100081, China

³

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

⁴

InnoLight Technology (Suzhou) Ltd., Suzhou 215000, China

^*

Author to whom correspondence should be addressed.

AI 2022, 3(3), 571-581; https://0-doi-org.brum.beds.ac.uk/10.3390/ai3030033

Submission received: 6 May 2022 / Revised: 8 June 2022 / Accepted: 9 June 2022 / Published: 22 June 2022

Download

Browse Figures

Versions Notes

Abstract

:

Since tiny optical glass is the key component in various optical instruments, more and more researchers have paid attention to automatic defect detection on tiny optical glass in recent years. It remains a challenging problem, as the defects are extremely small. In this paper, we propose a video-based two-stage defect detection network to improve detection accuracy for small defects. Specifically, the detection process is carried out in a coarse-to-fine manner to improve the detection precision. First, the optical glass area is located on the down-sampled version of the input image, and then defects are detected only within the optical glass area with a higher resolution version, which can significantly reduce the false alarming rate. Since the defects may exist on any place of the optical glass, we fuse the results of multiple video frames captured from various perspectives to promote recall rates of the defects. Additionally, we propose an image quality evaluation module based on a clustering algorithm to select video frames with high quality for improving both detection recall and precision. We contribute a new dataset called OGD-DET for tiny-scale optical glass surface defect detection experiments. The datasets consist of 3415 images from 40 videos, and the size of the defect area ranges from 0.1 mm to 0.53 mm, 2 to 7 pixels on images with a resolution of 1536 × 1024 pixels. Extensive experiments show that the proposed method outperforms the state-of-the-art methods in terms of both accuracy and computation cost.

Keywords:

object detection; defect detection; optical glass; two-stage; video-based

1. Introduction

Optical glass is an important and tiny signal transducer in optical devices. Most defects on optical glass are in the sub-millimeter level, so quality inspection of optical glass usually depends on high-resolution microscope images. To the best of our knowledge, many advanced object detection methods are applied on images with relatively low-resolution, such as VOC2007/2012 [1,2] (about 500 × 400) and MS COCO [3] (about 600 × 400), which is a good trade-off between the detection accuracy and computational cost. However, for high-resolution images in industrial quality inspection, the detector requires a lot of processing time, which is also a great challenge to the memory of the GPU. Moreover, defects may exist on any surface of the optical glass. Accurate results are difficult to obtain with just one image taken from one perspective. In this case, a multi-perspective image detection result fusion method is needed. Figure 1 shows images of normal and defective optical glass under different perspectives. Specifically, (a) is the front view and side view of the defective optical glass, and (b) is the qualified sample at the above perspectives.

In this paper, we propose a video-based two-stage optical glass defect detection network to solve the problems above. Since the high-resolution images need abundant computational cost and the down-sampled low-resolution images may make small defect areas invisible, the optical glass defect detection process is carried out in a coarse-to-fine manner. First, the optical glass area is located on the down-sampled version of the image, and then the defect is detected within that optical glass area with the higher resolution version. Such a detection framework can well control the computational cost and significantly reduce false alarms coming from the background. Since the defects may appear in various positions of the optical glass, we resort to taking videos to capture multi-images of the surface of the glass and carefully design a video-based defect detection framework to further improve the detection recall. Moreover, considering that low-quality video frames are useless or even harmful for defect detection, we propose an image quality evaluation method based on clustering to pick up high-quality video frames. Extensive experiments demonstrate the superiority of our method in terms of both quantitative and qualitative evaluation.

The main contributions of this paper are as follows:

A coarse-to-fine two-stage detection network is proposed to detect tiny defects on high-resolution images of the optical glass.
A video-based detection framework is suggested to support multi-perspectives defect detection.
A video frame image quality evaluation method based on clustering algorithm is proposed to pick useful images for detection.
We contribute a new dataset named “OGD-DET” that includes 3415 images collected in real industry situations.

2. Related Works

In recent decades, there has been considerable academic research on defect detection. One research direction is based on traditional image processing techniques, such as statistical analysis [4,5], spectrum-based approaches [6,7]. Another research field is deep learning based object detection method.

The process of traditional methods usually includes image pre-processing, hand-crafted feature extraction, classification, and post-processing [8,9,10,11]. In order to improve defect detection accuracy, template matching methods attempt to compare the pre-setting template image and the detected image at pixel-level, which has become popular in the field of industrial visual defect detection in terms of its high efficiency and stability [12]. All these traditional methods can achieve promising results. However, these methods are strongly dependent on the experience of experts and the high standard of image collection setting. Otherwise, the detection performance will degenerate severely.

The development of deep learning has led to a series of breakthroughs in object detection. Girshick et al. [13] propose a two-stage detector R-CNN, which firstly uses a selective search method to extract about 2000 proposal regions, and then feeds them to the convolution neural network (CNN) for object detection. R-CNN improves mean average precision by more than 30 % compared to the previous best results at that time on VOC2010 [14]. He et al. [15] use spatial pyramid pooling on the feature map after CNN layers on the whole image in SPP-net, raising computational efficiency and allowing processing of images of any size and any aspect ratio. Based on R-CNN and SPP-net, Fast R-CNN [16] and Faster R-CNN [17] are consecutively proposed to significantly reduce the computation cost, which utilizes ROI Pooling and Region Proposal Network for achieving an end-to-end object detection structure and can reach 0.5 fps and 7 fps on a K40 GPU. Different from detectors of R-CNN and its variants, YOLO [18,19,20,21,22] designs a one-stage detector, which is extremely fast but usually underperforms R-CNN based methods. Moreover, SSD [23], as another interesting work, is a good trade-off on detection accuracy and efficiency, which is also a one-stage detector similar to YOLO and design multiple anchors like Faster R-CNN.

Most of the existing deep detectors usually take a low-resolution version image after downsampling as input for computational efficiency. However, the tiny defects may be ignored in this low-resolution version image. One strategy for tackling this problem is to divide the whole image into sub-images and perform detections on each sub-image. Adam [24] first proposes the conventional method of object detection on large-size images. Specifically, Adam performs sliding window cropping of super-large-resolution images into multiple sub-images and then conducts object detection in each sub-image. The detection results of large-size images are finally achieved by splicing all detection results of sub-images followed by NMS filtering. Another trend to deal with detection on high-resolution images is designing the cascade detection framework. In cascade detection, Szegedy et al. [25] starts to use multi-stage architecture by different Intersection over Union (IoU) thresholds to solve mismatches proposals from training and inferencing in previous R-CNN and its variants, thus cascade architecture further improves detection accuracy. Gao et al. [26] use a coarse-to-fine approach by designing the Markov model for dynamically selecting regions to improve the detection speed on large images. Inspired by these works, we propose a video-based coarse-to-fine defect detection network to detect tiny defects on high-resolution images of optical glass, hoping that it can meet the requirements of effectiveness and efficiency at the same time.

3. Methods

In this section, we introduce our methods in detail. Firstly, we illustrate the overall architecture of our two-stage coarse-to-fine detection network. Then we introduce the implementation of a video-based detection framework by video frames fusing module. Finally, an image quality evaluation (IQE) method is introduced to pick high-quality images for better detection.

3.1. Coarse-to-Fine Defect Detection Network

Tiny defects on high-resolution images are hard to detect by existing deep detectors in real time since high-resolution images need high computational cost and the down-sampled low-resolution images may make small defect areas invisible. For solving this problem, we propose a two-stage coarse-to-fine detection network for tiny defects detection on high-resolution images. As shown in Figure 2, the coarse detection stage focuses on locating the glass area on the down-sampled images. The detected glass area will be restored to its original size of high-resolution and fed to the fine detection stage for defect detection. In addition, the detection networks are slightly different in two stages. Network details are described below.

3.1.1. The Coarse Detection Stage

Considering the requirement of real-time in industrial visual inspection, we design a detection network based on YOLOv4 [21] for both two stages. YOLOv4 is a typical model that well balances speed and accuracy, which consists of three sub-nets, the backbone network, neck network, and head network. During the coarse detection stage, glass regions are firstly located on the input image of the down-sampled version. To further improve the model efficiency to achieve real-time detection, a more lightweight backbone is chosen, which has shallow neck and head networks. As shown in Figure 3a, the Darknet-tiny network [20] is selected as the backbone network rather than the default version of CSPDarkNet53 network in YOLOv4. Specifically, the proposed network has only 13 × 13 and 26 × 26 two-scale feature maps for the neck and head networks, less than three scale feature maps which are 13 × 13, 26 × 26, and 52 × 52 in the original YOLOv4, respectively. Finally, the computation cost of Darknet-tiny backbone is only 5.56 billion FLOPs, less than the 9 billion FLOPs for the CSPDarkNet53 backbone of YOLOv4.

3.1.2. The Fine Detection Stage

In the fine detection stage, the detected glass area will firstly be restored to its original size which is a high-resolution version, and then sent to detect defects. Since the scale of the defect area on glass is still quite tiny and the smallest has only 2 × 7 pixels on images, seeking abundant feature information on the input RGB images is essential. More importantly, different color channels of RGB images have distinctive features. In consideration of these, we propose a Color Channel Separation (CCS) convolution to improve the model’s feature extraction capabilities in the fine detection stage. As illustrated in Figure 3b, the CCS convolution works on the backbone network, replacing the traditional convolution. Specifically, the proposed CCS convolution arranges a CBL and a MCBL4 operation after every color channel to help learn more semantic information. The CBL operation refers to a convolution layer, a batch normalization layer, and a leaky-relu layer, as shown in Figure 3c. And the MCBL(n) represents a Max-pooling layer, followed by a CBL operation, where (n) means such structure repeats for n times. With the CCS convolution, more abundant semantic information of the input image is leveraged, which would be further delivered to the later layers, leading to better detection results.

Compared to the single-stage detection method, the proposed coarse-to-fine two-stage framework not only effectively reduces false positives from the background but also meets the requirements of real-time processing.

3.2. Video-Based Detection Framework

Since the defects may exist in any place of the optical glass, we design a video-based detection framework by fusing the results of multiple frames captured from various perspectives. The video frames fusing module provides a video-level prediction as the final output.

Firstly, the videos are captured by an industrial camera under a controllable illumination environment. Specifically, the manipulator picks up the optical glass on the operating table and rotates it under the camera to collect multi-perspective video. The video-based detection framework takes the multiple frames as input and a video frame fusing module is proposed to fuse the results of multi-perspective video frames to give the final prediction result.

To specify fusion methods in detail, our proposed video frames fusing module is presented in Figure 4. Assuming that the video has N images, and M images are detected to be defective in them. Then the probability of the video being defective can be determined by the proportion of defective frames, in other words, the sample will be considered to have defects when

M / N > T

where T is the threshold of the ratio between abnormal images and all images. However, some defects are visible in a few frames and they will be ignored under the aforesaid rule. Therefore, the confidence score is introduced to further improve the fusing strategy. Here, i is the index of the sampled frame,

C_{i}

refers to the confidence score of detected defects in the

i_{t h}

frame. The confidence threshold is set as

C_{t}

. When

C_{i} > C_{t}

, this sample will also be considered to have defects. In our experiment, the confidence threshold

C_{t}

is 0.8, and the ratio threshold

T = 0.1

.

3.3. Image Quality Evaluation Module

Images captured by industrial microscope cameras may be blurry during the glass rotation process because of motion blur and out-of-focus. Blurry images are not appropriate for defect detection, which may hurt the final detection accuracy. Thus, we propose a clustering-based image quality evaluation module (IQE) to pick out images with high quality for the final prediction.

As illustrated in Figure 5, the proposed clustering-based IQE framework has two main steps: the training phase (left) and the testing phase (right). In the training phase, we first extract HOG features from all images which include both clear images and blurry images to realize self-classification by K-means clustering algorithm. These images would be classified into multiple classes with different image clarity. We manually choose one or more classes in which images are of high quality. Finally, all classes’ clustering centers will be recorded and will later be utilized for image clarity testing. In the testing phase, the HOG features are firstly extracted for the input images. Then, the mean square error (MSE) in terms of HOG features is computed between the input images and all clustering centers computed in the training stage. If the smallest MSE value is calculated from the chosen classes with high image quality, the image will be considered as high-quality samples, which are further utilized for defect detection. The image is deemed to not meet the image quality requirements and will be discarded.

4. Experiments

In this section, we conduct extensive experiments to verify the effectiveness of our approach. First of all, we introduce datasets and experimental settings. Then we present ablation studies and finally compare our method with the state-of-the-art methods.

4.1. Datasets

To evaluate our method, we propose an optical glass dataset called OGD-DET, which is captured from a real industrial production environment. All the optical glass videos are taken by the Omega industrial camera, and the resolution is 1536 × 1024 pixels. It has 40 videos, where 25 videos are used for training and 15 videos are used for testing. The OGD-DET dataset is available on GitHub (website: https://pntehan.github.io/OGD-DET/, accessed on 3 December 2021). To evaluate our methods at the image level, all videos are framed into images with 8 images per second. Finally, we obtain 3415 images in total, and 511 images are utilized for testing. In our experiments, all images are resized to 416 × 416 pixels. The proposed IQE module works to select glass images with high quality from the first detection stage for the final prediction. In our K-means clustering process, the initial clustering center number is set to 20, and we choose 9 classes with a high image quality from them. The number of K-means clustering iterations is 100.

4.2. Performance Evaluation Metrics

We adopt recall, precision, average precision (AP), and frames-per-second (FPS) for performance evaluations in our experiments. AP represents the average detected precision for each defect category. APx represents the AP at IoU = x. For example, [email protected] represents the AP at intersection over union (IoU) = 0.25. Recall and precision are determined when IoU is 0.25 and confidence score is 0.5. FPS refers to how many images are processed per second, indicating the model inference speed.

4.3. Ablation Studies

To verify the effectiveness of each module in our approach, we conduct experiments on ablation studies. To verify the role of the coarse-to-fine detection framework, the comparisons between one-stage defects detection network and two-stage defects detection network are conducted. The one-stage defects detection network based on YOLOv4 is carried out to detect defects directly with the down-sampled input images. As shown in Table 1, the two-stage defects detection network has a large improvement up to 13.60% on Recall and 9.80% improvement in Precision compared to the one-stage detection network under IoU = 0.25. The results demonstrate that the two-stage detection network is effective to detect tiny defects and reduce the false positives from the background as well.

Our Color Channel Separation (CCS) convolution achieves 100.00% and 90.50% in terms of Recall and Precision under IoU = 0.25, respectively, which is higher than a conventional convolution that gets 80.10% in terms of Recall and 97.40% in terms of Precision under IoU = 0.25. It demonstrates that the CCS module can help learn more detailed information from each color channel, which is beneficial to detect tiny defects. Moreover, our IQE module further improves the performance to 100.00% on Recall and 99.48% on Precision under IoU = 0.25 by selecting images with high quality for the final prediction.

4.4. Comparison with the State-of-the-Art Methods

We compare our methods with Faster RCNN, SSD, and Yolov5 [22] methods for defect detection. All the methods are trained on the same training dataset. Moreover, all the methods are evaluated under both image-level and video-level test settings.

The experiment results under the image-level setting are shown in Table 2. Our method significantly outperforms Faster RCNN and SSD methods in terms of AP. Benefiting from the coarse-to-fine two-stage detection framework, our method achieves much better detection results, especially on [email protected]. It demonstrates that the proposed two-stage detection framework can locate the defect area more precisely. Moreover, our method can perform in real-time to achieve 21 FPS on 3080ti GPU Server, which is much faster than Faster RCNN, and comparable to SSD512 and Yolov5. Although our method is slower than SSD300, it significantly outperforms SSD300 in terms of detection accuracy with a large improvement. The FLOPs and the number of Parameters of each method are illustrated in Table 3.

Table 4 illustrates the results under video-level setting. Fifteen videos are tested, 5 normal samples and 10 defect samples included. Our method achieves perfect results of 100% on AP and 100% on Recall, much better than Faster RCNN and SSD300. The performance gain comes from the video-based detection framework which fuses the results of multiple frames captured from various perspectives.

Visualization results of Faster RCNN, SSD, and our method are shown in Figure 6. Images under different illumination conditions are presented and the defect area has been enlarged and shown in the upper right corner of each image. The results can be concluded that for easy cases, all methods achieve good results as shown in the first two columns of Figure 6. Whereas for the hard cases shown in the last four columns, our method achieves more robust results than both Faster RCNN and SSD when the illumination is not good enough.

5. Conclusions

We propose a video-based two-stage defects detection network for sub-millimeter defect detection on optical glass. Specifically, we propose a coarse-to-fine two-stage detection framework to promote the detection performance on tiny defects which can effectively reduce the false alarming rate from backgrounds. A video-based detection framework is designed to detect defects from multi perspectives, which well tackles the problem that defects may exist on multiple surfaces of optical glass. Moreover, the CCS convolution and IQE module further improve defect detection results by improving feature representations and picking out high-quality samples for detection. In the future, we will conduct more experiments on large real-world datasets to further verify the effectiveness of our method.

Author Contributions

Conceptualization, H.Z. and J.Z.; methodology, H.Z. and X.Y.; software, H.Z. and J.C.; validation, H.Z., X.Y. and Z.W.; formal analysis, H.Z. and Z.W.; investigation, H.Z. and Z.W.; resources, J.Z., Y.D., J.C. and X.Z.; data curation, X.Y.; writing—original draft preparation, H.Z. and X.Y.; writing—review and editing, J.Z.; visualization, X.Y. and Z.W.; supervision, J.Z., Y.D., J.C. and X.Z.; project administration, J.Z. and Y.D.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by National Natural Science Foundation of China (No. 62176251).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. Available online: http://host.robots.ox.ac.uk/pascal/VOC/voc2007/htmldoc/index.html (accessed on 5 May 2007).
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. Available online: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html (accessed on 5 May 2012).
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, C.L. Microsoft Coco: Common Objects in Context. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
Goldstein, M.; Dengel, A. Histogram-Based Outlier Score (hbos): A Fast Unsupervised Anomaly Detection Algorithm. KI-2012: Poster and Demo Track. 2012, pp. 59–63. Available online: https://www.google.com.hk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjS2-6S3b34AhWuq1YBHbmFCuMQFnoECAQQAQ&url=https%3A%2F%2Fwww.dfki.de%2Ffileadmin%2Fuser_upload%2Fimport%2F6431_HBOS-poster.pdf&usg=AOvVaw0KM26WXglR4TQVsSKDpXsg (accessed on 5 May 2012).
Pittino, F.; Puggl, M.; Moldaschl, T.; Hirschl, C. Automatic Anomaly Detection on In-Production Manufacturing Machines Using Statistical Learning Methods. Sensors 2020, 20, 2344. [Google Scholar] [CrossRef] [PubMed]
Hou, X.D.; Zhang, L.Q. Saliency Detection: A Spectral Residual Approach. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007. [Google Scholar]
Bai, X.L.; Fang, Y.M.; Lin, W.S.; Wang, L.; Ju, B.-F. Saliency-Based Defect Detection in Industrial Images by Using Phase Spectrum. IEEE Trans. Ind. Inform. 2014, 10, 2135–2145. [Google Scholar] [CrossRef]
Fang, X.X.; Luo, Q.W.; Zhou, B.X.; Li, C.; Tian, L. Research progress of automated visual surface defect detection for industrial metal planar materials. Sensors 2020, 20, 5136. [Google Scholar] [CrossRef] [PubMed]
Chu, M.X.; Gong, R.F.; Gao, S.; Zhao, J. Steel surface defects recognition based on multi-type statistical features and enhanced twin support vector machine. Chemom. Intell. Lab. Syst. 2017, 171, 140–150. [Google Scholar] [CrossRef]
Kwon, B.K.; Won, J.S.; Kang, D.J. Fast defect detection for various types of surfaces using random forest with VOV features. Int. J. Precis. Eng. Manuf. 2015, 16, 965–970. [Google Scholar] [CrossRef]
Jian, C.X.; Gao, J.; Ao, Y.H. Automatic surface defect detection for mobile phone screen glass based on machine vision. Appl. Soft Comput. 2017, 52, 348–358. [Google Scholar] [CrossRef]
Zhou, X.; Wang, Y.; Xiao, C.; Zhu, Q.; Zhao, H. Automated visual inspection of glass bottle bottom with saliency detection and template matching. IEEE Trans. Instrum. Meas. 2019, 68, 4253–4267. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results. Available online: http://host.robots.ox.ac.uk/pascal/VOC/voc2010/htmldoc/index.html (accessed on 5 May 2010).
He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. Available online: https://arxiv.org/abs/1504.08083v (accessed on 7 April 2022).
Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. Available online: https://arxiv.org/abs/1804.02767v (accessed on 7 April 2022).
Adarsh, P.; Rathi, P.; Kumar, M. YOLO v3-Tiny: Object Detection and Recognition using one stage improved model. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; pp. 687–694. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H. Yolov4: Optimal Speed and Accuracy of Object Detection. Available online: https://arxiv.org/abs/2004.10934v1 (accessed on 7 April 2022).
Glenn, J. Yolov5. Available online: https://github.com/glenn-jocher/yolov5 (accessed on 7 April 2022).
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the 2016 European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; Available online: https://arxiv.org/abs/1512.02325 (accessed on 7 April 2022).
Adam, V.E. You Only Look Twice: Rapid Multi-Scale Object Detection in Satellite Imagery. Computer Vision and Pattern Recognition. Available online: https://arxiv.org/abs/1805.09512 (accessed on 7 April 2022).
Szegedy, C.; Toshev, A.; Erhan, D. Deep Neural Networks for object detection. In Proceedings of the 2016 European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
Gao, M.; Yu, R.; Li, A.; Morariu, V.I.; Davis, L.S. Dynamic zoom-in network for fast object detection in large images. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; Available online: https://arxiv.org/abs/1711.05187 (accessed on 7 April 2022).

Figure 1. Optical glass images under different perspectives, the defect area is highlighted in the red box. (a) glass with defects; (b) glass without defects.

Figure 2. Flowchart of proposed optical glass defect detection methods. First of all, multi perspectives images are extracted from the videos of optical glass. Then glass detection stage takes these images as input to find the locations of optical glass. The detected regions that cover the optical glass with good image quality will be fed into the defect detection stage. Finally, the fusing module is proposed to integrate the results from different perspectives to get the final defect detection result.

Figure 3. Flowchart of detection network. “Backbone: Single-conv” in (a) is used in the coarse detection stage, “Backbone: CCS-conv” in (b) is used in the fine detection stage. (c) shows the detailed operation in (a,b).

Figure 4. The video frames fusing module. N refers to the number of images framing from one video, M refers to the number of images where defects were detected. T is the threshold of the ratio between abnormal images and all images.

C_{i}

refers to the confidence score of detected defects in the

i_{t h}

frame. The confidence threshold is set as

C_{t}

. When

M / N > T

or

C_{i} > C_{t}

, the sample will be considered to abnormal.

Figure 4. The video frames fusing module. N refers to the number of images framing from one video, M refers to the number of images where defects were detected. T is the threshold of the ratio between abnormal images and all images.

C_{i}

refers to the confidence score of detected defects in the

i_{t h}

frame. The confidence threshold is set as

C_{t}

. When

M / N > T

or

C_{i} > C_{t}

, the sample will be considered to abnormal.

Figure 5. IQE module. In the training part, images are extracted HOG feature and realize self-classification by K-means clustering. Clustering centers with high image quality are picked out. In the testing part, images are told whether they meet the quality requirements by calculating the MSE distance between it and all the clustering centers.

Figure 6. Experiment results under different illumination conditions. The defect area is enlarged and shown in the upper right corner of each image. The first row shows the result of Faster RCNN; the second row shows the result of SSD300 [23]; the third row shows the result of our proposed methods; and the last row shows the ground truth of the testing images.

Table 1. Ablation experiment results. ‘Yolov4’ refers to Yolov4 [11] network and works as baseline; ‘Two-stage’ refers to the two-stage detection structure; ‘CCS’ and ‘IQE ’refers to CCS and IQE modules, respectively.

Yolov4	Two-Stage	CCS	IQE	[email protected]	Recall	Precision	FPS
√				0.6230	0.8380	0.7030	18
√	√			0.8820	0.9740	0.8010	22
√	√	√		0.9050	1	0.9050	21
√	√	√	√	0.9948	1	0.9948	21

Table 2. Comparison with SOTA under image-level setting.

Model	[email protected]	[email protected]	[email protected]	FPS
Faster RCNN	0.7821	0.6024	0.4740	10
SSD300	0.6030	0.0280	0.0	46
SSD512	0.8760	0.5246	0.2551	22
Yolov5	0.9730	0.7720	0.5270	25
Our method	0.9948	0.8616	0.8166	21

Table 3. Comparison with SOTA on computational cost.

Model	FLOPs (G)	Params (M)
Faster RCNN	112.1	48.1
SSD300	62.8	26.3
SSD512	180.6	27.2
Yolov5	109.1	46.5
Our method	104.2	38.4

Table 4. Comparison with SOTA under video-level setting.

Model	Recall	Precision
Faster RCNN	1	0.93
SSD300	0.8	0.73
SSD512	0.9	1
Yolov5	1	0.93
Our method	1	1

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, H.; Yang, X.; Wang, Z.; Zhang, J.; Du, Y.; Chen, J.; Zheng, X. Video-Based Two-Stage Network for Optical Glass Sub-Millimeter Defect Detection. AI 2022, 3, 571-581. https://0-doi-org.brum.beds.ac.uk/10.3390/ai3030033

AMA Style

Zhou H, Yang X, Wang Z, Zhang J, Du Y, Chen J, Zheng X. Video-Based Two-Stage Network for Optical Glass Sub-Millimeter Defect Detection. AI. 2022; 3(3):571-581. https://0-doi-org.brum.beds.ac.uk/10.3390/ai3030033

Chicago/Turabian Style

Zhou, Han, Xiaoling Yang, Zhongqi Wang, Jie Zhang, Yinchao Du, Jiangpeng Chen, and Xuezhe Zheng. 2022. "Video-Based Two-Stage Network for Optical Glass Sub-Millimeter Defect Detection" AI 3, no. 3: 571-581. https://0-doi-org.brum.beds.ac.uk/10.3390/ai3030033

Article Menu

Video-Based Two-Stage Network for Optical Glass Sub-Millimeter Defect Detection

Abstract

1. Introduction

2. Related Works

3. Methods

3.1. Coarse-to-Fine Defect Detection Network

3.1.1. The Coarse Detection Stage

3.1.2. The Fine Detection Stage

3.2. Video-Based Detection Framework

3.3. Image Quality Evaluation Module

4. Experiments

4.1. Datasets

4.2. Performance Evaluation Metrics

4.3. Ablation Studies

4.4. Comparison with the State-of-the-Art Methods

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI