Quartz crucibles are widely used in the preparation of solar cells and integrated circuits as a critical material for single-crystal silicon production, and they directly impact the quality of the prepared products [
1]. The existing technology produces quartz crucibles with a two-layer structure with a transparent inner layer and an opaque outer layer, the transparency of which is caused by the number and size of bubbles. The outer wall contains many dense bubbles, which give a flocculent opaque appearance, increase thermal insulation, and provide a uniformly radiating heat source. In contrast, the inner wall contains sparse, tiny bubbles, which rise in size during the 50-h high-temperature exposure to 1400 °C and can easily rupture, allowing the gas and quartz impurities in the bubbles to penetrate the silicon solution and destroy the crystal structure [
2]. Therefore, before using the crucible, it is crucial to check the size and quantity of bubbles in the transparent layer of the inner wall.
In industry, bubble measurement techniques are widely used. In Ref. [
3], a new method is used for measuring the bubble size distributions of 2D highly clustered bubbles using image processing technique. The diameters and size distribution of bubbles can be statistically calculated after binarization, edge extraction, and hole filling of the captured image. In Ref. [
4], a machine vision method based on edge pixel-based edge detection and target region locking based on a calibrated connected domain was proposed to detect bubbles in crystals, instead of the human eye. The detection method significantly improved the detection rate and accuracy of identifying bubbles in sapphire and determining their location. However, the traditional algorithm is limited by the lighting environment at the time of imaging and the existence of small bubble targets with less feature information, lower resolution, incomplete boundary contours, and other factors, which have insufficient generalization ability and target miss detection, and cannot meet industrial detection needs [
5]. Current deep learning-based target detection algorithms are mainly divided into two-stage and single-stage target detection algorithms, in which the single-stage target detection algorithm has a simple structure and higher computational efficiency [
6]. Compared with traditional image processing algorithms, deep learning-based target detection algorithms use the powerful feature extraction ability of convolutional neural networks in a large number of data samples to obtain target information-rich feature maps, effectively solving the difficulties of traditional algorithms. The models are highly modular, and they can be applied to visual measurement and defect detection tasks in industry [
7,
8], medicine [
9], and other fields by improving different structures [
10,
11]. Specifically, in Ref. [
12], the authors realized the detection of bubble defects in tire crown speckle interference based on the Faster R-CNN network framework and redesigned the feature pyramid structure to improve the small target detection precision. Still, many inference calculations reduced the detection speed and did not involve spatial bubble tracking. In order to improve the accuracy of short-term vehicle tracking in the process of autonomous driving, in the work of [
13], a method combining YOLOv3 and Kalman filtering was proposed to realize real-time warnings for those objects that were completely blocked. It was more suitable for autonomous driving applications. To address the problem of the slowness of the existing statistics based on mosaic images, in Ref. [
14], a method based on the YOLOv3 model and the SORT algorithm was used to perform the statistics of spruce number in a UAV-captured aerial video of complete spruce plots. The method could quickly and accurately calculate the number of spruce in a complete plot. However, the detection algorithms used YOLOv3, which limited the performance of the tracking algorithm.
The detector’s quality significantly impacts the tracking performance [
15]. In this paper, to improve the accuracy of the final quantity count, firstly, the network structure is improved based on the current YOLOv5 with better all-around performance, using the dilated convolution to compensate for the missing deep semantic features and enhancing the critical channel feature weights by using an efficient channel attention network. After that, the crucible bubble dataset constructed is trained and validated to improve the accuracy and speed of the network for small bubble detection. Lastly, the Kalman filter and Hungarian algorithm are used to correlate the upper and lower frame data to count the number of bubbles in the crucible’s transparent layer space and provide data references and technical support for quartz crucible quality inspection.