1. Introduction
Industry 4.0 drives a new revolution in manufacturing; thus, many factories are looking for new solutions to ensure their productivity and competitiveness. Unfortunately, it is very difficult to introduce new machines to meet the immediate requirements under the consideration of capital and machine compatibility of the current situation in small and medium-sized factories (SMSFs). Replacing current operators with automated technology to monitor machine status is a great method to make quality products and maximize outputs in SMSF. Capturing real-time data from sensors can assist SMSF in making decisions, providing insights and specific actions. Sensors and the corresponding system can further take over operators and automate tasks that previous industrial revolutions could not handle.
The Internet of Things (IoT) is a new technology under the rapid development of the internet in recent years [
1]. In IoT, objects, things, and cloud administrations are connected by means of networks and work together to produce high efficiency [
2]. Each smart object in IoT is well organized, properly managed, and safely controlled for many intelligent applications in our daily lives [
3]. Since IoT has been successfully applied in health care [
4], agriculture [
5], smart cities [
6,
7], and other applications [
8,
9], the development of IoT technology can also be used to help SMSF move toward Industry 4.0. Today, IoT refers more specifically to interconnected devices that combine sensors, software, and advanced technologies to transmit and receive data from the target. Edge computing is a distributed computing that integrates intelligence into edge devices in IoT. They are also often referred to as edge nodes because they can process and analyze data in real-time near the source of data collection. In edge computing, data do not need to be uploaded directly to the cloud or to a centralized data processing system [
10].
M. Iliyas et al. [
11] provided an article to introduce the current manufacturing industries and computer numerical control tool for machine production in factories. They also provided a brief review of machine monitoring and the summary of investments for machine monitoring in different countries. It showed the importance of the research needs for machine monitoring. Quasi-arithmetic means were proposed by B. Hou et al. for machine monitoring. They investigated several measuring tools based on Quasi-arithmetic means to measure the machine conditions [
12]. However, the data of machine condition should be collected by using the standard interface of the machine. For some machines of different ages, it is almost impossible to build a standard interface under cost considerations.
Various artificial intelligence algorithms are designed to imitate human thinking and simulate recognition systems. By using various mathematical models and sufficient training data, it presents its ability to learn, recall, and inductively perform deduction. In addition, it has good results in dealing with image recognition, classification, normalization, optimization, and other problems. Currently, computer vision technology is very useful in recognizing objects with camera sensors in IoT [
13,
14,
15]. Convolutional Neural Network (CNN) is a feedforward neural network for which its artificial neurons can respond to units within a partial coverage area and produce better results in computer vision and image processing. The network architecture of CNN usually includes multiple layers such as the input layer, convolution layer, and pooling layer. Among them, multiple convolutional layers or pooling layers can be repeated to achieve the purpose of deep learning. Finally, the training of the fully connected layer is added to achieve the prediction result of the output layer. If a large amount of training data can be matched, there will usually be a better identification result. Deep learning makes computer vision more robust for solving real problems such that IoT will also be more intelligent [
16,
17]. However, high accuracy in computer vision requires powerful computing power, which is not an easy task that edge nodes can afford. There are many types of machine control panels, and displayed data are mixed and diversified, which often causes difficulties in computer vision technology. H. Yun et al. [
18] developed a CNN model from the spectrograms of sound to establish machine monitoring technology. Two stethoscope sensors were used to capture the sound information about the environment of machines. The authors also applied Short-Time Fourier Transform to analysis spectrograms of sound to solve the problem of factory noise.
Currently, deep learning models are usually deployed on cloud servers under the consideration of computing power. However, in this manner, the collected data need to be transmitted to the cloud to perform event detection tasks. It often leads to delays in the cloud computing model, communication costs, and privacy issues, and the task cannot be achieved in time. Machine-to-machine service platform had been proposed to enable the communication of end devices. Through the cooperation of end nodes, this mechanism can effectively reduce the network traffic and speed up overall performances [
19]. Edge computing attempts to deploy the decision-making model to the terminal device. Due to the shortage of the end device in computing power and storage, only simple machine learning can be performed, and high accuracy may not be obtained in the end.
In preliminary research, it is observed that the supervisors of the production machine control department generally believe that integrating digital platforms through the IoT system can help supervisors and operators to keep an eye on the production process, detect the occurrence of abnormal events, improve the efficiency of production machines, and optimize the production process. Most of these factories have more than ten kinds of production machines. The time of purchasing the machine equipment may vary a long period, and the brands and functions are also different. Therefore, there are considerable differences in machine control methods or status display panels. In order to complete the digital upgrade of different equipment, additional sensors and IoT technology are normally used to collect real-time production status data of machine equipment. The integrated system can analyze the data, predict the moment when the machine may fail, or detect problems in real time. This also includes preventing abnormal events such as machine equipment crash or even to the blockage of production materials, increase unnecessary costs, and affect the normal production capacity of the factory. Most existing approaches about machine monitoring focused on building a total solution of IoT to fit the needs of Industry 4.0. It would not be acceptable under cost considerations. To the best of our knowledge, many control panels of the machines for small- and medium-sized factories are standalone and do not have communication interfaces for transmitting data directly. It might violate the warranty or cause damage to the production machine if we install new communication interfaces on the control panel in a brute force manner. Therefore, we proposed vision-based IoT technology based on federated learning to solve the immediate machine monitoring problem.
This paper focused on providing a general solution based on edge computing and cloud computing in IoT for machine monitoring in the manufacture of small- and medium-sized factory. For real-time consideration, edge computing and cloud computing models were seamlessly cooperated to perform information capture, event detection, and adaptive learning. The proposed IoT system processed regional low-level feature for detection and recognition in edge nodes. The cloud computing including fog computing was responsible for mid- and high-level features by federated learning network. The system fully utilized all resources in the integrated deep learning network to achieve high performance operations. The edge node was implemented by a simple camera embedded on a Terasic DE2-115 board to monitor machines and process data locally. The learning-based features were generated by cloud computing through the data sent from edge, and the identification results could be obtained by combining mid- and high-level features with the nonlinear classifier. Therefore, each factory could monitor the real-time condition of machines without operators and retain data privacy. Experimental results showed the efficiency of the proposed method when compared with other methods.
The background of the current developments in related technologies are addressed in
Section 2.
Section 3 provides a description of the designed system.
Section 4 shows the experimental results. Some conclusions are described in
Section 5.
3. System Design for Edge Computing in IoT
3.1. The Architecture of the Proposed System
The Industry 4.0 trend is bringing many new applications that require low latency and network independence. This can only be provided through edge processing in IoT. These cases include computer vision for machine monitoring. It can be further accelerated and improved by incorporating machine learning inference on embedded systems. In this paper, we propose a three-layer IoT system to perform computer vision for machine monitoring, which includes edge layer, fog layer, and the cloud layer. The architecture of the proposed system is shown in
Figure 4.
The first layer captures images for data generation and training/testing for low-level features. The low-level features in the image include color, texture, line, shape, etc. Although these multiple features can obtain the most direct panel information, the images obtained in the factory environment are highly variable, and the degree of variation will affect subsequent identifications. The large amount of generated data creates a huge burden on the cloud server, e.g., causing latency problems. Therefore, we added the fog node in the IoT as a middle layer placed between the cloud and the edge devices to assist the convergence of training and accelerate the transmission process. The fog node can be set in the factory’s own server. The method divides computing tasks and pushes them down to edge devices. It intelligently decomposes the computation of edge and fog nodes, which not only conforms to the distributed setting but also protects the user’s data privacy. Therefore, the mid-level features in the second layer are logical features in the image that can generate specific types of panel information objects through the identification results of low-level features. These objects can be combined into multiple logical features, including knobs, pointers, bar charts or pie charts, and other objects. The relationship between scale and length can be used for further logical reasoning feature identification. The strategy in the architecture is to reduce the training cost by applying local learning on edge devices and perform federated learning on fog nodes. This method successfully reduces the data samples, the communication costs of training models, and protects the user’s privacy. The final layer of the system is on the cloud with highest computing power, and it is responsible for high-level features in images. It generates semantic conceptual description for the behavior of objects, including digital recognition, text message recognition, or real-time production status flow charts, etc. They can be used to detect abnormal events and integrate multiple information and provide operators and manager the basis for decision making.
Based on the architecture of the proposed system,
Figure 5 shows the IoT system implemented in a plastic factory. The end device performs real time monitoring by scene change detection and extracts low-level features. Fog node gathers low-level features from end devices to form mid-level features and transmits to the cloud. The fog node was set in the factory in the proposed experiment. Finally, decision making is performed by high-level features in the cloud server with its high computing power. Only the abstract data are stored in the cloud database for further analysis, which can avoid the revelation of the factory’s confidential data. Each component in
Figure 5 will be described in the following sections.
3.2. Scene Change Detection in End Devices
In traditional vision technology, feature extraction and matching methods are used for object recognition. By using deep learning computing technology, the accuracy of image analysis is greatly improved. Relatively, large storage space and computing time are required in the end device such that it is almost impossible for real applications. Sending raw data back to the cloud server for processing can provide powerful computing power but introduce privacy issues. The various components in the IoT work separately and cooperatively to solve real problems. The end device in the proposed IoT was designed by using the common products on the market. The Terasic DE2-115 platform is equipped with an array camera as the end device of the IoT. Each camera is responsible for capturing images of a part of the PLC screen.
Figure 6 shows the array camera of the end device and its corresponding captured image region of the PLC screen.
where
m and
n represent the width and height of the image, respectively, and
I and
K are the corresponding images. MSE calculates the average of difference between two corresponding pixels. The calculated value indicates the similarity between two images, the smaller the value, the more similar the two images are. It is simple and fast, but has a serious problem that a large difference in pixel values does not necessarily mean a large difference in image content. For example, the images of exactly the same scene under different lighting conditions may result in large MSE values instead of 0. Therefore, the region of interest (ROI) for calculating MSE is a small sliding window with a size of sxs, and threshold T1 is used as the first test. If MSE is smaller than T1, the end device does nothing and keeps watching. Otherwise, it follows the second test by Equation (2), the Structural Similarity Index Measurement (SSIM) [
28]. SSIM is an indicator that can be used to measure the similarity between two digital images. There are strong correlations between adjacent pixels in natural images, and such correlations reveal the structural information of objects in the scene. It is more suitable to judge the image quality and detect scene changes such asthe human eye in the monitoring process”
where
x and
y are two testing images;
μx and
μy are their means; σ
x and σ
y are their variances; σ
xy is the covariance.
c1 and
c2 are constants.
SSIM ranges from 0 to 1. The larger the value of the
SSIM, the higher the similarity between the two images. When the two images being measured are exactly the same, the value of the SSIM is 1. The limitation of SSIM is that it cannot work effectively in the case of image displacement, scaling, and rotation, which are non-structural distortions. These are not the situations we are going to study in this paper, because each end device will be fixed in right position in the factory.
3.3. Feature Extraction in End Devices
TensorFlow Lite [
29] is a set of tools that help developers run TensorFlow models on mobile, embedded, and IoT devices. It supports on-device machine learning and recognition. It also has low latency and small binary files. The first one of TensorFlow Lite’s main components is TensorFlow Lite interpreter, which runs specially optimized models on many different types of hardware, including mobile phones, embedded Linux devices, and microcontrollers. The second is TensorFlow Lite converter, which converts TensorFlow models into an efficient form for use by the interpreter and introduces optimizations to reduce binary file size and improve performance. Converters are generally used in computers with efficient computing power. The interpreter is mainly run on embedded devices.
First, the real-time image data of the PLC panel are captured, and edge computing is performed at the IoT end device. Apply TensorFlow Lite to generate low-level features of individual images, which are passed for scene change detection. The low-level features in the image include color, texture, line, shape, and so on. The resulting feature vector is sent to the intermediate nodes. The feature vector will be merged with the results of other terminal nodes into higher order features.
Figure 7 presents a diagram to explain the concept of the algorithm for feature extraction on TensorFlow Lite with logistic regression structure. The weight
W and bias
b generated the linear classifier model. The Softmax generated the probabilities and cross entropy transferred them to the encoded label. The input image size was 28 × 28 such that it would change to a vector of size 784.
3.4. Fog and Cloud Computing
The purpose of high-level deep learning is to build a cloud architecture that can train machine learning models on the premise of protecting user privacy. The specific method for training the model is that each fog node uses the same model definition and initialization parameters. Let {F1, F2, …, FN} be the set of fog nodes, where N is the number of fog nodes. {D1, D2, …, DN} is the set of the corresponding data from fog nodes. Fog Fi trains the model with local data Di, calculates the gradient, and then encrypts and uploads it to the cloud server. Cloud server integrates the gradients from each fog and updates the model. Then, the server returns the updated parameters of the new model to each fog. Fogs update their local model individually according to the updated parameters. Then, the above steps are iterated until the model reaches the convergence criteria.
Figure 8 shows the mapping of the proposed models in IoT to real production process. From the upper left corner, low-level features of machine temperature were extracted by the model in edge devices. They will be integrated into mid-level features, corresponding to machine status and have confidential considerations. The right part of
Figure 8 is the result of fog and cloud computing, which indicates if there is an abnormal event, which will happen in the near future. At the same time, some adjustment suggestions are returned to the operator and manager of the factory.
By conducting the analysis of abnormal events in this research, it is found that the state characteristics of key components, such as peak injection pressure of the machine, are obtained by comparing the time series data of the relevant parameter settings with the actual value of the production process. There is an opportunity to diagnose abnormal events of the machine equipment and even the aging index. We summarized those to be the static training data. The IoT will continue to collect dynamic data when the system is running.
Table 1 shows the relationship between PLC panel features and the corresponding events. The symbol, I/3 (R/1), denotes increasing (reducing) the parameter setting in priority 3 (1), respectively. For example, the intersection of row 1 and column 1 in
Table 1 represents the fact that the injection pressure has to increase in priority 2 when underfilling events are detected. These were stored in the local database of the factory. When the local system obtains a notification received from the cloud, the corresponding adjustment suggestion will send to the operator and manager to make a quick decision about the production process. Only features and parameters are communicated in the IoT outside of the factory such that each factory can keep its own confidential information for production.
4. Experimental Results
This section describes the experimentation of the proposed method, which was built in a plastic injection factory. The edge node was implemented by a simple camera, Pcam 5C, embedded on Terasic DE2-115 board to monitor machines and only process low-level data and part of the fog computing. The video streaming formats of Pcam 5C was set to be 1080 p at 30 frames per second to fit the real-time monitoring requirement. TensorFlow Lite was initially trained with a database including 58,688 images generated under the environment of the plastic injection factory. Normalization process resized each image to be 32 × 32 pixel in the training process.
The real-time image capturing provides input images for scene change detection.
Figure 9 illustrated the result of scene change in six continues frames
Figure 9A–F. For
Figure 9A,B, There is no change between them such that the SSIM value is almost equal to 1. However, the SSIM value between
Figure 9C,D (
Figure 9E,F) is 0.51 (0.2), respectively. It represents that scene change happens in those frames and that the TensorFlow Lite model is further applied to extract low-level features.
The monitoring process of the actual operation of plastic injection machines is captured for 18 s and presented in
Figure 10. The horizontal axis in the figure represents time, and each unit scale is 1 s. The vertical axis indicates the reciprocal of the SSIM measurement index, which varies from 0 (no change) to 1 (scene change) and each scale is 0.1. The figure contains four machines presented as lines in different colors. It is observed that most SSIMs are less than 0.2 such that it is set to be the threshold value T for determining scene change. There is a dramatic change in SSIM between the third and fourth seconds, which represents the temperature change of all machines caused by the unstable power at that time. After the product was completed, the residual temperature of the mold opening caused temperature changes and reflected that SSIM changes in the seventh, eleventh, and seventeenth seconds at the same time. There are 21 image pairs exceeding the T value in the image sequence, which contains 540 images. Fog computing integrated end devices for further verifications, and 17 data changes were excluded. Cloud computing can clarify the cause of the four signals to determine the possibility of abnormal event.
When SSIM exceeds threshold T, edge computing performed on Terasic DE2-115 board extracts image features using the image pre-analysis model and transmit it to fog computing. The status of edge node is used to note the scene change result and stores in the database. When scene change detection is true, its status is changed from 0 to 1. When the image is the same as the previous one under the test by SSIM, the edge computing environment does not change, and the state remains 0. The fog node receives features from end nodes and performs multiple feature analysis, according to the machine state information. Otherwise, it does nothing and waits for the next state change. Because duplicate images are not processed, computing performance and visual accuracy of images can be improved. In
Figure 11, the status is 1 for event 1 to event 12 and event 14 to event 18, but image 13 is 0.
In the scenario, when the digit number of temperature information on the PLC panel screen changes, SSIM will detect the scene change on the edge computing. The edge node also extracted low-level features of images in which SSIM exceeded T and transmits features to fog computations. The average accuracy of the image capturing based on the trained TensorFlow Lite model approached 99.8% under the online test in a plastic injection factory for a week. Although it did not reach a perfect result, the errors could be eliminated in the following fog and cloud computations. Combining SSIM and classification processes in the edge device, the frame rate of the system for capturing images reached 45 per second in average. The fog node integrates multiple features from end devices and extracts mid-level features with TensorFlow Lite model. In this study, fog node is a mini-cloud containing several end devices. After receiving data from end devices, fog node recognizes digit numbers and transmits the result to the cloud server. Cloud server was responsible for abnormal event detection of temperature change and adjustment suggestion about how to control the temperature to the normal range. The experimentation had been performed for 50 days on two machines in a real plastic injection factory in Taiwan to verify the performance of the proposed method. For performance analysis, the delayed time of each machine was accumulated and the starting time was set to be infinite. As time goes on, the average delayed time decreased.
Figure 12 shows the average delayed time for manufacture with/without the proposed method Experimental results showed that the IoT system with the proposed method delayed 1.1 h on average for 50 days of continuous operation. However, the conventional system without the proposed method almost delayed 2.5 h in the same experimentation.