A Graphical Model to Diagnose Product Defects with Partially Shuffled Equipment Data

Ahn, Gilseung; Hur, Sun; Shin, Dongmin; Park, You-Jin

doi:10.3390/pr7120934

Open AccessArticle

A Graphical Model to Diagnose Product Defects with Partially Shuffled Equipment Data

¹

Department of Industrial and Management Engineering, Hanyang University, Ansan 15588, Korea

²

Department of Industrial Engineering and Management, National Taipei University of Technology, Taipei 106, Taiwan

^*

Author to whom correspondence should be addressed.

Processes 2019, 7(12), 934; https://0-doi-org.brum.beds.ac.uk/10.3390/pr7120934

Submission received: 26 October 2019 / Revised: 2 December 2019 / Accepted: 5 December 2019 / Published: 8 December 2019

(This article belongs to the Special Issue Fault Detection and Process Diagnostics by Using Big Data Analytics in Industrial Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The diagnosis of product defects is an important task in manufacturing, and machine learning-based approaches have attracted interest from both the industry and academia. A high-quality dataset is necessary to develop a machine learning model, but the manufacturing industry faces several data-collection issues including partially shuffled data, which arises when a product ID is not perfectly inferred and yields an unstable machine learning model. This paper introduces latent variables to formulate a supervised learning model that addresses the problem of partially shuffled data. The experimental results show that our graphical model deals with the shuffling of product order and can detect a defective product far more effectively than a model that ignores shuffling.

Keywords:

partially shuffled time series; graphical model; equipment data analysis; defect diagnosis; multi-source data fusion

1. Introduction

The diagnosis of product defects is essential for improving productivity in manufacturing. The task involves deciding whether a product is defective, given product-related information (e.g., image of completed products, and equipment data). Thanks to large volumes of data collected from factories and the advent of machine learning algorithms, data-driven defect diagnosis is receiving increasing attention from both the industry and academia. Data-driven defect diagnosis involves the development of a classifier

f

by using a training dataset that consists of

(x^{(i)}, y^{(i)})

for

i = 1, 2, \dots, n

, where

x^{(i)}

is a feature vector and

y^{(i)}

is the label of product

i

, as well as using the classifier to diagnose product defects.

Data-driven defect diagnosis uses either a feature vector (i.e., input data), such as completed product data (e.g., the image and specification of products) or equipment (e.g., command and sensor) data. Equipment data, including the manufacturing process for each product, can be used to diagnose defects before a product is produced. For example, Manco et al. [1] applied outlier detection to predict door failures on metro trains, and Khalastchi et al. [2] proposed a hybrid approach to detect faults in robotic systems based on sensor readings, commands, and feedbacks from actuators. Song and Shi [3] proposed a quality-supervised double-layer method to detect quality-related faults in which the first layer uses principal component analysis (PCA) to detect the fault and the second layer finds the key variable orthogonal weight of the fault from the PCA. Fakhfakh et al. [4] developed a mathematical model for on-line fault diagnosis, resulting in the schedule delays of a flexible manufacturing system to minimize the total delays caused by faults. They also developed a constraint programming technique to solve this mathematical problem.

Multi-source data fusion that combines or merges data from multiple sources (e.g., multiple sensors and two or more machines) is necessary for realistic defect diagnosis with equipment data [5]. It can be categorized as signal-level, feature-level, or decision-level fusion [6]. Among them, feature-level fusion extracts feature vectors from each data point and merges them based on product ID. It is used primarily to combine a dataset to train a defect diagnosis model [7,8].

However, multi-source data fusion cannot always be effectively conducted due to imperfections, correlations, inconsistency, disparateness, etc. [9]. In many cases (e.g., a specific subprocess such as milling in the manufacturing process), a product ID cannot be attached until the production process is finished. In such cases, inferring a product ID should be conducted first. However, it cannot be perfectly inferred when there are one or more workstations such that (1) two or more independent parallel machines reside in a workstation, (2) the processing time of each machine is variable, and (3) workstations cannot be observed from the outside due to safety issues. When equipment data are partially shuffled, there can be no guarantee that a record in data merged by feature-level fusion will indicate a product. We propose a graphical model to diagnose product defects with partially shuffled equipment data. The proposed graphical model calculates the probabilities that products are shuffled and the likelihood that a product is defective by considering the shuffling probability.

To the authors’ best knowledge, this is the first study to address the real and common manufacturing situation in which each product cannot be perfectly traced because merged equipment data based on product IDs may be partially shuffled. This study proposes a simulation method to calculate the probability that product order is shuffled at each workstation based on production-line analysis. The method also includes the calculation of the maximum shuffling range

R

at workstation

j

and probability,

Q_{j, r}

, of two products are shuffled with shuffling range

r \leq R

. It also proposes a graphical model to diagnose product defects with partially shuffled equipment data with a consideration of product shuffling based on

Q_{j, r}

.

Section 2 of this paper describes the problem in detail, focusing on the situation in which partially shuffled equipment data are merged from multiple data sources. Section 3 develops a graphical model to diagnose defects with partially shuffled equipment data through three phases: (1) production-line analysis with simulation, (2) graphical model design, and (3) graphical model inference. Section 4 provides an example of a numerical application of the proposed graphical model, and Section 5 suggests future research directions.

2. Problem Description

Suppose there are serial workstations and that a product must visit each machine at every workstation to be completed. More formally, let product

i

be processed by machine

k

at workstation

j

from time

t_{1}

to

t_{2}

. The feature for product

i

at workstation

j

,

x_{j}^{(i)},

is

(h_{j, k}^{(t_{1})}, h_{j, k}^{(t_{1} + 1)}, \dots, h_{j, k}^{(t_{2})})

, where

h_{j, k}^{(t)}

is the feature value collected from machine

k

at workstation

j

at time

t

and

t_{1} + 1

means time a unit time elapse (e.g., second and millisecond) after

t_{1}

. By expanding the problem to every product

i = 1, 2, \dots, n

and every workstation

j = 1, 2, \dots, m

, a defect diagnosis dataset,

{(x^{(1)}, y^{(1)}), \dots, (x^{(n)}, y^{(n)})}

can be generated in which

x^{(i)} = (x_{1}^{(i)}, x_{2}^{(i)}, \dots, x_{m}^{(i)})

and

y^{(i)}

are feature vector and a label of product

i

, respectively.

The problem is to train a classifier

f (\cdot)

with training records

{(x^{(i)}, y^{(i)})}_{i = 1}^{n}

and diagnose whether a given product is defective by using the trained classifier. It is a general approach to diagnose defects by using machine learning. However, this approach cannot be employed when some product-processing orders are partially shuffled and the shuffled orders are not known.

Consider a production line that consists of three workstations (

W_{1}

,

W_{2}

,

W_{3}

) where product orders maybe shuffled and shuffled orders are not perfectly inferred in

W_{2}

because the workstation includes two parallel machines whose processing times are not constant and cannot be observed from outside, as illustrated in Figure 1. The circle indicates a machine, and the dotted rectangle indicates a workstation. Defect diagnosis for each product is conducted after the product is processed by

W_{3}

.

Suppose there are two products

i - 1,

with

i

following the sequence

M_{1, 1} \to M_{2, 1} \to M_{3, 1}

and

M_{1, 1} \to M_{2, 2} \to M_{3, 1}

. Suppose also that product

i - 1

enters

M_{1, 1}

ahead of product

i

, but product

i - 1

enters

M_{3, 1}

after product

i

. This situation can be illustrated in a Gantt chart (Figure 2), in which the arrow, rectangle, and colored rectangle denote the time to extract data, observed data, and shuffled observed data, respectively.

As seen in Figure 2, equipment data for product

i - 1

are a concatenation of

M_{1, 1}

’s data during [1,3],

M_{2, 1}

’s data during [6,8], and

M_{3, 1}

’s data during [10], and equipment data for product

i

is a concatenation of

M_{1, 1}

’s data during [4,5],

M_{2, 2}

’s data during [6,7], and

M_{3, 1}

’s data during [8,9]. One cannot know which product is processed by machines

M_{2, 1}

and

M_{2, 2}

in

W_{2}

, because this workstation is not observed; one can only know if each machine is busy or idle. It is therefore impossible to identify which product is released at time 9 from

W_{3}

. In this situation, the feature vector for product

i

is not

x^{(i)} = (x_{1}^{(i)}, x_{2}^{(i)}, x_{3}^{(i)})

, but

{\hat{x}}^{(i)} = (x_{1}^{(i)}, x_{2}^{(i - 1)}, x_{3}^{(i - 1)})

, which affects the training of the classifier

f (\cdot)

. In this paper, we call

{\hat{x}}^{(i)}

the observed feature vector for product

i

, and we called the equipment data collected for each product partially shuffled data.

3. Proposed Graphical Model

In this section, we construct a graphical model to diagnose defects with partially shuffled equipment data through three phases: (1) production-line analysis with simulation, (2) graphical model design, and (3) graphical model inference. In the first phase, we characterize the partially shuffled data by means of the shuffling probabilities saved in matrix

Q

. In the second phase, latent variables are introduced to design a graphical model structure. In the third phase, probabilistic supervised models for defect detection are trained, and an expectation of probability that a product is defective given the observed feature vector is calculated using the graphical model.

3.1. Production-Line Analysis with Simulation (Characteristics of Shuffled Data)

Let each

{\tilde{a}}_{j}^{(i)}

and

{\tilde{c}}_{j}^{(i)}

indicate the arrival order and completion order of product

i

at workstation

j

. In the absence of shuffling,

{\tilde{a}}_{j}^{(i)} = {\tilde{c}}_{j}^{(i)}

. If shuffling exists,

{\tilde{a}}_{j}^{(i)} \neq {\tilde{c}}_{j}^{(i)}

for some

i

. Let us define the range of shuffling as

r \equiv | {\tilde{a}}_{j}^{(i)} - {\tilde{c}}_{j}^{(i)} |

for product

i

. We can expect that the actual

i^{t h}

completed product is either the

{(i - r)}^{t h}

or

{(i + r)}^{t h}

arriving product. Here,

r

is regarded as a random variable.

Let

Q_{j, r}

denote the probability of a shuffle with a range of

r

at workstation

j

, which is formally expressed as:

Q_{j, r} = \Pr ({\tilde{c}}_{j}^{(i_{2})} - {\tilde{c}}_{j}^{(i_{1})} \in {γ + r, γ - r} | {\tilde{a}}_{j}^{(i_{1})} - {\tilde{a}}_{j}^{(i_{2})} = γ)

(1)

where

{\tilde{a}}_{j}^{(i_{1})} - {\tilde{a}}_{j}^{(i_{2})} = γ

means that

γ

products arrive between the

i_{1}^{t h}

and

i_{2}^{t h}

arrival times at workstation

j

, and

{\tilde{c}}_{j}^{(i_{2})} - {\tilde{c}}_{j}^{(i_{1})} \in {γ + r, γ - r}

implies that either

γ + r

or

γ - r

products are completed between the

i_{1}^{t h}

and

i_{2}^{t h}

completion at workstation

j

. That is, the

i^{t h}

arrived product may not be the

i^{t h}

completed product at the workstation because some products are completed faster or slower than expected. Maximum shuffling range

R

at workstation

j

is calculated as

R = \max {r | Q_{j, r} > 0}

, and

Q_{j, r}

is defined for

r \leq R

.

Even though

Q_{j, r}

plays an important role in developing the graphical model, it is not possible to analytically calculate it when the range of

r

is wide because there are many machines whose processing times are variables. However, we adopted a simulation method that could estimate

Q_{j, r}

. The first step is to generate simulated data that consist of simulated production routes for

\hat{n}

products. Let

V^{(p)}

be the processing route of simulated product

p

, which is a tuple of the machine, start time of processing, and the end time of processing at each workstation:

V^{(p)} = ((M_{1}^{(p)}, a_{1}^{(p)}, c_{1}^{(p)}), (M_{2}^{(p)}, a_{2}^{(p)}, c_{2}^{(p)}), \dots, (M_{m}^{(p)}, a_{m}^{(p)}, c_{m}^{(p)}))

(2)

where

M_{j}^{(p)}

,

a_{j}^{(p)}

, and

c_{j}^{(p)}

indicate the machine that processes product

p

, its start time, and its end time (release time) at workstation

W_{j}

, respectively.

A processing route generation algorithm is presented as Algorithm 1. It repeats (1) randomly selecting one of the idle machines to process a product at a workstation, (2) sampling processing time from the distribution of the selected machine’s processing time, (3) calculating the start and end time of the selected machine for the product, and (4) updating the busy/idle status of the selected machine for every product and workstation.

Algorithm 1 Product Route Generation Algorithm
Input	$\hat{n}$ , $M_{j, k}$ , $F_{j, k}$ for all $j$ , $k$
Procedure	01	Initialize $I_{j, k}^{(t)}$ as 1 for all $t, j, k$
	02	Initialize $t$ as 1
	03	For $(p = 1; p \leq \hat{n}, j + +)$ do {
	04	For $(j = 1; j \leq m, j + +)$ do {
	05	While $(\sum_{k = 1}^{m_{j}} I_{j, k}^{(t)} = 0)$ do {Increase $t$ by 1}
	06	$\hat{k} \overset{R a n d o m}{\leftarrow} {k \| I_{j, k}^{(t)} = 1}$
	07	$a_{j}^{(p)} = t$
	08	$b_{j}^{(p)} \overset{s a m p l i n g}{\leftarrow} F_{j, k} (\cdot)$
	09	$c_{j}^{(p)} = t + b_{j}^{(p)}$
	10	$I_{j, \hat{k}}^{(t + τ)} = 0$ for $τ = 1, 2, \dots, b_{j}^{(p)}$ }}
Output	$V^{(p)}$ for $p = 1, 2, \dots, \hat{n}$

A specific explanation of Algorithm 1 follows. Every machine’s state for all

t

is initialized as idle (Line 01), and the time is also initialized (Lines 01–02). Product

p

is simulated (Lines 04~10): If there is no idle machine in workstation

j

at

t

, then

t

is increased by 1 until one or more machines become idle (Line 05); the product waits for processing when there is no idle machine. This product then enters a randomly selected idle machine

\hat{k}

in workstation

j

(Line 06), and the start time for processing the product

p

is set to

t

(Line 07), and the processing time is sampled from its probability distribution (Line 8). The processing time determines the end time of processing product

p

. (Line 09), and the selected machine’s state becomes busy (Line 10).

Using

V^{(p)}

for

p = 1, 2, \dots, \hat{n}

, obtained from Algorithm 1, we can calculate

Q_{j, r}

in Equation (1). First, we calculate

{\tilde{a}}_{j}^{(p)}

and

{\tilde{c}}_{j}^{(p)}

. based on

{a_{j}^{(1)}, a_{j}^{(2)}, \dots, a_{j}^{(\hat{n})}}

and

{c_{j}^{(1)}, c_{j}^{(2)}, \dots, c_{j}^{(\hat{n})}}

for all

p

, respectively. Then, we obtain the estimate of

Q_{j, r}

from the ratio of pairs

(i_{1}, i_{2})

that satisfies

{\tilde{c}}_{j}^{(i_{2})} - {\tilde{c}}_{j}^{(i_{1})} \in {γ + r, γ - r}

among those

{\tilde{a}}_{j}^{(i_{1})} - {\tilde{a}}_{j}^{(i_{2})} = γ

.

3.2. Graphical Model Design (Graphical Model for Shuffled Data)

We can introduce a latent variable

z^{(i)} = (z_{1}^{(i)}, z_{2}^{(i)}, \dots, z_{m}^{(i)})

to indicate the

i th

completed product at each workstation. For example, if

z_{j}^{(7)} = 9

, then the 7th completed product at workstation

j

is the one that has entered into that workstation as the 9th. Actually,

z_{j}^{(i)}

is unknown if workstation

j

is unobservable, but there can be only one

i^{'}

, such that

z_{j}^{(i^{'})} = i

. By introducing these latent variables, we can develop a graphical model that represents the relationships between all

x^{(i)}

and

{\hat{x}}^{(i)}

values, and an inference to defect diagnosis is possible.

An exemplary structure of graphical model can be given with the help of latent variable

z^{(i)}

, as presented in Figure 3 in the case

R = 2

.

In this figure, shaded circles indicate observable variables, while the others are unobservable. In general, node

z^{(i)}

is connected to

z^{(i - R)}, z^{(i - R + 1)}, \dots, z^{(i + R)}

because the variable

z^{(i)}

mathematically affects

z^{(i - R)}

,

z^{(i - R + 1)}

, …,

z^{(i + R)}

, and

{\hat{x}}^{(i)}

.

With the help of the graphical structure,

{\hat{x}}^{(i)}

given

z^{(i)}

can be expressed as:

{\hat{x}}^{(i)} | z^{(i)} = (x_{1}^{(z_{1}^{(i)})}, x_{2}^{(z_{2}^{(i)})}, \dots, x_{m}^{(z_{m}^{(i)})})

(3)

x^{(i)}

can be then inferred from

{\hat{x}}^{(i^{'})}

and

z^{(i^{'})}

for

i^{'} = i - R, i - R + 1, \dots, i + R

as:

x^{(i)} = (\sum_{i^{'} = i - R}^{i + R} I (z_{j}^{(i^{'})} = i) \times {\hat{x}}_{j}^{(i^{'})})_{j = 1}^{m}

(4)

where

I (c o n d i t i o n)

is 1 if the condition is satisfied, and 0, otherwise.

3.3. Inference (Defect Diagnosis)

The objective of this paper was to calculate the probability,

\Pr (y^{(i)} | x^{(i)}),

that product

i

is defective given its feature vector, which may be calculated by a probabilistic supervised model such as logistic regression. However, it cannot be directly obtained because

x^{(i)}

is unobservable, unlike the situation to which a supervised learning model is applied. Thus, we expanded

\Pr (y^{(i)} | x^{(i)})

by means of the designed graphical model structure to consider all possible situations in which

x^{(i)}

is distributed in

{\hat{x}}^{(i^{'})}

for

i^{'} = i - R, i - R + 1, \dots, i + R

as:

\Pr (y^{(i)} | x^{(i)}) = \sum_{{\tilde{z}}^{(i)}} \Pr (y^{(i)} | (\sum_{i^{'} = i - R}^{i + R} I ({\tilde{z}}_{j}^{(i^{'})} = i) \times {\hat{x}}_{j}^{(i^{'})})_{j = 1}^{m}) \times \Pr (z^{(i)} = {\tilde{z}}^{(i)})

(5)

Now, each part of the right-hand side of Equation (5) can be calculated.

(i) Calculation of

\Pr (z^{(i)} = {\tilde{z}}^{(i)})

\Pr (z^{(i)} = {\tilde{z}}^{(i)})

in Equation (5) can be calculated using

Q_{j, r}

from the production-line analysis in Section 3.1. When

R = 0

(when there is no partial shuffling), then

\Pr (z^{(i)} = i \times 1) = 1

for all

i

. In this case,

(\sum_{i^{'} = i - R}^{i + R} I (z_{j}^{(i^{'})} = i) \times {\hat{x}}_{j}^{(i^{'})})_{j = 1}^{m}

equals

x^{(i)}

, implying that

\Pr (y^{(i)} | x^{(i)})

is directly calculated from a probabilistic supervised model because

x^{(i)} = {\hat{x}}^{(i)}

. When

R \geq 1

, a conditional distribution of

z_{j}^{(i)}

given

z_{j - 1}^{(i)}

can be obtained.

\Pr (z_{j}^{(i)} = z_{j - 1}^{(i)} | z_{j - 1}^{(i)}) = Q_{j, 0}

(6)

\Pr (z_{j}^{(i)} = z_{j - 1}^{(i)} + r | z_{j - 1}^{(i)}) = \Pr (z_{j}^{(i)} = z_{j - 1}^{(i)} - r | z_{j - 1}^{(i)}) = \frac{Q_{j, r}}{2}, for r = 1, \dots, R .

(7)

z^{(i)}

’s distribution can be obtained recursively using Equations (6) and (7).

(ii) Calculation of

\Pr (y^{(i)} | (\sum_{i^{'} = i - R}^{i + R} I ({\tilde{z}}_{j}^{(i^{'})} = i) \times {\hat{x}}_{j}^{(i^{'})})_{j = 1}^{m})

This probability is obtained by a supervised machine learning model with a training dataset composed of

({\hat{x}}^{(i)}, y^{(i)}) = ({\hat{x}}_{1}^{(i)}, {\hat{x}}_{2}^{(i)}, \dots, {\hat{x}}_{m}^{(i)}, y^{(i)})

, i.e., the observed feature vector for product

i

and its label. Specifically,

{\hat{x}}_{j}^{(i)}, j = 1, 2, \dots, m

is the

i th

feature collected at workstation

j

;

y^{(i)}

is 1 if the

i th

product is defective, and

y^{(i)}

is 0 otherwise. Note that we do not know the exact value of

x^{(i)}

, and it cannot be guaranteed that

x^{(i)} = {\hat{x}}^{(i)}

because of possible shuffling. Instead, we utilized the value

{\tilde{z}}^{(i)}

of the latent variable

z^{(i)}

to compose a dataset to train the model.

All terms of

I ({\tilde{z}}_{j}^{(i^{'})} = i) \times {\hat{x}}_{j}^{(i^{'})}

(i - R \leq i^{'} \leq i + R)

in the condition part of the probability vanish, except the one that satisfies

{\tilde{z}}_{j}^{(i^{'})} = i

. Therefore, only those observed feature vectors

{\hat{x}}_{j}^{(i^{'})}

where

{\tilde{z}}_{j}^{(i^{'})} = i

are required. This implies the feature vectors used for learning each model are determined by the distribution of

z^{(i)}

. As an example, suppose that

z^{(i)}

can take either

(i, i + 1, i)

,

(i, i - 1, i)

or

(i, i, i)

when the shuffle range is

R = 1

. When

z^{(i)} = (i, i + 1, i)

or

z^{(i)} = (i, i - 1, i)

,

{{\hat{x}}_{1}, {\hat{x}}_{3}}

is used as a feature set, but when

z^{(i)} = (i, i, i)

, then the feature set is

{{\hat{x}}_{1}, {\hat{x}}_{2}, {\hat{x}}_{3}}

.

4. Numerical Example

This section provides a numerical example to apply the proposed model. In addition, the classification performance of the defect diagnosis is compared between our model, which takes the possibility of shuffling of product order into consideration, and the model that ignores it and assumes

x^{(i)} = {\hat{x}}^{(i)}

for all values of

i

. An F1 score, harmonic mean of the precision and recall, was employed as the classification performance measure to consider both cost of false positives (FPs) and false negatives (FNs). FN cost occurs when an actual defective product is not detected, and this cost covers selling defective products such as customer claim and safety accidents (note that the addressed product is a brake disc, which is extremely safety-critical). FP cost occurs when a normal product is detected as defective, and this cost covers the costs of re-check, re-work, or disposal. Even though FN cost is much greater than FP cost, considering only one cost makes a classifier biased, so we employed an F1 score of 4.1. Line data

We generated an illustrative dataset by referring to a disk-brake processing line of a large-sized auto parts company in Korea for this numerical example’s purpose. The generated data are very similar to the real data except for machine specifications and related parameters. Even though the data are artificial, we believe that the research result can be effectively applied to the real production field when a possible shuffle of production data exists because the main idea and methodology of the research focus on data-shuffling and its effect on the detection of defectives.

The processing line consists of seven workstations (

S_{1}

,

S_{2}

,

\dots

,

S_{7}

) including rough grinding, drilling, and tapping. Figure 4 illustrates the processing line, where a circle, a rectangle, and a colored rectangle denote a machine, a workstation, and a workstation that is not observable, respectively.

As seen in this figure,

S_{2}

,

S_{3}

, and

S_{5}

are composed of parallel machines, and the others include a single machine. We assumed that when a machine is abnormal the products processed by it are all defective.

Table 1 shows machine specifications, including probability density functions, of processing times and sensor values attached to each machine in normal and abnormal states.

N (μ, σ^{2})

denotes a normal distribution with mean

μ

and standard deviation

σ

, and

Exp (λ)

denotes an exponential distribution with mean

1 / λ

. The numbers were all created artificially for the purpose of illustration.

4.1. Production-Line Analysis

One million simulated products were generated using Algorithm 1 to estimate partially shuffling probability

Q_{j, r}

. The result is summarized in Table 2.

With Table 2 and Equations (6) and (7), we can obtain a probability distribution of

z^{(i)}

. For example, if we calculate the probability that the

i = 78 th

product arrives at Workstation (1), and overtakes the 77th product at Workstation (2), and is overtaken by two products at Workstation (3) and keeps its order afterward, then it is given by as:

overtaken by two products at Workstation (3) and keeps its order afterward. The probability is then given by:

\begin{array}{l} \Pr (z^{(i)} = (i, i - 1, i + 1, i + 1, i + 1, i + 1, i + 1)) = Q_{1, 0} \times \frac{Q_{2, 1}}{2} \times \frac{Q_{3, 2}}{2} \times Q_{4, 0} \times Q_{5, 0} \times Q_{6, 0} \times \\ Q_{7, 0} = 1.000 \times \frac{0.2272}{2} \times \frac{0.0901}{2} \times 0.7254 \times 1.000 \times 0.8504 \times 1.000 \times 1.000 = 0.00315699526 \end{array}

(8)

4.2. Feature Data

A total of

100, 000

values of

x^{(i)}, y^{(i)}

were generated for all

j

(i.e.,

\hat{n} = 100, 000

), and they were artificially shuffled to get the observed feature values,

{\hat{x}}_{j}^{(i)}

. Product

i

could then be expressed as a concatenation of the machine sequence, start and end times, machine state, feature vector, and label as follows:

(M_{1}^{(i)}, \dots, M_{7}^{(i)}, a_{1}^{(i)}, \dots, a_{7}^{(i)}, c_{1}^{(i)}, c_{2}^{(i)}, \dots c_{7}^{(i)}, s_{1}^{(i)}, s_{2}^{(i)}, \dots, s_{7}^{(i)}, x_{1}^{(i)}, x_{2}^{(i)}, \dots, x_{7}^{(i)}, y^{(i)})

(9)

where

x_{j}^{(i)}

represents the feature of statistical values, including variance, skewness, mean, kurtosis, and median extracted from sensor values of the machine processing the product at workstation

j

[10], and

s_{j}^{(i)}

indicates machine state (i.e., 0: normal or 1: abnormal).

y^{(i)}

is the label (i.e., whether product

i

is defective or not), calculated as

\max_{j} (s_{j}^{(i)})

. The number of defective products was 2,136; that is, the defective ratio during that period was 2.136%.

Among the 100,000 values of

x^{(i)}, y^{(i)}

,

70, 000

randomly chosen values were used to train the proposed graphical model, as well as the one that ignores shuffling of product order. The remaining 30,000 values were reserved for the test of the two models.

4.3. Defect Diagnosis Result

A logistic regression model was employed as a classifier because it is one of the simplest probabilistic classifiers. Other probabilistic classification models may yield similar results. We compared the classification performance between our model and the model that classified the product without consideration of possible shuffling. The proposed model considered the probability that the feature values were shuffled among the products when classifying the product, but the other, “no shuffling,” model assumed the order of collection of feature values was the same as that of the production. The preparation procedure for the dataset to be used for comparison is depicted in Figure 5.

Figure 6 shows the F1 scores of the proposed model and no shuffling model according to various cutoff values of threshold

δ

. The model decided whether a product was defective if

\Pr (y_{i} = 1 | {\hat{x}}^{(i)}) \geq δ

.

From Figure 6, we can see that the proposed model produced a much higher F1 score over all thresholds. This implies that our graphical model dealt with the shuffling of product order and could detect a defective product far more effectively than a model that ignores shuffling. As a result, one can expect an improved classification result by considering the shuffling when some workstations are unobservable from the outside and the shuffling of product order is expected. When the threshold was small, i.e., under

δ = 0.1

, then the F1 scores of the two models were almost identical, which was expected because both models tried to judge almost all products as defective. Another implication is that the best performance of the “no shuffling” model appeared when the cutoff threshold was near

δ = 0.08

, while the proposed model’s performance was the best when

δ = 0.17

. Therefore, the proposed model was less likely to produce a false alarm (classifying the non-defective as falsely defective) than the “no shuffling” model.

Figure 7 shows values of the F1 score, recall, precision and specificity of the proposed model for the range of

0.1 \leq δ \leq 0.5

. As seen in this figure, the specificity was close to 1 irrespective of

δ

, implying that most non-defective products were diagnosed as non-defective (i.e., negative), which was natural because the number of negative records was much greater than the number of positive records. Precision increased as

δ

got higher, while recall decreased. The F1 score was more affected by recall than precision because of very small values of recall. From this analysis, the following implications are drawn: First, specificity may not be a proper measure for defective diagnosis when classes are imbalanced. Second, precision and recall show the opposite trend as

δ

increases, implying that using only one of them may lead to wrong decision. Specifically, a classifier that determines most records as positive will get higher recall, and the precision of a classifier that regards most records as negative tends to be bigger. Since the F1 score of a highly imbalanced class dataset tends to show a similar trend to recall, a classifier which is not biased to negative class such as a naïve Bayes classifier with uniform prior can be suggested to reduce both misclassification costs.

5. Conclusions

This paper addresses a realistic data quality problem of manufacturing equipment data when developing a product-defect diagnosis model—the partially shuffled data problem. Each record should correspond to a product in order to develop a proper defect diagnosis model, but most records in partially shuffled data do not correspond with a product. This problem results from one or more unobservable workstations that consist of parallel machines. One alternative may be to omit those features that may be partially shuffled if those workstations that produce partially shuffled data are near the end of the manufacturing process. However, if one or more of these workstations is near the front of the process, such as in the numerical example of this paper, omitting features obviously yields a very poor classifier with insufficient feature vector. This is because all the features collected from the workstations followed by that shuffling workstation may be possibly shuffled, and, therefore, we have to remove all features from then on. Thus, omitting features from the workstations cannot recover the shuffle of other features.

Therefore, this paper presents a probabilistic graphical model to diagnose product defects by considering the probabilities that product orders are partially shuffled. The graphical model was designed by means of a production-line analysis, and it was trained by partially shuffled data. From the numerical example, we verified that the graphical model exhibited a similar performance to a classifier trained by data that were not shuffled.

For future research, we will expand the graphical model to diagnose defects in products where the manufacturing sequence of some products varies due to machine breakdown, reworking, and other factors. In addition, we can apply this model to other fields such as traffic management and bio-informatics. Additionally, in order to enhance the model’s diagnosis accuracy as well as to improve its computational efficiency, it will be necessary to develop a proper method for extracting critical features with a shuffling strategy.

Author Contributions

Author contributions are as follows: data curation, D.S.; funding acquisition, S.H.; investigation, G.A. and D.S.; methodology, G.A. and S.H.; project administration, S.H. and D.S.; software, G.A.; validation, G.A.; writing—original draft, G.A.; writing—review and editing, S.H., D.S., and Y.-J.P.

Funding

This work has supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2019R1A2C1088255).

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

Indices
$i$	Product index, $i = 1, 2, \dots, n$
$j$	Workstation index, $j = 1, 2, \dots, m$
$k$	Machine index, $k = 1, 2, \dots, m_{j}$
$t$	Time index, $t = 1, 2, \dots, T$
$p$	Simulated product index, $p = 1, 2, \dots, \hat{n}$
$r$	Shuffle range index, $r = 0, 1, 2, \dots, R$
Production Line-Related Terms
$W_{j}$	Workstation $j$
$M_{j, k}$	Machine $k$ at workstation $j$
$F_{j, k} (\cdot)$	Probability distribution function of $M_{j, k}$ ’s processing time
$Q_{j, r}$	Probability that processing orders of two products, where $r$ products are released between their release time at workstation $j - 1$ , are switched at workstation $j$
$M_{j}^{(i)}$	Machine that processes product $i$ at workstation $j$
$a_{j}^{(i)}$	Start time of processing of product $i$ at $M_{j}^{(i)}$
$b_{j}^{(i)}$	Processing time of product $i$ by $M_{j}^{(i)}$
$c_{j}^{(i)}$	Completion time of processing of product $i$ by $M_{j}^{(i)}$ , $c_{j}^{(i)} = a_{j}^{(i)} + b_{j}^{(i)}$
${\tilde{a}}_{j}^{(i)}$	Arrival order of product $i$ at workstation $j$
${\tilde{c}}_{j}^{(i)}$	Completion order of product $i$ at workstation $j$
$I_{j, k}^{(t)}$	Binary variable indicating $M_{j, k}$ is idle at $t$ , $I_{j, k}^{(t)} = {\begin{matrix} 1, {if M}_{j, k} is idle at t \\ 0, otherwise \end{matrix}$
Model-Related Terms
$x^{(i)}$	Feature vector for product $i$ , $x^{(i)} = {(x_{j}^{(i)})}_{j = 1}^{m} = (x_{1}^{(i)}, x_{2}^{(i)}, \dots, x_{m}^{(i)})$
$h_{j, k}$	Time-series data collected from machine $k$ at workstation $j$ , $h_{j, k} = (h_{j, k}^{(1)}, h_{j, k}^{(2)}, \dots, h_{j, k}^{(T)})$
$y^{(i)}$	Label of product $i$ , $y^{(i)} = {\begin{matrix} 1, if product i is defective \\ 0, otherwise \end{matrix}$
${\hat{x}}^{(i)}$	Observed feature vector for product $i$
$f (\cdot)$	Classifier to diagnose whether each product is defective or not
$z^{(i)}$	Latent variable of actual product whose data is collected as ${\hat{x}}^{(i)}$

References

Manco, G.; Ritacco, E.; Rullo, P.; Gallucci, L.; Astill, W.; Kimber, D.; Antonelli, M. Fault detection and explanation through big data analysis on sensor streams. Expert Syst. Appl. 2017, 87, 141–156. [Google Scholar] [CrossRef]
Khaleghi, B.; Khamis, A.; Karray, F.O.; Razavi, S.N. Multisensor data fusion: A review of the state-of-the-art. Inf. Fusion 2013, 14, 28–44. [Google Scholar] [CrossRef]
Song, B.; Shi, H. Fault detection and classification using quality-supervised double-layer method. IEEE Trans. Ind. Electron. 2018, 65, 8163–8172. [Google Scholar] [CrossRef]
Fakhfakh, O.; Toguyeni, A.; Korbaa, O. On-line fault diagnosis of FMS based on flows analysis. J. Intell. Manuf. 2018, 29, 1891–1904. [Google Scholar] [CrossRef]
Ma, Z.; Marchette, D.J.; Priebe, C.E. Fusion and inference from multiple data sources in a commensurate space. Stat. Anal. Data Min. 2012, 5, 187–193. [Google Scholar] [CrossRef] [Green Version]
Hall, D.L.; Llinas, J. An introduction to multisensor data fusion. Proc. IEEE 1997, 85, 6–23. [Google Scholar] [CrossRef] [Green Version]
Chen, Z.; Li, W. Multisensor feature fusion for bearing fault diagnosis using sparse autoencoder and deep belief network. IEEE Trans. Instrum. Meas. 2017, 66, 1693–1702. [Google Scholar] [CrossRef]
Vazifeh, M.R.; Hao, P.; Abbasi, F. Fault diagnosis based on multikernel classification and information fusion decision. Comput. Technol. Appl. 2013, 4, 404–409. [Google Scholar]
Khalastchi, E.; Kalech, M.; Rokach, L. A hybrid approach for improving unsupervised fault detection for robotic systems. Expert Syst. Appl. 2017, 81, 372–383. [Google Scholar] [CrossRef]
Safizadeh, M.S.; Latifi, S.K. Using multi-sensor data fusion for vibration fault diagnosis of rolling element bearings by accelerometer and load cell. Inf. Fusion 2014, 18, 1–8. [Google Scholar] [CrossRef]

Figure 1. Example to illustrate processing orders that can be partially shuffled.

Figure 2. Gantt chart describing when product orders are shuffled.

Figure 3. Structure of the graphical model (Case:

R = 2

).

Figure 3. Structure of the graphical model (Case:

R = 2

).

Figure 4. Layout of a disk-brake processing line in the numerical example.

Figure 5. Preparation procedure for the dataset to be used for comparison.

Figure 6. Defect diagnosis result: F1 scores of the proposed model and no shuffling model.

Figure 7. Performance measures of the proposed model according to threshold.

Table 1. Machine specifications of the example.

Machine	Processing Time	Probability That the Machine Is Abnormal	Sensor Values
Machine	Processing Time	Probability That the Machine Is Abnormal	At Normal State	At Abnormal State
$M_{1, 1}$	$Exp (0.1)$	0.001	$N (10, 3^{2})$	$N (15, 4^{2})$
$M_{2, 1} M_{2, 2}$	$Exp (0.05)$	0.005	$N (8, 3^{2})$	$N (12, 3^{2})$
$M_{3, 1} M_{3, 2} M_{3, 3}$	$Exp (0.1)$	0.003	$N (9, 3^{2})$	$N (7, 2^{2})$
$M_{4, 1}$	$Exp (0.2)$	0.010	$N (10, 3^{2})$	$N (12, 3^{2})$
$M_{5, 1} M_{5, 2}$	$Exp (0.15)$	0.001	$N (10, 3^{2})$	$N (13, 3^{2})$
$M_{6, 1}$	$Exp (0.1)$	0.001	$N (12, 3^{2})$	$N (15, 4^{2})$
$M_{7, 1}$	$Exp (0.1)$	0.001	$N (12, 3^{2})$	$N (15, 4^{2})$

Table 2. Partially shuffling probability matrix of the numerical example.

		$Shuffle Range r$
		0	1	2	3	4	5	6	7	8
Work-Station $j$	1	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	2	0.6212	0.2272	0.0901	0.0360	0.0151	0.0063	0.0025	0.0010	0.0004
	3	0.7254	0.1808	0.0618	0.0207	0.0072	0.0026	0.0010	0.0003	0.0001
	4	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	5	0.8504	0.1204	0.0229	0.0050	0.0010	0.0003	0.0000	0.0000	0.0000
	6	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	7	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ahn, G.; Hur, S.; Shin, D.; Park, Y.-J. A Graphical Model to Diagnose Product Defects with Partially Shuffled Equipment Data. Processes 2019, 7, 934. https://0-doi-org.brum.beds.ac.uk/10.3390/pr7120934

AMA Style

Ahn G, Hur S, Shin D, Park Y-J. A Graphical Model to Diagnose Product Defects with Partially Shuffled Equipment Data. Processes. 2019; 7(12):934. https://0-doi-org.brum.beds.ac.uk/10.3390/pr7120934

Chicago/Turabian Style

Ahn, Gilseung, Sun Hur, Dongmin Shin, and You-Jin Park. 2019. "A Graphical Model to Diagnose Product Defects with Partially Shuffled Equipment Data" Processes 7, no. 12: 934. https://0-doi-org.brum.beds.ac.uk/10.3390/pr7120934

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Graphical Model to Diagnose Product Defects with Partially Shuffled Equipment Data

Abstract

1. Introduction

2. Problem Description

3. Proposed Graphical Model

3.1. Production-Line Analysis with Simulation (Characteristics of Shuffled Data)

3.2. Graphical Model Design (Graphical Model for Shuffled Data)

3.3. Inference (Defect Diagnosis)

4. Numerical Example

4.1. Production-Line Analysis

4.2. Feature Data

4.3. Defect Diagnosis Result

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI