Development of a Face Prediction System for Missing Children in a Smart City Safety Network

Wang, Ding-Chau; Tsai, Zhi-Jing; Chen, Chao-Chun; Horng, Gwo-Jiun

doi:10.3390/electronics11091440

Open AccessArticle

Development of a Face Prediction System for Missing Children in a Smart City Safety Network

¹

Department of Information Management, Southern Taiwan University of Science and Technology, Tainan 71005, Taiwan

²

Institute of Manufacturing Information and Systems, National Cheng Kung University, Tainan 70101, Taiwan

³

Department of Computer Science and Information Engineering, Southern Taiwan University of Science and Technology, Tainan 71005, Taiwan

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(9), 1440; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics11091440

Submission received: 16 February 2022 / Revised: 24 April 2022 / Accepted: 25 April 2022 / Published: 29 April 2022

(This article belongs to the Special Issue Advances of Future IoE Wireless Network Technology)

Download

Browse Figures

Versions Notes

Abstract

:

Cases of missing children not being found are rare, but they continue to occur. If the child is not found immediately, the parents may not be able to identify the child’s appearance because they have not seen their child for a long time. Therefore, our purpose is to predict children’s faces when they grow up and help parents search for missing children. DNA paternity testing is the most accurate way to detect whether two people have a blood relation. However, DNA paternity testing for every unidentified child would be costly. Therefore, we propose the development of the Face Prediction System for Missing Children in a Smart City Safety Network. It can predict the faces of missing children at their current age, and parents can quickly confirm the possibility of blood relations with any unidentified child. The advantage is that it can eliminate incorrect matches and narrow down the search at a low cost. Our system combines StyleGAN2 and FaceNet methods to achieve prediction. StyleGAN2 is used to style mix two face images. FaceNet is used to compare the similarity of two face images. Experiments show that the similarity between predicted and expected results is more than 75%. This means that the system can well predict children’s faces when they grow up. Our system has more natural and higher similarity comparison results than Conditional Adversarial Autoencoder (CAAE), High Resolution Face Age Editing (HRFAE) and Identity-Preserved Conditional Generative Adversarial Networks (IPCGAN).

Keywords:

face aging; generative adversarial network; StyleGAN2; FaceNet; missing child

1. Introduction

We will divide this paper into three subsections to describe the introduction: “Status of Missing Children’s Cases”, “Problems of Current Face Aging Methods”, and “Contribution”.

1.1. Status of Missing Children’s Cases

According to the Federal Bureau of Investigation’s National Crime Information Center (NCIC) Missing Person and Unidentified Person Statistics, there will be 365,348 children missing in the United States in 2020 [1]. According to the National Crime Agency’s Missing Persons Statistics, it is estimated that over 65,800 children will go missing between 2019 and 2020 [2].

From the above statistical results, it is known that there has always been a situation of missing children. Therefore, we built the Face Prediction System for Missing Children to predict the face of children when they grow up and help parents or police search, detect, and identify missing children. Moreover, we applied our system to the Smart City Safety Network. According to S. P. Mohanty et al.’s description [3], a smart city includes smart infrastructure, smart transportation, smart energy, smart health care, and smart technology. The key to transforming traditional cities into smart cities is information and communication technology (ICT). Smart cities use ICT to solve a variety of urban problems. In addition, M. Lacinák et al. emphasized the importance of safe cities [4,5]. They describe that every smart city must also be a safe city, and a safe city should be regarded as a part of a smart city. A safe city system should include the following features: smart safety systems for surveillance, search, detection and identification, etc. The purpose of our system conforms to the concept of smart city safety, and then we use the concepts of IoE and AIoT to implement our system and form a network. More details can be found in Section 3.

To predict future faces currently, face aging image generation methods in the field of machine learning can be used. However, the existing face aging model only considers the facial features that only older people have. In fact, head size and genetics can also affect appearance. Therefore, current face aging methods cannot well predict children’s faces when they grow up. Section 1.2 describes the problems of current face aging methods in detail.

1.2. Problems of Current Face-Aging Methods

We refer to many face-aging image generation methods, such as CAAE [6] extended by VAE, F-GAN [7], HRFAE [8] and IPCGAN [9] extended by GANs. Details can be found in Section 2.3. Overall, these face-aging methods add or smooth some irregular wrinkles on the face, making the generated results appear older or younger. However, these methods do not work for children under 12; they only work for adults. According to medical research [3,4,5,6,7,8,9,10], the period from 0 years old to adolescence is the fastest-growing period for human beings. The appearance (including facial appearance, body shape, etc.) will change greatly. Therefore, if we only consider the facial features that only older people have and do not consider other factors (such as head size and genetics, etc.), it is impossible to predict children’s faces when they grow up.

Figure 1 illustrates the problems of current face-aging methods. In Figure 1, (a) is the original image, whose age is between 0 and 20 years old, while (b)~(d) is the face image converted by using the original image through the Group-GAN model; (b) is between 20 and 40 years old, (c) is between 40 and 60 years old, and (d) is over 60 years old.

We can see from Figure 1 that the child and the adult are transformed from (a) to (b)~(d), respectively. Two problems arise during this transition:

The difference between the child and adult transition from (a) to (b) is small, especially when the child (b) does not look like they are between 20 and 40 years old. The reason is that most aging models only consider the facial texture and do not consider that the head’s size will also change with age;
The result of a child’s transition from (a) to (d) is very unnatural, while the transition of an adult from (a) to (d) is relatively natural. The reason is not only because the size of the head is not taken into account, but also because people grow most rapidly before puberty (children under the age of 12), so the appearance changes very greatly. Therefore, it is not enough to only consider the facial texture for face prediction.

Overall, because existing aging models only consider the appearance characteristics that older adults have, they cannot predict children’s future faces.

We propose a Face Prediction System for Missing Children. In addition to taking into account the facial features of children before their disappearance, it also considers the facial features of their blood relatives. According to genetics [10,11,12,13], life on earth mainly uses DNA in the blood as genetic material so that the offspring will have the parent’s traits (traits including appearance and disease, etc.). In addition, according to Mendelian inheritance [14,15], human looks are mainly determined by genetics, which means that children are born with traits such as appearance and diseases that are part of their parents. Our system takes into account human genetics and then estimates and predicts children’s future faces. It is a more reasonable estimate than other aging models. No one has proposed this prediction method to date.

1.3. Contribution

Suppose a parent wants to search, detect and identify their child from a group of unidentified children. If every unidentified child had a DNA paternity test, there would be a lot of cost and time waiting for the test. If parents used our system, they could quickly confirm the possibility of blood relation with any unidentified child. Additionally, there are the following benefits:

We can directly eliminate this pairing when our system’s face prediction image and any child’s face image are low in similarity. This is useful for narrowing down the search;
When our system’s face prediction image and any child’s face image are high in similarity, we can use this pairing and then conduct the DNA paternity test. It is faster and less expensive than DNA paternity testing for every unidentified child.

Overall, we have the following two contributions:

Our system takes into account the facial features of children’s blood relatives, and the output predictions are approximately 75% more similar to the expected results. When parents search for missing children, our system helps to eliminate low similarity matches and narrow the search;
Parents can quickly and inexpensively confirm the possibility of blood relation with any child.

2. Related Work

2.1. Generative Adversarial Networks (GANs)

GANs [15,16,17] are unsupervised learning networks trained only through images without labels. A GAN is mainly composed of a generator and discriminator network. The final goal of a GAN is for the generator to randomly create real-looking images that cannot be distinguished from training images. Many scholars have developed different methods and applications based on the concept of GANs. For example, in the field of image generation, the Progressive GAN [18] proposed by NVIDIA Tero Karras et al. can randomly generate high-resolution images. There is also StyleGAN [19] proposed in 2019, which follows the training network of Progressive GAN and has the function of style conversion, which can control the changes of different styles of images.

However, due to the droplet and phase artifact problems in StyleGAN, StyleGAN2 [20] was proposed in 2020 to solve the above problems and make the output results more natural. StyleGAN has had a huge impact on image generation and editing, and many scholars have used StyleGAN for different studies. For example, Image2StyleGAN [21,22], SEAN [23], Editing in Style [24], StyleFlow [25], Pixel2style2pixel [26], StyleCLIP [27] and StyleMapGAN [28] are all image generation and editing methods developed based on StyleGAN.

Figure 2 is the generator architecture of StyleGAN [18,19,20]. The generator of StyleGAN consists of the mapping network and the synthesis network. The mapping network is a non-linear network using an 8-layer MLP. Its input is the latent code (or latent variable) z in latent space Z, and the output is the intermediate latent code (or dlatents) w in intermediate latent space W. The latent space is simply a representation of compressed data in which similar kinds of data points will be closer in the latent space. Latent space is useful for learning data features and finding simpler data representations for analysis. We can interpolate data in the latent space and use our model’s decoder to ‘generate’ data samples. The purpose of the mapping network is to convert the input z to w. Because the use of z to control image features is limited, a mapping network is needed to convert z to w, which is used to reduce the correlation between features and to control the generation of different images.

The synthesis network is used to generate images of different styles and add affine transformation A and random noise B to each sub-network layer. A is used to control the style of the generated image, which can affect the pose of a face, identity features, etc. B is used for the details of the generated image and can affect details such as hair strands, wrinkles, skin color, etc.

Figure 3 shows the style mixing result of StyleGAN. Sources A and B are pre-trained models using StyleGAN to project the images into the corresponding latent space. Finally, the images are directly generated by the latent code. The coarse styles from source B mainly control the coarser low-resolution features (no more than 8 × 8), affecting posture, general hairstyle, facial shape, etc. The middle styles from source B mainly control the finer features of the middle resolution (16 × 16 to 32 × 32), including facial features, hairstyles, opening or closing of eyes, etc. The fine styles from source B mainly control the relatively high-quality, high-resolution features (64 × 64 to 1024 × 1024), affecting the color of eyes, hair, and skin.

Table 1 shows the training results of StyleGAN and StyleGAN2. The ↑ indicates that higher is better and ↓ that lower is better. In 2020, StyleGAN2 improved the shortcomings of SyleGAN, including the droplet and phase artifact problem, and enhanced the quality of SyleGAN, including weight demodulation, path length regularization and no progressive growth. Because StyleGAN2 can generate higher quality images and train faster, this study mainly uses the pre-trained model of StyleGAN2 for style mixing and image projection processing.

2.2. FaceNet

FaceNet [29] is a unified framework for solving recognition and verification problems proposed by Google. According to Florian Schroff et al., FaceNet mainly uses convolutional neural networks to analyze face information and project the information into Euclidean space. The similarity between the two can be directly calculated by calculating the distance in the space. In 2015, FaceNet received the highest accuracy score of 99.63% in LFW [30] and received attention for this. At present, the development of face recognition and verification is quite mature, including OpenFace [31], Deep-Face [32], VGG-Face [33,34], DeepID [35,36,37,38], ArcFace [39] and Dlib [40], all of which have over 90% accuracy. The highest accuracy face-recognition model today is VarGFaceNet [41]. We used Euclidean distance (L2) to calculate distance and converted it into similarity.

2.3. Image Generation Method of Face Aging

In the field of image generation, variational autoencoder (VAE) [42] and generative adversarial network (GAN) [17] methods are the mainstream.

2.3.1. VAE

Research methods extended by VAE include the adversarial autoencoder (AAE) [43,44], the conditional adversarial autoencoder (CAAE) [6], and the conditional adversarial consistent identity autoencoder (CACIAE) [45], etc. AAE is a training method that combines the encoder-decoder idea of VAE and the generator-discriminator of GAN. CAAE is a face-aging method proposed by Z. Zhang et al. It builds a discriminator based on AAE to make the generated images more realistic. CAAE can learn the face manifold and achieve smooth age progression and regression so that the results can appear more aged or younger. In addition, the CACIAE proposed by Bian et al. can reduce the loss of identity information, making the results more realistic and age-appropriate. In the experimental results, our system is compared with CAAE. Since CAAE only considers facial lines, it cannot predict the appearance of children when they grow up.

2.3.2. GANs

The methods of synthesizing face images using GANs can be divided into two categories: translation-based and condition-based.

Translation-Based Method

The translation-based face image synthesis method converts any set of style images into another set of style images. This concept first came from Cycle-GAN [46], proposed by Zhu et al. Its advantage is that it does not require the pairing of two collection domains, making it available for face-style transfer, unlike pix2pix [47], which must have two or two. Only paired data can be used for training. The disadvantage is that it can only be converted between two domains, so later, Choi et al. proposed StarGAN [48], which can learn multiple domains and solve Cycle-GAN’s problem.

In terms of face-aging models, Palsson et al. proposed F-GAN [7], based on the style transfer architecture developed by Cycle-GAN. F-GAN combines the advantages of Group-GAN and FA-GAN. When the age span is large (about 20 years old or more), because the effect is better, Group-GAN is used for face conversion, and FA-GAN is used on the contrary. The problem with F-GAN is that it cannot be converted naturally, and the image quality is low. After 2018, because StyleGAN provides an FFHQ dataset, it became easier to generate high-quality images, but it has artifact problems. Subsequently, Shen et al. proposed InterFaceGAN [49], which can semantically edit the learned latent semantic information (for example, changing age, gender and angle, etc.) and repair the artifacts in the image, making the resulting image more natural. Although it produces higher-quality images, it is not suitable for predicting the appearance of children because it only takes into account the texture of the face.

Condition-Based Method

The condition-based face image synthesis method can be regarded as a supervised GAN. It adds an additional condition to the generator’s input and the discriminator. The condition can be a label or a picture, etc. The function guides the generator and the discriminator towards training on this condition. This concept first came from cGAN [50,51,52], proposed by Mirza et al. It has a better effect than the original GAN, so it has been widely used in the future.

In terms of face-aging models, Wang et al. proposed IPCGAN [9], an architecture that successfully generates new synthetic face images and preserves identities in specific age groups. It generates realistic, age-appropriate faces and guarantees that the synthesized faces have the same identity as the input image. In the experimental results, our system is compared with IPCGAN. Since IPCGAN only changes the facial lines, it cannot predict the appearance of children when they grow up.

In addition, HRFAE [8], proposed by Yao et al., combines age labels and latent vectors and can be used for face age editing on high-resolution images. The core idea is to create a latent space containing face identities and a feature modulation layer corresponding to the individual’s age and then combine these two elements so that the generated output image is the specified target age. In the experimental results, our system is compared with HRFAE. Because HRFAE only considers facial lines, it cannot predict the appearance of children when they grow up.

3. Method

We propose a Face Prediction System for Missing Children, whose purpose is to predict children’s future faces. It allows parents to quickly and inexpensively confirm the possibility of blood relation with any child. When parents search for missing children, our system helps to eliminate low similarity matches and narrow the search. Our system considers the respective features of the following two face images to predict the future face, including face images of the child before the disappearance and face images of the blood relatives. Our system combines StyleGAN2 and FaceNet methods to achieve prediction. StyleGAN2 is used to style mix two face images. FaceNet is used to compare the similarity of two face images. The input is an image of the missing child available before the disappearance and multiple images of family members related by blood. The output is a prediction result. More details can be found in Section 3.1, Section 3.2, Section 3.3 and Section 3.4.

At the application level, we apply our Face Prediction System for Missing Children and the issues of searching for missing children to the concepts of IoE and AIoT, as shown in Figure 4, which will be described in detail below.

On the left side of Figure 4 is the IoE that combines machine-to-machine (M2M), people-to-people (P2P), and people-to-machine (P2M) connections. The difference between IoE and IoT is that IoT only focuses on the pillar of things, while IOE includes four pillars, namely things, people, process and data. IoE is the intelligent connection of the four pillars. The definition of the process is to provide the right information to the right person or machine at the right time to make the connection between people, things, and data more valuable. M2M is defined as the transmission of data from one machine or thing to another machine, including sensors, robots, computers, mobile devices, etc. These M2M connections can be considered IoT. P2P is defined as the transmission of information from one person to another. At present, P2P is mainly realized by mobile devices (such as PCs, TVs, and smartphones) and social networks (such as Facebook, Twitter, and LinkedIn). P2M is defined as the transmission of information between people and machines. People conduct complex data analysis through machines to obtain useful key information and help people make informed decisions. The following will explain the application mode of our system with the concepts of M2M, P2P and P2M.

M2M: The user uses a device with a photographing function to transmit the photos of the children before going missing and the photos of their relatives to our system through the cloud network to predict the faces of the missing children and finally transmit the prediction results to the user through the cloud network (the above process can also be regarded as AIoT, as shown on the right side of Figure 4);
P2P: Family members and friends of the missing children or police can publish relevant information about the missing children (including the time of disappearance, the place of disappearance, the photos before the disappearance and the predicted images from our system, etc.) to social media through mobile devices and social networks, hoping to be known and shared by netizens. The aim is to find the witnesses of the incident or people who know the context of the incident, who will provide relevant information to their families or police to assist in the arrest of the murderer;
P2M: Family members, friends or police officers of missing children can use our system to predict the face of missing children at their present age and use the prediction results as one of the clues. Then, they can spread the image through TV, newspapers, magazines and various social media to let more people know about the case, and let people recall and judge whether they have seen this person. Finally, if people have clues, they can provide criminal clues to the police to help solve the case.

Our system mainly combines StyleGAN2 and FaceNet methods. StyleGAN2 is used to mix two images, and FaceNet is used to compare the similarity of the two images. The architecture of this system will be described in detail below.

3.1. Overview of the System Architecture

Figure 5 shows a flowchart of the image processing steps, divided into three main parts: data preprocessing, phase 1: filtering the best new face image, and phase 2: predicting the age and appearance of a missing child.

Data preprocessing: This is used to take a single face image from the original image and output the dlatents for each face image. We will need the dlatents of each face image to use StyleGAN2 for face mixing. At the beginning of the first and second phases, we load the dlatents, and then proceed to mix the two images;
Phase 1—filtering the best new face image: This is used to mix the two relatives with the highest similarity with the missing child. The face mixing result will have the appearance characteristics of the above two relatives. Finally, the system will select the best new face from multiple mixing results;
Phase 2—predicting the age and appearance of a missing child: This is used to mix the best new face and the image of the missing child so that the face mixing result has not only the appearance characteristics of the missing child but also the appearance characteristics of the above two relatives. Finally, the system will select the best prediction result from multiple mixed results.

The details of these three parts will be described in order below.

3.2. Data Preprocessing

Data preprocessing consists of the following two steps:

Dlib Face Alignment Module: This module aligns and crops each face in the missing child’s available image;
StyleGAN2 Project Image Module: Each face image is subjected to StyleGAN2 projection processing, and finally, the projected image and dlatents file are obtained. These data will be required as input during the first and second phases.

3.2.1. Dlib Face Alignment Module

Figure 6 shows the flowchart of the Dlib Face Alignment Module, which mainly corrects and truncates each face in the original image and outputs it as a face image of size 1024 × 1024. The Dlib Face Alignment Module contains three functions: ‘Face Detector’, ‘Facial Landmark Predictor’ and ‘Face Alignment’.

Face Detector: Each face in the original is detected and labelled with a number;
Facial Landmark Predictor: The 68 landmarks of each face are predicted;
Face Alignment: The image is rotated so that the landmarks of the eyes are horizontally aligned, then the face image is captured and the image is resized to 1024 × 1024.

3.2.2. StyleGAN2 Project Image Module

Figure 7 shows the StyleGAN2 projection process, whose input is an image of a missing child (the child in Figure 6). The Projection image is a function provided by StyleGAN2, which can iterate continuously on an input image (missing child face), producing a very similar one (a projection). In Figure 6, the projected image when iteration is 1 is the projected image after StyleGAN2 training, which is the default image of StyleGAN2. When iterating the 1000th time, the result of the projected image is very similar to the input image (missing child face), so we stored the dlatents this time as a NumPy file to be used for StyleGAN2 style mixing or interpolation in the future.

3.3. Phase 1: Filter the Best New Face Image

The first phase of the system is mainly to filter the best new face images. The input data is a projected image of a missing child and several projected images of family members. The output is an image of the best new face, one of the 36 mixed faces. There are four modules in the first processing phase: the Similarity Sequence Module, StyleGAN2 Style Mixing Module, FaceNet Face Compare Module and Best New Face Filter Module.

3.3.1. Similarity Sequence Module

The Similarity Sequence Module focuses on selecting the two family members most similar to the child from multiple family members. The missing child is first compared with each family member using FaceNet. All the similarities are ranked in descending order, and the images of the top two family members with the highest similarities are output.

3.3.2. StyleGAN2 Style Mixing Module

The StyleGAN2 Style Mixing Module inputs the top two dlatents with the highest similarity in the first phase, and after the StyleGAN2 style mixing process, a total of 36 mixed new faces will be generated. For example, Figure 8 is the StyleGAN2 style mixing result.

3.3.3. Best New Face Filter Module

The Best New Face Filter Module mainly filters one of the 36 new faces mixed by StyleGAN2 as the best new face. In the first phase, this module mainly uses the similarity percentage metric to evaluate the advantages and disadvantages of 36 new faces. These 36 new faces

⟨ n_{1}, n_{2}, n_{3}, \dots, n_{36} ⟩

will get a weight

W_{1, n_{k}}

, respectively. Then the system will rank

W_{1, n_{k}}

from small to large, and the minimum

\min W_{1, n_{k}}

is the best new face.

W_{1, n_{k}} = W_{1, P_{n_{k}}} + W_{1, S_{n_{k}}}

(1)

W_{1, P_{n_{k}}} = |P_{C, T o p 1} - P_{n_{k}, T o p 1}| = |\frac{S_{C, T o p 1}}{S_{C, T o p 1} + S_{C, T o p 2}} - \frac{S_{n_{k}, T o p 1}}{S_{n_{k}, T o p 1} + S_{n_{k}, T o p 2}}|

(2)

W_{1, S_{n_{k}}} = |S_{C, T o p 1} - S_{n_{k}, T o p 1}| + |S_{C, T o p 2} - S_{n_{k}, T o p 2}|

(3)

Here, Equation (1) is the formula for weight

W_{1, n_{k}}

(refer to Table 2 for symbolic meaning), which mainly calculates the similarity percentage of

W_{1, P_{n_{k}}}

and similarity of

W_{1, S_{n_{k}}}

between children and 36 new faces. Then,

W_{1, P_{n_{k}}}

and

W_{1, S_{n_{k}}}

is added and called the Similarity Percentage Metric. Equation (2) is the similarity percentage

W_{1, P_{n_{k}}}

, which mainly calculates

P_{C, T o p 1}

and

P_{C, T o p 1}

. The smaller the gap between the two, the better, indicating that the percentage value of the two is closer. Equation (3) is the similarity of

W_{1, S_{n_{k}}}

; the smaller the formula, the better, indicating that the new face image is more similar to the family. After calculating

W_{1, n_{k}}

, the system will sort each weight, and the smallest weight

\min W_{1, n_{k}}

is the best new face.

3.4. Phase 2: Predicting the Age and Appearance of a Missing Child

The second phase of the system focuses on predicting the current age of the missing child. The input data are the best new face and the missing child image, and the output data are the prediction result. A total of four modules were used in the second phase of processing, in the order of Data Preprocessing, StyleGAN2 Style Mixing Module, FaceNet Face Compare Module and Best New Face Filter Module. The two modules, Data Preprocessing and StyleGAN2 Style Mixing Module, operate in the same way as the corresponding modules in Phase 1, while the other modules are different.

3.4.1. FaceNet Face Compare Module

The FaceNet Face Compare Module mainly compares the best new face and the missing child image with each new face in the second phase. Finally, it records the similarity comparison information in the JSON file for subsequent analysis.

3.4.2. Best New Face Filter Module

The Best New Face Filter Module mainly selects the best prediction result from 36 new faces in the second phase. In the second phase, the similarity percentage is mainly used to evaluate the advantages and disadvantages of 36 new faces. These 36 new faces

⟨ n_{1}, n_{2}, n_{3}, \dots, n_{36} ⟩

will get a weight

W_{2, n_{k}}

, respectively. Then the system will rank

W_{2, n_{k}}

from small to large, and the minimum

\min W_{2, n_{k}}

is the best new face.

W_{2, n_{k}} = |S_{C, B} - S_{n_{k}, B}|

(4)

Equation (4) is the formula for weight

W_{2, n_{k}}

(refer to Table 2 for symbolic meaning), which is the similarity gap between 36 new faces and the best new face. The smaller

W_{2, n_{k}}

is, the more it means that the new face will be more similar to the best new face. After calculating

W_{2, n_{k}}

, the system will sort each weight and the smallest weight

\min W_{2, n_{k}}

is the best new face.

4. Experiment

Figure 9 shows the experimental results of this system. The input data for this experiment were obtained from members of the same family. The first column in Figure 9 is the image of missing children. These three images are of different people; they are about 3 years old. The second column is images of family members or relatives of the missing children; the third column is our system, which contains the first and second phases. The input in the first phase is an image of the missing child available before their disappearance and multiple images of family members who are related by blood (dotted box in Figure 9), and the output is the best new face image, which is a mixed image of the facial features of two blood relatives. The input in the second phase is an image of the missing child available before their disappearance and an image of the best new face. The fourth column is the predicted results (output) of our system; the fifth column is the similarity comparison between the predicted result and the expected result, and the sixth column is the expected output, which is the ground truth, the faces of the missing children at the age of 20. The system mainly uses the face compare function provided by SKEye [55] for similarity comparison, and its similarity comparison refers to Algorithm 1. The predictions of the three sisters were compared to the expected output, and the results were 77%, 76% and 77%, respectively.

From the physical appearance, it is difficult for humans to identify the gender of children under the age of three. We observe the child in the second row in Figure 9; she looks like a male, and the predicted image also looks like a male, but this does not affect the final similarity comparison of our prediction system. Because our system excludes human subjective judgments (including hairstyles) and only compares the similarity of facial features, the system will not be misled by physical appearance.

Algorithm 1 SKFace [55] Feature Comparison

Input:

F_{1}

: Features of the first face;

F_{2}

: Features of the second face;

Output: S: Similarity between

F_{1}

and

F_{2}

;

1: Load

F_{1}

and

F_{2}

;

2: Get

F_{1}

and

F_{2}

base64 code;

3: Verify whether

F_{1}

and

F_{2}

are recognized;

4: Calculate the distance between

F_{1}

and

F_{2}

;

5:

F_{1}

and

F_{2}

are converted into similarity S.

Figure 10 shows the comparison diagram of our system, CAAE [6], HRFAE [8] and IPCGAN [9]. The first line is the input child image; Line 2 is the expected output; Lines 3~6 are the prediction results of the system, CAAE, HRFAE and IPCGAN and the similarity comparison results with the expected output. The similarity can correspond to Table 3. It can be seen from Figure 10 that, compared with other aging models, this system can produce more natural and high-resolution images, and the prediction accuracy is the highest, about more than 75%, which means that this system can well predict the appearance of children when they grow up.

Our system works for non-special families, direct blood relatives, and images with intact and undamaged faces. The following will list the conditions that do not apply because these reasons may result in low similarity:

Special family, including half-brothers and half-sisters, etc.;
Non-direct blood relatives, including an aunt’s husband, uncle’s wife and cousins, etc.;
Incomplete or damaged face image, including poor image quality and face injuries, angles that are too skewed, expressions that are too exaggerated, etc.;
Twins.

5. Conclusions

This study proposes a Face Prediction System for Missing Children, which can enable parents to quickly confirm whether they have the possibility of a parent-child relationship with any missing child, hoping to help parents find the missing child. The system combines FaceNet and StyleGAN2 methods to predict the appearance of missing children at their present age through similarity comparison and style mixing. Finally, we compare this system with other aging models, including CAAE, HRFAE and IPCGAN. Experiments show that this system has the highest prediction accuracy compared with other aging models, and the prediction results are of higher picture quality and natural.

Author Contributions

Conceptualization, G.-J.H.; methodology, Z.-J.T. and D.-C.W.; software, Z.-J.T.; validation, Z.-J.T.; investigation, Z.-J.T. and D.-C.W.; resources, D.-C.W.; writing—original draft preparation, Z.-J.T. and G.-J.H.; writing—review and editing, Z.-J.T. and G.-J.H.; supervision, C.-C.C.; project administration, G.-J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Informed consent was obtained from all participants involved in the study.

Acknowledgments

This work was supported in part by the Ministry of Science and Technology (MOST) of Taiwan under Grants MOST 110-2221-E-218-002 and in part by the “Allied Advanced Intelligent Biomedical Research Center, STUST” from Higher Education Sprout Project, Ministry of Education, Taiwan, and in part by the Ministry of Science and Technology (MOST) of Taiwan under Grant MOST 110-2221-E-218-007.

Conflicts of Interest

The authors declare no conflict of interest.

References

Federal Bureau of Investigation, 2020 NCIC Missing Person and Unidentified Person Statistics. Available online: https://www.fbi.gov (accessed on 28 January 2022).
National Crime Agency, UK Missing Persons Unit. Available online: http://www.missingpersons.police.uk (accessed on 28 January 2022).
Mohanty, S.P.; Choppali, U.; Kougianos, E. Everything you wanted to know about smart cities: The Internet of things is the backbone. IEEE Consum. Electron. Mag. 2016, 5, 60–70. [Google Scholar] [CrossRef]
Lacinák, M.; Ristvej, J. Smart City, Safety and Security. Procedia Eng. 2017, 192, 522–527. [Google Scholar] [CrossRef]
Ristvej, J.; Lacinák, M.; Ondrejka, R. On Smart City and Safe City Concepts. Mob. Netw. Appl. 2020, 25, 836–845. [Google Scholar] [CrossRef] [Green Version]
Zhang, Z.; Song, Y.; Qi, H. Age Progression/Regression by Conditional Adversarial Autoencoder. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4352–4360. [Google Scholar] [CrossRef] [Green Version]
Palsson, S.; Agustsson, E.; Timofte, R.; Gool, L.V. Generative Adversarial Style Transfer Networks for Face Aging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 2165–21658. [Google Scholar] [CrossRef]
Yao, X.; Puy, G.; Newson, A.; Gousseau, Y.; Hellier, P. High Resolution Face Age Editing. In Proceedings of the International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 8624–8631. [Google Scholar] [CrossRef]
Tang, X.; Wang, Z.; Luo, W.; Gao, S. Face Aging with Identity-Preserved Conditional Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7939–7947. [Google Scholar] [CrossRef]
Tanner, J.M.; Davies, P.S. Clinical longitudinal standards for height and height velocity for North American children. J. Pediatr. 1985, 107, 317–329. [Google Scholar] [CrossRef]
Weitzman, J. Epigenetics: Beyond face value. Nature 2011, 477, 534–535. [Google Scholar] [CrossRef] [Green Version]
Marioni, R.E.; Belsky, D.W.; Deary, I.J. Association of facial ageing with DNA methylation and epigenetic age predictions. Clin. Epigenet. 2018, 10, 140. [Google Scholar] [CrossRef] [PubMed]
Richmond, S.; Howe, L.J.; Lewis, S.; Stergiakouli, E.; Zhurov, A. Facial Genetics: A Brief Overview. Front. Genet. 2018, 9, 462. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Miko, I. Gregor Mendel and the principles of inheritance. Nat. Educ. 2008, 1, 134. [Google Scholar]
Bowler, P.J. The Mendelian Revolution: The Emergence of Hereditarian Concepts in Modern Science and Society. Baltim. J. Hist. Biol. 1989, 24, 167–168. [Google Scholar]
The Human Life Cycle. Available online: https://med.libretexts.org/@go/page/1918 (accessed on 14 February 2022).
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Bing, X.; Bengio, Y. Generative adversarial networks. In Proceedings of the Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar] [CrossRef] [Green Version]
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8107–8116. [Google Scholar] [CrossRef]
Abdal, R.; Qin, Y.; Wonka, P. Image2stylegan: How to embed images into the stylegan latent space? In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 4431–4440. [Google Scholar] [CrossRef] [Green Version]
Abdal, R.; Qin, Y.; Wonka, P. Image2stylegan++: How to edit the embedded images? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [CrossRef]
Zhu, P.; Abdal, R.; Qin, Y.; Wonka, P. Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
Collins, E.; Bala, R.; Price, B.; Susstrunk, S. Editing in style: Uncovering the local semantics of gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Abdal, R.; Zhu, P.; Mitra, N.; Wonka, P. StyleFlow: Attribute-Conditioned Exploration of StyleGAN-Generated Images Using Conditional Continuous Normalizing Flows. ACM Trans. Graph. (TOG) 2021, 40, 1–21. [Google Scholar] [CrossRef]
Richardson, E.; Alaluf, Y.; Patashnik, O.; Nitzan, Y.; Azar, Y.; Shapiro, S.; Cohen-Or, D. Encoding in Style: A StyleGAN Encoder for Image-to-Image Translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2287–2296. [Google Scholar] [CrossRef]
Patashnik, O.; Wu, Z.; Shechtman, E.; Cohen-Or, D.; Lischinski, D. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. arXiv 2021, arXiv:2103.17249. [Google Scholar]
Kim, H.; Choi, Y.; Kim, J.; Yoo, S.; Uh, Y. Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 852–861. [Google Scholar] [CrossRef]
Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A Unified Embedding for Face Recognition and Clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar] [CrossRef] [Green Version]
Huang, G.B.; Ramesh, M.; Berg, T.; Learned-Miller, E. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Available online: http://vis-www.cs.umass.edu/lfw/lfw.pdf (accessed on 28 January 2022).
Baltrušaitis, T.; Robinson, P.; Morency, L.-P. OpenFace: An open source facial behavior analysis toolkit. In Proceedings of the IEEE Winter Conference on Application of Computer Vision, Lake Placid, NY, USA, 7–10 March 2016; pp. 1–10. [Google Scholar] [CrossRef] [Green Version]
Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 1701–1708. [Google Scholar] [CrossRef]
Parkhi, O.; Vedaldi, A.; Zisserman, A. Deep Face Recognition. In Proceedings of the British Machine Vision Conference (BMVC), Nottingham, UK, 1–5 September 2014; pp. 41.1–41.12. [Google Scholar] [CrossRef] [Green Version]
Cao, Q.; Shen, L.; Xie, W.; Parkhi, O.; Zisserman, A. VGGFace2: A dataset for recognising faces across pose and age. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 67–74. [Google Scholar] [CrossRef] [Green Version]
Sun, Y.; Wang, X.; Tang, X. Deep Learning Face Representation from Predicting 10,000 Classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 1891–1898. [Google Scholar] [CrossRef] [Green Version]
Sun, Y.; Wang, X.; Tang, X. Deep Learning Face Representation by Joint Identification-Verification. In Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 2 (NIPS’14), Montreal, QC, Canada, 8–13 December 2014; pp. 1988–1996. [Google Scholar]
Sun, Y.; Wang, X.; Tang, X. Deeply learned face representations are sparse, selective, and robust. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 2892–2900. [Google Scholar] [CrossRef] [Green Version]
Sun, Y.; Liang, D.; Wang, X.; Tang, X. DeepID3: Face Recognition with Very Deep Neural Networks. arXiv 2015, arXiv:1502.00873. [Google Scholar]
Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
King, D.E. Dlib-ml: A Machine Learning Toolkit. JMLR 2009, 10, 1755–1758. [Google Scholar]
Yan, M.; Zhao, M.; Xu, Z.; Zhang, Q.; Wang, G.; Su, Z. VarGFaceNet: An Efficient Variable Group Convolutional Neural Network for Lightweight Face Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 27–28 October 2019; pp. 2647–2654. [Google Scholar] [CrossRef] [Green Version]
Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2021, arXiv:1312.6114. [Google Scholar]
Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.; Frey, B. Adversarial autoencoders. arXiv 2015, arXiv:1511.05644. [Google Scholar]
Balasundaram, P.; Avulakunta, I.D. Human Growth and Development. In StatPearls [Internet]; StatPearls Publishing: Treasure Island, FL, USA, 2022. Available online: https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/books/NBK567767/ (accessed on 14 February 2022).
Bian, X.; Li, J. Conditional adversarial consistent identity autoencoder for cross-age face synthesis. Multimed. Tools Appl. 2021, 80, 14231–14253. [Google Scholar] [CrossRef]
Zhu, J.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar] [CrossRef] [Green Version]
Isola, P.; Zhu, J.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar] [CrossRef] [Green Version]
Choi, Y.; Choi, M.; Kim, M.; Ha, J.-W.; Kim, S.; Choo, J. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8789–8797. [Google Scholar] [CrossRef] [Green Version]
Shen, Y.; Gu, J.; Tang, X.; Zhou, B. Interpreting the Latent Space of GANs for Semantic Face Editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9240–9249. [Google Scholar] [CrossRef]
Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Roberts, A.M. The Complete Human Body: The Definitive Visual Guide; DK: London, UK, 2016. [Google Scholar]
Polan, E.; Taylor, D. Journey across the Life Span: Human Development and Health Promotion; F.A. Davis Co.: Philadelphia, PA, USA, 1998. [Google Scholar]
The Internet of Everything How More Relevant and Valuable Connections Will Change the World. Available online: https://www.cisco.com/c/dam/global/en_my/assets/ciscoinnovate/pdfs/IoE.pdf (accessed on 28 January 2022).
Session 2: Pillars of the IoE. Available online: https://www.open.edu/openlearn/mod/oucontent/view.php?id=48819 (accessed on 28 January 2022).
SKEye, Face Compare. Available online: https://www.sk-ai.com/Experience/face-compare (accessed on 28 January 2022).

Figure 1. Group-GAN face aging results.

Figure 2. Generator architecture of StyleGAN [18,19,20].

Figure 3. Examples that were generated by a mixture of Source A and B latent codes using StyleGAN [19,20].

Figure 4. Our system is applied to the concepts of IoE and AIoT [53,54].

Figure 5. Flowchart of image processing steps.

Figure 6. Dlib Face Alignment Module Flowchart.

Figure 7. StyleGAN2 projection image process.

Figure 8. StyleGAN2 Style Mixing result.

Figure 9. A comparison experiment of the similarity between the predicted image and the expected image.

Figure 10. Comparison diagram of our system.

Table 1. Training results of StyleGAN [19] and StyleGAN2 [20].

Configuration		FFHQ, 1024 × 1024
Configuration		FID ↓	Path Length ↓	Precision ↑	Recall ↑
A	Baseline StyleGAN [19]	4.40	212.1	0.721	0.399
B	+ Weight demodulation	4.39	175.4	0.702	0.425
C	+ Lazy regularization	4.38	158.0	0.719	0.427
D	+ Path length regularization	4.34	122.5	0.715	0.418
E	+ No growing, new G & D arch.	3.31	124.5	0.705	0.449
F	+ Large networks (StyleGAN2 [20])	2.84	145.0	0.689	0.492

The “ + ” in the table represents the experimental results based on StyleGAN (A) plus (B) to (F) configurations.

Table 2. Symbol Definition.

Symbol	Meaning
$n_{k}$	$⟨ n_{k} ⟩_{k = 1}^{36} = ⟨ n_{1}, n_{2}, n_{3}, \dots, n_{36} ⟩$ , where $n$ represents the generated new face image, and $k$ is the item.
$W_{1}$ and $W_{2}$	$W_{1}$ represents the weight value of the first phase; $W_{2}$ represents the weight value of the second phase.
$S_{x, y}$	Represents the similarity comparison value of $x$ and $y$ . $0 \leq x, y \leq 100$ .
$P_{x, y 1}$ or $P_{x, y 2}$	Represents the similarity proportion value of $S_{x, y 1}$ or $S_{x, y 2}$ among $S_{x, y 1}$ and $S_{x, y 2}$ .
$T o p 1$ and $T o p 2$	$T o p 1$ Family members with the highest similarity to the child; $T o p 2$ Family members with the second-highest similarity to the child.
$C$	Missing child.
$B$	Best new face.

Table 3. Similarity comparison results of our system.

Input	Expected	Similarity between Input and Expected
Input	Expected	Our	CAAE [6]	HRFAE [8]	IPCGAN [9]
		77%	68%	73%	74%
		76%	33%	61%	68%
		77%	72%	67%	74%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, D.-C.; Tsai, Z.-J.; Chen, C.-C.; Horng, G.-J. Development of a Face Prediction System for Missing Children in a Smart City Safety Network. Electronics 2022, 11, 1440. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics11091440

AMA Style

Wang D-C, Tsai Z-J, Chen C-C, Horng G-J. Development of a Face Prediction System for Missing Children in a Smart City Safety Network. Electronics. 2022; 11(9):1440. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics11091440

Chicago/Turabian Style

Wang, Ding-Chau, Zhi-Jing Tsai, Chao-Chun Chen, and Gwo-Jiun Horng. 2022. "Development of a Face Prediction System for Missing Children in a Smart City Safety Network" Electronics 11, no. 9: 1440. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics11091440

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of a Face Prediction System for Missing Children in a Smart City Safety Network

Abstract

1. Introduction

1.1. Status of Missing Children’s Cases

1.2. Problems of Current Face-Aging Methods

1.3. Contribution

2. Related Work

2.1. Generative Adversarial Networks (GANs)

2.2. FaceNet

2.3. Image Generation Method of Face Aging

2.3.1. VAE

2.3.2. GANs

Translation-Based Method

Condition-Based Method

3. Method

3.1. Overview of the System Architecture

3.2. Data Preprocessing

3.2.1. Dlib Face Alignment Module

3.2.2. StyleGAN2 Project Image Module

3.3. Phase 1: Filter the Best New Face Image

3.3.1. Similarity Sequence Module

3.3.2. StyleGAN2 Style Mixing Module

3.3.3. Best New Face Filter Module

3.4. Phase 2: Predicting the Age and Appearance of a Missing Child

3.4.1. FaceNet Face Compare Module

3.4.2. Best New Face Filter Module

4. Experiment

5. Conclusions

Author Contributions

Funding

Informed Consent Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI