Reliable Multi-View Deep Patent Classification

Zhang, Liyuan; Liu, Wei; Chen, Yufei; Yue, Xiaodong

doi:10.3390/math10234545

Open AccessArticle

Reliable Multi-View Deep Patent Classification

by

Liyuan Zhang

^1,2,

Wei Liu

¹,

Yufei Chen

^1,* and

Xiaodong Yue

³

¹

College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China

²

Shanghai IC Technology & Industry Promotion Center, Shanghai 201203, China

³

School of Computer Engineering and Sciences, Shanghai University, Shanghai 200444, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(23), 4545; https://0-doi-org.brum.beds.ac.uk/10.3390/math10234545

Submission received: 6 November 2022 / Revised: 21 November 2022 / Accepted: 22 November 2022 / Published: 1 December 2022

(This article belongs to the Special Issue Soft Computing and Uncertainty Learning with Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Patent classification has long been regarded as a crucial task in patent information management and patent knowledge mining. In recent years, studies combining deep learning automatic patent classification methods with deep neural networks have significantly increased. Although great efforts have been made in the patent deep classification task, they mainly focus on information extraction from a single view (e.g., title or abstract view), but few studies concern multi-view deep patent classification, which aims to improve patent classification performance by integrating information from different views. To that end, we propose a reliable multi-view deep patent classification method. Within this method, we fuse multi-view patent information at the evidence level from the perspective of evidence theory, which not only effectively improves classification performance but also provides a reliable uncertainty estimation to solve the unreliability of classification results caused by property differences and inconsistencies in the different patent information sources. In addition, we theoretically prove that our approach can reduce the uncertainty of classification results through the fusion of multiple patent views, thus facilitating the performance and reliability of the classification results. The experimental results on 759,809 real-world multi-view patent data in Shanghai, China, demonstrate the effectiveness, reliability, and robustness of our approach.

Keywords:

patent classification; multi-view deep learning; evidence theory

MSC:

68T01

1. Introduction

Patent documents collected from various domains are important intellectual resources for information and knowledge management, which can help people to understand detailed concepts and the underlying technologies of components [1,2]. With the rapid development of technology, the number of patents has significantly increased in recent years. Automatic patent classification, which divides each patent into corresponding categories by information extraction in terms of machine learning methods, has become an essential task in patent information management and patent knowledge mining [3].

Early patent classification methods were generally designed using traditional natural language processing algorithms, for example, the bag-of-words (BOW) model is used to represent patent information and then the K-nearest neighbors (KNN) model is utilized to classify patent representations [4]. Recently, combining deep learning with deep patent classification methods that classify patent information in terms of deep neural networks (DNNs) has become an important direction as it can effectively improve patent classification performance.

However, although great efforts have been made in patent deep classification, they mainly focus on information extraction from a single view (e.g., title or abstract view), and few studies concern multi-view deep patent classification, which ignores the complementary information in the latent patent feature space. Taking the real-world patent information collected from Shanghai, China, shown in Table 1 as an example, the designed single-title-view-based patent classification model will generate incorrect results since the same title-view patent information has different categories (shown as Case 1 and Case 2 in Table 1). Moreover, due to the inconsistencies and unknowns of patent information sources (shown as Case 3 and Case 4 in Table 1; the title view and abstract view contain abnormal information), the multi-view deep classification results may be uncertain and unreliable [5] because the traditional deep neural networks focus on the accuracy of the classifications but ignore the credibility of the results, which is a great limitation of patent classification applications. Thus, it is vital to devise a multi-view deep patent classification model that not only improves patent classification performance by integrating information from different views but also provides a reliable uncertainty estimation to measure the reliability of the classification results, which can express “I do not know” for uncertain predictions. To achieve this goal, we revisit multi-view deep patent classification from the perspective of evidence theory to generate accurate and reliable patent classification results in this paper.

Evidence theory, also referred to as Dempster–Shafer Theory (DST) [6,7], is a generalization of Bayesian theory to subjective probabilities [8] for reasoning with partial, unreliable, incomplete, deceptive, or conflicting evidence. In contrast to existing uncertainty-based Bayesian methods, such as the Laplacian approximation [9], Markov Chain Monte Carlo (MCMC) [10], and variational techniques [11,12,13], which are computationally expensive, evidence theory directly quantifies different dimensions of uncertainty such as vacuity (i.e., lack of evidence), which is beneficial for providing reliable uncertainty estimations for classification results [14].

From the perspective of evidence theory, we devise a reliable multi-view deep patent classification (RMDPC) method. Within our method, we fuse multi-view patent information at the evidence level instead of the feature or output levels as done previously. Concretely, we use a BERT neural network to extract the information from each patent view and then adopt evidence theory to represent information collected from each patent view as a subjective opinion that contains beliefs about the truth of propositions under degrees of uncertainty. By a simple fusion strategy, we effectively integrate these opinions to potentially improve both classification performance and reliability. Overall, the RMDPC method provided reliable multi-view classifications, which were validated by sufficient empirical results. In summary, the contributions of this paper are:

(1): We first propose a reliable multi-view deep patent classification (RMDPC) method that integrates the complementary information from each view in the latent patent feature space to improve classification performance and guarantee the reliability of the patent classification results.
(2): By constructing a simple and effective fusion strategy using evidence theory, RMDPC can provide a theoretical reliability guarantee that reduces the overall uncertainty of the patent classification results while integrating each patent view effectively.
(3): The experimental results on a large-scale real-world patent multi-view dataset including 759,809 data validate the superiority of our method in terms of accuracy, reliability, and robustness.

The rest of this article is organized as follows. In Section 2, the related works on deep patent classification, multi-view deep learning, uncertainty of deep learning, and evidence theory are briefly reviewed. Then, the proposed reliable multi-view deep patent classification (RMDPC) method is illustrated in detail in Section 3. Then, the effectiveness and reliability of the RMDPC are demonstrated on a large-scale real-world patent multi-view dataset in Section 4. Finally, Section 5 concludes this manuscript.

2. Related Works

2.1. Deep Patent Classification

Recently, with the development of deep learning, deep patent classification methods that combine patent classification with DNNs has become an important direction for automatic patent classification [15,16,17,18]. Due to the inherent properties of the patent, textual information is often the most widely exploited factor that distinguishes the different types of patents. Therefore, the automatic deep patent classification method always builds a deep neural network, such as a convolutional neural network (CNN) [19,20], residual network (ResNet) [21], BERT [22], graph neural network (GNN) [23,24], and so on, to extract and classify the textual information. For example, PatentBERT [22] utilizes the pre-training language model BERT to extract the features and classify them into the corresponding classes. DeepPatent [2] develops a convolutional neural network to recognize the type of patent document. However, few works consider the patent classification task from the perspective of multi-view learning. Patent2vec [24] devises a novel framework for patent classification from a multi-view graph-based perspective, which is only suitable for the graph’s patent data and lacks a reliability guarantee. To that end, we propose a reliable multi-view deep patent classification method that fuses multiple views to potentially improve both classification performance and reliability.

2.2. Multi-View Deep Learning

In recent years, many multi-view deep learning methods have been proposed to extract the correlations among multiple views [25,26,27,28,29,30]. The deep CCA (DCCA) method [25] captures the nonlinear relationships between different views. The deep canonically correlated autoencoder (DCCAE) [27] trains autoencoders to obtain common representations. The enhanced trusted multi-view classification (ETMC) network [28] focuses on the trusted multi-view classification results. The MvNNcor method [29] seamlessly embeds and fuses various view-specific information and deep interaction information to promote multi-view performance.

2.3. Uncertainty in Deep Learning

Aleatoric and epistemic uncertainties are always considered in deep learning. In general, uncertainty-based models can be divided into two main categories, i.e., Bayesian and non-Bayesian models. Bayesian neural networks (BNNs) replace the deterministic weight parameters with distributions to estimate the uncertainty. For example, MC-dropout [31,32] performs dropout sampling from the weights during training and testing. Bayesian methods are always computationally expensive. To avoid the computational cost, a number of non-Bayesian methods have been proposed. The deep ensemble method [33] trains and fuses multiple deep neural networks to improve classification performance. Evidential neural networks [14,34,35] model uncertainty by introducing subjective logic.

2.4. Uncertainty in Evidence Theory

Evidence theory, also referred to as Dempster–Shafer Theory (DST), directly models inherent uncertainty in information resulting from unreliable, incomplete, deceptive, and/or conflicting evidence and combines evidence from different sources with various fusion operators to produce new representations [36]. Typically, the considered uncertainty in evidence theory is vacuity (caused by a lack of evidence), which is used in out-of-distribution detection in deep learning [14]. Recently, more dimensions of uncertainty have been studied such as dissonance (due to conflicting evidence) and consonance (due to evidence about the composite subsets of state values) [37].

3. Method

3.1. Overview

In this section, we describe how to construct the RMDPC model in detail. Figure 1 presents the framework of our reliable multi-view deep patent classification method, which includes three major components: (1) the backbone of the feature extractor f that maps the input patent view to the embedded representations, (2) the evidential neural network head g that collects evidence from the patent embeddings, and (3) a multi-view aggregation strategy that fuses evidence from multiple views into a unified patent evidence representation, and the aggregated evidence is then transformed into a patent opinion that contains the probability

P

and uncertainty degree u of the input patent based on evidence theory. Within this framework, we fuse multi-view patent information at the evidence level from the perspective of evidence theory, which can not only effectively improve the classification performance but also provide a reliable uncertainty estimation to solve the unreliability of the classification results caused by the property differences and inconsistencies of the different patent information sources. The details of our RMDPC model are described in the following subsections.

3.2. Reliable Multi-View Deep Patent Classification

Given a number of patent multi-view inputs

X = {\{{\{x_{i}^{v}\}}_{i = 1}^{n}\}}_{v = 1}^{m}

, where n is the number of samples and m is the number of patent views. The RMDPC firstly utilizes the backbone

{\{f^{v} (θ)\}}_{v = 1}^{m}

parameterized by

θ

to extract the features from each view of the input so we have the patent feature embedding

f^{v} (x_{i}^{v}, θ)

. Note that the backbone can be various neural networks without a softmax layer. In this paper, considering the superiority of the processing of text, we use a BERT neural network as the backbone to extract the features from multi-view patent information. Then, the patent feature embedding

f^{v} (x_{i}^{v}, θ)

is fed into the evidential head.

In contrast to existing deep learning-based models that typically use a softmax layer on top of deep neural networks (DNNs) for classification, which produces over-confident outputs in false prediction [38], the evidential head introduces the evidence framework of the Dempster–Shafer theory (DST) and subjective logic (SL) [36] to overcome the limitations of softmax-based DNNs. It provides a simple and efficient way to jointly formulate multi-class classification and uncertainty modeling through a minor change that replaces the softmax layer with an activation layer (i.e., ReLu layer) to produce a non-negative output, termed evidence [14].

Formally, for the K-classification task, the feature embedding

f^{v} (x_{i}^{v}, θ)

from the backbone can be transformed into evidence

e_{i}^{v}

by the evidential head from each patent view in terms of the following equation:

e_{i}^{v} = g^{v} (f^{v} (x_{i}^{v}, θ)),

(1)

where

e_{i}^{v} = \{e_{i 1}^{v}, e_{i 2}^{v}, \dots, e_{i K}^{v}\}

and

g^{v} (\cdot)

is the v-th evidential function to keep evidence

e_{i}^{v}

non-negative. In particular, we assume that the class probability follows a prior Dirichlet distribution

Dir (P |α)

, which is parameterized by

α \in R^{K}

and given by

D i r (P |α) = \{\begin{matrix} \frac{1}{B (α)} \prod_{i = 1}^{K} p_{i}^{α_{i} - 1} & for p \in S_{K}, \\ 0 & otherwise, \end{matrix}

(2)

where

S_{K}

is the K-dimensional unit simplex,

S_{K} = \{P |\sum_{i = 1}^{K} p_{i} = 1 and 0 \leq p_{1}, \dots, p_{K} \leq 1\}

(3)

and

B (α)

is the K-dimensional multinomial beta function.

Based on the DST and SL theories, we have the

α_{i}^{v}

linked to the learned evidence

e_{i}^{v}

by the equality

α_{i}^{v} = e_{i}^{v} + 1

, where

α_{i}^{v} = \{α_{i 1}^{v}, α_{i 2}^{v}, \dots, α_{i K}^{v}\}

. In the inference, the predicted probability

P^{v} = \{p_{i 1}^{v}, p_{i 2}^{v}, \dots, p_{i K}^{v}\}

of the

k

-th class is calculated as

p_{i k}^{v} = α_{i k}^{v} / S_{i}^{v}

, where

S_{i}^{v} = \sum_{k = 1}^{K} α_{i k}^{v}

is the Dirichlet strength. Then, we have the predictive uncertainty

u_{i}^{v} = K / S_{i}^{v}

to represent the vacuity of evidence for each patent view. The evidential head uses the Dirichlet distribution parameterized over evidence to represent the density of such a probability assignment and the predictions of the learner as a distribution over the possible softmax outputs, which models the second-order probabilities to indicate the uncertainty of the neural network results to enable the model to become “know unknown”.

The evidence

e_{i}^{v}

from a single patent view is collected by each evidential head. Now, we should focus on the patent classification with multiple patent views. Given a set of m evidence

{\{e_{i}^{v}\}}_{v = 1}^{m}

for the i-th sample, we devise a simple and efficient aggregation strategy for multi-view deep patent classification with evidence theory, which is shown in Definition 1:

Definition 1

(Evidence-theory-based aggregation strategy for multi-view deep patent classification).The aggregation strategy for multi-view deep patent classification with evidence theory simply consists of evidence parameter addition. Given the i-th sample with m multiple views for the K-classification task, we can obtain a set of evidence

{\{e_{i}^{v}\}}_{v = 1}^{m}

, which is collected from m evidential heads. For

k = 1, \dots, K

,

e_{i}^{v} = \{e_{i 1}^{v}, \dots, e_{i k}^{v}\}

, we have

e_{i k} = \sum_{v = 1}^{m} e_{i k}^{v}

, which represents the process of multi-view aggregation, and

S_{i} = \sum_{k = 1}^{K} (e_{i k} + 1)

, which is the aggregated Dirichlet strength.

Following Definition 1, we combine the evidence from multiple patent views into the aggregated representation

α_{i} = \{α_{i 1}, \dots, α_{i k}\}

, where

α_{i k} = e_{i k} + 1

. Then, we have

P (α_{i}) = \{p_{i 1}, \dots, p_{i k}\}

to produce the final probability of each patent class, where

p_{i k} = α_{i k} / S_{i}

. After we obtain the aggregated representation

α_{i}

, we discuss how to train our reliable multi-view deep patent classification model.

For the classification task, the loss function aims to minimize the generalization risk

R = E [L (P (α), Y)]

, where

P (α)

indicates the predictive class probability,

Y

represents the ground truth and

L (\cdot)

refers to the certain loss function, e.g., mean squared error or cross-entropy loss. However, the generalization risk is hard to compute as the data distribution is unknown. The most common approach is to approximate the generalization risk by minimizing the empirical risk on the labeled data, i.e.,

min \hat{R} = \sum_{i = 1}^{n} L (P (α_{i}), y_{i})

. In this work, we adopt this method with a cross-entropy loss function.

For conventional neural networks, the cross-entropy loss is usually employed as

L_{c e} = - \sum_{j = 1}^{K} y_{i j} log (p_{i j}),

(4)

where

y_{i j}

is the true label and

p_{i j}

is the predicted probability of the i-th sample for class j. Within our model, given the evidence

{\{e_{i}^{v}\}}_{v = 1}^{m}

of the i-th sample with m views obtained from the evidential head, we can obtain the overall evidence

e_{i}

after the fusion of evidence from multiple patent data views. Then, we can obtain the parameter

α_{i}

(e.g.

α_{i} = e_{i} + 1

) and the corresponding class probability

P_{i}

(i.e.,

P_{i} = α_{i} / S_{i}

) of the Dirichlet distribution

Dir (P_{i} |α_{i})

. Considering the

Dir (P_{i} |α_{i})

as a prior on the likelihood, we modify Equation (4) into the following form:

\begin{matrix} L_{c e} (α_{i}) & = \int [\sum_{j = 1}^{K} - y_{i j} log (p_{i j})] \frac{1}{B (α_{i})} \prod_{j = 1}^{K} p_{i j}^{α_{i j} - 1} d p_{i} \\ = \sum_{j}^{K} y_{i j} (ψ (S_{i}) - ψ (α_{i j})), \end{matrix}

(5)

where

ψ (\cdot)

is the digamma function and

B (α_{i})

is the K-dimensional multinomial beta function. The modified cross-entropy loss

L_{c e} (α_{i})

aims to encourage the evidence of the ground-truth category to reach a large value; however, it cannot guarantee that less evidence will be generated for incorrect categories. To address this limitation, we introduce a Kullback–Leibler (KL) divergence term into our loss function that shrinks the evidence for the incorrect categories to zero, which can regularize the predictive distribution by penalizing those divergences from the “I do not know” state that do not contribute to data fitting. The KL divergence term is as follows:

L_{K L} (α_{i}) = K L [D i r (P_{i} | {\tilde{α}}_{i}) | | D i r (P_{i} | 〈1, \dots, 1〉)],

(6)

where

{\tilde{α}}_{i} = y_{i} + (1 - y_{i}) ⊙ α_{i},

(7)

and

K L [D i r (P_{i} | {\tilde{α}}_{i}) | | D i r (P_{i} | 〈1, \dots, 1〉)]

is calculated as

\begin{matrix} log (\frac{Γ (\sum_{k = 1}^{K} {\tilde{α}}_{i k})}{Γ (K) \prod_{k = 1}^{K} Γ ({\tilde{α}}_{i k})}) + \sum_{k = 1}^{K} ({\tilde{α}}_{i k} - 1) [ψ ({\tilde{α}}_{i k}) - ψ (\sum_{j = 1}^{K} {\tilde{α}}_{i j})], \end{matrix}

(8)

where

Γ (\cdot)

is a gamma function,

ψ (\cdot)

is a digamma function, and

D i r (P_{i} | 〈1, \dots, 1〉)

means the uniform Dirichlet distribution.

By synthesizing the cross-entropy loss

L_{c e} (α_{i})

and the KL divergence regularization term

L_{K L} (α_{i})

, the overall optimization problem of our RMDPC model is formulated as

min_{α} L (α) = \sum_{i = 1}^{n} L_{c e} (α_{i}) + λ_{t} L_{K L} (α_{i}),

(9)

where

λ_{t}

is the balance factor. In general, we consider

λ_{t}

as a warm-up function, which gradually increases the value of

λ_{t}

to prevent the model from paying too much attention to the KL divergence term

L_{K L} (α_{i})

in the initial stage of training and avoid the premature convergence to the uniform distribution for the misclassified samples, which may be correctly classified in future epochs.

Here, we also give the overall algorithm framework of the RMDPC method to show the optimization process of our method in Algorithm 1.

Algorithm 1 Algorithm for Reliable Multi-view Deep Patent Classification (RMDPC)

1:: /*Training*/
2:: Input: Multi-view patent dataset: $D = {\{{\{x_{i}^{v}\}}_{v = 1}^{m}, y_{i}\}}_{i = 1}^{n}$ .
3:: Initialize: Initialize the parameters of the neural network $θ$ .
4:: while not converged do
5:: for $v = 1 : m$ do
6:: $f^{v} (x_{i}^{v}, θ) \leftarrow$ feature embedding of BERT neural network;
7:: ${e_{i}}^{v} \leftarrow$ evidence from the evidential head according to Equation (1);
8:: end for
9:: $e_{i} \leftarrow$ aggregation in terms of Definition 1;
10:: $α_{i} \leftarrow e_{i} + 1$ ;
11:: Obtain the overall loss by updating $α$ with Equation (9);
12:: Update the neural networks with a gradient descent according to Equation (9);
13:: end while
14:: Output: parameters of neural networks.
15:: /*Test*/
16:: Obtain the patent class probability and corresponding uncertainty degree.

3.3. Theoretical Study

Existing deep patent classification methods only focus on a single view and cannot provide a confidence degree to express “I don’t know” if we feed in abnormal patent information after training the network on a set of normal patent information. To this end, we devise a reliable multi-view deep patent classification model that not only effectively improves classification performance but also generates reliable multi-view patent classification results by reducing overall uncertainty after the fusion of multiple patent data views. In this section, we theoretically prove that our model can achieve reliable classification results, whose overall uncertainty decreases after fusing multiple data views.

In terms of the proposed evidence-theory-based aggregation strategy shown in Definition 1 in this paper, we can easily obtain the following theorem.

Theorem 1

(Overall Uncertainty Minimization).The overall uncertainty u of the multi-view patent classification results is smaller than the uncertainty

u^{v}

of any single patent view

x^{v}

after the fusion of multiple patent data views.

Proof.

Assume that the

e_{i}^{v}

is the evidence obtained from the v-th evidential head for the i-th sample. Let

e_{i}

be the aggregated evidence after the arrival of the v-th patent view in terms of the evidence-theory-based aggregation strategy. Then, the overall uncertainty

u_{i}

is updated as

\begin{matrix} u_{i} & = 1 - \sum_{j = 1}^{K} \frac{e_{i j}}{S_{i}} \\ = \frac{K}{S_{i}} \\ = \frac{K}{S_{i}^{v} + \sum_{v^{'} \neq v} \sum_{j = 1}^{K} e_{i j}^{v^{'}}} . \end{matrix}

(10)

which is smaller than the uncertainty of the single-patent-view results

u_{i}^{v} = \frac{K}{S_{i}^{v}}

since

S_{i}^{v} + \sum_{v^{'} \neq v} \sum_{j = 1}^{K} e_{i j}^{v^{'}} > S_{i}^{v} .

(11)

□

Theorem 1 theoretically guarantees that our RMDPC model can reduce overall uncertainty after the aggregation of multiple patent data views, which can produce reliable multi-view patent classification results.

4. Experiments

In this section, we extensively evaluate the proposed method on a real-world multi-view patent dataset and compare it with existing deep patent classification methods, uncertainty-aware patent classification methods, and multi-view classification methods. Furthermore, we also provide an analysis of the reliability estimation on noisy patent data. The experimental results show that our algorithm achieves state-of-the-art performance.

4.1. Datasets

In this part, we introduce the real-world multi-view patent dataset in our work, which contains 759,809 Chinese patent samples with two views consisting of a patent title view and a patent abstract view. This real-world multi-view patent dataset was collected from Shanghai, China. According to the International Patent Classification (IPC) rules established by the Strasbourg Agreement 1971, a multi-view patent dataset is divided into eight categories involving (A) human necessities; (B) performing operations; transporting; (C) chemistry; metallurgy; (D) textiles; paper; (E) fixed construction; (F) mechanical engineering; lighting; heating; weapons; blasting; (G) physics; and (H) electricity.

4.2. Implementations

For our algorithm, we use a BERT as the backbone to extract the information from multiple patent data views and then use the fully connected networks with Batch normalization as the evidential heads to collect the evidence from all datasets. The Adam optimizer [39] is used to train the network, where the

l_{2}

-norm regularization is set to 1

e^{- 5}

. We then use 5-fold cross-validation to select the learning rate from

\{1 e^{- 4}, 3 e^{- 4}, 1 e^{- 3}, 3 e^{- 3}\}

. For all datasets, 20% of the samples are used for the test sets, and there are no data augmentation or preprocessing methods used in our experiments, except that we set the biggest padding size for each view to 32 and 128, respectively. Furthermore, we run each method five times and report the average values in the figures or the mean values and standard deviations in the tables. The model is implemented by PyTorch on an NVIDIA A100 with a GPU with 40 GB memory.

4.3. Comparative Methods

We compare our method with existing patent deep classification methods, uncertainty-based classification methods, and state-of-the-art multi-view classification methods.

The three patent deep classification methods are:

DCPC: The CNN-based deep patent classification method extracts the information using a traditional conventional neural network.
DRPC: The ResNet-based deep patent classification method uses a ResNet neural network to obtain the information from the patent data to produce the classification results.
DBPC: The BERT-based deep patent classification method utilizes a BERT neural network as the information extractor to classify the patent data.

The three uncertainty-based classification methods are:

MCDO: The Monte Carlo dropout method [32] considers the dropout network training as the approximate inference in a Bayesian neural network.
DE: The deep ensemble method [33] generates multiple deep sub-models and combines the predictions of these sub-models.
EDL: The evidential deep learning method [14] forms a subjective opinion in terms of evidence theory to obtain predictions.

The four multi-view classification methods are:

DCCA: A deep canonically correlated analysis [25] obtains the correlations through deep neural networks, which maximizes the correlations among the views.
DCCAE: Deep canonically correlated autoencoders [27] employ autoencoders to search for a common representation.
MVTCAE: A multi-view total correlation autoencoder [28] obtains the correlations among multiple views using an autoencoder to learn a complete representation.
ETMC: Enhanced trusted multi-view classification [29] focuses on the uncertainty estimation problem and produces reliable classification results.

4.4. Ablation Study

In this part, we first conduct a detailed ablation study to demonstrate the effectiveness of our main technical components, which consist of the multi-view fusion strategy and KL divergence loss term. Except for the results shown in the first and second rows in Table 2, which indicate the best accuracy among each single patent view, we evaluate these two components on a multi-view patent dataset with all the views. There are four combinations of these major components. As shown in Table 2, our RMDPC outperformed all other combinations over five runs, which validates the effectiveness of each of the components of our method.

4.5. Comparison with the Patent Deep Classification Methods Using the Best View

In this part, we first compare our RMDPC model with three existing patent deep classification methods. The detailed experimental results are shown in Table 3. Since most current patent deep classification methods only focus on a single patent view, we report the experimental results of each method with the best-performing patent view in terms of accuracy. As shown in Table 3, our model outperformed other patent deep classification methods and accuracy was improved by about 6% compared to the second-best model (DBPC), which verifies the improved performance of the proposed multi-view patent deep classification method.

4.6. Comparison with the Uncertainty-Based Classification Methods Using the Best View

Then, in this subsection, we also compare our RMDPC model with three uncertainty-based deep classification methods. For all of these uncertainty-based deep classification methods, we use a BERT as the backbone, which is similar to our method. The detailed experimental results are shown in Table 4. Since most current patent deep classification methods only focus on a single patent view, we report the experimental results of each method with the best-performing patent view in terms of accuracy. As shown in Table 4, our model had better performance compared with the other uncertainty-based deep classification methods and increased accuracy of nearly 3% compared to the second-best model (EDL), which indicates that the proposed RMDPC is more effective than the other single-view uncertainty-based deep classification methods.

4.7. Comparison with the State-of-the-Art Multi-View Deep Classification Methods

Now, we focus on a performance comparison of the multi-view deep classification methods. In this section, we compare our RMDPC model with four state-of-the-art multi-view deep classification methods. For all of these multi-view deep classification methods, we use a BERT as the backbone, which is similar to the RMDPC method. The detailed experimental results are shown in Table 5. We report the experimental results of each method with the fusion of multiple patent views in terms of accuracy. As shown in Table 5, our model outperformed other multi-view deep classification methods and showed an improvement of about 2% compared to the second-best multi-view deep classification method (ETMC), which shows that the fusion of our method is more effective than other multi-view methods.

4.8. Reliability Estimation

Our RMDPC method is based on evidence theory, which can model the uncertainty of a multi-view patent classification. In this part, we conduct qualitative experiments to provide some insights into the estimated uncertainty, which can evaluate the uncertainty estimation performance of our method.

We first randomly sampled 1000 instances from the multi-view testing patent data with five runs, and then added noise to half of these testing patent samples for two views. Similar to the work of [40], the noise vectors (denoted by

ϵ^{v}

) were sampled from the Gaussian distribution

N (0, I)

. Then, we added the noise vectors, i.e., the i-th sample for the v-th view

{\tilde{x}}_{i}^{v} = x_{i}^{v} + ϵ_{i}^{v}

. Figure 2 shows the average uncertainty histogram of the unpolluted testing patent data (green) and noisy patent data (red) with five runs. We can observe that by adding noise vectors to the testing data, the overall uncertainty of the noisy data significantly increased. This means that the uncertainty is associated with the sample quality and our method can capture the uncertainty of patent data, which verifies the reliability of our RMDPC method.

5. Conclusions

Existing patent deep classification methods only consider a single view, which ignores the complementary information in the latent patent feature space. Moreover, softmax-based patent deep classification always produces over-confident outputs in false predictions, which makes the patent classification results unreliable. To that end, we propose a reliable multi-view deep patent classification (RMDPC) method, which can not only improve performance by fusing multiple patent data views but also generate reliable patent classification results. Furthermore, the theoretical study shows that the RMDPC method can reduce overall uncertainty after the fusion of multiple data views, which theoretically guarantees the reliability of the method. The experimental results on large-scale real-world multi-view patent data validate the effectiveness, reliability, and robustness of the proposed multi-view deep patent classification method.

Although our method has the advantage of a supervised patent classification task, it is difficult to process the patent data with insufficient labels. In future work, we will propose a novel reliable semi-supervised multi-view deep patent classification method to address the patent classification problem with limited labels.

Author Contributions

Conceptualization, L.Z., Y.C. and X.Y.; methodology, L.Z. and W.L.; software, L.Z.; validation, L.Z.; formal analysis, L.Z. and W.L.; investigation, L.Z.; resources, L.Z. and W.L.; data curation, L.Z.; writing—original draft preparation, L.Z. and W.L.; visualization, L.Z.; supervision, Y.C. and X.Y.; project administration, Y.C. and X.Y.; funding acquisition, Y.C. and X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of Shanghai (NO.21ZR1423900), Science and Technology Innovation Program of Shanghai (NO.22511101902), and Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province, China (No.CICIP2021001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request due to privacy restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, Z.; Tate, D.; Lane, C.; Adams, C. A framework for automatic TRIZ level of invention estimation of patents using natural language processing, knowledge-transfer and patent citation metrics. Comput. Aided Des. 2012, 44, 987–1010. [Google Scholar] [CrossRef]
Li, S.; Hu, J.; Cui, Y.; Hu, J. DeepPatent: Patent classification with convolutional neural networks and word embedding. Scientometrics 2018, 117, 721–744. [Google Scholar] [CrossRef]
Zhang, L.; Li, L.; Li, T. Patent mining: A survey. Acm Sigkdd Explor. Newsletter 2015, 16, 1–19. [Google Scholar] [CrossRef]
Larkey, L. Some issues in the automatic classification of US patents. In Proceedings of the Working Notes for the AAAI-98 Workshop on Learning for Text Categorization, Madison, WI, USA, 27 July 1998; pp. 87–90. [Google Scholar]
Liu, W.; Yue, X.; Chen, Y.; Denoeux, T. Trusted Multi-View Deep Learning with Opinion Aggregation. Proc. Aaai Conf. Artif. Intell. 2022, 36, 7585–7593. [Google Scholar] [CrossRef]
Denœux, T.; Younes, Z.; Abdallah, F. Representing uncertainty on set-valued variables using belief functions. Artif. Intell. 2010, 174, 479–499. [Google Scholar] [CrossRef] [Green Version]
Shafer, G. A Mathematical Theory of Evidence; Princeton University Press: Princeton, NJ, USA, 1976. [Google Scholar]
Dempster, A.P. A generalization of Bayesian inference. J. R. Stat. Soc. Ser. Methodol. 1968, 30, 205–232. [Google Scholar] [CrossRef]
MacKay, D.J.C. A practical Bayesian framework for backpropagation networks. Neural Comput. 1992, 4, 448–472. [Google Scholar] [CrossRef] [Green Version]
Neal, R.M. Bayesian Learning for Neural Networks; Springer Science and Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Graves, A. Practical variational inference for neural networks. Adv. Neural Inf. Process. Syst. 2011, 24, 2348–2356. [Google Scholar]
Ranganath, R.; Gerrish, S.; Blei, D. Black box variational inference. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, Reykjavik, Iceland, 22–25 April 2014; pp. 814–822. [Google Scholar]
Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight uncertainty in neural networ. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1613–1622. [Google Scholar]
Sensoy, M.; Kaplan, L.; Kandemir, M. Evidential deep learning to quantify classification uncertainty. Adv. Neural Inf. Process. Syst. 2018, 31, 3179–3189. [Google Scholar]
Krestel, R.; Chikkamath, R.; Hewel, C.; Risch, J. A survey on deep learning for patent analysis. World Pat. Inf. 2021, 65, 102035. [Google Scholar] [CrossRef]
Yoo, Y.; Lim, D.; Heo, T.S. Solar cell patent classification method based on keyword extraction and deep neural network. arXiv 2021, arXiv:2109.08796. [Google Scholar]
Haghighian Roudsari, A.; Afshar, J.; Lee, W.; Lee, S. PatentNet: Multi-label classification of patent documents using deep learning based language understanding. Scientometrics 2022, 127, 207–231. [Google Scholar] [CrossRef]
Roudsari, A.H.; Afshar, J.; Lee, S.; Lee, W. Comparison and analysis of embedding methods for patent documents. In Proceedings of the 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju Island, Republic of Korea, 17–20 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 152–155. [Google Scholar]
Hu, J.; Li, S.; Hu, J.; Yang, G. A hierarchical feature extraction model for multi-label mechanical patent classification. Sustainability 2018, 10, 219. [Google Scholar] [CrossRef]
Abdelgawad, L.; Kluegl, P.; Genc, E.; Falkner, S.; Hutter, F. Optimizing neural networks for patent classification. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Cham, Switzerland, 2019; pp. 688–703. [Google Scholar]
Kucer, M.; Oyen, D.; Castorena, J.; Wu, J. DeepPatent: Large scale patent drawing recognition and retrieval. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2022; pp. 2309–2318. [Google Scholar]
Roudsari, A.H.; Afshar, J.; Lee, C.C.; Lee, W. Multi-label patent classification using attention-aware deep learning model. In Proceedings of the 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Republic of Korea, 19–22 February 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 558–559. [Google Scholar]
Tang, P.; Jiang, M.; Xia, B.N.; Pitera, J.W.; Welser, J.; Chawla, N.V. Multi-label patent categorization with non-local attention-based graph convolutional network. Proc. AAAI Conf. Artif. Intell. 2020, 34, 9024–9031. [Google Scholar] [CrossRef]
Fang, L.; Zhang, L.; Wu, H.; Xu, T.; Zhou, D.; Chen, E. Patent2Vec: Multi-view representation learning on patent-graphs for patent classification. World Wide Web 2021, 24, 1791–1812. [Google Scholar] [CrossRef]
Andrew, G.; Arora, R.; Bilmes, J.; Livescu, K. Deep canonical correlation analysis. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1247–1255. [Google Scholar]
Zhang, C.; Cui, Y.; Han, Z.; Zhou, J.T.; Fu, H.; Hu, Q. Deep partial multi-view learning. IEEE Trans. Pattern Anal. Mach. Intell. 2020; early access. [Google Scholar]
Wang, W.; Arora, R.; Livescu, K.; Bilmes, J. On deep multi-view representation learning. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1083–1092. [Google Scholar]
Han, Z.; Zhang, C.; Fu, H.; Zhou, J.T. Trusted Multi-View Classification with Dynamic Evidential Fusion. In IEEE Transactions on Pattern Analysis and Machine Intelligence; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
Xu, J.; Li, W.; Liu, X.; Zhang, D.; Liu, J.; Han, J. Deep embedded complementary and interactive information for multi-view classification. Proc. AAAI Conf. Artif. Intell. 2020, 34, 6494–6501. [Google Scholar] [CrossRef]
Xu, S.; Chen, Y.; Ma, C.; Yue, X. Deep evidential fusion network for medical image classification. Int. J. Approx. Reason. 2022, 150, 188–198. [Google Scholar] [CrossRef]
Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 1050–1059. [Google Scholar]
Gal, Y.; Ghahramani, Z. Bayesian convolutional neural networks with Bernoulli approximate variational inference. arXiv 2015, arXiv:1506.02158. [Google Scholar]
Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. Adv. Neural Inf. Process. Syst. 2017, 30, 6402–6413. [Google Scholar]
Zhou, X.; Yue, X.; Xu, Z.; Denoeux, T.; Chen, Y. PENet: Prior evidence deep neural network for bladder cancer staging. Methods 2022, 207, 20–28. [Google Scholar] [CrossRef]
Zhou, X.; Yue, X.; Xu, Z.; Denoeux, T.; Chen, Y. Deep Neural Networks with Prior Evidence for Bladder Cancer Staging. In Proceedings of the 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Houston, TX, USA, 9–12 December 2021; pp. 1221–1226. [Google Scholar]
Jøsang, A. Subjective Logic: A Formalism for Reasoning under Uncertainty; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Jøsang, A.; Cho, J.H.; Chen, F. Uncertainty characteristics of subjective opinions. In Proceedings of the 21st International Conference on Information Fusion (FUSION), Cambridge, UK, 10–13 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1998–2005. [Google Scholar]
Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Geng, Y.; Han, Z.; Zhang, C.; Hu, Q. Uncertainty-aware multi-view representation learning. Proc. AAAI Conf. Artif. Intell. 2021, 35, 7545–7553. [Google Scholar] [CrossRef]

Figure 1. The framework of the reliable multi-view deep patent classification (RMDPC) method. (1) The patent embedding representation of each patent view is captured by backbone neural networks. (2) The estimated evidence from each patent embedding representation is obtained using the evidential head. (3) The evidence from different patent views is accumulated and combined into one unified evidence based on the aggregation strategy with evidence theory and then the aggregated evidence is transformed into a patent opinion that contains the belief and uncertainty degrees based on evidence theory.

Figure 2. Uncertainty estimation on multi-view patent data.

Table 1. The limitation of existing single-view-based patent classification methods and multi-view-based patent classification methods.

Case	Patent Title View	Patent Abstract View	Category
1	Eco-board	Steel plate with a layer of an inorganic material fireproof composite board glued on each side.	B32B3/18
2	Eco-board	A protective layer of an inorganic nano waterproof fireproof material.	E04F13/075
3	@&$% $\hat{7}$ @%	A new environmentally friendly aluminum alloy material.	Inconsistency sources
4	@&$% $\hat{7}$ @%	(@$&)(*$%% $\hat{@}$ @!$&%	Unknown sources

Table 2. Ablation study on Chinese multi-view patent data, “✔” means RMDPC with the corresponding component, “-” means “not applied”.

Main Components		Metric
Fusion	KL Divergence	ccuracy (%)
-		82.93 ± 0.60
-	✔	83.12 ± 0.01
✔		84.31 ± 0.20
✔	✔	85.21 ± 0.50

Table 3. Comparison with other patent deep classification methods.

Method	Accuracy (%)
DCPC	76.75 ± 0.02
DRPC	78.50 ± 0.12
DBPC	79.55 ± 0.32
RMDPC	85.21 ± 0.50

Table 4. Comparison with other uncertainty-based deep classification methods.

Method	Accuracy (%)
MCDO	80.95 ± 0.12
DE	82.00 ± 0.01
EDL	82.93 ± 0.60
RMDPC	85.21 ± 0.50

Table 5. Comparison with other multi-view deep classification methods.

Method	Accuracy (%)
DCCA	78.65 ± 0.01
DCCAE	79.00 ± 0.00
MVTCAE	81.16 ± 0.21
ETMC	83.79 ± 0.43
RMDPC	85.21 ± 0.50

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, L.; Liu, W.; Chen, Y.; Yue, X. Reliable Multi-View Deep Patent Classification. Mathematics 2022, 10, 4545. https://0-doi-org.brum.beds.ac.uk/10.3390/math10234545

AMA Style

Zhang L, Liu W, Chen Y, Yue X. Reliable Multi-View Deep Patent Classification. Mathematics. 2022; 10(23):4545. https://0-doi-org.brum.beds.ac.uk/10.3390/math10234545

Chicago/Turabian Style

Zhang, Liyuan, Wei Liu, Yufei Chen, and Xiaodong Yue. 2022. "Reliable Multi-View Deep Patent Classification" Mathematics 10, no. 23: 4545. https://0-doi-org.brum.beds.ac.uk/10.3390/math10234545

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reliable Multi-View Deep Patent Classification

Abstract

1. Introduction

2. Related Works

2.1. Deep Patent Classification

2.2. Multi-View Deep Learning

2.3. Uncertainty in Deep Learning

2.4. Uncertainty in Evidence Theory

3. Method

3.1. Overview

3.2. Reliable Multi-View Deep Patent Classification

3.3. Theoretical Study

4. Experiments

4.1. Datasets

4.2. Implementations

4.3. Comparative Methods

4.4. Ablation Study

4.5. Comparison with the Patent Deep Classification Methods Using the Best View

4.6. Comparison with the Uncertainty-Based Classification Methods Using the Best View

4.7. Comparison with the State-of-the-Art Multi-View Deep Classification Methods

4.8. Reliability Estimation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI