Question Answer System: A State-of-Art Representation of Quantitative and Qualitative Analysis

Zope, Bhushan; Mishra, Sashikala; Shaw, Kailash; Vora, Deepali Rahul; Kotecha, Ketan; Bidwe, Ranjeet Vasant

doi:10.3390/bdcc6040109

Open AccessEditor’s ChoiceArticle

Question Answer System: A State-of-Art Representation of Quantitative and Qualitative Analysis

¹

Symbiosis Institute of Technology, Symbiosis International (Deemed University) (SIU), Lavale, Pune 412115, India

²

Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology, Symbiosis International (Deemed University) (SIU), Lavale, Pune 412115, India

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2022, 6(4), 109; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc6040109

Submission received: 9 August 2022 / Revised: 15 September 2022 / Accepted: 29 September 2022 / Published: 7 October 2022

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Question Answer System (QAS) automatically answers the question asked in natural language. Due to the varying dimensions and approaches that are available, QAS has a very diverse solution space, and a proper bibliometric study is required to paint the entire domain space. This work presents a bibliometric and literature analysis of QAS. Scopus and Web of Science are two well-known research databases used for the study. A systematic analytical study comprising performance analysis and science mapping is performed. Recent research trends, seminal work, and influential authors are identified in performance analysis using statistical tools on research constituents. On the other hand, science mapping is performed using network analysis on a citation and co-citation network graph. Through this analysis, the domain’s conceptual evolution and intellectual structure are shown. We have divided the literature into four important architecture types and have provided the literature analysis of Knowledge Base (KB)-based and GNN-based approaches for QAS.

Keywords:

question answering system; bibliometric analysis; natural language processing; machine comprehension

1. Introduction

Question answer system (QAS) is standard Natural Language Processing (NLP) task. In this digital era, we are drowned in a sea of information. We have web search engines that help us sail through the information, but their application is limited and could not help us beyond certain limits. While looking for the answers, web search engines can only direct to the answer’s probable locations, but one must sort through to find the answer. It is fascinating to have an automatic system that can fetch/generate the answer from retrieved documents instead of only displaying them to the user. Thus, QAS finds the natural language answers for the natural language questions.

Since QA is an intersection of NLP, Information Retrieval (IR), Logical Reasoning, Knowledge Representation, Machine learning, semantic search, QA can be used to quantifiably measure any Artificial Intelligence (AI) system’s understanding and reasoning capability [1,2]. Efforts have been taken in this respect since the 1960s. However, due to the availability of excellent computation power and emergence of many state-of-the-art deep learning algorithms, this field has gained momentum lately. These latest deep learning models have performed better than humans on single paragraph question answering benchmarks, including SQuAD [3,4,5]. However, a QAS system capable of answering complex questions is still elusive [6].

All QA systems can be classified based on the type of questions it is trying to answer. Different types of questions are given in Figure 1. Every type comes with its own set of requirements and needs different treatment. However, we can broadly divide the QAS process into three parts, as shown in Figure 2. Part one deals with understanding the question asked in natural language, while part two is about finding the appropriate background information needed to answer the question. Part three is about finding the probable answers and selecting the most appropriate from them. Many techniques have been used to cater to each phase. To match the reading comprehension level of humans, Ding et al. [7] explains three main challenges of the QA system: 1. Reasoning ability, 2. Explainability, and 3. Scalability.

Bibliometric analysis has a history of 60–70 years, but a remarkable increase in popularity has been seen in the last couple of decades. Bibliometric methodology analyses the literature from a quantitative perspective [8]. Thus, quantitative techniques are applied to the bibliometric data like citation, publication, and keywords. Such bibliometric data is rigorously analyzed to uncover the interesting trends and relationships between various research constituents. Due to databases like SCOPUS or Web of Science, collecting a large amount of data has become easier. Also, many software tools like VOSViewer, Gephi, and bibexcel have been created to transform, analyze, and visualize such massive data. This has further led to recently increasing the popularity of bibliometric analysis.

Even though the Question Answering domain has been studied since the 1960s, it is a dynamic and still relevant field. It amalgamates various important fields like IR, NLP, Knowledge representation, logical reasoning, Machine learning, and many more. Due to this rich context, many diverse approaches have been proposed. Many qualitative survey papers describe the publications and their methodologies used; but due to a vast number of publications, systematic literature review becomes infeasible. Also it will be very interesting to study the effect of interaction between these important fields and the solutions emerged from it. Only a single previous publication byBlooma et al. [9] attempts to perform the bibliometric analysis on the QA domain. However, the time span considered for that study was very small (2000–2007), and only co-word analysis was done. Hence, a thorough bibliometric analysis of bibliometric data of the QA domain is still not done. To bridge this research gap, we have explored the following research questions in this study.

To uncover the research themes and their evolution in the QA domain
To identify important constituents and their contribution to the domain
To identify seminal works/ publications in the field of QA

This paper is divided into five sections. To collect the relevant data for the analysis, enormous databases are need to be searched. Hence searching strategy is essential, which is described in Section 2. Section 3 discusses the results of the quantitative analysis with graphical visualizations. Section 4 contains the qualitative analysis, followed by Section 5, concluding the article with important findings.In bibliometric analysis, a vast amount of documents are considered and analyzed. Hence, the methodology framework plays a crucial role in the success of the study. Figure 3 shows the framework methodology of this study. Authors of this paper have designed the framework as per bibliometric analysis technique toolbox explained by Donthu et al. [10].

There are basically two dimensions to analyse any scientific field. First dimension try to find out the contribution of a particular research constituent to a given field. Such kind of analysis is called as performance analysis. Contribution can be represented using various metrics such as Year-wise publication, Most productive countries, Most cited publications, Top publishers etc. another dimension try to find the inter-relationships between these research constituents. hence research constituent like authors, publications, keywords are represented as interconnected network and network analysis is performed to identify the key nodes in network. Such analysis is called as science mapping. Our results of performance analysis and science mapping are explained in Section 3.1 and Section 3.2.

2. Search Strategy

Bibliometric Analysis is carried out on the published documents to explore the “question answering system” research space. As per Figure 4, subsequent search strategy is followed. The two most important bibliometric databases (viz. Web of Science (WOS) and SCOPUS) are queried to find out the related documents. The term “Question-Answer” is very general and used in many subject areas like medicine, psychology, social science, art & humanities, etc. After careful examination of all the subject areas, it is observed that results from subject areas except Computer Science and Engineering are irrelevant to this study. Also, while verifying some sample documents manually, it was observed that few articles talked about the question-answering systems based on images or visuals or multimedia. However, our scope is strictly related to a text-based QA system. Hence, such documents are also excluded from this study. Section 2.1 briefly discusses the procedure of data collection and Section 2.2 describes the data pre-processing methods.

2.1. Sources and Methods

Science Citation Index Expanded (SCI-EXPANDED) and Conference Proceedings Citation Index—Science (CPCI-S) edition of WOS core collection and SCOPUS database are used to collect the documents. Queries given in Table 1 are executed on respective databases on Dec-2021. Due to the problems discussed earlier, results are further refined by following inclusion criteria.

Document Type: Proceeding Paper or Article
Research area: Computer science or engineering
Language: English

2.2. Data Pre-Processing

Labels used to denote the same bibliometric information are different in both databases. The type of information also varies. Due to this inconsistent representation, we cannot directly combine the results from both databases. The maximum-common-attribute (MCA) set of both sources is identified to join the data, and data are combined manually in a field-wise manner. This ‘merged-dataset’ containing (4459 documents) is then used for most of the analysis. For analysis that requires the attribute not present in MCA, experimentation is done separately on WOS and SCOPUS data.

Furthermore, there are many inconsistencies in the form of misspelling and typing errors in title or journal names and non-standardized ways of representing the date. Due to these inconsistencies, pre-processing is necessary for this data. N-gram fingerprint key collision algorithm and a nearest neighbor algorithm is applied to cluster similar document titles. We have used the openRefine software tool for the same. Later after manual inspection of each cluster, all the titles in the same cluster are replaced by one consistent representation.

Both the databases can index a same document, and also our data may contain some duplicate entries. 1253 duplicate documents were identified and removed using MS-EXCEL software from the merged dataset, and final unique 3206 documents were retained. These docuements are then used for quantitative survey explained in Section 3.

3. Quantitative Survey

There are two vital points in a quantitative survey: (1) the contribution of individual research constituents and (2) the relationship between those research constituents. While performance analysis, discussed in Section 3.1 highlights the contribution of research constituents, in Section 3.2, science Mapping discusses their relationship.

3.1. Performance Analysis

Performance analysis is a study to discover the contribution of various research constituents to the given field. Typically, this analysis is more on the descriptive side. Despite the descriptive nature, it helps to recognize the importance of the various constituents. There are various metrics available to perform the performance analysis. Some metrics are related to a number of publications, which signifies the productivity of the field, while few metrics are citation-based, which measures the impact or significance. Few other metrics like total-publications-with-citation and h-index involve publication and citation. All three kinds of measures are considered in this study. Section 3.1.1 discusses the publication-related measures, while Section 3.1.2 deals with citation-based measures. Afterward, publication-citation-based measures are discussed in Section 3.1.3.

3.1.1. Publication Related Metrics

The existence of the research paper in the early 1960s, as seen in Figure 5, indicates that automatically answering a question or reading comprehension was a significant area in the 1960s. However, this field gained significant traction in the 2000s decade. Besides, the availability of massive computing resources and the emergence of state-of-the-art (SOTA) language models in the 2010s and 2020s fuelled the research in this area, which is also clearly visible in the figure. The fact that 50% of the research publications have been produced in the last six years demonstrates that researchers are pursuing this field more rigorously now.

Figure 6 represents the publication count by geographical area. This kind of analysis is required to understand the opportunities available worldwide in the field. China and USA are the top two countries producing the majority of the research papers while India and Germany come distant third and fourth in the ranking. China with 520, USA with 368, India with 140, and Germany with 119 publications; these top 4 countries have produced more than 50% of publications. Table 2 shows the exact publication count for the top 10 countries.

Figure 7 shows the top 15 authors from WOS and SCOPUS datasets. Penas A has the highest number of publications; most are indexed in both databases. Few other influential authors are Nakov P, Moschitti A, and Lehman J.

Figure 8 and Figure 9 point to the top sources publishing the research indexed in SCOUS and WOS, respectively, in the question-answer domain. The majority of the research documents indexed in WOS are published in Spring nature, IEEE, ACM, Elsevier, while ceur workshop proceedings, NIST Special Publication, ACM conferences, IEEE access are most preferred sources in case of SCOPUS.

3.1.2. Citation-Related Metrics

The number of citations and trends in the citation indicates the research field’s relevance. Extracted documents have a total of 32182 citations and 17.11 citations per publication. Such a high number of total publications and massive average citations per publication ratio indicates that this field is still relevant, and many researchers are currently working in this domain. Table 3 and Table 4 list the top 10 highly cited publications from SCOPUS and WOS in the last five years. Out of 15 unique publications listed in Table 3 and Table 4 together, six publications are directly related to either the ‘knowledge graph’ or ‘knowledge base’ keyword, highlighting that these are the latest research trends in the QA domain.

3.1.3. Publication-Citation-Related Metrics

Even though the total number of publications in any given field is sufficient to show the activeness of a particular field, the research’s impact can also be measured using the h-index. H-index is a quantitative metric that indicates that there are n publications with at least n citations. Hence higher the h-index, higher the importance and significance. As seen in Figure 5, most of the publications are from recent years, and hence due to their citation time window, they haven’t received any citations yet. However, out of 3206 documents under consideration, 1881 documents have at least one citation, and the h-index of the collection is 80.

3.2. Science Mapping

Science mapping helps us to study the internal structure of research constituents. Its primary usage is to understand the relationships between various entities. In science mapping, how the research constituents like citation, authors, documents, and keywords are connected is presented spatially. The interaction between these entities brings out this field’s structure and highlights how this field has been explored. Out of many approaches to science mapping, this study uses citation analysis, Co-citation analysis, and Co-word analysis.

The interaction between these constituents is often viewed in some kind of graph or a network. Many network metrics are deployed to enrich the assessment of bibliometric information. Network metrics like degree of centrality, betweenness centrality, eigenvector centrality, and page rank can be used to understand the network more clearly. The role of the entity in the overall network and its relative importance can be assessed using these network metrics.

Degree of centrality: Most basic metric of all is a degree of centrality, which measures the connection of an entity (which can be a document, author, keyword) to other entities in a network. So, if a particular entity is connected to many other entities, that entity has a high degree of centrality. Thus, representing its overall importance in the network.
Betweenness centrality: It measures the ability of an entity (node) to transmit the information of one part of a network to other parts of the network. So, an entity with a high degree of betweenness centrality plays a crucial role in connecting, otherwise disconnected, part of a network. Such entities more often act as bridges between sub-fields in a research area.
PageRank: PageRank is the most widely used method to find the importance of an entity. An entity with a small degree of centrality can profoundly affect the network if it influences highly connected entities. This fact is considered in PageRank.
Eigenvector Centrality: eigenvector centrality is another metric to measure the influence on the overall network. The position of a node determines its overall influence on a network. Thus, the node connected to many highly connected nodes must have influenced the good part of a network; this forms the basic idea behind Eigenvector centrality. Thus, eigenvector centrality is high if a node is connected to many highly connected nodes.

Clustering is also an essential tool that can enrich the bibliometric assessment. Clustering can be used to identify the various sub-themes. By observing the clusters created, one can visualize the major underlying themes and how they have evolved over the period. In this study, VOSViewer [26] and Gephi [27] have been used for visualization, network analysis, and clustering. Microsoft Excel is used for basic visualization like chart plotting. VOSViewer is a popular visualization tool for bibliometric data. Gephi, on the other hand, is a tool popularly used for network analysis. It can also be used for visualization. In this study, networks are constructed using VOSViewer, and Gephi has been used to perform the network analysis.

3.2.1. Citation Analysis

Citation analysis is the most basic analysis for science mapping. Publications are linked based on their citation linkage, i.e., if one publication cites other, there is a link from the first publication to the second. This way network is formed. Thus, the citation becomes the basis for an intellectual relationship between publications.

Citation analysis is done on both databases separately. Table 5 discusses the strategy for this analysis. 1009 publications out of 2601 from SCOPUS and 494 publications out of 1858 from WOS fulfill the minimum criteria of 5 citations. These publications are used for network formation. The greatest connected component is further considered for the network analysis. Visualization is then created using Gephi’s Fruchterman Reingold layout. Looking at similarity in citation, nodes are clustered and represented using cluster color in the visualization. Figure 10 and Figure 11 shows the citation network for SCOPUS and WOS respectively.

The network thus created is analyzed for the measures like page rank, eigen centrality, and betweenness centrality. Table 6 and Table 7 highlight the important nodes with respect to the mentioned measures. Survey papers always have a major influence on the citation network. This is underlined by the fact that 10 publications from Table 6 and Table 7 are survey papers. While Liu et al. [28] reviews the general QA approaches; approaches that are based on themes like IR and deep learning are reviewed in Kolomiyets and Moens [29], Huang [30] respectively. Hoeffner et al. [23], Shah et al. [31], Athenikos and Han [32], Srba and Bielikova [33], Lopez et al. [34], Wang et al. [35], Dimitrakis et al. [36] reviews the efforts taken by the researchers in various subdomains.

3.2.2. Co-Citation Analysis

Co-citation is a fundamental technique for identifying the various themes in the research fields. In the co-citation network, two publications are connected if cited by the third paper. Hence, publications appearing together in reference sections are the same thematically, enabling one to identify seminal references and underlying themes. The main disadvantage of Co-citation analysis is that it has a high bias toward publications with more citations. Thus, ignoring the latest or niche publications Donthu et al. [10].

Co-citation analysis is also done on both the datasets separately on cited references. The full counting method from VOSViewer has been used to generate the network map. Table 8 discusses the co-citation strategy. The threshold for considering the reference was set to a minimum of 5 citations. 407 out of 56,134 from SCOPUS and 1125 out of 27,426 from WOS references met the threshold criteria. It is difficult to visualize the network correctly with these many nodes and edges. Hence for visualization, we have shown a smaller network formed with minimum citation criteria of 25. However, for network analysis, a complete network has been considered. Smaller networks for SCOPUS and WOS are shown in Figure 12 and Figure 13 respectively. These visualizations are created using Linlog/modularity normalization using the VOSVeiwer tool.

Page-rank, Eigen Centrality, and Betweenness centrality measures were calculated through network analysis using Gephi. Seminal references ranked by these measures are listed in Table 9 and Table 10. Most of the papers in these lists are highly appreciated publications and foundational papers in respective sub-themes.

3.2.3. Co-Word Analysis

While Co-citation helps to find the intellectual structure of the domain, it contributes significantly less with respect to publication content. On the other hand, keywords carry scientific concepts, ideas, and knowledge. Hence, Co-word analysis is an important technique to identify the research trends. So, it is worthwhile to look at keywords used in collected documents. Unfortunately, inconsistency creeps in the keywords in the form of synonyms, plural, use of abbreviations, or inconsistent punctuation. A few sample examples of this inconsistency are given in Table 11.

These similar keywords are clustered together by applying N-gram fingerprint key collision algorithm and a nearest neighbor algorithm. These clusters are then manually inspected, and synonyms are replaced by one consistent representation. 3697 unique keywords were identified from the ‘merged-dataset’. The top 10 keywords are given in Figure 14. As expected, Question-answering and Natural Language Processing are the two most used keywords by authors. Few studies have also used Information Retrieval, deep learning, Machine learning, Knowledge-base, community question answering, Knowledge graph as keywords.

As discussed in Section 3.1.1, research in the QA domain gathered momentum in the 2000s, and the last 5–6 years are critically important. To further understand the undercurrent of the research trend, this duration is split into time frame 1: 2001–2016 and time frame 2: 2017–2021 for further analysis. Table 12 shows the keywords used in those time frames.

If we ignore the generic QA and NLP keywords, other important keywords from timeframe-1 are Information Retrieval, Ontology, Passage Retrieval, and Semantic web. At the same time, those from timeframe-2 are Deep Learning, Knowledge graph, Convolution Neural Network. This usage of keywords shows that efforts are now shifted to neural network-based and knowledge graph-based solutions. Same is also evident from Figure 15. Yellow cluster in figure represents the recent trends in keyword usage.

Co-occurrence analysis is performed on author keywords using VOSViewer. Keywords occurring with more than five are only considered for the analysis. 178 keywords out of 3697 met the criteria. Top-15 keywords are listed in Table 13 for the question-answering systems.

Figure 16 is the density map of the Co-Occurrence graph and illustrates the crucial aspects of the QA systems. It shows four prominent clusters and two small clusters. The first cluster offers the generalized view of the QA system. The second cluster includes the solution involving the Deep Learning-based solutions, the latest SOTA model-based attention mechanism, or RNNs. The third cluster talks about a very important subset of the QA system, i.e., Community Question Answer (CQA) systems. The fourth cluster is about ontology and semantic-related works. The fifth and Sixth clusters are less significant ones. Table 14 lists the top 10 keywords from each cluster.

4. Qualitative Analysis

This section focuses on the qualitative survey of the QA domain. Table 12 shows the recent shift in approaches of QA systems. Since WWW has matured and now stores the information more structured way, researchers are more inclined toward Knowledge Base/Graph-based QA systems. Same is also evident from Section 3.1.2. Hence, we have considered the systems from this sub-domain for qualitative analysis. This section is organized as Section 4.1 paints the general picture of the QA domain, while Section 4.2 focuses on KB-based QA systems (KBQA). Graph Neural Network (GNN) has shown better results recently in processing the data represented in the form of graphs. Hence, Section 4.3 explores the GNN-based solutions for KBQA.

4.1. General Approaches

One of the first well-known efforts in the QA domain was BASEBALL, developed by Jr. Green B.F. and Laughery [69]. QA systems have been changed drastically since then. We can broadly categorize the attempts into three important classes: linguistic approach, statistical approach, and Pattern Matching approach. The linguistic approach attempts to understand the natural language text by deploying various techniques like tokenization, POS tagging, and parsing. Few notable efforts includes Jr. Green B.F. and Laughery [69], M. et al. [70], Clark et al. [71], Mishra et al. [72], Bobrow et al. [73], Xiaoyan et al. [74]. Due to the availability of huge information, a statistical method for QAS has increased recently. Kim et al. [75], Mansouri et al. [76], Liu and Peng [77], Moschitti [78], Zhang and Zhao [79], Quarteroni and Manandhar [80] uses the SVM statistical method for either question classification or identifying the feature of words or as a text classifier. Another statistical method, N-gram mining is used in Soricut and Brill [81], Berger et al. [82], to form a chunk from a question. Cai et al. [83] uses Sentence Similarity Model for Web-based Chinese QA system with answer validation; while maximum entropy model for question/answer classification is used in Ittycheriah et al. [84] along with various N-gram or bag of words features.

Surface-based pattern matching technique is mostly used for factoid questions, while closed domain questions are answered using template-based pattern matching techniques. Molla and Vicedo [85], Ravichandran and Hovy [86], Cui et al. [87], Du et al. [88] are a few surface-based pattern matching techniques where patterns are either handcrafted or automatically detected. Template-based techniques are also widely used e.g., RDF in Unger et al. [89], Zhang and Zou [90], To and Reformat [91], FAQ in Burke et al. [44], Liu et al. [92], Otsuka et al. [93], and SPARQL Cocco et al. [94], Hu et al. [95]. Other than these, few hybrid approaches try to combine more than one technique. Kwok et al. [96] is an example of integration of linguistic techniques and pattern matching, and Wang et al. [3] uses a rule-based approach and SVM; whereas surface pattern matching and entropy are combinedly used in Xia et al. [97].

4.2. Knowledge Base-Based Approaches

Diefenbach et al. [98] divides QA process into 5 tasks: question analysis, phrase mapping, disambiguation, query construction, and querying distributed knowledge. Question analysis deals with the syntactic features and extracts the information about the question. In phrase mapping entity identified in the question analysis is mapped to the corresponding highest probability resource from the KG. With disambiguation right resources are selected for the entities in phrase mapping. Query construction deals with constructing a SPARQL query that can be used to dig the information from KG. Sometimes we may have to retrieve the information from more than one KB; hence distributed knowledge task comprises the techniques related to that. Critical phases/sub-tasks in each task are explained in Figure 17. Not all QA systems have this clear classification of tasks; however, every QA system must perform these tasks.

4.2.1. Initial Data Transformations

A crucial part of any QA system is understanding the question and its context. Hence, many KBQA performs the question analysis by adopting the dependency tree or Neural Network (NN)-based classification approach. In the dependency tree approach, key information about the question is extracted from the dependency tree generated. Many KBQA systems like Nicula et al. [99], Le et al. [100], Shin et al. [101] use standard dependency tree generation methods. Phase structure grammar [102] and feature-based grammar [103] have been used to generate the dependency tree, but the most prominent method uses the parsing tools like TALN [104] and Stanford Parser [105,106,107]. Hu et al. [95] proposed the Valuable Dependency Parser, which uses the Stanford Parser for initial parsing, and then a few tags are prioritized for query generation. While the dependency parser gives the relation between the words, constituency parsing uses context-free grammar and divides the statements into sub phrases. Zhu et al. [108] generates constituency tree, which is then matched to KB using graph traversal technique. NN-based methods, [109,110,111] are also used for question or entity classification. But due to the existence of many types of question/entity, these methods perform sub-optimally.

Entity linking is another critical task in a QA system. It consists of two subtasks, viz, Named Entity Recognition (NER) and Disambiguation of extracted entities to correct entities in KB. Apart from classical NER methods, Probabilistic Graphical methods like the Maximum Entropy Markov Model (MEMM) and Conditional Random Field (CRF) are also very popular as NER. Chen et al. [112] uses two-stage MEMM as NER, whereas Bach et al. [113], Hu et al. [114], Wu et al. [115], Sui [116] uses the CRF method and various RNN models.Hu et al. [114] uses BiLSTM to capture the context while the CRF layer generates probability distribution for tag sequence. Bach et al. [113] proposes the three-stage NER model. The first stage is the CNN model that captures the word level representations, while BiLSTM in the second stage captures the sentence level representations. These representations are then given to the third stage, a CRF-based inference layer for named entity detection. However, the BERT-BiLSTM-CRF-based NER method, proposed in Sui [116] has state-of-the-art results.

To map the extracted entities to entities in KB, many QA systems have used the existing tools like Dbpedia spotlight [106,109,110,117,118,119,120], S-MART [114]. The similarity measures like Jaro-Vinkler [101], UMBC+LSA [104], and Siamese LSTM [121] have also been used for mapping the entities. Few QA systems have also used approaches based on NN [122], Hierarchical RNN [16], BiLSTM [123], and BERT [124,125].

4.2.2. Architectural Classification

Almost all the QA systems can be classified in four different architectures, viz., Semantic Parsing-based (Figure 18), Subgraph Matching-based (Figure 19), Template-Based (Figure 20), and Information Extraction (IE)-Based (Figure 21). These architectures are discussed in the following section one by one.

Semantic Parsing-Based Methods:

A semantic parsing-based method maps natural language question to logical form query, which can then be executed against the knowledge base to retrieve the answer. Dependency tree and question analysis done in initial data transformation help generate the query. Semankey [126] generates the multiple trees for each combination of extracted entities by considering the underlying ontology and using BFS-based algorithm. Various trees are generated by starting the BFS from the main entity of interest. Each tree is then converted into a SPARQL query and executed against the KB. Instead of generating multiple trees, method proposed by Maheshwari et al. [127] generates multiple candidate paths starting from the main entity mapped in KB. These paths, along with the question, are encoded using BiLSTM and ranked by calculating the dot product of the question encoding vector and path encoding vector. Alternatively, in the method proposed in Zafar et al. [128], latent representations of candidate path and questions are obtained using Tree-LSTM, and the similarity function is used to rank these representations.

Generator-Reranker architecture is proposed by Inan et al. [129]. The generator produces a list of potential candidates, and the reranker ranks these candidates based on the similarity between each candidate and the input sentence.Lu et al. [130] have proposed the recall-oriented information extraction method. Query generation is then viewed as a linear programming problem and solved using the Steiner tree. There are many other approaches for query generation like the use of DAG [131], Hidden Markov Model [132], Siamese Neural Network model [109], and RNN [133].

The semantic pipeline for KBQA seems natural and intuitive; however, looking at the current research and publication trends, we can conclude that this type of solution is already maturing, and other types of solutions should be explored for future research.

Sub-graph-based method:

Logical form query execution is like finding the subgraph. Hence, the subgraph matching-based method performs the initial data transformation similar to the semantic parsing method. However, instead of generating the formal query, it compiles the query graph and then models it as a subgraph finding problem.

Majority of the approaches [105,106,119] extracts several triplet pattern (Es, R, Ed). Es and Ed are the source and destination entities, and R is their relation. These triplets are identified using initial transformations discussed earlier. Jin et al. [105] then retrieves the candidate subgraph from KB and uses semantic similarity to evaluate the matching between triplets and retrieved sub-graph. Bakhshi et al. [119] proposes the novel data-driven graph similarity framework, where disambiguation is handled with the help of user input. Then the QA problem is remodeled as Integer Linear Programming (ILP) and solved using the proposed golden graph alignment method and standard optimizers. On the other hand, method in Li et al. [106] uses extracted triplets to build the query graph and resolves the ambiguities by finding isomorphic graphs of this query graph from KB. A semantic similarity between a semantic vector of edge in query graph and path in KB is found.

Hu et al. [114] uses a rule-based approach to generate a semantic query graph (SQG) using the state-transition paradigm. First, entities are recognized using BiLSTM and CRF-based model. Then, these recognized entities undergo four state transition operations: Connect, Merge, Expand, and Fold to generate SQG. While method in Hu et al. [114] generates SQG, Hu et al. [107] generates the dependency tree, and Zhu et al. [108] generates the constituency tree. Such generated tree/SQG is then found in KB using graph traversal technique and path ranking method.

Template-based method:

In template-based methods, initial data transformation leads to a template for the question, which is then matched with templates stored in a repository. These templates are nothing but pseudo-query with some slots. The best matching template is selected and instantiated with statistical methods to retrieve the SPARQL query [89]. Generating the template repository and extracting the query structure from the question are two essential tasks in template-based systems.

Ever since this type of solution was proposed in Unger et al. [89], researchers have come up with many template-based QA systems. Most of the approaches use handcrafted rules to extract the query structure of question [120,134] and hand-crafted templates [115,134].

Vollmers [135] proposes TeBaQA, an isomorphic graph-based approach. Basic graph patterns of some SPARQL queries are maintained in the repository. Using ML classifier, natural language questions are then classified into these isomorphic basic graph patterns (i.e., templates). To select the appropriate template from a database, Wu et al. [115] identifies the question type using LSTM and uses the BiLSTM+CRF model as a NER tool to extract the entities. These entities are used to instantiate the template. Apart from Neural network methods, few solutions like Abujabal et al. [18], To and Reformat [91] involve the dependency tree generated from the question. The isomorphic tree of this dependency tree is selected from the template repository, and the SPARQL query is generated.

Though this solution offers better results, we have found very few systems. One of the possible reasons is that the system is too dependent on the collection of templates. At the same time, researchers are also moving toward a more end-to-end approach envolving newer deep learning and transfer learning solutions, discussed in the next section.

Information Extraction-based methods:

In IE-based architecture, question and KB sub-graph are encoded into common embedding space using some machine learning approach and directly matched them using a predefined scoring function. To capture the more accurate feature for encoding, initial data transformations are done sometimes.

As discussed in earlier sections, there have been many ML solutions for initial transformation tasks. Similarly, there are many end-to-end solutions for QA tasks. These approaches usually do not need hand-crafted features or rules designed by experts and can scale better to large and complex KBs. Two main DNN architectures that are mainly used are CNN and RNN. Dai et al. [136], Yin et al. [137] proposes an End-to-end neural network model for answering factoid questions, whereas Golub and He [138] uses A character-level encoder-decoder framework- Seq2Seq LSTM model for the same task. Lukovnikov et al. [12] also uses a neural network for answering simple questions end-to-end, leaving all decisions to the model. Wang et al. [139,140], Xie et al. [141] are a few methods that use CNN to get the embedding/vector representations.

However, recently researchers have started using hybrid ML solutions where individual subtasks are solved using an appropriate ML model. Budiharto et al. [142] uses two approaches, RNN-based and CNN-based encoder. The output of these encoders forms the hidden vector. Bidirectional Attention Flow (BiDAF) is then used to match the hidden vector of KB and the hidden vector of the question. Song et al. [143] proposed a method in whitch Semantic features are learned using LSTM. The attention mechanism is added to focus on a particular part of the answer to generate the embeddings. Qu et al. [144] trains three separate models. An Entity Alignment Model is a simple NN model; the Object Answering Model is SVM, while the Prediction-Verification Model is an encoder-decoder model. In Luo et al. [145], the query structure is encoded into a uniform vector representation to capture the interactions between individual semantic components of the question.

Instead of finding the vector representations directly, if we do the initial transformation and get a better understanding of a question, we can get a better vector representation. With this aim, Tong et al. [146] generates the dependency tree and proposes a Tree-structured LSTM model that accepts tree-structured input to get the embeddings.

Due to limited data availability, these LSTM-based approaches tend to overfit. Hence use of Bidirectional Encoder Representations from Transformers (BERT) is explored in Luo et al. [124], Lukovnikov et al. [147], Panchbhai et al. [148]. One can also use five BERT models for five different tasks: question expected answer type (Q-EAT), answer type (AT) classification models, and Question Answering Model, as suggested by Day and Kuo [149]. Each BERT model is fined tuned for the respective task.

Thus, models generating vector representations avoid semantic pipelines. Due to the availability of many DNN approaches and the possibilities of combining them in various ways, many approaches are reported in the literature. LSTM-based approaches combined with attention mechanisms have reported promising results and can be further explored for betterment. On the other hand, pre-trained model-based approaches are still nascent.

4.3. GNN-Based Approaches

Previous efforts for QA systems in multi-hop settings were based on RNN. But DAGs used in RNNs cannot represent the rich inference. Cao et al. [150], Song et al. [151] are the first few attempts to use GNN for the same. In Cao et al. [150], entities mentioned in multiple documents are identified as nodes, and the edges represent the relations between the entities (either with-in or cross-document). The graph thus formed is then processed using GCNs to generate the multi-hop reasoning.

Instead of using a single co-reference edge, Song et al. [151] proposes to use 2 additional types of edges: ‘same’ and ‘window.’ The ‘same’ edge helps to connect the multiple mentions of an entity appearing far apart. This helps capture global information. The ‘window’ edge connects to entities mentioned in a fixed window. GCN and Graph recurrent network is then used to demonstrate the usefulness of the graph thus formed.

The bi-directional Attention Entity Graph Convolutional Network (BAG) method for multi-hop QA, proposed in Cao et al. [152], generates the graph by extracting the entities and relationships from multiple documents. Relation-aware representations for the identified entity nodes are then acquired from the Relational Graph Convolutional network (R-GCN). BAG then uses, Bi-Directional Attention mechanism to generate a more meaningful relationship between query and graph.

Tu et al. [153] introduces a Heterogeneous Document-Entity (HDE) graph to represent the knowledge obtained from multiple documents. Instead of using a node of a single type (as in all other KG approaches), heterogeneous graphs contain nodes of various types like candidate, entity, and document nodes. These different nodes relate to each other with different types of edges, representing the more structural representation. To demonstrate the effectiveness of HDE, the authors have used Graph neural network (GNN) with a message-passing algorithm.

Xiao [154] argues that instead of using the same static graph for every query, one should use a dynamic graph built explicitly for a given query. It also proposes a novel method, Dynamically Fused Graph Network (DFGN), for the same reason. The graph generation process in DFGN starts with the main entity of the given query, and then the graph is constructed by connecting the related entities around this start entity. This process continues till a probable answer is discovered.

Three main challenges of comprehending the text for QA are Reasoning ability, Explainability, and Scalability Ding et al. [7]. To tackle these issues, a method mimicking the cognitive process of humans is proposed in Ding et al. [7]. The proposed method, CogQA has two modules to build the cognitive graph. The first module is implicit extraction which extracts the meaningful entities of questions and probable answers and forms the graph of working memory. The second module is used for explicit reasoning. This module performs reasoning over the constructed graph and gathers the important hints for module one. These tips are further used by module one in the extraction of next hop entities. This iterative process continues until all probable answers are found. The final answer will depend on the second module’s reasoning in the final round. They have proposed the BERT-based extraction module and GNN-based reasoning module.

Vakelenko et al. [155] proposes another approach for the complex QA, QAmp. Here, important entities and relations are identified and mapped to the relevant portion of a graph. This approach uses unsupervised message passing to propagate the confidence values across the graph. These scores lead to the correct answer entities. At last, these scores are aggregated depending on the type of question.

Due to the natural language’s flexibility, the questions are highly unstructured, whereas knowledge graphs maintain the information in a very structured way. So, a way to convert an unstructured sentence to a structured query to process the information in KG is needed. Also, ambiguation in the question itself leads to wrong answers. Many disambiguation models like Hu et al. [95], Xiong et al. [156], Zheng et al. [157], Zhu and Iglesias [158] has been proposed. Disambiguation can be reduced by asking a few questions to the asker; hence Zheng et al. [157] proposes a novel interactive method that allows users to use natural language to query the KG. While Xiong et al. [156], Zhu and Iglesias [158] use semantic tools for disambiguation, the NLQSK framework is proposed in Hu et al. [95], where top k- SPARQL statements are generated for given questions and keyword search is used to fetch the neighboring information for unmapped entities and finally these two are combined to produce the answer.

5. Summary

To explore the complete domain space of QAS, publications are extracted from WOS and SCOPUS databases. Both sets of extracted publications are then merged together by considering the MCA and overall 4459 publications are retained. Further, inconsistencies are removed by using ‘n-gram fingerprint key collision’ algorithm. Performance analysis is done on this merged data-set however, science mapping is done individually on WOS and SCOPUS data. The answers to the research questions can be found in the follwing important findings.

Even though publications in the QAS domain are from the 1960s, 50% are from the last six years, indicating that QA is attracting many researchers nowadays.
Penas A, Nakov P, Moschitti A, Lehman J are influential contributors with more than 15 publications. China is the most productive country and, along with USA, India, and Germany, contributes to more than 50% of the overall publications.
We have also identified the highly cited publications in the last 5 years from both SCOPUS and WOS (listed in Table 3 and Table 4).
Using network analysis measures like Page rank, Eigen centrality, and Betweenness centrality on the citation graph, we have identified the most influential publications (listed in Table 6 and Table 7).
Co-citation analysis highlighted the QA domain’s seminal and foundational publications (listed in Table 9 and Table 10).
Co-occurrence analysis helped us to identify the four major sub-domains of QAS. This analysis also helped us conclude that neural network and knowledge base-based solutions are recent research trends.
This study also gave a summary of important methods in KB-based solutions. We classified the approaches into four important classes and discussed the various approaches to perform the sub-tasks in each class.
Finally, we have also discussed the approaches belonging to one of the most upcoming and promising areas, i.e., GNN-based approaches.

6. Conclusions

Bibliometric analysis of QAS has been done along with a literature review of two important subdomains. This study highlights the important constituents of QAS domain along with seminal and foundational publications. We have also observed that neural network and knowledge graph-based solution is the current research trend.

Out of four architecture types of KB-based solutions, semantic parsing-based approach is already maturing and provides less scope for future novel work. Subgraph-based and template-based approaches shows promising results, but due to heavy dependence on hand-crafted features, they are not explored completely. However, Information Extraction-based approach provides end-to-end solution for all sub-tasks and hence most of the research is moving in this direction.

Author Contributions

Conceptualization, B.Z. and S.M.; Formal analysis, B.Z. and R.V.B.; Investigation, B.Z. and R.V.B.; Methodology, B.Z. and S.M.; Project administration, S.M., K.S., D.R.V. and K.K.; Software, B.Z., S.M. and R.V.B.; Supervision, S.M., K.S., D.R.V. and K.K.; Validation, B.Z., S.M., K.S., D.R.V., K.K. and R.V.B.; Visualization, B.Z. and R.V.B.; Writing—original draft, B.Z.; Writing—review & editing, S.M., K.S., D.R.V., K.K. and R.V.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hermann, K.M.; Kočiský, T.; Grefenstette, E.; Espeholt, L.; Kay, W.; Suleyman, M.; Blunsom, P. Teaching Machines to Read and Comprehend. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; MIT Press: Cambridge, MA, USA, 2015; Volume 1, pp. 1693–1701. [Google Scholar]
Rajpurkar, P.; Jia, R.; Liang, P. Know What You Don’t Know: Unanswerable Questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; Volume 2, pp. 784–789. [Google Scholar] [CrossRef] [Green Version]
Wang, W.; Yan, M.; Wu, C. Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; Volume 1, pp. 1705–1714. [Google Scholar] [CrossRef] [Green Version]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
Rajpurkar, P.; Zhang, J.; Lopyrev, K.; Liang, P. SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv 2016, arXiv:1606.05250. [Google Scholar]
Tu, M.; Wang, G.; Huang, J.; Tang, Y.; He, X.; Zhou, B. Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2704–2713. [Google Scholar] [CrossRef]
Ding, M.; Zhou, C.; Chen, Q.; Yang, H.; Tang, J. Cognitive Graph for Multi-Hop Reading Comprehension at Scale. arXiv 2019, arXiv:1905.05460. [Google Scholar]
Bidwe, R.; Mishra, S.; Patil, S.; Shaw, K.; Vora, D.; Kotecha, K.; Zope, B. Deep Learning Approaches for Video Compression: A Bibliometric Analysis. Big Data Cogn. Comput. 2022, 6, 44. [Google Scholar] [CrossRef]
Blooma, M.; Chua, A.; Goh, D.L.; Keong, L. A Trend Analysis of the Question Answering Domain. In Proceedings of the 2009 Sixth International Conference on Information Technology: New Generations, Las Vegas, NV, USA, 27–29 April 2009; Volumes 1–3, pp. 1522–1527. [Google Scholar] [CrossRef]
Donthu, N.; Kumar, S.; Mukherjee, D.; Pandey, N.; Lim, W. How to conduct a bibliometric analysis: An overview and guidelines. J. Bus. Res. 2021, 133, 285–296. [Google Scholar] [CrossRef]
Wang, W.; Yang, N.; Wei, F.; Chang, B.; Zhou, M. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the ACL 2017—55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1, pp. 189–198. [Google Scholar] [CrossRef] [Green Version]
Lukovnikov, D.; Fischer, A.; Lehmann, J.; Auer, S. Neural network-based question answering over knowledge graphs on word and character level. In Proceedings of the 26th International World Wide Web Conference, Perth, Australia, 3–7 April 2017; pp. 1211–1220. [Google Scholar] [CrossRef] [Green Version]
Hao, Y. An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1, pp. 221–231. [Google Scholar] [CrossRef] [Green Version]
Yang, Z. HotpotQA: A Dataset for Diverse Explainable Multi-hop Question Answering. arXiv 2018, arXiv:1809.09600. [Google Scholar]
Xiong, C.; Zhong, V.; Socher, R. Dynamic Coattention Networks For Question Answering. arXiv 2017, arXiv:1611.01604. [Google Scholar]
Yu, M.; Yin, W.; Hasan, K.S.; Santos, C.D.; Xiang, B.; Zhou, B. Improved Neural Relation Detection for Knowledge Base Question Answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1, pp. 571–581. [Google Scholar] [CrossRef]
Huang, X.; Zhang, J.; Li, D.; Li, P. Knowledge Graph Embedding-Based Question Answering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 11–15 February 2019; pp. 105–113. [Google Scholar] [CrossRef]
Abujabal, A.; Riedewald, M.; Yahya, M.; Weikum, G. Automated template generation for question answering over knowledge graphs. In Proceedings of the 26th International World Wide Web Conference, Perth, Australia, 3–7 April 2017; pp. 1191–1200. [Google Scholar] [CrossRef]
Wang, S. R3: Reinforced ranker-reader for open-domain question answering. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 5981–5988. [Google Scholar]
Khot, T.; Sabharwal, A.; Clark, P. Scitail: A textual entailment dataset from science question answering. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 5189–5197. [Google Scholar]
Das, A.; Datta, S.; Gkioxari, G.; Lee, S.; Parikh, D.; Batra, D. Embodied question answering. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2135–2144. [Google Scholar] [CrossRef] [Green Version]
Cui, W.; Xiao, Y.; Wang, H.; Song, Y.; Hwang, S.; Wang, W. KBQA: Learning Question Answering over QA Corpora and Knowledge Bases. Proc. VLDB Endow. 2017, 10, 565–576. [Google Scholar] [CrossRef] [Green Version]
Hoeffner, K.; Walter, S.; Marx, E.; Usbeck, R.; Lehmann, J.; Ngomo, A.C. Survey on challenges of Question Answering in the Semantic Web. Semant Web 2017, 8, 895–920. [Google Scholar] [CrossRef] [Green Version]
Neshati, M.; Fallahnejad, Z.; Beigy, H. On dynamicity of expert finding in community question answering. Inf. Process. Manag. 2017, 53, 1026–1042. [Google Scholar] [CrossRef]
Esposito, M.; Damiano, E.; Minutolo, A.; Pietro, G.; Fujita, H. Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering. Inf. Sci. 2020, 514, 88–105. [Google Scholar] [CrossRef]
van Eck, N.J.; Waltman, L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 2010, 84, 523–538. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bastian, M.; Heymann, S.; Jacomy, M. Gephi: An Open Source Software for Exploring and Manipulating Networks. In Proceedings of the International AAAI Conference on Weblogs and Social Media, San Jose, CA, USA, 17–20 May 2009. [Google Scholar]
Liu, Y.; Yi, X.; Chen, R.; Song, Y. A Survey on Frameworks and Methods of Question Answering. In Proceedings of the 2016 3rd International Conference On Information Science And Control Engineering, Beijing, China, 8–10 July 2016; pp. 115–119. [Google Scholar] [CrossRef]
Kolomiyets, O.; Moens, M.F. A survey on question answering technology from an information retrieval perspective. Inf. Sci. 2011, 181, 5412–5434. [Google Scholar] [CrossRef]
Huang, Z. Recent Trends in Deep Learning-Based Open-Domain Textual Question Answering Systems. IEEE Access 2020, 8, 94341–94356. [Google Scholar] [CrossRef]
Shah, A.; Ravana, S.; Hamid, S.; Ismail, M. Accuracy evaluation of methods and techniques in Web-based question answering systems: A survey. Knowl. Inf. Syst. 2019, 58, 611–650. [Google Scholar] [CrossRef]
Athenikos, S.; Han, H. Biomedical question answering: A survey. Comput. Methods Programs Biomed. 2010, 99, 1–24. [Google Scholar] [CrossRef]
Srba, I.; Bielikova, M. A Comprehensive Survey and Classification of Approaches for Community Question Answering. ACM Trans. Web 2016, 10, 1–63. [Google Scholar] [CrossRef]
Lopez, V.; Uren, V.; Sabou, M.; Motta, E. Is Question Answering fit for the Semantic Web?: A survey. Semant. Web 2011, 2, 125–155. [Google Scholar] [CrossRef] [Green Version]
Wang, X.; Huang, C.; Yao, L.; Benatallah, B.; Dong, M. A Survey on Expert Recommendation in Community Question Answering. J. Comput. Sci. Technol. 2018, 33, 625–653. [Google Scholar] [CrossRef] [Green Version]
Dimitrakis, E.; Sgontzos, K.; Tzitzikas, Y. A survey on question answering systems over linked data and documents. J. Intell. Inf. Syst. 2020, 55, 233–259. [Google Scholar] [CrossRef]
Hirschman, L.; Gaizauskas, R. Natural Language Question Answering: The View from Here. Nat. Lang. Eng. 2001, 7, 275–300. [Google Scholar] [CrossRef] [Green Version]
Toba, H.; Ming, Z.Y.; Adriani, M.; Chua, T.S. Discovering high quality answers in community question answering archives using a hierarchy of classifiers. Inf. Sci. 2014, 261, 101–115. [Google Scholar] [CrossRef]
Lopez, V.; Uren, V.; Motta, E.; Pasin, M. AquaLog: An ontology-driven question answering system for organizational semantic intranets. J. Web Semant. 2007, 5, 72–105. [Google Scholar] [CrossRef]
Zhao, Z.; Zhang, L.; He, X.; Ng, W. Expert Finding for Question Answering via Graph Regularized Matrix Completion. IEEE Trans. Knowl. Data Eng. 2015, 27, 993–1004. [Google Scholar] [CrossRef] [Green Version]
Kwok, C.; Etzioni, O.; Weld, D. Scaling question answering to the web. ACM Trans. Inf. Syst. 2001, 19, 242–262. [Google Scholar] [CrossRef]
Khodadi, I.; Abadeh, M. Genetic programming-based feature learning for question answering. Inf. Process. Manag. 2016, 52, 340–357. [Google Scholar] [CrossRef]
Nguyen, D.; Nguyen, D.; Pham, S. Ripple Down Rules for Question Answering. Semant. Web 2017, 8, 511–532. [Google Scholar] [CrossRef] [Green Version]
Burke, R.; Hammond, K.; Kulyukin, V.; Lytinen, S.; Tomuro, N.; Schoenberg, S. Question answering from frequently asked question files: Experiences with the FAQ FINDER system. AI Mag. 1997, 18, 57–66. [Google Scholar]
Soricut, R.; Brill, E. Automatic question answering: Beyond the factoid. In Proceedings of the Hlt-Naacl 2004: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Boston, MA, USA, 2–7 May 2004; pp. 57–64. [Google Scholar]
Dong, L.; Wei, F.; Zhou, M.; Xu, K. Question answering over freebase with multi-column convolutional neural networks. In Proceedings of the ACL-IJCNLP 2015—53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Beijing, China, 26–31 July 2015; Volume 1, pp. 260–269. [Google Scholar] [CrossRef] [Green Version]
Fader, A.; Zettlemoyer, L.; Etzioni, O. Open question answering over curated and extracted knowledge bases. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 1156–1165. [Google Scholar] [CrossRef]
Rodrigo, A.; Penas, A. A study about the future evaluation of Question-Answering systems. Knowl.-Based Syst. 2017, 137, 83–93. [Google Scholar] [CrossRef]
Zou, L.; Huang, R.; Wang, H.; Yu, J.; He, W.; Zhao, D. Natural language question answering over RDF—A graph data driven approach. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; pp. 313–324. [Google Scholar] [CrossRef]
Qiu, X.; Huang, X. Convolutional neural tensor network architecture for community-based question answering. In Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 1305–1311. [Google Scholar]
Moldovan, D.; Pasca, M.; Harabagiu, S.; Surdeanu, M. Performance issues and error analysis in an open-domain Question Answering system. ACM Trans. Inf. Syst. 2003, 21, 133–154. [Google Scholar] [CrossRef]
Pal, A.; Harper, F.; Konstan, J. Exploring question selection bias to identify experts and potential experts in community question answering. ACM Trans. Inf. Syst. 2012, 30, 1–28. [Google Scholar] [CrossRef]
Figueroa, A.; Neumann, G. Category-specific models for ranking effective paraphrases in community Question Answering. Expert Syst. Appl. 2014, 41, 4730–4742. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Pennington, J.; Socher, R.; Manning, C. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar] [CrossRef]
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems—Volume 2 (NIPS’13); Curran Associates, Inc.: Red Hook, NY, USA, 2013; pp. 3111–3119. [Google Scholar]
Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data—SIGMOD’08, Vancouver, BC, Canada, 10–12 June 2008; p. 1247. [Google Scholar] [CrossRef]
Sutskever, I.; Vinyals, O.; Le, Q. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc., Palais des Congrès de Montréal: Montréal, QC, Canada, 2014; Volume 27. [Google Scholar]
Seo, M.; Kembhavi, A.; Farhadi, A.; Hajishirzi, H. Bidirectional Attention Flow for Machine Comprehension. arXiv 2016, arXiv:1611.01603. [Google Scholar]
Ferrucci, D. Building Watson: An Overview of the DeepQA Project. AI Mag. 2010, 31, 59. [Google Scholar] [CrossRef] [Green Version]
Chen, D.; Fisch, A.; Weston, J.; Bordes, A. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1870–1879. [Google Scholar] [CrossRef] [Green Version]
Miller, G. WordNet: A lexical database for English. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models Are Unsupervised Multitask Learners. Available online: https://openai.com/blog/better-language-models/ (accessed on 27 April 2022).
Pedregosa, F. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Berant, J.; Chou, A.; Frostig, R.; Liang, P. Semantic parsing on freebase from question-answer pairs. In Proceedings of the EMNLP 2013—2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; pp. 1533–1544. [Google Scholar]
Blei, D.; Ng, A.; Jordan, M. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
Manning, C.; Surdeanu, M.; Bauer, J.; Finkel, J.; Bethard, S.; McClosky, D. The Stanford CoreNLP Natural Language Processing Toolkit. 2014. Available online: https://nlp.stanford.edu/pubs/StanfordCoreNlp2014.pdf (accessed on 20 April 2022).
Lehman, J.; Stanley, K. Revising the evolutionary computation abstraction: Minimal criteria novelty search. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation—GECCO’10, Portland, OR, USA, 7–11 July 2010; p. 103. [Google Scholar] [CrossRef]
Green, B.F., Jr.; Wolf, A.K.; Chomsky, C.; Laughery, K. Baseball: An automatic question-answerer. In Proceedings of the Western Joint Computer Conference: Extending Man’s Intellect, Los Angeles, CA, USA, 9–11 May 1961; pp. 219–224. [Google Scholar] [CrossRef]
Akour, M.; Abufardeh, S.; Magel, K.; Al-Radaideh, Q. QArabPro: A Rule-Based Question Answering System for Reading Comprehension Tests in Arabic. Am. J. Appl. Sci. 2011, 8, 652–661. [Google Scholar] [CrossRef]
Clark, P.; Thompson, J.; Porter, B. A Knowledge-Based Approach to Question-Answering. In Proceedings of the AAAI’99 Fall Symposium on Question-Answering Systems, Orlando, FL, USA, 18–22 July 1999; pp. 43–51. [Google Scholar]
Mishra, A.; Mishra, N.; Agrawal, A. Context-Aware Restricted Geographical Domain Question Answering System. In Proceedings of the 2010 International Conference on Computational Intelligence and Communication Networks, Bhopal, India, 26–28 November 2010; pp. 548–553. [Google Scholar] [CrossRef]
Bobrow, D.; Kaplan, R.; Kay, M.; Norman, D.; Thompson, H.; Winograd, T. GUS, a frame-driven dialog system. Artif. Intell. 1977, 8, 155–173. [Google Scholar] [CrossRef]
Xiaoyan, H.; Xiaoming, C.; Kaiying, L. A rule-based chinese question answering system for reading comprehension tests. In Proceedings of the 3rd International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2007, Kaohsiung, Taiwan, 26–28 November 2007; Volume 2, pp. 325–329. [Google Scholar] [CrossRef]
Kim, M.Y.; Xu, Y.; Goebel, R. Legal Question Answering Using Ranking SVM and Syntactic/Semantic Similarity. In Proceedings of the JSAI International Symposium on Artificial Intelligence, Tokyo, Japan, 23–25 November 2014; Volume 9067, pp. 244–258. [Google Scholar] [CrossRef]
Mansouri, A.; Affendey, L.; Mamat, A.; Kadir, R. Semantically Factoid Question Answering Using Fuzzy SVM Named Entity Recognition. Int. Symp. Inf. Technol. 2008, 1–4, 1014–1020. [Google Scholar]
Liu, X.; Peng, T. A SVM and Co-seMLP integrated method for document-based question answering. In Proceedings of the 14th International Conference on Computational Intelligence and Security, CIS 2018, Hangzhou, China, 16–19 November 2018; pp. 179–182. [Google Scholar] [CrossRef]
Moschitti, A. Answer Filtering via Text Categorization in Question Answering Systems. In Proceedings of the International Conference on Tools with Artificial Intelligence, Sacramento, CA, USA, 3–5 November 2003; pp. 241–248. [Google Scholar]
Zhang, K.; Zhao, J. A Chinese question-answering system with question classification and answer clustering. In Proceedings of the 2010 7th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2010, Yantai, China, 10–12 August 2010; Volume 6, pp. 2692–2696. [Google Scholar] [CrossRef]
Quarteroni, S.; Manandhar, S. Designing an interactive open-domain question answering system. Nat. Lang. Eng. 2009, 15, 73–95. [Google Scholar] [CrossRef] [Green Version]
Soricut, R.; Brill, E. Automatic Question Answering using the Web: Beyond the factoid. Inf. Retr. 2006, 9, 191–206. [Google Scholar] [CrossRef]
Berger, A.; Caruana, R.; Cohn, D.; Freitag, D.; Mittal, V. Bridging the lexical chasm: Statistical approaches to answer-finding. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval—SIGIR, Athens, Greece, 24–28 July 2000; pp. 192–199. [Google Scholar] [CrossRef]
Cai, D.; Cui, H.; Miao, X.; Zhao, C.; Ren, X. A web-based Chinese automatic question answering system. In Proceedings of the Fourth International Conference on Computer and Information Technology, Wuhan, China, 14–16 September 2004; pp. 1141–1146. [Google Scholar] [CrossRef]
Ittycheriah, A.; Franz, M.; Zhu, W.J.; Ratnaparkhi, A.; Mammone, R.J. IBM’s Statistical Question Answering System. In Proceedings of the Tenth Text REtrieval Conference, Gaithersburg, MD, USA, 13–16 November 2000. [Google Scholar]
Molla, D.; Vicedo, J. Question answering in restricted domains: An overview. Comput. Linguist. 2007, 33, 41–61. [Google Scholar] [CrossRef]
Ravichandran, D.; Hovy, E. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 6–12 July 2002; pp. 41–47. [Google Scholar]
Cui, H.; Kan, M.Y.; Chua, T.S. Soft pattern matching models for definitional question answering. ACM Trans. Inf. Syst. 2007, 25, 8-es. [Google Scholar] [CrossRef]
Du, Y.; Huang, X.; Li, X.; Wu, L. A Novel Pattern Learning Method for Open Domain Question Answering. In Proceedings of the Natural Language Processing—IJCNLP 2004, Hainan Island, China, 22–24 March 2004. [Google Scholar] [CrossRef]
Unger, C.; Buhmann, L.; Lehmann, J.; Ngomo, A.C.; Gerber, D.; Cimiano, P. Template-based question answering over RDF data. In Proceedings of the 21st Annual Conference on World Wide Web, Lyon, France, 16–20 April 2012; pp. 639–648. [Google Scholar] [CrossRef] [Green Version]
Zhang, X.; Zou, L. IMPROVE-QA: An interactive mechanism for RDF question/answering systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Houston, TX, USA, 10–15 June 2018; pp. 1753–1756. [Google Scholar] [CrossRef]
To, N.; Reformat, M. Question-Answering System with Linguistic Terms over RDF Knowledge Graphs. Proceedings of the2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). [CrossRef]
Liu, S.; Zhong, Y.X.; Ren, F.J. Interactive Question Answering Based on FAQ. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8208, pp. 73–84. [Google Scholar]
Otsuka, A.; Nishida, K.; Bessho, K.; Asano, H.; Tomita, J. Query Expansion with Neural Question-to-Answer Translation for FAQ-based Question Answering. In Proceedings of the Companion Proceedings of the World Wide Web Conference 2018, Lyon, France, 23–27 April 2018; pp. 1063–1068. [Google Scholar] [CrossRef] [Green Version]
Cocco, R.; Atzori, M.; Zaniolo, C. Machine learning of SPARQL templates for question answering over LinkedSpending. In Proceedings of the CEUR Workshop Proceedings, Napoli, Italy, 12–14 June 2019; Volume 2400. [Google Scholar]
Hu, X.; Duan, J.; Dang, D. Natural language question answering over knowledge graph: The marriage of SPARQL query and keyword search. Knowl. Inf. Syst. 2021, 63, 819–844. [Google Scholar] [CrossRef]
Kwok, C.; Etzioni, O.; Weld, D. Scaling question answering to the web. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; pp. 150–161. [Google Scholar] [CrossRef]
Xia, L.; Teng, Z.; Ren, F. An Integrated Approach for Question Classification in Chinese Cuisine Question Answering System. In Proceedings of the Second International Symposium on Universal Communication, Osaka, Japan, 15–16 December 2008; pp. 317–321. [Google Scholar] [CrossRef]
Diefenbach, D.; Lopez, V.; Singh, K.; Maret, P. Core Techniques of Question Answering Systems over Knowledge Bases: A Survey. Knowl. Inf. Syst. 2018, 55, 529–569. [Google Scholar] [CrossRef] [Green Version]
Nicula, B.I.; Ruseti, S.; Rebedea, T. Enhancing property and type detection for a QA system over linked data. In Proceedings of the 2015 14th RoEduNet International Conference—Networking in Education and Research, Craiova, Romania, 24–26 September 2015; pp. 167–172. [Google Scholar]
Le, H.; Phan, X.; Nguyen, T.D. Using Dependency Analysis to Improve Question Classification. In Proceedings of the Knowledge and Systems Engineering, Hanoi, Vietnam, 17–19 October 2014. [Google Scholar]
Shin, S.; Jin, X.; Jung, J.; Lee, K.H. Predicate constraints-based question answering over knowledge graph. Inf. Process. Manag. 2019, 56, 445–462. [Google Scholar] [CrossRef]
Tran, Q.; Nguyen, M.; Pham, S. Question Analysis for a Community-Based Vietnamese Question Answering System. In Knowledge and Systems Engineering; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
Song, D. TR discover: A natural language interface for querying and analyzing interlinked datasets. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2015; Volume 9367, pp. 21–37. [Google Scholar] [CrossRef]
Meditskos, G.; Dasiopoulou, S.; Vrochidis, S.; Wanner, L.; Kompatsiaris, I. Question answering over pattern-based user models. In ACM International Conference Proceeding Series, Proceedings of the 12th International Conference on Semantic Systems 2016, New York, NY, USA, 13–14 September 2016; ACM: New York, NY, USA, 2016; pp. 153–160. [Google Scholar] [CrossRef] [Green Version]
Jin, H.; Luo, Y.; Gao, C.; Tang, X.; Yuan, P. ComQA: Question Answering Over Knowledge Base via Semantic Matching. IEEE Access 2019, 7, 75235–75246. [Google Scholar] [CrossRef]
Li, G.; Yuan, P.; Jin, H. Svega: Answering Natural Language Questions over Knowledge Base with Semantic Matching. In Proceedings of the 30th International Conference on Software Engineering and Knowledge Engineering, San Francisco, CA, USA, 1–3 July 2018. [Google Scholar]
Hu, S.; Zou, L.; Yu, J.; Wang, H.; Zhao, D. Answering Natural Language Questions by Subgraph Matching over Knowledge Graphs. IEEE Trans. Knowl. Data Eng. 2018, 30, 824–837. [Google Scholar] [CrossRef]
Zhu, C.; Ren, K.; Liu, X.; Wang, H.; Tian, Y.; Yu, Y. A Graph Traversal-Based Approach to Answer Non-Aggregation Questions over DBpedia. arXiv 2015, arXiv:1510.04780. [Google Scholar]
Jiao, J.; Wang, S.; Zhang, X.; Wang, L.; Feng, Z.; Wang, J. gMatch: Knowledge base question answering via semantic matching. Knowl.-Based Syst. 2021, 9, 228. [Google Scholar] [CrossRef]
Wang, S.; Jiao, J.; Zhang, X. A Semantic Similarity-based Subgraph Matching Method for Improving Question Answering over RDF. In Proceedings of the Web Conference 2020—Companion of the World Wide Web Conference, Taipei, Taiwan, 20–24 April 2020; pp. 63–64. [Google Scholar] [CrossRef]
Singh, K.; Both, A.; Sethupat, A.; Shekarpour, S. Frankenstein: A Platform Enabling Reuse of Question Answering Components. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2018; Volume 10843, pp. 624–638. [Google Scholar] [CrossRef]
Chen, Y.H.; Lu, E.L.; Ou, T.A. Intelligent SPARQL Query Generation for Natural Language Processing Systems. IEEE Access 2021, 9, 158638–158650. [Google Scholar] [CrossRef]
Bach, N.; Thanh, P.; Oanh, T. Question Analysis towards a Vietnamese Question Answering System in the Education Domain. Cybern. Inf. Technol. 2020, 20, 112–128. [Google Scholar] [CrossRef]
Hu, S.; Zou, L.; Zhang, X. A State-transition Framework to Answer Complex Questions over Knowledge Base. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar]
Wu, L.; Wu, P.; Zhang, X. A Seq2seq-Based Approach to Question Answering over Knowledge Bases. In Semantic Technology, Proceedings of the 9th Joint International Conference, JIST 2019, Hangzhou, China, 25–27 November 2019; Springer: Berlin/Heidelberg, Germany, 2020; Volume 1157, pp. 170–181. [Google Scholar] [CrossRef]
Sui, Y. Question answering system based on tourism knowledge graph. J. Phys. Conf. Ser. 2021, 1883, 012064. [Google Scholar] [CrossRef]
Ruseti, S.; Mirea, A.; Rebedea, T.; Trausan-Matu, S. QAnswer—Enhanced entity matching for question answering over linked data. In Proceedings of the CEUR Workshop Proceedings, Toulouse, France, 8–11 September 2015; Volume 1391. [Google Scholar]
Diefenbach, D.; Amjad, S.; Both, A.; Singh, K.; Maret, P. Trill: A Reusable Front-End for QA Systems. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2017; Volume 10577, pp. 48–53. [Google Scholar] [CrossRef] [Green Version]
Bakhshi, M.; Nematbakhsh, M.; Mohsenzadeh, M.; Rahmani, A. Data-driven construction of SPARQL queries by approximate question graph alignment in question answering over knowledge graphs. Expert Syst. Appl. 2020, 146, 113205. [Google Scholar] [CrossRef]
Jabalameli, M.; Nematbakhsh, M.; Zaeri, A. Ontology-lexicon–based question answering over linked data. ETRI J. 2020, 42, 239–246. [Google Scholar] [CrossRef]
Chen, D.; Yang, M.; Zheng, H.T.; Li, Y.; Shen, Y. Answer-enhanced path-aware relation detection over knowledge base. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 1021–1024. [Google Scholar] [CrossRef]
Wang, R.Z.; Ling, Z.H.; Hu, Y. Knowledge Base Question Answering with Attentive Pooling for Question Representation. IEEE Access 2019, 7, 46773–46784. [Google Scholar] [CrossRef]
Zheng, H.T.; Fu, Z.Y.; Chen, J.Y.; Sangaiah, A.; Jiang, Y.; Zhao, C.Z. Novel knowledge-based system with relation detection and textual evidence for question answering research. PLoS ONE 2018, 13, e0205097. [Google Scholar] [CrossRef] [PubMed]
Luo, D.; Su, J.; Yu, S. A BERT-based Approach with Relation-aware Attention for Knowledge Base Question Answering. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020. [Google Scholar] [CrossRef]
Patil, S.; Chavan, L.; Mukane, J.; Vora, D.; Chitre, V. State-of-the-Art Approach to e-Learning with Cutting Edge NLP Transformers: Implementing Text Summarization, Question and Distractor Generation, Question Answering. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 445–453. [Google Scholar] [CrossRef]
Abad-Navarro, F.; Martinez-Costa, C.; Fernandez-Breis, J. Semankey: A Semantics-Driven Approach for Querying RDF Repositories Using Keywords. IEEE Access 2021, 9, 91282–91302. [Google Scholar] [CrossRef]
Maheshwari, G.; Trivedi, P.; Lukovnikov, D.; Chakraborty, N.; Fischer, A.; Lehmann, J. Learning to Rank Query Graphs for Complex Question Answering over Knowledge Graphs. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2019; Volume 11778, pp. 487–504. [Google Scholar] [CrossRef] [Green Version]
Zafar, H.; Napolitano, G.; Lehmann, J. Formal Query Generation for Question Answering over Knowledge Bases. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2018; Volume 10843, pp. 714–728. [Google Scholar] [CrossRef]
Inan, H.; Tomar, G.; Pan, H. Improving Semantic Parsing with Neural Generator-Reranker Architecture. arXiv 2019, arXiv:1909.12764. [Google Scholar]
Lu, X.; Pramanik, S.; Roy, R.; Abujabal, A.; Wang, Y.; Weikum, G. Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019. [Google Scholar]
Xu, K.; Feng, Y.; Huang, S.; Zhao, D. Question answering via phrasal semantic parsing. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2015; Volume 9283, pp. 414–426. [Google Scholar] [CrossRef]
Shekarpour, S.; Marx, E.; Ngomo, A.C.; Auer, S. SINA: Semantic interpretation of user queries for question answering on interlinked data. J. Web Semant. 2015, 30, 39–51. [Google Scholar] [CrossRef]
Xu, K.; Wu, L.; Wang, Z.; Yu, M.; Chen, L.; Sheinin, V. Exploiting Rich Syntactic Information for Semantic Parsing with Graph-to-Sequence Model. arXiv 2018, arXiv:1808.07624. [Google Scholar]
Lu, J.; Sun, X.; Li, B.; Bo, L.; Zhang, T. BEAT: Considering question types for bug question answering via templates. Knowl.-Based Syst. 2021, 225, 107098. [Google Scholar] [CrossRef]
Vollmers, D. Knowledge Graph Question Answering using Graph-Pattern Isomorphism. arXiv 2021, arXiv:2103.06752. [Google Scholar]
Dai, Z.; Li, L.; Xu, W. CFO: Conditional Focused neural question answering with large-scale knowledge bases. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 2, pp. 800–810. [Google Scholar] [CrossRef] [Green Version]
Yin, J.; Xin, J.; Lu, Z.; Shang, L.; Li, H.; Li, X. Neural generative question answering. IJCAI Int. Jt. Conf. Artif. Intell. 2016, 2016, 2972–2978. [Google Scholar]
Golub, D.; He, X. Character-level question answering with attention. In Proceedings of the EMNLP 2016—Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 1598–1607. [Google Scholar]
Wang, Y.; Chen, Q.; He, C.; Liu, H.; Wu, X. Knowledge Base Question Answering System Based on Knowledge Graph Representation Learning. In Proceedings of the 2020 the 4th International Conference on Innovation in Artificial Intelligence, Xiamen, China, 8–11 May 2020. [Google Scholar]
Wang, L.; Zhang, Y.; Liu, T. A Deep Learning Approach for Question Answering Over Knowledge Base. In Natural Language Understanding and Intelligent Applications; Springer: Berlin/Heidelberg, Germany, 2016; Volume 10102, pp. 885–892. [Google Scholar] [CrossRef]
Xie, Z.; Zeng, Z.; Zhou, G.; He, T. Knowledge Base Question Answering Based on Deep Learning Models. In Natural Language Understanding and Intelligent Applications; Springer: Berlin/Heidelberg, Germany, 2016; Volume 10102, pp. 300–311. [Google Scholar] [CrossRef]
Budiharto, W.; Andreas, V.; Gunawan, A. Deep learning-based question answering system for intelligent humanoid robot. J. Big Data 2020, 7, 77. [Google Scholar] [CrossRef]
Song, B.; Zhuo, Y.; Li, X. Research on question-answering system based on deep learning. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2018; Volume 10942, pp. 522–529. [Google Scholar] [CrossRef]
Qu, Y.; Liu, J.; Kang, L.; Shi, Q.; Ye, D. Question Answering over Freebase via Attentive RNN with Similarity Matrix-based CNN. arXiv 2018, arXiv:1804.03317. [Google Scholar]
Luo, K.; Lin, F.; Luo, X.; Zhu, K. Knowledge base question answering via encoding of complex query graphs. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 2185–2194. [Google Scholar]
Tong, P.; Yao, J.; He, L.; Xu, L. Leveraging Domain Context for Question Answering over Knowledge Graph. Data Sci. Eng. 2019, 4, 323–335. [Google Scholar] [CrossRef] [Green Version]
Lukovnikov, D.; Fischer, A.; Lehmann, J. Pretrained Transformers for Simple Question Answering over Knowledge Graphs. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2019; Volume 11778, pp. 470–486. [Google Scholar] [CrossRef] [Green Version]
Panchbhai, A.; Soru, T.; Marx, E. Exploring sequence-to-sequence models for SPARQL pattern composition. Commun. Comput. Inf. Sci. 2020, 1232, 158–165. [Google Scholar] [CrossRef]
Day, M.Y.; Kuo, Y.L. A Study of Deep Learning for Factoid Question Answering System. In Proceedings of the 2020 IEEE 21ST International Conference on Information Reuse and Integration for Data Science, Las Vegas, NV, USA, 11–13 August 2020; pp. 419–424. [Google Scholar] [CrossRef]
Cao, N.; Aziz, W.; Titov, I. Question Answering by Reasoning Across Documents with Graph Convolutional Networks; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018. [Google Scholar]
Song, L.; Wang, Z.; Yu, M.; Zhang, Y.; Florian, R.; Gildea, D. Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks. arXiv 2018, arXiv:1809.02040. [Google Scholar]
Cao, Y.; Fang, M.; Tao, D. BAG: Bi-directional Attention Entity Graph Convolutional Network for Multi-hop Reasoning Question Answering. arXiv 2019, arXiv:1904.04969. [Google Scholar]
Tu, M.; Wang, G.; Huang, J.; Tang, Y.; He, X.; Zhou, B. Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs. arXiv 2019, arXiv:1905.07374. [Google Scholar]
Xiao, Y. Dynamically Fused Graph Network for Multi-hop Reasoning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019. [Google Scholar]
Vakelenko, S.; Garcia, J.; Polleres, A.; Rijke, M.; Cochez, M. Message Passing for Complex Question Answering over Knowledge Graphs. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 883–894. [Google Scholar] [CrossRef]
Xiong, H.; Wang, S.; Tang, M.; Wang, L.; Lin, X. Knowledge Graph Question Answering with semantic oriented fusion model. Knowl.-Based Syst. 2021, 221, 106954. [Google Scholar] [CrossRef]
Zheng, W.; Cheng, H.; Yu, J.; Zou, L.; Zhao, K. Interactive natural language question answering over knowledge graphs. Inf. Sci. 2019, 481, 141–159. [Google Scholar] [CrossRef]
Zhu, G.; Iglesias, C. Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Expert Syst. Appl. 2018, 101, 8–24. [Google Scholar] [CrossRef]

Figure 1. Types of Questions.

Figure 2. Phases of QAS.

Figure 3. Methodological Framework.

Figure 4. Search Strategy. * is wild card used to represent any or no group of character.

Figure 5. Year-wise publication.

Figure 6. Most productive Countries.

Figure 7. Top 15 authors.

Figure 8. Top Publishers (SCOPUS).

Figure 9. Top Publishers (WOS).

Figure 10. Citation Network (SCOPUS).

Figure 11. Citation Network (WOS).

Figure 12. Co-citation (Scopus).

Figure 13. Co-citation (WOS).

Figure 14. Top 10 Keywords.

Figure 15. Co-occurrence Overlay Visualization.

Figure 16. Co-occurrence cluster.

Figure 17. Important Task in KBQA.

Figure 18. Semantic Parsing Architecture.

Figure 19. Subgraph-Based Architecture.

Figure 20. Template-Based Architecture.

Figure 21. Information Extraction-Based Architecture.

Table 1. Queries.

Database	Query	No. of Documents
WOS	“question answer *” (Title) not Visual (Title) not image (title) not Multimedia (title)	1858
Scopus	TITLE (“question Answer *” and not (visual or image or video)) AND (LIMIT-TO (SRCTYPE,“p”) OR LIMIT-TO (SRCTYPE,“j”)) AND (LIMIT-TO (SUBJAREA,“COMP”) OR LIMIT-TO (SUBJAREA,“ENGI”)) AND (LIMIT-TO (DOCTYPE,“cp”) OR LIMIT-TO (DOCTYPE,“ar”)) AND (LIMIT-TO (LANGUAGE,“English”))	2601

“*” is wild card used to represent any or no group of character.

Table 2. Most productive countries.

Country	Number of Publications
China	520
United States	368
India	140
Germany	119
Spain	97
Japan	90
England	58
South Korea	55
Canada	51
Italy	51

Table 3. Highly cited publications in the last five years (SCOPUS).

Sr. No.	Reference	2017	2018	2019	2020	2021	Total
1	Wang et al. [11]	3	63	94	119	43	322
2	Lukovnikov et al. [12]	5	18	32	45	41	143
3	Hao [13]	0	14	31	55	30	130
4	Yang [14]	0	0	11	78	32	122
5	Xiong et al. [15]	7	23	25	47	11	115
6	Yu et al. [16]	0	8	27	43	30	108
7	Huang et al. [17]	0	0	8	36	62	107
8	Abujabal et al. [18]	1	15	29	31	25	101
9	Wang [19]	0	4	21	45	25	95
10	Khot et al. [20]	0	7	24	45	19	95

Table 4. Highly cited publications in the last five years (WOS).

Sr. No.	Reference	2017	2018	2019	2020	2021	Total
1	Wang et al. [11]	1	34	65	52	30	182
2	Lukovnikov et al. [12]	3	13	22	24	18	80
3	Das et al. [21]	0	2	22	25	20	69
4	Hao [13]	0	7	21	28	12	68
5	[22]	0	6	17	20	14	57
6	Yu et al. [16]	0	5	19	13	15	53
7	Hoeffner et al. [23]	0	10	16	11	12	49
8	Neshati et al. [24]	0	2	15	19	10	46
9	Abujabal et al. [18]	0	6	17	12	7	42
10	Esposito et al. [25]	0	0	0	16	22	38

Table 5. Citation analysis strategy.

Database	Total Publication	Publication with Citations More Than 5	Greatest Connected Component
Database	Total Publication	Publication with Citations More Than 5	No. of Nodes	No. of Edges
SCOPUS	2601	1009	408	646
WOS	1858	494	375	886

Table 6. Citation Analysis (SCOPUS).

Sr. No.	Reference	TOP 10 (Page Rank)	TOP 10 (Eigen Centrality)	TOP 10 (Betweenness Centrality)
1.	Hirschman and Gaizauskas [37]	YES	YES	YES
2.	Toba et al. [38]	YES	YES	YES
3.	Lopez et al. [39]	YES	YES	YES
4.	Kolomiyets and Moens [29]	YES	YES	YES
5.	Zhao et al. [40]	YES	YES	YES
6.	Liu et al. [28]	NO	YES	YES
7.	Kwok et al. [41]	YES	NO	YES
8.	Shah et al. [31]	NO	YES	YES
9.	Wang et al. [11]	YES	NO	YES
10.	Khodadi and Abadeh [42]	NO	YES	NO
11.	Athenikos and Han [32]	NO	YES	NO
12.	Nguyen et al. [43]	NO	YES	NO
13.	Burke et al. [44]	YES	NO	NO
14.	Soricut and Brill [45]	YES	NO	NO
15.	Huang [30]	NO	NO	YES
16.	Dong et al. [46]	YES	NO	NO

Table 7. Citation Analysis (WOS).

Sr. No.	Reference	TOP 10 (Page Rank)	TOP 10 (Eigen Centrality)	TOP 10 (Betweenness Centrality)
1	[33]	YES	YES	YES
2	Lopez et al. [34]	YES	YES	YES
3	Fader et al. [47]	YES	YES	YES
4	Toba et al. [38]	NO	YES	YES
5	Kolomiyets and Moens [29]	YES	NO	YES
6	Rodrigo and PeÃ±as [48]	NO	YES	YES
7	Zou et al. [49]	YES	YES	NO
8	Hoeffner et al. [23]	YES	YES	NO
9	Wang et al. [35]	YES	YES	NO
10	Zhao et al. [40]	YES	YES	NO
11	Qiu and Huang [50]	YES	NO	YES
12	Dimitrakis et al. [36]	NO	NO	YES
13	Moldovan et al. [51]	YES	NO	NO
14	Pal et al. [52]	NO	YES	NO
15	Burke et al. [44]	NO	NO	YES
16	Figueroa and Neumann [53]	NO	NO	YES

Table 8. Co-citation analysis strategy.

Database	Total Cited References	References with Citation More Than 5	Greatest Connected Component
Database	Total Cited References	References with Citation More Than 5	Number of Nodes	Number of Edges
SCOPUS	56,134	407	403	5954
WOS	27,426	1125	1000	57,299

Table 9. Co-Citation (SCOPUS).

Sr. No.	Reference	TOP 10 (Page Rank)	TOP 10 (Eigen Centrality)	TOP 10 (Betweenness Centrality)
1.	Hochreiter and Schmidhuber [54]	YES	YES	YES
2.	Pennington et al. [55]	YES	YES	YES
3.	Mikolov et al. [56]	YES	YES	YES
4.	Bollacker et al. [57]	YES	YES	NO
5.	Hermann et al. [1]	YES	YES	NO
6.	Devlin et al. [4]	NO	YES	YES
7.	Sutskever et al. [58]	YES	YES	NO
8.	Rajpurkar et al. [5]	YES	YES	NO
9.	Seo et al. [59]	YES	YES	NO
10.	Wang et al. [11]	YES	YES	NO
11.	Ferrucci [60]	YES	NO	NO
12.	Chen et al. [61]	NO	NO	YES
13.	Miller [62]	NO	NO	YES
14.	Radford et al. [63]	NO	NO	YES
15	Pedregosa [64]	NO	NO	YES

Table 10. Co-Citation (WOS).

Sr. No.	Reference	TOP 10 (Page Rank)	TOP 10 (Eigen Centrality)	TOP 10 (Betweenness Centrality)
1.	Hochreiter and Schmidhuber [54]	YES	YES	YES
2.	Mikolov et al. [56]	YES	YES	YES
3.	Berant et al. [65]	YES	YES	YES
4.	Bollacker et al. [57]	YES	YES	YES
5.	Pennington et al. [55]	YES	YES	YES
6.	Devlin et al. [4]	YES	YES	YES
7.	Blei et al. [66]	YES	YES	YES
8.	Ferrucci [60]	YES	YES	YES
9.	Miller [62]	YES	NO	YES
10.	Rajpurkar et al. [5]	YES	YES	NO
11.	Manning et al. [67]	NO	NO	YES
12.	Lehman and Stanley [68]	NO	YES	NO

Table 11. Keyword Synonyms.

Sr. No.	Keyword	Synonym
1.	question answer	question answering, question-answer, QA, question
		answering system. Question answering systems
2.	natural language processing	nlp, NLP, Natural Language Processing
3.	Convolution Neural Network	CNN, cnn, Convolution neural networks

Table 12. Keyword trend.

2017–2022		2001–2016
Author Keywords	Count	Author Keywords	Count
question answering	472	Question Answering	536
natural language processing	140	Information Retrieval	95
deep learning	93	Natural Language Processing	93
information retrieval	65	Ontology	70
knowledge graph	64	Community Question Answering	50
community question answering	58	Passage Retrieval	31
knowledge base	35	Machine Learning	28
question classification	29	Semantic web	25
convolution neural network	25	Information extraction	25
Ontology	24	Query expansion	24

Table 13. Top 15 keyword.

Keyword	Links	Occurrences	TLS
question answering	155	1018	1302
natural language processing	96	233	437
information retrieval	75	161	326
deep learning	63	97	184
Ontology	47	94	169
community question answering	54	108	142
semantic web	27	42	122
machine learning	43	51	109
knowledge graph	42	66	97
information extraction	29	39	81
knowledge base	31	43	75
Sparql	27	26	74
question classification	36	45	74
natural language	27	22	71
passage retrieval	26	40	69

Table 14. Cluster-wise top keywords.

Keyword	Cluster	TLS	Keyword	Cluster	TLS
question answering	1	1302	ontology	4	169
natural language processing	1	437	semantic web	4	112
information retrieval	1	326	sparql	4	74
machine learning	1	109	natural language	4	71
information extraction	1	81	linked data	4	64
question classification	1	74	rdf	4	35
passage retrieval	1	69	semantic search	4	33
query expansion	1	65	dbpedia	4	30
answer extraction	1	45	artificial intelligence	4	23
question analysis	1	42	semantics	4	21
deep learning	2	184	knowledge engineering	5	22
knowledge graph	2	97	knowledge acquisition	5	20
knowledge base	2	75	big data	5	15
convolutional neural network	2	45	summarization	5	14
neural networks	2	40	non-factoid question answering	5	13
bert	2	33	user interaction	5	9
lstm	2	29	information seeking	5	4
neural network	2	29	social question answering	5	3
answer selection	2	28	data mining	6	15
attention mechanism	2	26	cross-lingual question answering	6	12
community question answering	3	142	natural language interfaces	6	10
text mining	3	46
learning to rank	3	42
expert finding	3	32
question retrieval	3	30
language model	3	21
tf-idf	3	19
question routing	3	18
crowdsourcing	3	17
expert recommendation	3	17

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zope, B.; Mishra, S.; Shaw, K.; Vora, D.R.; Kotecha, K.; Bidwe, R.V. Question Answer System: A State-of-Art Representation of Quantitative and Qualitative Analysis. Big Data Cogn. Comput. 2022, 6, 109. https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc6040109

AMA Style

Zope B, Mishra S, Shaw K, Vora DR, Kotecha K, Bidwe RV. Question Answer System: A State-of-Art Representation of Quantitative and Qualitative Analysis. Big Data and Cognitive Computing. 2022; 6(4):109. https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc6040109

Chicago/Turabian Style

Zope, Bhushan, Sashikala Mishra, Kailash Shaw, Deepali Rahul Vora, Ketan Kotecha, and Ranjeet Vasant Bidwe. 2022. "Question Answer System: A State-of-Art Representation of Quantitative and Qualitative Analysis" Big Data and Cognitive Computing 6, no. 4: 109. https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc6040109

Article Menu

Question Answer System: A State-of-Art Representation of Quantitative and Qualitative Analysis

Abstract

1. Introduction

2. Search Strategy

2.1. Sources and Methods

2.2. Data Pre-Processing

3. Quantitative Survey

3.1. Performance Analysis

3.1.1. Publication Related Metrics

3.1.2. Citation-Related Metrics

3.1.3. Publication-Citation-Related Metrics

3.2. Science Mapping

3.2.1. Citation Analysis

3.2.2. Co-Citation Analysis

3.2.3. Co-Word Analysis

4. Qualitative Analysis

4.1. General Approaches

4.2. Knowledge Base-Based Approaches

4.2.1. Initial Data Transformations

4.2.2. Architectural Classification

4.3. GNN-Based Approaches

5. Summary

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI