4.2. Evaluation Measures
After performing the experiment, we used several measures to evaluate our results. We calculated the confusion matrix for all the themes in the experiment. Each entry in the confusion matrix represents the following: the number of true positive (TP), which is the number of lines where the classification matches the correct theme; the false positive (FP) which represents the number of lines where the classification matches the incorrect theme as positive; true negative (TN), which is the number of cases where the classification matches the correct negative theme; and false negative (FN), which refers to the number of lines where the classification matches the incorrect negative theme.
We also evaluated the model to assess the performance using accuracy, precision, recall, and F1 score (Davis, 2006), based on the result of the confusion matrix.
Accuracy is the measurement of all lines that have been classified correctly. It consists of all the true positive and negative lines divided by all the lines being classified. It is defined as:
Precision implies the number of selected lines that are correctly classified as positive. It consists of all the true positive lines divided by all the lines that are classified as positive. It is defined as:
Recall indicates the proportion of actual classes that are correctly categorized as positive. It consists of all the true positive lines divided by all the lines that are actually positive. It is defined as:
F1 score is the combination of both precision and recall. It is defined as:
4.3. Results
In this section, we first present our classification results in all the measures mentioned above. Then we analyze the misclassification lines that were incorrectly classified by the modes. Lastly, we discuss our findings and limitations.
Table 4, below, presents the results of the thematic analysis of concordance lines using both long short-term memory (LSTM) and recurrent neural network (RNN). The result shows the high accuracy of using LSTM, reaching 84%, in comparison to the 79% accuracy achieved using RNN
Table 5, below, shows the results of the automatic thematic analysis of concordance lines. The results show a comparison between both LSTM and RNN for the four identified themes. We can clearly see that LSTM outperforms RNN within each theme, with regard to the accuracy of classifying the concordance lines. The lines are categorized into four themes: (1) Evaluation of the Syrian situation, (2) International stance, (3) Representation of the UK and USA together, and (4) Local context of the UK. The obtained accuracy was tested on unseen concordance lines.
Using the LSTM, the “Evaluation of the Syrian situation” theme achieved 100% recall without misclassifying any lines. This is the same result achieved by RNN. Next, the “Representation of the UK and USA together” theme achieved the second-highest recall of 94%, with only three lines having been misclassified. RNN achieved a very close result, reaching a recall of 93%. Furthermore, the “International stance” theme achieved a low recall of 69%, with 16 lines having been misclassified. RNN achieved the worst result, reaching a recall of 44%.
In terms of precision, the “International stance”, “Representation of the UK and USA together”, and “Evaluation of the Syrian situation” themes achieved high precision using LSTM, with results of 89%, 88%, and 85%, respectively. The “International stance” theme achieved a low precision of 72%, and 14 lines were classified as belonging to the “International stance” theme. Generally, RNN achieved a lower precision score than LSTM. For the themes of “International stance” and “Representation of the UK and USA together”, the RNN obtained slightly better results (77% and 94%, respectively) than LSTM. However, LSTM obtained better results for the themes of “Evaluation of the Syrian situation” (74% in RNN) and “International stance” (72% in RNN). Considering the overall results in
Table 4 and
Table 5, we can see how LSTM is more accurate, in general, for the thematic analysis than the RNN model.
The confusion matrix in
Table 6 shows how many items match the correct categorization and how many lines match incorrect themes. Thus, we consider the percentage of the accuracy for each theme rather than only the total amount of accuracy.
Table 4, below, shows that the categorization of the lines that relate to the first and third theme have the highest correct categorization among the themes. The fourth theme, “International stance”, comes after the first and third themes in the accuracy of categorizing the lines. However, the “International stance” them has the lowest accuracy of categorization. In this section, we discuss how the lines are categorized in each theme and the instances of incorrect categorization, which highlights the effectiveness and issues of the proposed methodology.
As shown in the table above, the theme of evaluating the Syrian attack was the most accurate theme, which matches the exact categorization. All of the categorized lines meet the original classification. Thus, for this theme, there is no discussion of the incorrect instances.
The second most accurate theme was the “Representation of the UK and USA together” theme, which had three incorrect classifications. For example:
In the experiment, Example 1 was categorized under the theme of “International stance”, while it originally related to “Representation of the UK and USA together”. Categorizing such examples is difficult, as this theme seems to be in the blurred area between more than one theme (this point will be expanded below, with the consideration of the other themes). This example and the other lines in the corpus show the high accuracy of classifying the lines of this theme, even though the algorithm missed three instances.
A less-accurately categorized theme was “International stance”. The algorithm missed the correct categorization of 16 lines. From the lexical perspective, regarding their semantic use, words such as “UK” and “Britain” were frequently used both for the representation of the local context and also from the perspective of the international stance. For example:
- 2.
International community needs to ask: would Assad utilize chemical weapons stage bring potential western military intervention international community to investigate the reasons behind the use of chemical weapons.
- 3.
Later came comments from the USA House speaker John Boehner backing a Syria war resolution, adding to the likelihood of Congress voting for USA action.
Examples 2 and 3 were classified as a part of the UK stance, while they were originally categorized in the theme of “International stance”. Lexically, words such as “international”, “community”, “internationally”, and “prohibited” were frequently used with the “International stance” theme. In Example 4, the algorithm considered terms such as “voting” and “resolution” as strongly connected to the local context, while the semantic function of the whole line suggested the connection of the example to the “International stance” theme.
The least accurate theme in the categorization was the “International stance” theme.
Table 5, above, shows that the “International stance” theme had various incorrect classifications for the different themes. One reason might be that the “International stance” theme had strong terms reoccurring both in this theme and, at the same time, in other themes. The incorrect lines in the “International stance” theme have almost the same issues, for example:
- 4.
He (General Sir Nick Houghton) revealed no decisions have been made on military involvement in Syria.
- 5.
We have seen the unwinnable nature of the Afghan conflict. The terrible sores of the Balkan civil wars are still raw enough to remind us of what little effect our intervention had there.
Examples 4 and 5 were classified under the “International stance” theme, while they were originally categorized under the “International stance” theme. Terms such as “Syria”, “Afghan”, “Balkan”, and “civil” strongly occurred in both themes. In the local context, they were used to refer to the social imagination of the country regarding the UK’s experience in international interventions, such as in Iraq and Afghanistan. At the same time, these terms were also used to represent the role of the United Nations and other actors, such as the US. Thus, these findings are consistent with those of Altameemi (2020), in that some lines are in the blurred area between more than one theme.