1. Introduction
Suicide is a leading cause of death and a global disease burden, accounting for nearly one million annual deaths across the world [
1]. Annual suicide figures are critical to understanding risk and guiding research, including the study of biological, social, psychological, and economic factors that may vary with data monitoring [
2]. However, a significant lag between monitoring and public reporting of suicides often delays and challenges real-time interventions [
3]. This becomes a significant barrier when factors that affect suicide rates shift more rapidly and may have peaked and waned before their association with elevated suicide risk can be identified. The multifactorial nature of suicide risk poses a further challenge, as risk factors may change over time, according to specific demographics or subgroup types.
Suicide is the 18th leading cause of death across all ages, and the 2nd leading cause of death among young people (12 to 24 years old) [
4]. In 2015, 425 deaths by suicide were recorded in Ireland, representing a rate of 9.2 per 100,000 of the population. Similar to increased risk observed in Europe and the United States [
4], the majority of suicide deaths (e.g., >74%) were male [
5]. The Institute of Medicine (IOM) further estimates that an additional 25 suicide attempts (100–200 for youth) occur for every suicide death, accounting for nearly 500,000 emergency visits annually in the United States [
6,
7]. A national suicide reduction strategy has been developed in Ireland [
8], which aligns with coordinated strategies by the World Health Organization [
9]. The Central Statistics Office in Ireland is responsible for releasing national suicide statistics, which are published with a delay of approximately two years or longer [
5]. This issue has prompted increased calls for reporting advancements to guide epidemiology and enhanced surveillance.
Individuals at risk of suicide may use the Internet for a number of suicide-related reasons, such as to anonymously share suicidal thoughts with others and to seek out social connections [
10], to access confidential support from suicide prevention service programs [
11], and to visit websites that may contain information, such as on suicide methods [
12]. Longitudinal studies conducted by Sueki and colleagues reported that suicide-related Internet use increased suicidal ideation and depression over time [
13,
14]. However, opportunities are also available to harness the positive potential the Internet offers, whereby clinicians can explore an individual’s Internet use as part of a suicide risk assessment process, as well as to develop personalized online safety practices as part of their crisis planning. Given that an estimated 85% of the global population is covered by a commercially-available wireless signal [
15], and in 2012, 72% of United States Internet users searched the Internet for health topics [
16], researchers have looked to Internet searches as a potentially new information source in the surveillance and monitoring of suicidal behavior to inform advancements in risk detection and intervention opportunities [
3,
13,
14,
17,
18,
19,
20,
21]. Facebook recently released a press briefing, noting real-time suicide prevention tools, which use artificial intelligence to identify signs of risk with advanced options to enhance connection to additional services (e.g., inFacebook Live), potentially providing the promise of real-time safety monitoring, as well as research that may advance risk prediction [
22]. However, such approaches must be transparent and ethically sound.
Google is the most commonly-used search engine in the world, representing 74.54% of the global market share in 2017 [
23]. Epidemiologists have monitored the use of Internet search engines, such as Google, to successfully track epidemics to accelerate real-time understanding of risk or data trends [
24,
25]. For example, by monitoring changes in help-seeking behavior in the form of Internet search volumes for phrases closely linked with a specific pathogen, disease outbreaks and epidemics can be identified and thus acted on. Google Trends is a website that acts as an online log of Internet search volumes performed on the Google.com search engine. Google Trends allows public access to statistics on queries performed on the Google search engine. It reports search volumes, as opposed to raw counts, for a particular term as a portion of the total number of searches for a given area. The data are adjusted for overall search volume in the geographical area, and the search data date back to January 2004 [
26].
Several researchers have utilized Google Trends to identify outbreaks of infectious diseases, such as influenza [
24,
25], chickenpox [
27], and gastroenteritis [
28]. Ginsberg et al. [
25] analyzed Google Trends search queries to track influenza-like illness in the United States. The authors reported that the relative frequency of certain queries (e.g., cold/flu remedies, influenza symptoms) was highly correlated with the percentage of physician visits in which a patient presented with influenza-like symptoms. In respect to suicide reporting, McCarthy [
3] applied Google Trend analysis to the study of suicide risk on a population-wide level. Google Trends was used to generate search volumes for the terms “suicide”, “teen suicide”, “depression”, “divorce”, and “unemployment”. Google Trends data were subsequently compared to official Centre for Disease Control and Prevention statistics for the corresponding years for suicide deaths and intentional self-injury for years 2004–2007 in the United States. The results showed that, among the general population, there was no correlation between search volume for the term “suicide” and purposeful self-injury. In contrast, there was a strong negative correlation between the Internet search term “suicide” and suicide deaths (
r = −0.9002). Importantly, data for youth (i.e., aged 18–25 years) differed markedly from those of the general population. Search volume for “suicide” was positively correlated with both intentional self-injury (
r = 0.498) and suicide deaths (
r = 0.699). The author hypothesized that this inverse correlation (i.e., between suicide-related Internet searches and suicides in the general population) indicates that the Internet is used by many to seek help or otherwise reduce suicide risk, which may vary significantly by age. Given that suicide-related Internet searches were positively correlated with self-injury and suicide among youth, the author proposed that this group may use the Internet to facilitate self-injury, suggesting greater Internet use risk for this demographic group.
Furthermore, several researchers have extended this research by exploring associations between Internet searches relating to suicide and suicide rates in various populations. This includes exploring the volume of search terms for suicide (e.g., “suicide”, “commit suicide”), risk factors for suicide (e.g., “depression”, “divorce”, “unemployment”), and specific suicide methods (e.g., “suicide by jumping”, “hanging”). Yang et al. [
19] explored the association between monthly suicide rates in Taipei City, Taiwan, and Internet search volumes for 37 suicide-related terms during the period from January 2004 to December 2009. Results revealed that many of the Internet search terms were associated with actual suicide deaths. Searches for “major depression” and “divorce” accounted for, at most, 30.2% of the variance in suicide data. Their analysis also revealed that Internet search trends were associated with different means of suicide. Non-violent suicide was associated with searches for domestic violence and insomnia. The search trend for the title of a forbidden but popular pro-suicide book in Taiwan and Japan was associated with violent and male suicide deaths. In Japan, the monthly search volume for the terms “suicide” and “suicide method” was not significantly correlated with the monthly suicide rate. However, the volume of Google searches using the search term “utsu” (depression) was positively correlated with the suicide rates in the same or previous month and was negatively correlated with suicide rates after three months [
17]. In the United States, Gunn and Lester [
18] reported marginally significant positive associations between suicide rates and search volumes for the terms “commit suicide” (
p = 0.01) and “how to suicide” (
p = 0.07). The association between suicide rates and the search volume for “suicide prevention” was significant and positive (
p = 0.001), suggesting that people are looking to the Internet for help and are potentially not finding it. Such findings may inform opportunities for intervention, as well as real-time monitoring of suicide risk at the population-level, in some cases, according to age and suicidal behavior.
To increase model validity, researchers have controlled for variables that may confound the relationship between suicide-related search data and suicidal behavior. Bruckner et al. [
29] applied rigorous time-series routines to control for temporal patterns of suicide when exploring the association between Internet search terms and suicide rates in England and Wales from 2004–2010. The researchers also controlled for unemployment rates and Google searches in the news, which often peak after suspected suicides of celebrities, but which may or may not signal increases in help-seeking or depression. This is relevant as such cases have been previously reported in the literature and reveal the effects of media contagion [
30]. For the three searches that included the term “depression”, a positive relationship with suicide in that month was found. The strongest positive relationship occurred between the Google Trends query for “depression and help” and suicide incidence in the same month (
p = 0.002). No relationship was found between searches for “suicide” or “suicide and methods” and suicide incidence. In contrast, Kristoufek et al. [
21] found that a greater number of searches for the term “depression” was related to fewer suicides, whereas a greater number of searches for the term “suicide” was related to more suicides in England between 2004 and 2013.
In 2017, Tran et al. [
20] evaluated the validity and utility of Google Trends search volumes for behavioral forecasting of risk/suicide rates in the United States of America, Germany, Austria, and Switzerland. The researchers concluded that the validity of Google Trends search volumes for behavioral forecasting of national suicide rates is low, and they proposed several recommendations to increase the reliability and stability of the use of data obtained from Google Trends, which will be incorporated into the present study. Such recommendations include the use of specific search terms instead of broad terms (i.e., “suicide”), in contrast to previous approaches in the literature, and the presence/absence of quotation marks when retrieving the search query volume.
Nonetheless, the use of internet sources for statistical purposes should be used with caution. Selection bias is a predominant issue due to the uneven Internet penetration among and within countries, the population covered by these sources is also subject to daily changes, and often there is difficulty in linking the data to other datasets [
31,
32,
33]. In detail, for search queries, one must also be wary of several factors such as changes to the search algorithm [
34] and media events which lead to an unexpected behavior [
35].
The present study aims to apply search query volumes to help forecast suicide outcomes in Ireland. This contrasts with the common use of historical suicide records for forecasting, without the consideration of the use of other sources. Furthermore, our comparison to the United Kingdom aims to clarify that despite cultural and geographic proximity, search behavior online can vary, thus, approaches must be targeted at country-level. This study will address a gap in the existing literature wherein the use of search query data to forecast suicide occurrences in Ireland remains unexplored. Ireland is a prime case study for the application of search query data; English is the predominant language and Internet access is present in almost 90% of the households [
36]. Suicide is also a leading cause of death in Ireland, particularly in young people and women [
37]. Similar research has not been conducted in this jurisdiction to date. Thus, we identify the most informative terminologies used by the population of Ireland and state the benefits of applying Google Trends for suicide forecasting in Ireland. The current study employs a broad dataset spanning eleven years. Our study requests Google Trends search volumes for search queries relating to “depression” and “suicide” and employs a collection of specific search queries (e.g., “how to commit suicide”) gathered from Tran et al. [
20], in addition to suggested queries specific to Ireland. The generated search volumes are explored in terms of their relationship to Irish deaths by suicide statistics published by the Central Statistics Office, while controlling for unemployment and temporal patterns in suicide. Furthermore, we apply vector autoregression and neural network autoregression techniques to forecast the suicide outcomes in Ireland.
5. Conclusions
Our work extends previous research by improving the methodology, focusing on country-specific search queries, applying neural network autoregression, and applying it to the forecasting of suicide rates in Ireland, where such analysis has not been completed previously. Whilst using previously determined search queries, we extended these by gathering terms specific for the region of interest. Through a selection approach, we determined the most relevant queries which suggest the strong relevancy of pro-suicide queries (i.e., “suicide methods”, “suicidal”, “how to commit suicide”) and related medical conditions (i.e., “anxiety” and “postnatal depression”). The application of search query volume geographically restricted to Ireland shows the improvement in predicting changes in the number of suicide occurrences in the country. Furthermore, the performance achieved by the neural network autoregression suggests that this approach can yield more accurate predictions than traditional autoregression, for suicide forecasting.
Our results support the value in applying indirect sources, namely, Google Trends, for the forecasting of suicide occurrences in Ireland. These models are an added benefit for public health officials as they can anticipate changes in the number of suicide occurrences, indicating when more attention or caution should be applied. Hence, this collaborative research has created a novel tool for improving current health policies in Ireland. As suicide is influenced by a variety of psychosocial, biological, environmental, economic and cultural factors, the prediction of suicides is a highly complex task. Our approach utilizes search queries volumes and unemployment records as a proxy for some of these factors. The knowledge and applications provided by this work are three-fold: (1) this approach allows us to infer the search behavior of people at risk of suicide, i.e., the query “depressed” is commonly related to the search of “suicide”; (2) it can be used to determine early predictors of increased suicidal behavior, i.e., the search volume of suicide-related queries can indicate an increase or decrease in suicide occurrences; (3) it can provide further insights into to new trends (e.g., economical or behavioral) that are related with suicide occurrences, i.e., movie/tv releases can lead to an increase in suicide-related queries. The direct application of these findings by the public health agencies can be seen in improved and targeted suicide prevention campaigns capable of addressing the predominant issues discovered through the query analyses and to affect the largest number of people possible. For example, the search queries here utilized are suggested to be significant for the prediction of suicide occurrences; hence when these are queried, supportive messages and counselling services can be displayed to the user. Furthermore, search queries can also reveal timing and targets of prevention campaigns; as an example, highly publicized suicides (e.g., in movies, tv shows, celebrities) lead to increases in suicide related queries [
64]; hence, through the identification of queries that affect suicide-related searches we can target source for an increased suicide risk.
This approach was also tested in another English-speaking county, the UK, to determine the quality and adequacy of the selected search queries for suicide forecasting. Our positive results further support the benefits of utilizing Google Trends (even in less populous countries such as Ireland), as well as the forecasting ability and generalization capabilities of a limited number of queries for suicide forecasting. Although our models were tested with United Kingdom data, other English-speaking countries, such as the United States of America, could be used for evaluations; however, it is important to acknowledge the additional challenges this brings, for example, regional and state-level differences, as well as in-state (rural and urban areas) variations.
Future research includes the identification of events that trigger increases in the public’s attention or interest in suicide, leading to a change in their online search behavior. This information could potentially be extracted from other data sources and added to the models, as additional knowledge may improve forecasting ability. Recent technological advancements show promise and new opportunities for the forecasting of suicide occurrences. Potential directions for future research include the application of machine learning algorithms, as well as natural language processing to extract information from textual records and conduct prediction with a large number of variables [
65].