While the presence of global warming is largely undisputed among experts and public climate change awareness has reached levels of over
in Western countries (Lee et al. 2015
; Poortinga et al. 2018
; Baiardi and Morana 2020
), the verification of a change in the global surface temperature in terms of statistical significance remains a challenging task. The main problem is that there is no consensus about the true nature of the underlying process which has generated the observed temperature record. It may have a deterministic trend, a stochastic trend or even a combination of both. It may also share its trend in certain subperiods with other time series such as greenhouse gas emissions, which would allow us to draw conclusions about the cause of the warming trend. In order to address questions of this type, various sophisticated time series models have been employed in previous studies, a selection of which will be reviewed in Section 1
. However, when we are only interested in the question whether the global surface temperature is changing or not, we may be able to choose a relatively simple setting and possibly avoid to impose implausibly strong assumptions.
In this paper, we consider only fractionally integrated processes with constant mean, and try to distinguish between the stationary case, where the memory parameter d
is less than 0.5, and the non-stationary case, where
. In this setting, the rejection of the stationarity hypothesis
can always be interpreted as evidence of climate change, regardless whether the rejection is due to a large value of d
(infinite variance) or a nonconstant mean (deterministic trend). To emphasize the importance of including the non-stationary, yet mean-reverting case
in the alternative hypothesis of climate change, we plotted realizations of fractional white noise for different values of d
and different sample sizes. Figure 1
g shows that even for a relatively large value of d
in the stationary range, the long-term fluctuations are modest. In contrast, the changes are quite dramatic for
See Figure 1
h. Changes of this magnitude would be disastrous in the case of the global temperature.
However, when testing the hypothesis
, we have to strike a balance between simplicity and plausibility. On the one hand, we must allow for short-range dependence in order to keep the test realistic, hence using only fractionally integrated white noise is not an option. On the other hand, the class of data generating processes (DGPs) must also not be too rich because Pötscher
) has shown that inference on unit roots and cointegration as well as the problem of estimating the memory parameter fall into the category of ill-posed estimation problems under commonly used specifications for the set of feasible DGPs. For example, reliable unit root inference is not possible when this set is rich enough to contain all ARMA(1, 1) processes.
Conventional tests of hypotheses about the memory parameter d
are based on the asymptotic normality of semiparametric estimators for d
which are obtained from the periodogram ordinates at Fourier frequencies in a small neighborhood of frequency zero. The restriction to consider only the lowest Fourier frequencies certainly helps to avoid any distortions caused by short-range dependencies but it also compromises the large-sample assumption required for the asymptotics to kick in. Mangat and Reschenhofer
) therefore developed a small-sample test which requires only that the total sample size is large but not the number Fourier frequencies that are used in the test procedure. Building on this test, Reschenhofer and Mangat
) introduced a new estimator for d
, which performed well in terms of bias and root-mean-square error (RMSE) in an extensive simulation study. Both the test and the estimator will be used together with other methods in Section 3
to search for conclusive evidence against the stationarity of the global air surface temperature. Before, in Section 2
, all employed methods will be described in detail. Section 4
concludes the paper.
A popular way to test for a trend in a time series is based on the simple linear trend model. Under the assumption that the errors are stationary with mean zero and variance
, the OLS estimator
for the slope parameter
is in case of a large sample size n
approximately normally distributed with mean
(Grenander and Rosenblatt 1957
). Using the fact that the error variance
is proportional to the spectral density of the error process at frequency zero, it can be estimated by a suitable lag window estimator based on the sample autocovariances of the OLS residuals. However, as pointed out by Fomby and Vogelsang
), this approach breaks down when there is a unit root in the errors or the errors have long memory because in these cases the spectral density at frequency zero does not exist. In their analysis of six global temperature series, Fomby and Vogelsang
) therefore used the robust trend test proposed by Vogelsang
) instead and were able to reject the null hypothesis of a nonpositive linear trend at the
level in all but one case. The robustness of this test to strong serial correlation implies that it cannot be used for the detection of a stochastic trend, hence other tests must be used for this purpose.
Using various unit root tests to test for the presence of stochastic trends in three temperature series (northern hemisphere, southern hemisphere, global; period: 1865–1994), Stern and Kaufmann
) obtained conflicting results. The augmented Dickey–Fuller (ADF) test (Dickey and Fuller 1979
) and the Kwiatowski–Phillips–Schmidt–Shin (KPSS) test (Kwiatkowski et al. 1992
) suggested that all temperature series are
(integrated of order one), which means that they require differencing to achieve stationarity, whereas the Phillips–Perron test (Phillips and Perron 1988
) and the Schmidt–Phillips test (Schmidt and Phillips 1992
) suggested that all temperature series are I(0) (trend stationary), which means that the subtraction of a deterministic trend is sufficient to achieve stationarity. Taking also the spatial distribution of the temperature into account, Chang et al.
) detected functional unit roots in time series of cross-sectional temperature densities.
Once we have established that surface temperature follows a stochastic trend, we might go one step further and investigate the causes for the changes in surface temperature. This could be done by testing whether surface temperature cointegrates with independent variables like greenhouse gases. The existence of a cointegrating relationship would indicate that there is a common stochastic trend and consequently also that human activity is responsible for the rising temperature. However, the methodological problems do not become any smaller when we move from a univariate analysis to a multivariate analysis. Indeed, the reports of cointegrating relationships by Kaufmann and Stern
) and Kaufmann et al.
) were challenged by Gay-Garcia et al.
) who argued that surface temperature is stationary around a broken linear trend. Kaufmann et al.
) responded that the in-sample forecasts generated by the cointegration model are more accurate than the in-sample forecast generated by the broken trend model. Conceding that a single break may not be enough for the rejection of non-stationarity, Lai and Yoon
) tried up to three breaks and were then able to reject the unit root hypothesis in favor of stationarity about a broken trend for both global and hemispheric temperatures. They used the unit root test proposed by Harvey et al.
), which extends the minimum GLS detrended Dickey—Fuller (MDF) test of Perron and Rodríguez
) by allowing for multiple breaks.
However, the evidence presented by Lai and Yoon
) must be interpreted with caution. Firstly, in relation to the modest sample size of
(observation period from 1880 to 2015) there are simply too many tuning parameters besides the maximum number of breaks, namely the trimming parameters
determining the times
of the earliest possible break and the latest possible break, respectively, the separation parameter
determining the minimum number
of observations between breaks, and the number p
of lags used in the ADF-type autoregression or the maximum number of lags if p
is chosen automatically, e.g., by the modified Akaike information criterion (MAIC) procedure of Ng and Perron
) and Perron and Qu
). Moreover, Lai and Yoon
) also considered three different types of breaks occurring either in the intercept, in the slope, or both in the intercept and the slope. Less problematic are the specifications required for the calculation of the asymptotic critical values because they leave no room for data snooping when the values provided by Harvey et al.
) are used. Secondly, the underlying asymptotics of this test is intuitively not very plausible. Suppose that three breaks in 150 years are a realistic estimate, then we would expect six breaks in 300 years and 30 breaks in 1500 years. Why should the maximum number of breaks remain fixed when the sample size increases? This can only happen when we keep the length of the observation period fixed and increase only the sampling frequency. For example, when we switch from annual observations to weekly observations, the sample size n
increases dramatically from n
but the number of breaks does not change and the minimum number of observations between breaks increases accordingly from
. Clearly, this kind of asymptotics is inappropriate for the investigation of climate change. Not much new can be learned about the long-term trend of the global surface temperature when weekly, daily or hourly measurements are used instead of annual measurements. Unfortunately, the “right asymptotics”, where the number of breaks grows at the same rate as the length of the observation period, does not appear to be a practical option.
Instead of assuming that surface temperature and greenhouse gases have stochastic trends and searching for evidence of a cointegrating relationship, we may as well assume trend-stationarity and search for a common deterministic trend in order to establish that human factors are responsible for the rising temperature. Estrada et al.
) provided evidence for the existence of such a trend, which is characterized by common breaks in the slope. Perron and Yabu
) designed a test for structural breaks in a deterministic trend, which does not require any prior knowledge as to whether the deviations from the trend are difference- or trend-stationary. Applying this test to various temperature series (land and sea, global, northern and southern hemispheres) as well as series of greenhouse gases and other factors contributing to radiative forcing (difference of the sunlight absorbed by the Earth and the energy radiated back to space), Estrada et al.
) found breaks in the slope of the trend function in almost all series but also large differences in the break date estimates. They attributed these differences to natural variability, e.g., low-frequency oscillations such as the Atlantic Multidecadal Oscillation (AMO) and shorter-term variations such as El Nino/Southern Oscillation (ENSO). The results of cotrending analyses of Estrada et al.
) and Estrada and Perron
), which take the AMO into account, show that temperatures and forcing variables share the same long-term trend, which is highly influenced by greenhouse gases.
3. Empirical Results
Important theoretical results for the analysis of long-range dependence have typically been derived under the assumption , hence they cannot be used for the construction of tests of hypotheses such as and , which are the practically testable versions of the hypotheses of interest, and , respectively. Fortunately, there is a big difference between the two cases, which still allows us to get meaningful test results. For example, when we test the hypothesis instead of the hypothesis for the original time series, we clearly cannot interpret a rejection of that hypothesis as evidence against stationarity because the alternative hypothesis still contains the stationary region (0.4, 0.5). In contrast, when we test the hypothesis instead of the hypothesis for the differenced series, a rejection of that hypothesis implies that for the differenced series and consequently also that for the original series, which is strong evidence against stationarity. In our empirical study, we will therefore also have a closer look at the first differences of the temperature series.
The main datasets used in our investigation were the hemispheric and global annual means from 1850 to 2019 of the combined land and marine temperature anomalies (HadCRUT4; Morice et al. 2012
), the land air temperature anomalies (CRUTEM4; see Jones et al. 2012
), and the sea surface temperature anomalies (HadSST3; see Kennedy et al. 2011
). These datasets were developed by the Climatic Research Unit of the University of East Anglia and the Met Office Hadley Centre and are available at the website http://www.cru.uea.ac.uk/
. Temperature anomalies were defined relative to the 1961–1990 temperature mean. Secondary datasets were the NASA Goddard Institute for Space Studies (GISS) hemispheric annual means from 1880 to 2019, which are available at the website https://data.giss.nasa.gov/gistemp/
(Lenssen et al. 2019
). In addition, we also use the annual unsmoothed (1856–2019) and smoothed (1861–2014) AMO series obtained by averaging from the corresponding monthly series, which are available at the website https://psl.noaa.gov/data/timeseries/AMO/
of the NOAA Physical Sciences Laboratory.
First, we looked at the global means of the combined land and marine temperature anomalies from the HadCRUT4 dataset. The central question was whether the clearly noticeable recent rise in temperature (see Figure 2
a) is just a transient phenomenon or a strong indication of non-stationarity.
The periodograms of the temperature series and the differenced temperature series, which are displayed in Figure 2
c,d, respectively, were consistent with a pole at frequency zero in the spectral density of the first series and a zero at frequency zero in the spectral density of the second series. A pole in an ARFIMA spectral density implied that the memory parameter d
was greater than zero and a zero implies that d
was less than zero, hence Figure 2
c suggests that
for the original series and Figure 2
for the differenced series, which implied that
for the original series. More informative was a plot of the log periodogram against
.The slope appeared to be greater than the slope of the lines in the background, which was 0.5 in case of the temperature series (see Figure 2
e) and −0.5 in case of the differenced series (see Figure 2
f), hence we concluded that
, which implied that the temperature series was non-stationary. This finding was corroborated by the estimates of d
obtained with the estimators
(log periodogram regression),
(simple smoothing by averaging over neighboring periodogram ordinates with
(smoothing with Parzen window and truncation point
(narrow-band Whittle likelihood) and
and (Kolmogorov–Smirnov goodness-of-fit test), which were all greater than 0.86 (see Table 1
Finally, in order to verify our findings in terms of statistical significance, we also employed the small-sample test described in Section 2.2
. The hypotheses to be tested were of the form
in case of the original temperature series and
in case of the differenced series. For this type of hypotheses, the one-sided test
is appropriate. The test results obtained by using
Fourier frequencies are shown in the third column of Table 2
. Clearly, the choice of a small value increased the probability of a type II error (non-rejection of a false null hypothesis) because of insufficient sample size whereas a large value increased the probability of a type I error (rejection of a true null hypothesis) because of distortions caused by short-range dependencies. For all values of
, the null hypothesis could be rejected at the
. As pointed out earlier, the rejections for
were not conclusive because the alternative hypotheses still contained the stationary regions (0.4,0.5) and (0.49,0.5), respectively. The problem with
was the lack of theoretical justification. We have therefore carried out a simulation study to search for possible overrejections. The results showed that this problem was of no great concern (see Table 3
). However, the test results obtained with
for the differenced series were significant throughout anyway.
In our frequency-domain analysis, we assumed that the periodogram in the neighborhood of frequency zero was not distorted by any short and medium-range dependencies. However, as pointed out by Estrada et al.
), there is some low-frequency natural variability caused by oceanic cycles which needs to be addressed. In the following, we will focus on the AMO, which is a striking pattern visible in a plot of the detrended North Atlantic sea surface temperature. Compared to shorter-term variations such as the ENSO, the AMO has a much greater amplitude and a much longer period (six decades or more). Naturally, it has a large influence on the northern hemisphere climate. Estrada and Perron
) rejected the unit root hypothesis in favor of trend stationarity with a break in the rate of growth both for temperature and forcing series and observed that all series have approximately the same break dates once the AMO is filtered out from the global and northern hemisphere temperature series. Because of the different degrees to which the two hemispheres are affected by the AMO, we tested the stationarity hypothesis separately for the northern and southern hemispheres. For this purpose, we used the two hemispheric temperature series from the HadCRUT4 dataset. The results are shown in in the 5th and 6th columns of Table 2
For the northern hemisphere (NH), they were similar to those obtained for the global series and generally weaker in case of the southern hemisphere (SH). Noticeable in the latter case was the discrepancy between the original series and the differenced series, which could not be explained by a simple type of short-term autocorrelation. Indeed, Table 4
shows that in the case of a simple ARFIMA(1,d
,0) model, the number of rejections of a false null hypothesis
generally decreased as
increased from 0.4 over 0.49 and 0.5 to 0.6 (which corresponded to −0.4 after differencing). Clearly, the effect of differencing is less predictable in case of a large low-frequency oscillation such as the AMO. The best way to deal with the AMO properly is to remove it. The problem is that it is unknown and—despite its long history of possibly hundreds of years (see Delworth and Mann 2000
)—also difficult to estimate because each cycle has a different period, phase, amplitude, and non-sinusoidal shape.
By definition, the AMO is closely related to the NH sea temperature series. In Figure 3
h, this close relationship is illustrated graphically by applying the Hodrick—Prescott (HP) filter with a medium and a large smoothing parameter to the NH sea temperature series and comparing the difference with the smoothed AMO series. The similarity to the smoothed AMO series was still visible when the global sea temperature series (Figure 3
g) or the corresponding combined sea and land temperature series (Figure 3
b: NH, Figure 3
a: global) are used instead. This meant that, in principle, an oscillation like the AMO could also have been defined on the basis of the global combined temperature series. However, in this case, we would be reluctant to filter out the whole oscillation because we would be aware that it is the sum of an unknown “periodic” component and random fluctuations with similar periods. This line of reasoning is corroborated by Figure 4
d, which shows the log periodogram of the global temperature series after the removal of the AMO (by a regression on the smoothed AMO or on an estimate obtained by applying a HP-filter to the unsmoothed AMO), and the corresponding figures for the two hemispheric series (Figure 4
e,f). While the third periodogram ordinate (from the right because the log periodogram was plotted against −1 times the log frequency) was well above the regression lines in Figure 2
e, it was now substantially smaller, which indicates that the AMO removal has possibly gone too far. Noting that the AMO can be approximated very well by a sinusoid with a period of 61 years and still quite satisfactorily by a sinusoid with a period of
years (see Figure 4
a), which corresponds to the third Fourier frequency, we may alternatively account for the AMO in the frequency domain just by omitting the third periodogram ordinate.
For a robustness check of our earlier test results, we employed the test based on the statistic
again, first after the removal of the AMO in the time domain by regressing the temperature on the smoothed AMO and second after the omission of the third periodogram ordinate. The fact that the first approach probably removed too much and the second approach too little is not the only possible cause for discrepancies in the results. There was also a difference in the sample sizes which was due to the fact that the AMO series were shorter. We therefore used the HadCRUT4 dataset only for the examination of the effect of the AMO-removal in the frequency domain. For the AMO-removal in the time domain, we used the shorter NH temperature series from the GISS dataset and, in addition, replaced the smoothed AMO series by the series obtained by applying the HP-filter with
to the unsmoothed AMO. Table 2
shows that the test results obtained for the global temperature series hardly changed when the third Fourier frequency was omitted. Furthermore, comparing the last two columns of Table 2
, which show the test results for the original and the AMO-detrended NH temperature series, respectively, we again found no noticeable differences, hence we concluded that the rejection of the null hypothesis of stationarity was not just due to the AMO.
In this article, we have employed new frequency domain-methods for the investigation of global and hemispheric temperature series in order to obtain conclusive evidence of a long-term persistent change in the climate. Regardless whether there is a deterministic upward trend, a stochastic trend, or too strong serial correlation in the temperature series, there will always be a very steep increase in the spectral density in the neighborhood of frequency zero, which can be estimated or subjected to a formal statistical test. The advantage of this focus on a narrow frequency band is that no strong assumptions about the data generating process are required. The disadvantage is that the proof of a steep increase implies only that the temperature series are non-stationary but does not provide any details about the true nature of these series.
While establishing evidence of global warming in a statistically sound way is important in its own right, issues of more practical relevance must, of course, still be studied in a more concrete setting. For example, employing highly complex econometric models, Morana and Sbrana
) found that global warming (which is driven by radiative forcing) might directly enhance Atlantic hurricanes intensity/risk, which is inconsistent with the steady decline in the risk premia for catastrophe bonds since their introduction in the mid-1990s. Pretis
) showed that energy-balance models, which describe the change in temperatures as a function of incoming and outgoing radiation, can be mapped to cointegrated vector autoregressions, which allows the use of standard econometric techniques for the estimation of climate responses and the testing of their feedback.
Unfortunately, conventional tests of stationarity are based on asymptotic arguments and are therefore not really appropriate for temperature series, which are relatively short because there are too few data available from the period before 1850 (or even 1880 under more stringent criteria) for the estimation of global averages (for analyses of longer time series which are based on estimates rather than measurements, see, e.g., (Mills 2012
; Davidson et al. 2016
)). We have therefore employed a new small-sample test proposed by Mangat and Reschenhofer
), which is based on the comparison of the cumulative normalized periodogram in a narrow frequency band close to frequency zero with the cumulative distribution function of the standard uniform distribution. This test requires only that the standard distributional assumptions for the included periodogram ordinates hold approximately, which is plausible when the sample size is not too small. However, in contrast to conventional frequency-domain tests, the number of included frequencies do not have to be large too. The results obtained with this test after suitably transforming the temperature series and slightly modifying the null hypothesis are highly significant. These findings are corroborated by the estimates of the memory parameter obtained with various methods, which lie all in the region of non-stationarity.
However, even when the frequency band under consideration is very small, there is still the danger of distortions caused by low-frequency oscillations such as the AMO. Since the AMO mainly influences the northern hemisphere climate, we checked the robustness of our findings by testing the hypothesis of stationarity separately for the NH temperature series after the removal of the AMO. Regardless whether the AMO was removed in the time domain or in the frequency domain, the test results remained significant. We conclude that our empirical evidence of a persistent change in the surface temperature is genuine and not just due to a stationary phenomenon like the AMO.