#### Appendix A.1. On Early Non-Systematic Measurements of CO_{2}

This Appendix (not contained in Version 1 of our paper) addresses comments by all three reviewers of Version 1, Yog Aryal [

85], Ronan Connolly [

14], and Stavros Alexandris [

86], about the reasons why we delimit our analysis to the period 1980–2019. The two latter reviewers suggested using earlier data compiled by Beck (2007), who referred to old chemical analyses of atmospheric concentration of CO

_{2}.

We are sympathetic to the passion of the late Ernst-Georg Beck who, being a biology teacher, sacrificed a lot of time and effort to the exciting exercise of digging out old CO_{2} measurements. Indeed, it could be worthwhile to have a critical look at the historical data and to try to make order in them and utilize them. However, this would certainly warrant an individual paper with this particular aim.

Historically, it was not the first review paper of this sort. For instance, in his

Table 1, Beck [

87] refers to old works by Letts and Blake (~1900; [

88]), who considered 252 papers with data (all in 19th century), and to Stepanova [

89], who considered 229 papers with data (130 in 19th century and 99 in 20th century). Beck himself [

87] considered 156 papers with data (82 in 19th and 74 in 20th century).

As usual, it is instructive to consider the paper by Beck [

87] jointly with critical commentaries published later in the journal where the original paper appeared [

90,

91]. In particular, R.F. Keeling [

90] opined that the old chemical measurements examined by Beck [

87] “exhibit far too much geographic and short-term temporal variability to plausibly be representative of the background. The variability of these early measurements must therefore be attributed to ‘local or regional’ factors or poor measurement practice”. Keeling [

90] also noted “basic accounting problems”. “Beck’s 11-year averages show large swings, including an increase from 310 to 420 ppm between 1920 and 1945 (Beck’s

Figure 11)”. “To drive an increase of this magnitude globally requires the release of 233 billion metric tons of carbon to the atmosphere. The amount is equivalent to more than a third of all the carbon contained in land plants globally. […] To make a credible case, Beck needed to offer evidence for losses or gains of carbon of this magnitude from somewhere. He offered none.”

Meijer [

91] expressed the opinion that Beck’s work “contains major flaws, such that the conclusions are wrong”. He also wrote: “The measurements presented in the paper are indeed useless for the purpose the author wants to use them, certainly in the way the author interprets them”. There is a lack of interpretation of diurnal and seasonal variability (effects called the “diurnal” and the “seasonal” rectifier in the literature) and consideration of atmospheric mixing or lack thereof. Meijer also criticized the lack of meta-data: “The necessary data to judge, namely measurement height, consecutive length of a record and especially temporal resolution, are lacking in [Beck’s] Table 2. In the light of the above, the whole ‘Discussion and Conclusion’ section is invalid, including [Beck’s]

Figure 11,

Figure 12,

Figure 13 and

Figure 14”. Indeed, the records mentioned in Beck’s Table 2 were local and short-lasting, with the longest periods being 1920–1926. Beck’s

Figure 11 and

Figure 13 show concatenated short segments of data at different places.

There are some other puzzling elements in Beck’s paper. For instance, in his

Figure 5, referring to data from a meteorological station near Giessen, the variability of high amplitude seems suspicious and not physically realistic. In particular, from June to August 1940, the measured CO

_{2} concentration increases from 340 to 550 ppm (much more than in Beck´s

Figure 5 discussed by Keeling [

90] and Meijer [

91] as quoted above), with weird seasonal behaviour. Beck himself admitted that the results for Giessen “need to be adjusted downwards to take account of anthropogenic sources of CO

_{2} from nearby city, an influence that has been estimated as lying between 10 and 70 ppm […] by different authors”.

The controversy and disputes among these authors extended beyond pure scientific issues. Thus, Beck [

87] wrote “[t]he data accepted […] had to be sufficiently low to be consistent with the greenhouse hypothesis of climate change controlled by rising CO

_{2} emissions from fossil fuel burning”. On the other hand, Meijer [

91] wrote: “The author even accuses the pioneers Callendar and [Charles David] Keeling of selective data use, errors or even something close to data manipulation”. In addition, [R.F.] Keeling [

90] noted: “Beck is […] wrong when he asserts that the earlier data have been discredited only because they don’t fit a preconceived hypothesis of CO

_{2} and climate. […] Instead, the data have been ignored because they cannot be accepted as representative without violating our understanding of how fast the atmosphere mixes”.

In view of the above questions about the data reliability as well as the controversies and disputes, we decided to limit the period of our study in 1980–2019 in which the measurements are systematic and verifiable because they are made in several locations simultaneously.

#### Appendix A.3. Some Notes on Time Directionality of Causal Systems

In a unidirectional causal system in continuous time

t, in which the process

$\underset{\_}{x}\left(t\right)$ is the cause of

$\underset{\_}{y}\left(t\right)$, an equation of the form

should hold [

67], where

$\alpha \left(t\right)$ is the impulse response function. The causality condition is thus

Here, we consider systems with positive dependence, in which

$\alpha \left(t\right)\ge 0$ for

$t\ge 0$, which are possibly also excited by another process

$\underset{\_}{v}\left(t\right)$, independent of

$\underset{\_}{x}\left(t\right)$. Working in discrete time, we write

Assuming (without loss of generality) zero means for all processes, multiplying by

${\underset{\_}{x}}_{\tau -\eta}$, taking expected values, and denoting the cross-covariance function as

${c}_{xy}\left[\eta \right]\u2254\mathrm{E}\left[{\underset{\_}{x}}_{\tau -\eta}{\underset{\_}{y}}_{\tau}\right]$ and the autocovariance function as

${c}_{x}\left[\eta \right]\u2254\mathrm{E}\left[{\underset{\_}{x}}_{\tau -\eta}{\underset{\_}{x}}_{\tau}\right]$, we find

For

$\eta >0$, using the property that

${c}_{x}\left[\eta \right]$ is an even function (

${c}_{x}\left[\eta \right]={c}_{x}\left[-\eta \right]$), we get

and for the negative part

With intuitive reasoning, assuming that the autocovariance function is decreasing (${c}_{x}\left[{j}^{\prime}\right]<{c}_{x}\left[j\right]$ for ${j}^{\prime}>j$), as usually happens in natural processes, we may see that the rightmost term of Equations (A9) and (A10) should be decreasing functions of η (as for ${j}^{\prime}>j$ it will be ${c}_{x}\left[{j}^{\prime}-\eta \right]<{c}_{x}\left[j-\eta \right]$ and ${c}_{x}\left[{j}^{\prime}+\eta \right]<{c}_{x}\left[j+\eta \right]$). However, the term ${{\displaystyle \sum}}_{j=0}^{\eta -1}{\alpha}_{j}{c}_{x}\left[\eta -j\right]$ of Equation (A9) is not decreasing. Therefore, it should attain a maximum value at some positive lag $\eta ={\eta}_{1}$. Thus, a positive maximizing lag, $\eta ={\eta}_{1}>0$, is a necessary condition for causality direction from ${\underset{\_}{x}}_{\tau}\mathrm{to}{\underset{\_}{y}}_{\tau}.$ Conversely, the condition that the maximizing lag is negative is a sufficient condition to exclude the causality direction exclusively from ${\underset{\_}{x}}_{\tau}\mathrm{to}{\underset{\_}{y}}_{\tau}$.

All above arguments remain valid if we standardize (divide) by the product of standard deviations of the processes ${\underset{\_}{x}}_{\tau}$ and ${\underset{\_}{y}}_{\tau}$ and, thus, we can replace cross-covariances ${c}_{xy}\left[\eta \right]$ with cross-correlations ${r}_{xy}\left[\eta \right]$ (or, in the case of differenced processes, ${r}_{\tilde{x}\tilde{y}}\left[\nu ,\eta \right]$).

#### Appendix A.4. Some Notes on the Alternative Procedures on Causality

Reviewer Yog Aryal [

85] opined that we missed referring to the recent relevant works by Hannart et al. [

92] and Verbitsky et al. [

93]. In response to this comment, we include this Appendix (not contained in Version 1 of our paper) explaining, in brief, why we do not compare our results with the ones of those studies, also noting that only the latter study contains material that is prima facie comparable to ours. The former study, focusing on the so-called causal counterfactual theory, is more theoretical and also much more interesting. While we, too, are preparing a theoretical study, in which we will discuss some theories in detail, in this Appendix, we give some key elements of our theoretical disagreements and a counterexample that illustrates the disagreements.

We first note that in order to define causality, Hannart et al. [

92] refer to the work on the 18th century philosopher David Hume and, in particular, his famous book

Enquiry concerning Human Understanding [

94] first published in 1748. From this book, we wish to quote the following important passage, which emphasizes the difficulties even in defining causality:

Our thoughts and enquiries are, therefore, every moment, employed about this relation: Yet so imperfect are the ideas which we form concerning it, that it is impossible to give any just definition of cause, except what is drawn from something extraneous and foreign to it.

Hannart et al. [

92], while studying the probability of occurrence of an event

Y, introduced the two-valued variable

${X}_{f}$ to indicate whether or not a forcing

f is present, and continue as follows:

The probability${p}_{1}=P(Y=1|{X}_{f}=1)$of the event occurring in the real world, with f present, is referred to as factual, while${p}_{0}=P(Y=1|{X}_{f}=0)$is referred to as counterfactual. Both terms will become clear in the light of what immediately follows. The so-called fraction of attributable risk (FAR) is then defined asThe FAR is interpreted as the fraction of the likelihood of an event that is attributable to the external forcing.

They also show that under some conditions, FAR is a probability which they denote PN and call probability of necessary causality. They stress that it “is important to distinguish between necessary and sufficient causality” and they associate PN (or FAR) “with the first facet of causality, that of necessity”. They claim to have “introduced its second facet, that of sufficiency, which is associated with the symmetric quantity $1-\left(1-{p}_{1}\right)/\left(1-{p}_{0}\right)$”; they denote it as PS, standing for probability of sufficient causality.

Central to the logical framework of Hannart et al. [

92] is the notion of

intervention of an experimenter, which is equivalent to experimentation with the ability to set the value of the assumed cause to a desired value. Clearly, this is feasible in laboratory experiments and infeasible in natural processes. The authors resort to the “so-called

in silico experimentation” which, despite the impressive name chosen, is intervention in a mathematical model that represents the process. Hence, objectively, they examine the “causality” that is embedded in the model rather than the natural causality. One may argue that this it totally unnecessary. It would be better to inspect the model’s equations or code to investigate what causality has been embedded in the model instead of running simulations and calculating probabilities. In particular, if the models used are climate models as in [

92], their inability to effectively describe (perform in “prime time”) the real-world processes [

50,

95,

96,

97,

98,

99,

100] makes the entire endeavour futile. Another notion these authors use is

exogeneity, which is related to the so-called

causal graph, reflecting the assumed dependencies among the studied variables. Specifically, they state “a sufficient condition for

X to be exogenous wrt any variable is to be a top node of a causal graph”.

Here, we will use the simple example of

Section 4.2, temperature–clothes weight–sweat, to show that using the quantities FAR (or PN) and PS may give spurious results that do not correspond to necessary or sufficient conditions for causality, at least with their meaning in our paper.

We use the two-valued random variables $\underset{\_}{x},\underset{\_}{y},\underset{\_}{z}$ to model the states of temperature, clothes weight, and sweat, respectively. We designate the following states:

x = 1: being hot above a threshold;

y = 1: wearing clothes with weight below a threshold;

z = 1: sweat quantity above a threshold;

and the opposite states with $x=0,y=0,z=0$, respectively. We choose the threshold of temperature so that $P\left\{\underset{\_}{x}=0\right\}=P\left\{\underset{\_}{x}=1\right\}=0.5$ and that of clothes weight so that $P\left\{\underset{\_}{y}=0\right\}=P\left\{\underset{\_}{y}=1\right\}=0.5$. We choose a small probability, 0.05, of wearing light clothes when cold, or heavy clothes when hot, i.e., $P\left\{\underset{\_}{y}=1|\underset{\_}{x}=0\right\}=P\left\{\underset{\_}{y}=0|\underset{\_}{x}=1\right\}=0.05$ (generally, we avoid choosing zero probabilities; rather the minimum value we choose is 0.05).

Using the definition of conditional probability,

we find the probability matrix

A with elements

${a}_{ij}=P\left\{\underset{\_}{x}=i,\underset{\_}{y}=j\right\}$ as follows:

Now, we assign plausible values to the conditional probabilities of high sweat, $P\left\{\underset{\_}{z}=1|\underset{\_}{x}=x,\underset{\_}{y}=y\right\}$, as follows:

Cold, heavy clothes: $P\left\{\underset{\_}{z}=1|\underset{\_}{x}=0,\underset{\_}{y}=0\right\}=0.2$

Cold, light clothes: $P\left\{\underset{\_}{z}=1|\underset{\_}{x}=0,\underset{\_}{y}=1\right\}=0.1$

Hot, heavy clothes: $P\left\{\underset{\_}{z}=1|\underset{\_}{x}=1,\underset{\_}{y}=0\right\}=0.95$

Hot, light clothes: $P\left\{\underset{\_}{z}=1|\underset{\_}{x}=1,\underset{\_}{y}=1\right\}=0.80$

Again, we have avoided setting any of the conditional probabilities to 0 (or 1), and we have used multiples of 0.05 for all of them.

Using the definition of conditional probability in the form

we find the joint probabilities for each of the triplets

$\left\{x,y,z\right\}$ that are shown in

Table A1.

**Table A1.**
Joint probabilities $P\left\{\underset{\_}{x}=x,\underset{\_}{y}=y,\underset{\_}{z}=z\right\}$ for all triplets $\left\{x,y,z\right\}$

**Table A1.**
Joint probabilities $P\left\{\underset{\_}{x}=x,\underset{\_}{y}=y,\underset{\_}{z}=z\right\}$ for all triplets $\left\{x,y,z\right\}$

x | y | z = 0 | z = 1 |
---|

0 | 0 | 0.38 | 0.095 |

0 | 1 | 0.0225 | 0.0025 |

1 | 0 | 0.00125 | 0.02375 |

1 | 1 | 0.095 | 0.38 |

| $P\left\{\underset{\_}{z}=z\right\}=$ | 0.49875 | 0.50125 |

Now, assume that we let an “artificial intelligence entity” (AIE) decide on causality based on the probability rules of the Hannart et al. [

92] framework. Our AIE has access to numerous videos of people and is “trained” to assign accurate values of

y and

z, referring to clothes and sweat, based on the images in videos. In the video images, no thermometers are shown and, thus, our AIE cannot assign values of

x, nor can it be aware of the notion of temperature. Our AIE tries to construct a causal graph putting, say,

$\underset{\_}{y}$ as a top node and

$\underset{\_}{z}$ as an end node; hence, it assumes that

$\underset{\_}{y}$ is exogenous. Based on the huge information it can access, our AIE can (a) claim that it has constructed a prediction model based on one part of the data (e.g., using the so-called deep-learning technique) and, hence, is able to perform “in silico experimentation” (even though this is not absolutely necessary) and (b) accurately estimate the joint and conditional probabilities related to

$\left\{y,z\right\}$ using either the model, the data, or both. Provided that the dataset is large enough, it will come up with the true values for the conditional probabilities, which are

${b}_{ij}=P\left\{\underset{\_}{y}=i,\underset{\_}{z}=j\right\}$ and

${c}_{ij}=P\left\{\underset{\_}{z}=j|\underset{\_}{y}=i\right\}$, and form the matrices

B and

C, respectively, with values as follows:

Here, the true values

${b}_{ij}$ have been determined from the values of

Table A1 noting that

and the true values

${c}_{ij}$ have been determined from the definition of conditional probability:

Our AIE will then implement the causality conditions of sweat on clothes weight, assigning ${p}_{0}=P\left\{\underset{\_}{z}=1|\underset{\_}{y}=0\right\}=0.2375$ and ${p}_{1}=P\left\{\underset{\_}{z}=1|\underset{\_}{y}=1\right\}=0.765$. It will further calculate the probability of necessary causality as PN = 0.690, and the probability of sufficient causality even higher, PS = 0.692. Hence, our AIE will inform us that there is all necessary and sufficient evidence that light clothes cause high sweat.

Now, coming to the study by Verbitsky et al. [

93], we notice that it assumes that “each time series is a variable produced by its hypothetical low dimensional system of dynamical equations” and uses the technique of distances of multivariate vectors for reconstructing the system dynamics. As demonstrated in Koutsoyiannis [

101], such assumptions and techniques are good for simple toy models but, when real-world systems are examined, low dimensionality appears as a statistical artifact because the reconstruction actually needs an incredibly high number of observations to work, which are hardly available. The fact that the sums of multivariate vectors of distances is a statistical estimator with huge uncertainty is often missed in studies of this type, which treat data as deterministic quantities to obtain unreliable results. We do not believe that the Earth system and Earth processes (including global temperature and CO

_{2}) are of low dimensionality, and we deem it unnecessary to discuss the issue further. We only note the fact that global temperature and CO

_{2} virtually behave as Gaussian, which enables reliable estimation of standard correlations and dismiss the need to use the overly complex and uncertain correlation sums.