#### 2.2. Quantization of Marine Abnormal Variations from Raster-Formatted Datasets

An abnormal variation is a deviation from an averaged status obtained from a long-term series (e.g., daily, monthly, seasonally or yearly). Obviously, long-term marine parameters have seasonal variations that are mainly dominated by solar radiances. However, against the background of global climate change, spatiotemporal patterns that deviate from normal seasonal cycles are of particular interest in anomalous climate event analysis. With little prior knowledge, the

z-score algorithm is more suitable for removing seasonal fluctuations [

28].

Both quantitative and Boolean mining are unable to address continuous values. Therefore, before carrying out the rule mining, the abnormal variations need to be quantified into continuous intervals, which are used to represent the intensities of variations [

15,

28]. Many quantization strategies are available (e.g., cluster-based, equal-density, equal-area, or equal-depth methods), but very often, they are closely related to the specific domain [

14]. In this manuscript, our goal is to discover the abnormal association relationships among marine parameters against global climate change, and the quantization algorithm should describe the intensities of abnormal variations. The mean-standard deviation of the time series was a criterion used to quantify the marine parameters into three ranks, −1, 0 and +1, indicating negative changes, no changes and positive changes, respectively. For a specified grid pixel, i.e., the

ith row and

jth column in a raster-formatted dataset, the formula is shown as Equation (1).

where

$\mu $ and

$\delta $ are the mean and standard deviation values of the time series in the specified grid pixel (

i^{th} row and

j^{th} column), respectively, and

$V$ is the abnormal variation at a given time in the specified grid pixel.

From long-term raster-formatted datasets, the quantification of abnormal marine parameter variations consists of the following steps:

Step 1: Calculate the mean and standard deviation of the time series’ real values of marine parameters from long-term raster-formatted datasets.

Step 2: Extract the abnormal variations of marine parameters using the z-score algorithm.

Step 3: Calculate the mean and standard deviation values on the basis of long-term abnormal variations of marine parameters.

Step 4: Quantify the abnormal variations into continuous intervals (i.e., −1, 0 and +1), using Equation (1) for each time and each grid pixel.

#### 2.3. Identification of ENSO Events

There are many indices that describe ENSO events, including the Southern Oscillation Index; anomalies of SST in El Niño region 12 (90° W~80° W, 10° S~0°), region 3 (150° W~90° W, 5° S~5° N), and region 4 (160° E~150° W, 5° S~5° N) [

29]; the Multivariate ENSO Index (MEI); the Oceanic Niño Indices [

30]; and the precipitation-based ENSO index [

31]. In this study, we used the MEI (

http://www.esrl.noaa.gov/psd/enso/mei/), provided by the U.S. National Oceanic and Atmospheric Administration’s Earth System Research Laboratory Physical Sciences Division. It is based on six observed variables over the tropical Pacific: sea-level pressure, the zonal and meridional components of the surface wind, SST, surface air temperature, and the total cloudiness fraction of the sky [

32].

Different percentile definitions are used to rank ENSO events as strong, moderate or weak [

32]. However, using too many types would make it difficult to identify abnormal variations of marine parameters related to the ENSO. Considering the consistency of abnormal variations in marine parameters and ENSO events, the mean-standard deviation algorithm was used to catalog ENSO events into three ranks: −1, 0 and +1, which indicate a La Niña event, a neutral condition, and an El Niño event, respectively. The criteria are similar to Equation (1).

#### 2.4. A Recursive Algorithm

Apriori is a seminal algorithm for finding frequent itemsets using candidate generation and is based on the three steps referred to as link–prune–generation [

33]. Since its introduction and subsequent widespread application, the core idea of Apriori has been shared and improved in the development of quantitative relationship mining [

34]. This manuscript uses the core idea of link–prune–generation to design the EOMSAP for exploring abnormal association patterns among marine parameters against ENSO events. The key implementations consist of two steps. The database in these steps is composed of mining transaction tables.

Step 1: Generate the frequent 1-itemset related to the ENSO by scanning the database one time for each item (i.e., marine parameter) and each quantification type (i.e., −1, 0 and +1). Next, use Equation (2) to calculate support (S), denoted as $S\left(\mathrm{A}\left[k\right]\right)$, and use Equation (3) to calculate conditional support (CS) against ENSO events, denoted as $CS(\mathrm{A}\left[k\right]|\mathrm{ENSO}\left[l\right])$. If and only if the inequalities in Equation (4) are true, the frequent 1-itemset related to ENSO is generated, denoted as $CS(\mathrm{A}$_{1}$[k$_{1}$]|\mathrm{ENSO}\left[l\right])$.

where

m is the number of items involved in the mining model, which goes from 1 to the total number of marine parameters (

M). For one item,

m is equal to 1, while for

M items,

m is equal to

M;

$n\left({\mathrm{A}}_{1}\left[{k}_{1}\right]{\mathrm{A}}_{2}\left[{k}_{2}\right]\dots {\mathrm{A}}_{m}\left[{k}_{m}\right]\right)$ is the number of co-occurrences of items

${\mathrm{A}}_{1},{\mathrm{A}}_{2}\dots {\mathrm{A}}_{m}$ at level

k_{1},

k_{2},

…,

k_{m};

$n\left(\mathrm{ENSO}\left[l\right]\right)$ is the number of occurrences of an ENSO[

l] event;

$n\left({\mathrm{A}}_{1}\left[{k}_{1}\right]{\mathrm{A}}_{2}\left[{k}_{2}\right]\dots {\mathrm{A}}_{m}\left[{k}_{m}\right]\mathrm{ENSO}\left[l\right]\right)$ is the number of co-occurrences of items

${\mathrm{A}}_{1},{\mathrm{A}}_{2}\dots {\mathrm{A}}_{m}$ at level

k_{1},

k_{2},

…,

k_{m} and the ENSO[

l] event;

k_{1},

k_{2},

…,

k_{m} are one of the quantification types (i.e., −1, 0 and +1);

l is the ENSO type (i.e., +1, El Niño and −1, La Niña); and

${\tau}_{s}$ is the user-specified threshold of marine parameters. The first inequality in Equation (4) means that only the variation type

k_{1},

k_{2},

…,

k_{m} of marine parameters

${\mathrm{A}}_{1},{\mathrm{A}}_{2}\dots {\mathrm{A}}_{m}$ and the ENSO[

l] event satisfying the user-specified minimum support are meaningful. The second means that only when the supports of marine parameters

${\mathrm{A}}_{1},{\mathrm{A}}_{2}\dots {\mathrm{A}}_{m}$ at variation type

k_{1},

k_{2},

…,

k_{m} against an ENSO[

l] event are not less than their support in the database are their co-variations of marine parameters regarded as association patterns against ENSO[

l].

Step 2: Generate frequent (m + 1)-itemsets from candidate m-itemsets using a recursive algorithm with linking–pruning, where m is not less than 2. Within this step, the linking and pruning functions are run recursively until no more frequent itemsets are generated. The Linking Function generates the candidate (m + 1)-itemsets from the m-itemsets by step-by-step linking without scanning the database, while the Pruning Function removes the false (m + 1)-itemsets according to Equation (4).

For a clear description of the workflow finding frequent itemsets against ENSO events, we give an example with simulated data in

Table 1.

Example 1:

Table 1 shows quantitative change for five marine parameters (A

_{1}, A

_{2}, …, A

_{5}) and an ENSO event. The +1, 0 and −1 of the marine parameters mean positive changes, no changes and negative change, respectively. The

$\pm $1 of the ENSO means an El Niño or La Niña event, respectively. In this case, the support threshold is set to 20.0%.

The supports of A_{1}, A_{2}, A_{3}, A_{4} and A_{5} with El Niño events are 30.0%, 20.0%, 30.0%, 0.0% and 20.0%, respectively. Their independent supports are 80.0%, 80.0%, 80.0%, 50.0% and 60.0%, and their conditional supports against El Niño are 100%, 66.7%, 100%, 0.0% and 66.7%, respectively. According to Equation (4), A_{4} fails to meet the first inequality and A_{2} fails the second inequality; thus, the frequent 1-itemsets are A_{1}, A_{3} and A_{5}, denoted as (A_{1}[+1]|ENSO[+1], A_{3}[+1]|ENSO[+1] and A_{5}[+1]|ENSO[+1]). The LinkingFunction generates three candidate 2-itemsets, which are (A_{1}[+1]A_{3}[+1]|ENSO[+1], A_{1}[+1]A_{5}[+1]|ENSO[+1] and A_{3}[+1]A_{5}[+1]|ENSO[+1]). The PruningFunction verifies that they are all frequent 2-itemsets. Repeating the LinkingFunction and PruningFunction generates one frequent 3-itemset, which is A_{1}[+1]A_{3}[+1], A_{5}[+1]|ENSO[+1].

With similar processing, the frequent itemsets against a La Niña event include three 1-itemsets and two 2-itemsets, which are (A_{3}[+1]|ENSO[−1], A_{4}[+1]|ENSO[−1], and A_{5}[+1]|ENSO[−1], A_{3}[+1]A_{4}[+1]|ENSO[−1], and A_{3}[+1]A_{5}[+1]|ENSO[−1]).

#### 2.5. Generating Meaningful Marine Spatial Association Patterns

In this step, the key issue is to determine which frequent itemsets are meaningful according to the minimum thresholds of the evaluation indicators. Generally, the specified thresholds are defined by users according to their research domains. For each frequent itemset, its evaluation indicators (e.g., confidence and lift) are calculated by scanning the database once. If the evaluation indicators satisfy the user-specified thresholds, a frequent itemset is meaningful.

In this manuscript, we use confidence and lift as evaluation indicators for generating meaningful marine spatial association patterns. Confidence describes the occurrence probability of marine abnormal variations (${\mathrm{A}}_{1}\left[{k}_{1}\right]{\mathrm{A}}_{2}\left[{k}_{2}\right]\dots {\mathrm{A}}_{m}\left[{k}_{m}\right]$) assuming that an ENSO event occurs, which has the same formula as Equation (3).

Lift describes the impact on marine abnormal variations of the occurring ENSO event; that is, once an ENSO event has occurred, how much does the occurrence probability of marine abnormal variations change?

Lift is defined as:

where

$n{(\mathrm{A}}_{1}[{k}_{1}]{\mathrm{A}}_{2}[{k}_{2}]\cdots {\mathrm{A}}_{\mathrm{m}}[{k}_{m}]\mathrm{ENSO}[l])$,

$n{(\mathrm{A}}_{1}[{k}_{1}]{\mathrm{A}}_{2}[{k}_{2}]\cdots {\mathrm{A}}_{\mathrm{m}}[{k}_{m}])$,

$n(\mathrm{ENSO}[l])$ and

$N$ have similar meanings as in Equations (2) and (3).