A Gaussian Process Method with Uncertainty Quantification for Air Quality Monitoring

Wang, Peng; Mihaylova, Lyudmila; Chakraborty, Rohit; Munir, Said; Mayfield, Martin; Alam, Khan; Khokhar, Muhammad Fahim; Zheng, Zhengkai; Jiang, Chengxi; Fang, Hui

doi:10.3390/atmos12101344

Open AccessArticle

A Gaussian Process Method with Uncertainty Quantification for Air Quality Monitoring

by

Peng Wang

^1,*

,

Lyudmila Mihaylova

²

,

Rohit Chakraborty

³

,

Said Munir

³

,

Martin Mayfield

³,

Khan Alam

⁴

,

Muhammad Fahim Khokhar

⁵

,

Zhengkai Zheng

⁶,

Chengxi Jiang

⁷ and

Hui Fang

⁸

¹

Department of Computing and Mathematics, Manchester Metropolitan University, Manchester M15 6BH, UK

²

Department of Automatic Control and Systems Engineering, The University of Sheffield, Sheffield S10 2TN, UK

³

Department of Civil and Structural Engineering, The University of Sheffield, Sheffield S10 2TN, UK

⁴

Department of Physics, University of Peshawar, KPK, Peshawar 25120, Pakistan

⁵

Institute of Environmental Sciences and Engineering, National University of Sciences and Technology, Islamabad 44000, Pakistan

⁶

Yueqing Xinshou Agricultural Development Co., Ltd., Yueqing 325604, China

⁷

College of Electrical and Electronic Engineering, Wenzhou University, Wenzhou 325035, China

⁸

College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China

^*

Author to whom correspondence should be addressed.

Atmosphere 2021, 12(10), 1344; https://0-doi-org.brum.beds.ac.uk/10.3390/atmos12101344

Submission received: 13 September 2021 / Revised: 30 September 2021 / Accepted: 2 October 2021 / Published: 14 October 2021

(This article belongs to the Special Issue Air Quality in the UK)

Download

Browse Figures

Versions Notes

Abstract

:

The monitoring and forecasting of particulate matter (e.g.,

{PM}_{2.5}

) and gaseous pollutants (e.g., NO,

{NO}_{2}

, and

{SO}_{2}

) is of significant importance, as they have adverse impacts on human health. However, model performance can easily degrade due to data noises, environmental and other factors. This paper proposes a general solution to analyse how the noise level of measurements and hyperparameters of a Gaussian process model affect the prediction accuracy and uncertainty, with a comparative case study of atmospheric pollutant concentrations prediction in Sheffield, UK, and Peshawar, Pakistan. The Neumann series is exploited to approximate the matrix inverse involved in the Gaussian process approach. This enables us to derive a theoretical relationship between any independent variable (e.g., measurement noise level, hyperparameters of Gaussian process methods), and the uncertainty and accuracy prediction. In addition, it helps us to discover insights on how these independent variables affect the algorithm evidence lower bound. The theoretical results are verified by applying a Gaussian processes approach and its sparse variants to air quality data forecasting.

Keywords:

Gaussian process; uncertainty quantification; air quality forecasting; low-cost sensors; sustainable development

1. Introduction

It is generally believed that urban areas provide better opportunities in terms of economic, political, and social facilities compared to rural areas. As a result, more and more people are migrating to urban areas. At present, more than fifty percent of people worldwide live in urban areas, and this percentage is increasing with time. This has led to several environmental issues in large cities, such as air pollution [1].

Landrigan reported that air pollution caused

6.4

million deaths worldwide in 2015 [2]. According to World Health Organization (WHO) statistical data, three million premature deaths were caused by air pollution worldwide in 2012 [3]. Air pollution has a strong link with dementia, causing 850,000 people to suffer from dementia in the UK [4]. Children growing up in residential houses near busy roads and junctions have a much higher risk of developing various respiratory diseases, including asthma, due to high levels of air pollution [5]. Polluted air, especially air with high levels of NO,

{NO}_{2}

, and

{SO}_{2}

and particulate matter (PM

_{2.5}

), is considered the most serious environmental risk to public health in urban areas [6]. Therefore, many national and international organisations are actively working on understanding the behaviour of various air pollutants [7]. This eventually leads to the development of air quality forecasting models so that people can be alerted in time [8].

Essentially, being like a time series, air quality data can be easily processed by models that are capable of time series data processing. For instance, Shen applies an autoregressive moving average (ARMA) model in PM

_{2.5}

concentration prediction in a few Chinese cities [9]. Filtering techniques like Kalman filter are also applied to adjust data biases to improve air quality prediction accuracy [10]. These methods, though with good results reported, are limited by the requirement of a prior model before data processing. Machine learning methods, on the other hand, can learn a model from the data directly. This has enabled them to attract wide attention in recent decades in the field of air quality forecasting. For instance, Lin et al. propose the support vector regression with logarithm preprocessing procedure and immune algorithms (SVRLIA) method, which outperforms general regression neural networks (GRNN) [11] and BackPropagation neural networks (BPNN) [12] in Taiwan air quality forecasting [13].

Recently, inspired by the fact that large scale data are accumulated, deep learning models have been applied in air quality prediction [14]. Some work has added these deep learning models with the ability to quantify uncertainties introduced by inputs. For instance, Garriga-Alonso et al. endow a deep convolutional network with uncertainty quantification, by taking it as an equivalent of a Gaussian processes (GPs) model [15]. This is because GPs predictions are accompanied by confidence intervals, which are usually taken as a metric to measure prediction uncertainties. Applications of GPs in air quality forecasting can be found in [16,17]. However, the involvement of matrix inversion in GPs limits their application in large-scale datasets [18]. This has inspired research on improving the efficiency of GP models, and a series of efficient GP models have been published [19]. We also proposed an efficient GP model with application in air quality forecasting [17]. Despite the rich number of GP models published, there lacks work that investigates how noise level, hyperparameters, etc. affect the performance of GP models. It is necessary because air quality data vary due to seasonal variations and sensor degradations. A well-trained GP model may not work when fed with new data, simply due to measurement noise level change. By knowing how the variation of GPs performance can be attributed to noise level and hyperparameters, etc., we will still be able perform analysis when noise level or hyperparameters vary.

Aiming at this, a general solution is proposed in this paper. It provides insights on how a GP model’s performance is related to measurement noise level and hyperparameters, etc. The main contribution of this work includes (1) a general method for analysing how noise level and hyperparameters of a GP model affect the prediction performance. The variation of the evidence lower bound (ELBO) and the upper bound of the marginal likelihood (UBML) with respect to the noise level and hyperparameters are also given. (2) Neumann series is exploited to approximate the matrix inversion involved in GPs. This helps construct an analytical relation between noise level, hyperparameters, etc., and model performance. (3) A comparative air quality forecasting study between Sheffield, UK, and Pershawar, Pakistan is given, demonstrating that the proposed solution is able to capture how noise level and hyperparameters affect GPs performance.

The remaining part of this paper is as follows. Section 2 provides the theoretical fundamentals involved in this paper; Section 3 elaborates the proposed uncertainty quantification solution. In Section 4, we provide a comparative study of air quality prediction in the same period between the British city Sheffield and Pakistani city Pershawar, and the paper is concluded in Section 5. Appendix A describes the data collection process in Peshawar, Pakistan, and in Sheffield, United Kingdom, and presents maps of the considered areas of these cities. Appendix B gives the World Health Organisation (WHO) criteria for air pollutants. Appendix C gives the approximate derivatives of the GP kernel.

2. Background Knowledge

2.1. Gaussian Processes

Given a set of training data

D = {(x_{i}, y_{i}), i = 1, \dots, n}

where

x_{i} \in X

is the input and

y_{i} \in R

is the observation, we can determine a GP model

f (\cdot)

to predict

y_{*}

for a new input

x_{*}

. For instance, when the output is one-dimensional, the GP model is formulated as

f \sim GP (\bar{f} (x), k (x, x^{'})), y = f (x) + ε, ε \sim N (0, σ^{2}),

(1)

where

\bar{f} : X \to R

is the mean function defined as

\bar{f} (x) = E [f (x)],

(2)

and

k : X \times X \to R

is the kernel function [18] defined as

k (x, x^{'}) = E [(f (x) - \bar{f} (x)) (f (x^{'}) - \bar{f} (x^{'}))],

(3)

where

ε

is the additive, independent, identically distributed Gaussian measurement noise with variance

σ^{2} \neq 0

, and

E

denotes the mathematical expectation operation.

Given

x_{i}

a

D \times 1

vector, the n inputs can be aggregated into a matrix

X_{D \times n}

, or briefly

X

with the corresponding output vector

y_{n \times 1}

, or

y

. Similarly, the function values at the test inputs

X_{*}

with dimensions of

D \times N

can be denoted as

f_{*}

, and we next write the joint distribution of

y

and

f_{*}

as

[\begin{matrix} y \\ f_{*} \end{matrix}] \sim N (0, [\begin{matrix} K_{n n} + σ^{2} I & K_{n N} \\ K_{N n} & K_{N N} \end{matrix}]),

(4)

where

I

represents the identity matrix.

K_{n n} + σ^{2} I

is the

n \times n

prior covariance matrix of

y

with entry

K_{i j} = k (x_{i}, x_{j}) + σ^{2} δ_{i j}

, where

δ_{i j}

is one iff

i = j

and zero otherwise, and

x_{i}

and

x_{j}

are column vectors from

X

. The matrix

K_{N N}

denotes the

N \times N

prior covariance matrix of

f_{*}

with entry

K_{i j} = k (x_{i}, x_{j})

, where

x_{i}

and

x_{j}

are column vectors from

X_{*}

. The matrices

K_{N n}

and

K_{n N}

satisfy

K_{N n} = K_{n N}^{T}

, and the entry of the

N \times n

prior covariance matrix of

f_{*}

and

y

is

K_{i j} = k (x_{i}, x_{j})

, where

x_{i}

is a column vector from

X_{*}

and

x_{j}

is a column vector from

X

.

By deriving the conditional distribution of

f_{*}

from (4), where the prior mean is set to be zero for simplicity [20], we have the predictive posterior at new inputs

X_{*}

as

f_{*} | X, y, X_{*} \sim N ({\bar{f}}_{*}, cov (f_{*})),

(5)

where

{\bar{f}}_{*} ≜ E [f_{*} | X, y, X_{*}] = K_{N n} {[K_{n n} + σ^{2} I]}^{- 1} y,

(6)

is the prediction at

X_{*}

, and

cov (f_{*}) = K_{N N} - K_{N n} {[K_{n n} + σ^{2} I]}^{- 1} K_{N n}^{T},

(7)

denotes the covariance of

f_{*}

.

The hyperparameter

θ

incorporated in the mean and covariance functions underpin the predictive performance of GP models, and they are usually estimated by maximising the logarithm of the marginal likelihood

log p (y | X) = - \frac{1}{2} y^{T} {(K_{n n} + σ^{2} I)}^{- 1} y - \frac{1}{2} log | K_{n n} + σ^{2} I | - \frac{n}{2} log 2 π .

(8)

2.2. Neumann Series Approximation

Given a matrix inverse

A^{- 1}

, it can be expanded as the following Neumann series [21]

A^{- 1} = \sum_{n = 0}^{\infty} {(X^{- 1} (X - A))}^{n} X^{- 1},

(9)

which holds if

{lim}_{n \to \infty} {(I - X^{- 1} A)}^{n} = 0

is satisfied. In our case, suppose

A = K + σ_{n}^{2} I ≜ D_{A} + E_{A},

(10)

where

D_{A}

is the main diagonal of

A

and

E_{A}

is the hollow. If we substitute

X

in Equation (9) by

D_{A}

, we get

A^{- 1} = \sum_{n = 0}^{\infty} {(- D^{- 1} E_{A})}^{n} D_{A}^{- 1},

(11)

which is guaranteed to converge when

{lim}_{n \to \infty} {(- D_{A}^{- 1} E_{A})}^{n} = 0

. We investigated the convergence condition in [17], where we proved that if

A

is diagonally dominant, then Neumann series can approximate

A^{- 1}

both fast and accurate. In case

A

is not diagonally dominant, we also provided a way to convert it into a diagonally dominant matrix in [17], such that

A^{- 1}

can still be approximated by Neumann series. When Neumann series given in (11) converges, we can then approximate

A

with only the first L terms. The L-term approximation is computed as follows:

{\tilde{A}}_{L}^{- 1} = \sum_{n = 0}^{L - 1} {(- D_{A}^{- 1} E_{A})}^{n} D_{A}^{- 1},

(12)

For instance, when

L = 1, 2, 3

, we have the approximations

{\tilde{A}}_{L}^{- 1} = \{\begin{matrix} D_{A}^{- 1}, & L = 1 \\ D_{A}^{- 1} - D_{A}^{- 1} E_{A} D_{A}^{- 1}, & L = 2 \\ D_{A}^{- 1} - D_{A}^{- 1} E_{A} D_{A}^{- 1} + D_{A}^{- 1} E_{A} D_{A}^{- 1} E_{A} D_{A}^{- 1} . & L = 3 \end{matrix}

(13)

3. Uncertainty Quantification in Gaussian Processes

3.1. Uncertainty in Measurements

It is intuitive that noisy measurements would result in less accurate predictions, just as a poor model would do. However, it is not direct from Equations (6) and (7). We will show in detail how the measurement noise would affect the prediction accuracy.

From Equations (6) and (7), we can see that the measurement noise

ϵ

affects the prediction and the covariance by adding a term

σ_{n}^{2} I

to the prior covariance

K

in comparison to the noisy free scenario [20]. From the way that they originated, we know that both

K

and

σ_{n}^{2} I

are symmetrical. Then, a matrix

P

exists such that

K = P^{- 1} D_{K} P,

(14)

where

D_{K}

is a diagonal matrix with eigen values of

K

along the diagonal. As

σ_{n}^{2} I

a diagonal matrix itself, we have

σ_{n}^{2} I = P^{- 1} σ_{n}^{2} I P .

(15)

Therefore, we have the partial derivative of Equation (6) with respect to

σ_{n}^{2}

as

\frac{\partial {\bar{f}}_{*}}{\partial σ_{n}^{2}} = K_{*} P {(D_{K} + σ_{n}^{2} I)}^{- 2} P^{- 1} y,

(16)

The element-wise form of Equation (16) can be therefore obtained as

{(\frac{\partial {\bar{f}}_{*}}{\partial σ_{n}^{2}})}_{o} = - \sum_{h = 1}^{n} \sum_{i = 1}^{n} \sum_{j = 1}^{n} p_{h j} p_{i j} k_{o h} Λ_{j}^{- 1} y_{i},

(17)

where

Λ_{j} = {(λ_{j} + σ_{n}^{2})}^{2}

.

p_{h j}

and

p_{i j}

are the entries indexed by the j-th column, h-th and i-th row, respectively.

k_{o h}

is the o-th row and h-th column entry of

K_{*}

.

y_{i}

is the i-th element of

y

.

o = 1, \dots, s

denotes the o-th element of the partial derivation.

We can see that the sign of Equation (17) is determined by

p_{h j}

and

p_{i j}

. This is because we can actually transform

y

to either positive or negative with a linear transformation, which will not be an issue for the GPs model. When we impose no constraints on

p_{h j}

and

p_{i j}

, Equation (17) could be any real number, indicating that

{\bar{f}}_{*}

is multimodal with respect to

σ_{n}^{2}

, which means that one

σ_{n}^{2}

can lead to different

{\bar{f}}_{*}

, or equivalently, different

σ_{n}^{2}

can lead to the same

{\bar{f}}_{*}

. In such cases, it is difficult to investigate how

σ_{n}^{2}

affects the prediction accuracy. In this paper, to facilitate the study of the monotonicity of

{\bar{f}}_{*}

, we constrain

p_{h j}

and

p_{i j}

to satisfy

{(\frac{\partial {\bar{f}}_{*}}{\partial σ_{n}^{2}})}_{o} \{\begin{matrix} > 0, & p_{h j} p_{i j} < 0, \\ < 0, & p_{h j} p_{i j} > 0, \\ = 0, & p_{h j} p_{i j} = 0 . \end{matrix}

(18)

Then, we can see that

{\bar{f}}_{*}

is monotonic. It means that changes of

σ_{n}^{2}

can cause arbitrarily large/small predictions, whereas a robust method should bound the prediction errors regardless of how

σ_{n}^{2}

varies.

Similarly, the partial derivative of Equation (7) with respect to

σ_{n}^{2}

is

\frac{\partial cov (f_{*})}{\partial σ_{n}^{2}} = (K_{*} P) {(D_{K} + σ_{n}^{2} I)}^{- 2} {(K_{*} P)}^{T} = \sum_{i = 1}^{n} Λ_{i}^{- 1} {\vec{p}}_{i} {\vec{p}}_{i}^{T},

(19)

where we denote the

m \times n

dimension matrix

K_{*} P

as

K_{*} P = [{\vec{p}}_{1}, {\vec{p}}_{2}, \dots, {\vec{p}}_{n}],

(20)

with

{\vec{p}}_{i}

a

m \times 1

vector, and

i = 1, \dots, n

.

As the uncertainty is indicated by the diagonal elements, we only show how these elements change with respect to

σ_{n}^{2}

. The diagonal elements are given as

\begin{matrix} diag (\sum_{i = 1}^{n} Λ_{i}^{- 1} {\vec{p}}_{i} {\vec{p}}_{i}^{T}) = & diag (\sum_{i = 1}^{n} Λ_{i}^{- 1} p_{1 i}^{2}, \sum_{i = 1}^{n} Λ_{i}^{- 1} p_{2 i}^{2}, \dots, \sum_{i = 1}^{n} Λ_{i}^{- 1} p_{m i}^{2}) \\ = & diag (Σ_{11}, Σ_{22}, \dots, Σ_{m m}), \end{matrix}

(21)

with

diag (\cdot)

denoting the diagonal elements of a matrix. We see that

Σ_{j j} ⩾ 0

stands for

j = 1, \dots, m

, which implies that

cov (f_{*})

is non-decreasing as

σ_{n}^{2}

increases. This means that the increase of measurement noise level would cause the non-deceasing of the prediction uncertainty.

3.2. Uncertainty in Hyperparameters

Another factor that affects the prediction of a GPs model is the hyperparameters. In Gaussian processes, the posterior, as shown in Equation (5), is used to do the prediction, while the marginal likelihood is used for hyperparameters selection [18]. The log marginal likelihood as shown in Equation (22) is usually optimised to determine the hyperparameter with a specified kernel function.

log p (y | X, θ) = - \frac{1}{2} y^{T} {(K + σ_{n}^{2} I)}^{- 1} y - \frac{1}{2} log | K + σ_{n}^{2} I | - \frac{N}{2} log 2 π .

(22)

However, the log marginal likelihood could be non-convex with respect to the hyperparameters, which implies that the optimisation may not converge to the global maxima [22]. A common solution dealing with it is to sample multiple starting points from a prior distribution, then choose the best set of hyperparameters according to the optima of the log marginal likelihood. Let’s assume

θ = {θ_{1}, θ_{2}, \dots, θ_{s}}

being the hyperparameter set and

θ_{s}

denoting the s-th of them, then the derivative of

log p (y | X)

with respect to

θ_{s}

is

\frac{\partial}{\partial θ_{s}} log p (y | X, θ) = \frac{1}{2} tr ((α α^{T} - {(K + σ_{n}^{2} I)}^{- 1}) \frac{\partial (K + σ_{n}^{2} I)}{\partial θ_{s}}),

(23)

where

α = {(K + σ_{n}^{2} I)}^{- 1} y

, and

tr (\cdot)

denotes the trace of a matrix. The derivative in Equation (23) is often multimodal and that is why a fare few initialisations are used when conducting convex optimisation. Chen et al. show that the optimisation process with various initialisations can result in different hyperparameters [22]. Nevertheless, the performance (prediction accuracy) with regard to the standardised root mean square error does not change much. However, the authors do not show how the variation of hyperparameters affects the prediction uncertainty [22].

An intuitive explanation to the fact of different hyperparameters resulting with similar predictions is that the prediction shown in Equation (6) is non-monotonic itself with respect to hyperparameters. To demonstrate this, a direct way is to see how the derivative of (6) with respect to any hyperparameter

θ_{s} \in θ

changes, and ultimately how it affects the prediction accuracy and uncertainty. The derivatives of

{\bar{f}}_{*}

and

cov (f_{*})

of

θ_{s}

are as below

\frac{\partial {\bar{f}}_{*}}{\partial θ_{s}} = (K_{*} \frac{\partial {(K + σ_{n}^{2} I)}^{- 1}}{\partial θ_{s}} + \frac{\partial K_{*}}{\partial θ_{s}} {(K + σ_{n}^{2} I)}^{- 1}) y .

(24)

We can see that Equations (24) and (25) are both involved with calculating

{(K + σ_{n}^{2} I)}^{- 1}

, which becomes enormously complex when the dimension increases. In this paper, we focus on investigating how hyperparameters affect the predictive accuracy and uncertainty in general. Therefore, we use the Neumann series to approximate the inverse [21].

\begin{matrix} \frac{\partial cov (f_{*})}{\partial θ_{s}} = & \frac{\partial K (X_{*}, X_{*})}{\partial θ_{s}} - \frac{\partial K_{*}}{\partial θ_{s}} {(K + σ_{n}^{2} I)}^{- 1} K_{*}^{T} - K_{*} \frac{\partial {(K + σ_{n}^{2} I)}^{- 1}}{\partial θ_{s}} K_{*}^{T} \\ - K_{*} {(K + σ_{n}^{2} I)}^{- 1} \frac{\partial K_{*}^{T}}{\partial θ_{s}} . \end{matrix}

(25)

3.3. Derivatives Approximation with Neumann Series

The approximation accuracy and computationally complexity of Neumann series varies with L. This has been studied in [21,23], as well as in our previous work [17]. This paper aims at providing a way to quantify uncertainties involved in GPs. We therefore choose the 2-term approximation as an example to carry out the derivations. By substituting the 2-term approximation into Equations (24) and (25), we have

\begin{matrix} \frac{\partial {\bar{f}}_{*}}{\partial θ_{s}} \approx (K_{*} \frac{\partial (D_{A}^{- 1} - D_{A}^{- 1} E_{A} D_{A}^{- 1})}{\partial θ_{s}} + \frac{\partial K_{*}}{\partial θ_{s}} (D_{A}^{- 1} - D_{A}^{- 1} E_{A} D_{A}^{- 1})) y, \end{matrix}

(26)

\begin{matrix} \frac{\partial cov (f_{*})}{\partial θ_{s}} \approx & \frac{\partial K (X_{*}, X_{*})}{\partial θ_{s}} - \frac{\partial K_{*}}{\partial θ_{s}} (D_{A}^{- 1} - D_{A}^{- 1} E_{A} D_{A}^{- 1}) K_{*}^{T} \\ - K_{*} \frac{\partial (D_{A}^{- 1} - D_{A}^{- 1} E_{A} D_{A}^{- 1})}{\partial θ_{s}} K_{*}^{T} - K_{*} (D_{A}^{- 1} - D_{A}^{- 1} E_{A} D_{A}^{- 1}) \frac{\partial K_{*}^{T}}{\partial θ_{s}} . \end{matrix}

(27)

Due to the simple structure of matrices

D_{A}

and

E_{A}

, we can get the element-wise form of Equation (26) as

{(\frac{\partial {\bar{f}}_{*}}{\partial θ_{s}})}_{o} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} (k_{o j} \frac{\partial d_{j i}}{\partial θ_{s}} + \frac{\partial k_{o j}}{\partial θ_{s}} d_{j i}) y_{i} .

(28)

Similarly, the element-wise form of Equation (27) is

{(\frac{\partial cov (f_{*})}{\partial θ_{s}})}_{o o} = \frac{\partial K {(X_{*}, X_{*})}_{o o}}{\partial θ_{s}} - \sum_{i = 1}^{n} \sum_{j = 1}^{n} (\frac{\partial k_{o j}}{\partial θ_{s}} d_{j i} k_{o i} + k_{o j} \frac{\partial d_{j i}}{\partial θ_{s}} k_{o i} - k_{o j} d_{j i} \frac{\partial k_{o i}}{\partial θ_{s}}),

(29)

where

o = 1, \dots, m

denotes the o-th output,

d_{j i}

is the j-th row and i-th column entry of

D_{A}^{- 1} - D_{A}^{- 1} E_{A} D_{A}^{- 1}

,

k_{o j}

and

k_{o i}

are the o-th row, j-th and i-th entries of matrix

K_{*}

, respectively. When the kernel function is determined, Equations (26)–(29) can be used for GPs uncertainty quantification.

3.4. Impacts of Noise Level and Hyperparameters on ELBO and UBML

The minimisation of

KL (q (f, u) ∥ p (f, u | y))

is equivalent to maximise the ELBO [18,24] as shown in

L_{lower} = - \frac{1}{2} y^{T} G_{n}^{- 1} y - \frac{1}{2} log | G_{n} | - \frac{N}{2} log (2 π) - \frac{t}{2 σ_{n}^{2}},

(30)

where

G_{n} = G_{xx} + σ_{n}^{2} I

, and

t = Tr (K_{xx} - G_{xx})

. Combining it with UBML, as shown in Equation (31), an interval can be given to quantify the uncertainty in marginal likelihood.

L_{upper} = \frac{1}{2} y^{T} {(G_{n} + t I)}^{- 1} y - \frac{1}{2} log | G_{n} | - \frac{N}{2} log (2 π) .

(31)

This paper, however, focuses on investigating how ELBO and UBML change according to

σ_{n}^{2}

only. Because the investigation of how ELBO and UBML change with respect to kernel hyperparameters involves multiple Neumann series approximations, which makes the analysis less convincing. We shall leave it as an open problem for future study. The derivatives of Equations (30) and (31) with respect to

σ_{n}^{2}

are as follows,

\begin{matrix} \frac{\partial L_{lower}}{\partial σ_{n}^{2}} = \frac{1}{2} [\sum_{i = 1}^{n} {(λ_{i} + σ_{n}^{2})}^{- 2} {(\sum_{j = 1}^{n} y_{j} v_{j i})}^{2} - \sum_{i = 1}^{n} \frac{1}{λ_{i} + σ_{n}^{2}} + \frac{t}{σ_{n}^{4}}], \end{matrix}

(32)

\begin{matrix} \frac{\partial L_{upper}}{\partial σ_{n}^{2}} = - \frac{1}{2} [\sum_{i = 1}^{n} {(λ_{i} + σ_{n}^{2} + t)}^{- 2} {(\sum_{j = 1}^{n} y_{j} v_{j i})}^{2} + \sum_{i = 1}^{n} \frac{1}{λ_{i} + σ_{n}^{2}}] . \end{matrix}

(33)

Figure 1 shows how

σ_{n}^{2}

affects ELBO and UBML. We set

σ_{n}^{2}

to increase from 0.1 to 200.0 with a step of 0.01. Both ELBO and UBML are recorded step by step. From the figure, we can see that when

σ_{n}^{2}

is small (

σ_{n}^{2} \in [0.1, 1.5]

), ELBO increases with different speeds, however, UBML fluctuates as the derivative of UBML jumps between positive and negative. When

σ_{n}^{2}

is in [1.5, 3.0], ELBO still increases, but the speeds slow down significantly. In comparison, UBML keeps decreasing with reducing speeds. The decrements of UBML mean that when

σ_{n}^{2}

increases, though ELBO could be increased still, but the maximum (which is the UBML) can decrease. When

σ_{n}^{2} \in [3.0, 20.0]

, ELBO starts to decrease when

σ_{n}^{2} \approx 3.2

, while UBML keeps decreasing. This means that as

σ_{n}^{2}

increases, both ELBO and UBML decrease, which indicates that the model becomes less and less effective to explain the data. When

σ_{n}^{2}

keeps increasing (

σ_{n}^{2} \in [20.0, 200.0]

), the decreasing speeds of ELBO and UBML becomes similar and approaches zero. This means that UBML and ELBO both converge and together define an interval for the marginal likelihood, which however, can result in non-optimal hyperparameters. Our conclusion is that when

σ_{n}^{2}

increases, UBML tends to decrease, which decreases the maximum that ELBO can reach. ELBO, on the other hand, is robust to the change of

σ_{n}^{2}

(as it keeps increasing when

σ_{n}^{2}

is below ∼3.2). However, when

σ_{n}^{2}

exceeds a certain threshold, ELBO turns to decrease, indicating that the GPs model becomes less and less reliable. However, both ELBO and UBML converge, even when

σ_{n}^{2}

becomes very significant, though we can no longer trust the model.

4. Experiments and Analysis

To verify that the proposed solution can help to identify the impacts of

σ_{n}^{2}

and

θ

on the predition accuracy and uncertainty of GPs model and its sparse variants such as the fully independent training conditional (FITC) [25] and variational free energy (VFE) [24] models, we conduct various experiments to process air quality data collected from Sheffield, UK, and Pershawar, Pakistan (see Appendix A), during the time period of 24 June 2019–14 July 2019 for three weeks, which will be denoted as W1, W2, and W3 hereafter. The data were collected with digital sensors called AQMesh pod with a 15 min time interval. Though the sensor itself is able to measure the concentrations of quite a few atmospheric pollutants, here we only analyse the concentrations of NO, NO

_{2}

, SO

_{2}

, and PM

_{2.5}

. Figure 2 shows the raw data. We can see directly that the air quality of Sheffield is much better than Pershawar on average. Especially during daytime, concentrations of NO

_{2}

and PM

_{2.5}

in Pershawar exceed the WHO criteria (see Appendix B). Meanwhile, those in Sheffield are much lower than the criteria. Being a postindustrial city itself, Sheffield has improved air quality significantly. The experience can be spread to help cities like Pershwar to improve air quality.

4.1. Air Quality Prediction

Figure 3 and Figure 4 show Sheffield and Pershawar forecasting results of GPs, FITC, and VFE, with

3 σ

confidence intervals (denoted as Conf in the figures) indicated by the shaded area. We can see that the GPs model reports the best results in general, in terms of absolute error between predicts and measurements (denoted as Meas in the figures). However, the performance of all the models varies from pollutant types to cities. This is actually one of the reasons why the investigation of how measurement noise level and hyperparameters affect prediction accuracy and uncertainty is necessary. To make the results more convincing, we normalise the data from both cities for uncertainty quantification studies.

4.2. Impacts of Measurement Noise Level and Hyperparameters

To demonstrate how noise level

σ_{n}^{2}

and hyperparameters affect prediction accuracy and uncertainty, three sets of experiments are conducted. This paper adopts the squared exponential (SE) kernel, with hyperparameters

s_{f}

and l. The analytical derivation can be found in Appendix C. The prediction accuracy is identified by the root mean square error (RMSE), as shown in Equation (34), while the uncertainty is identified by

\frac{1}{2} σ

confidence bound. Configurations of the experiments are as follows.

Experiment 1: Impacts of

σ_{n}^{2}

on prediction accuracy and uncertainty. Both

s_{f}

and l are fixed to be the optimised values.

σ_{n}^{2}

varies from 0.1 through to 20.0. NO,

{NO}_{2}

,

{SO}_{2}

, and

{PM}_{2.5}

data from both cities are processed. Six inducing points are applied to both FITC and VFE.

Experiment 2: Impacts of

s_{f}

on prediction accuracy and uncertainty. l is set to the optimised value.

s_{f}

varies from 0.1 through to 30.0.

σ_{n}^{2}

is set to 0.5 and 1.5, respectively. NO data from both cities are processed. Six inducing points are applied to both FITC and VFE.

Experiment 3: Impacts of l on prediction accuracy and uncertainty.

s_{f}

is set to the optimised value. l varies from 0.1 through to 30.0.

σ_{n}^{2}

is set to 0.5 and 1.5, respectively. NO data from both cities are processed. Six inducing points are applied to both FITC and VFE.

RMSE = \sqrt{\frac{\sum_{i = 1}^{N u m} {(y_{i} - \hat{y_{i}})}^{2}}{N u m}},

(34)

where

y_{i}

is the ground truth value and

\hat{y_{i}}

represents predicted meant.

N u m

is the sample number in testing set.

Figure 5 and Figure 6 show the results from Experiment 1. To make the results more distinguishable, the horizontal axes of the figures are set to

log (σ_{n}^{2})

. We can see from Figure 5 that when

σ_{n}^{2}

is small, GPs perform the best in general, while the performance of FITC and VFE varies. We can also observe that as

σ_{n}^{2}

keeps increasing, the RMSE becomes very significant for all methods/pollutants. Similar results can be observed from Figure 6 as well. Both comply with our theoretical conclusions, despite the fact that the Neumann series is used to approximate the matrix inverse. We also notice that

σ_{n}^{2}

has a more significant impact on Sheffield data as RMSE increases ealier after

log (σ_{n}^{2})

reaches zero. From Figure 6b,c, we also see that the uncertainty bounds of Sheffield data are greater after

log (σ_{n}^{2})

reaches zero. We think the reason is that Sheffield data are generally less periodical than Pershawar data (see Figure 2), which influences the performance of the models.

4.3. Impacts of Noise Level on ELBO and UBML

Figure 7 shows the results from Experiment 2. According to our theoretical results, the impact of

s_{f}

on the uncertainty should become greater as

s_{f}

increases. This is verified by the results shown in Figure 7b,d. Our theoretical results also suggest that the variation of

s_{f}

would not affect the prediction accuracy. We can see from Figure 7a,c that when

s_{f}

is smaller, it does affect the prediction accuracy, but when it exceeds a certain value, the impacts become negligible. Considering the Neumann series approximation, we would say that the experimental results comply with the theoretical conclusion.

The results of Experiment 3 are shown in Figure 8. We can see that when l is smaller, both RMSE and the uncertainty bounds change rapidly. While after it exceeds certain values, both converge. This again complies with our theoretical conclusions and simulation results. We should also notice from Figure 7 and Figure 8 that the increment of

s_{f}

tends to increase the uncertainty, whereas the increment of l tends to decrease the uncertainty. Taking both into consideration, an optimised uncertainty bound can be obtained.

We also conduct an experiment to demonstrate how the noise level

σ_{n}^{2}

affects the ELBO and UBML. In our experiment, we set

σ_{n}^{2}

to vary from 0.5 to 4.5. The results are shown in Figure 9. To make the results distinguishable, we set the vertical axes to

log (- E L B O / U B M L)

. To make the logrithm work, we reverse the signs of both ELBO and UBML. This is the reason why ELBO is ‘greater’ than UBML in Figure 9. The full GPs model is trained by setting

σ_{n}^{2}

to

{1, 7, 13, 19, 25, 31, 37, 43, 49}

to obtain 9 sets of hyperparameters. For each set of them, we then set

σ_{n}^{2}

to vary from 0.5 to 4.5. The darker the colour in Figure 9, the smaller

σ_{n}^{2}

is for model training. We can see that generally, greater

σ_{n}^{2}

can slow down the convergence speed of both ELBO and UBML, while training a model. When the model is trained, the increment of

σ_{n}^{2}

can lower down UBML, which is the maximum that ELBO can reach. This implies that the increment of

σ_{n}^{2}

can cause the failure of a sparse GPs model, as ELBO is deeply related to determine a sparse GPs model. Nevertheless, the experimental results again comply with our theoretical conclusions.

5. Conclusions

This paper proposes a general method to investigate how the performance variation of a Gaussian process model can be attributed to hyperparameters and measurement noises, etc. The method is demonstrated by applying it to process particulate matter (e.g.,

{PM}_{2.5}

) and gaseous pollutants (e.g., NO,

{NO}_{2}

, and

{SO}_{2}

) from both Sheffield, UK, and Peshawar, Pakistan. Experimental results show that the proposed method provides insights on how measurement noises and hyperparameters, etc. affect the prediction performance of a Gaussian process. The results align with the analytical derivations, which is enabled by adopting Neuman series to approximate matrix inversions in Gaussian process models. The theoretical findings and experimental results combined demonstrate that the proposed method can generate air quality forecasting results. In the meantime, it provides a way to link uncertainties in measurements and hyperparameters, etc. with the forecasting results. This will help with forecasting performance analysis when measurement noise level or model hyperparameters vary, making the method more general.

Author Contributions

Conceptualization, P.W., L.M., M.M., R.C., S.M., K.A. and M.F.K.; methodology, P.W.; software, P.W.; validation, P.W., Z.Z., C.J. and H.F.; formal analysis, P.W., L.M.; investigation, P.W.; data curation, S.M., R.C., K.A. and M.F.K.; writing—original draft preparation, P.W., L.M., R.C., S.M., K.A. and M.F.K.; writing—review and editing, P.W. and L.M.; visualization, P.W., R.C.; supervision, L.M., M.M.; funding acquisition, L.M., P.W., M.M., S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the UK EPSRC through EP/T013265/1 project NSF-EPSRC:ShiRAS. Towards Safe and Reliable Autonomy in Sensor Driven Systems, a joint project with the USA National Science Foundation under Grant NSF ECCS 1903466. Other funders are NSFC (61703387) and the Global Challenges Research Funds (QR GCRF—Pump priming awards (Round 2), project entitled: “Collaborating with North Pakistan for monitoring and reducing the air pollution (X/160978)”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not Applicable.

Acknowledgments

We are grateful to UK EPSRC for funding this work through EP/T013265/1 project NSF-EPSRC:ShiRAS. Towards Safe and Reliable Autonomy in Sensor Driven Systems. This work was also supported by the USA National Science Foundation under Grant NSF ECCS 1903466. We also appreciate the support of NSFC (61703387). We are also grateful to the Global Challenges Research Funds (QR GCRF - Pump priming awards (Round 2), entitled: “Collaborating with North Pakistan for monitoring and reducing the air pollution (X/160978))”. We also thank Urban FLows Observatory, the University of Sheffield for providing the air quality sensors for collecting air pollution data in Pakistan.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Data Collection

Peshawar (34.015

^{\circ}

N, 71.52

^{\circ}

E) is a city located in Khyber Pakhtunkhwa, Pakistan, situated at an elevation of 340 m above sea level. Peshawar covers an area of 1257 km

^{2}

and has a population of 1,218,773 making it the biggest city in Khyber Pakhtunkhwa. Peshawar is predominantly hot during summer (May–Mid July) with an average maximum temperature of 40

^{\circ}

C followed by monsoon and cold winter.

Local vehicular emission, fossil fuel energy plants and industrial processes are the significant sources of air pollution in Peshawar. Wind direction and wind speed also play a crucial role to observe transboundary pollution build-up. Furthermore, at this site, the distribution and dispersion of air pollution are further impacted by the nearby buildings, and its proximity to Grand Trunk Road, creating a built-up street canyon environment, generated primarily from nearby, increasing traffic pollution.

The air quality monitoring sensor (AQMS) was installed at the University of Peshawar’s Physics Department Building (see Figure A1) at 6 m height from the ground surface level. It is described as an urban background site.

Sheffield (

53^{\circ} 23^{'}

N,

1^{\circ} 28^{'}

W) is a geographically diverse city located in county South Yorkshire, UK, built on several hills thus situated at an elevation of 29–500 m above sea level. Sheffield covers a total area of 367.9 km

^{2}

with a growing population of 582,506. Sheffield is claimed to be the “greenest city” in England by the local city council. Sheffield enjoys a temperate climate with July considered the hottest month, with an average maximum temperature of 20.8

^{\circ}

C.

The air pollution in the city is primarily due to both road transport and industry, and to a lesser extent, fossil fuel-run processes, such as energy supply and commercial or domestic heating systems (for example, wood burners).

The AQMS is installed at 2.5 m height from the elevated ground surface level at the playground of Hunter’s Bar Infants School (see Figure A2), which lies in close proximity to a busy roundabout, and at the intersection of Ecclesall Road, Brocco Bank, Sharrow Vale Road and Junction Road; thus, traffic is the primary source of pollution. It is also described as an urban background site.

In our case, the AQMSs are commercially low cost sensor nodes AQMesh. They have been deployed at the two sites in Peshawar and Sheffield. A “black box” post calibration is applied to the data by the manufacturer to eliminate the impact of humidity and temperature on the sensor and to eliminate cross sensitivity. The data are aggregated and sampled every 15 min. The data collected from these nodes are transferred to the cloud-based AQMesh database via standard GPRS communication integrated. The data are then accessed through the dedicated API.

Appendix B. The WHO Concentration Criteria for Pollutants

All data from ’WHO Air quality guidelines for particulate matter, ozone, nitrogen dioxide and sulfur dioxide’ [26].

WHO ${NO}_{2}$

Table A1. WHO Nitrogen dioxide guidelines.

Nitrogen Dioxide	Annual Mean	1-h Mean
${NO}_{2}$	$40^{-} g / m^{3}$	$200^{-} g / m^{3}$

WHO ${SO}_{2}$

Table A2. WHO sulfur dioxide guidelines.

Sulfur Dioxide	24-h Mean	10-min Mean
${SO}_{2}$	$20^{-} g / m^{3}$	$500^{-} g / m^{3}$

WHO ${PM}_{2.5}$ and ${PM}_{10}$

Table A3. WHO particulate matter guidelines.

Particulate Matter	Annual Mean	24-h Mean
${PM}_{2.5}$	$10^{-} g / m^{3}$	$25^{-} g / m^{3}$
${PM}_{10}$	$20^{-} g / m^{3}$	$50^{-} g / m^{3}$

WHO $O_{3}$

Table A4. WHO Ozone guidelines.

Ozone	8-h Mean
$O_{3}$	$100^{-} g / m^{3}$

Appendix C. Approximated Derivatives of SE Kernel

By specifying a kernel function, we can obtain analytical forms of Equations (28) and (29) immediately. In this paper, we adopt the widely used SE kernel shown in Equation (A1) as an example.

k_{S E} (x, x^{'}) = s_{f}^{2} exp (- \frac{{(x - x^{'})}^{2}}{2 l^{2}}) .

(A1)

There are two hyperparameters, i.e., the signal variance

s_{f}

and length-scale l are involved. Equations (A2) and (A3) show the expectation (prediction mean) partial derivative (EPD) and covariance partial derivative (CPD) of

s_{f}

,

\begin{matrix} {(\frac{\partial {\bar{f}}_{*}}{\partial θ_{s}})}_{o} |_{θ_{s} = s_{f}} \\ = & \sum_{i = 1}^{n} \sum_{j = 1}^{n} (k_{o j} \frac{\partial d_{j i}}{\partial s_{f}} + \frac{\partial k_{o j}}{\partial s_{f}} d_{j i}) y_{i} \\ = & \sum_{i = 1}^{n} \sum_{j = 1}^{n} y_{i} \{\begin{matrix} 0, & j \neq i \\ 0, & j = i \end{matrix}\}, \end{matrix}

(A2)

\begin{matrix} {(\frac{\partial cov (f_{*})}{\partial θ_{s}})}_{o o} |_{θ_{s} = s_{f}} \\ = & \frac{\partial K {(X_{*}, X_{*})}_{o o}}{\partial s_{f}} - \sum_{i = 1}^{n} \sum_{j = 1}^{n} (\frac{\partial k_{o j}}{\partial s_{f}} d_{j i} k_{o i} + k_{o j} \frac{\partial d_{j i}}{\partial s_{f}} k_{o i} - k_{o j} d_{j i} \frac{\partial k_{o i}}{\partial s_{f}}) \\ = & 2 s_{f} - \sum_{i = 1}^{n} \sum_{j = 1}^{n} \{\begin{matrix} 2 s_{f} exp (- \frac{{(x_{o} - x_{j})}^{2} + {(x_{j} - x_{i})}^{2} + {(x_{o} - x_{i})}^{2}}{2 l^{2}}), & j \neq i \\ - 2 s_{f} exp (- \frac{{(x_{o} - x_{j})}^{2} + {(x_{o} - x_{i})}^{2}}{2 l^{2}}), & j = i \end{matrix}\} . \end{matrix}

(A3)

While the derivatives of l are given in Equations (A4) and (A5),

\begin{matrix} {(\frac{\partial {\bar{f}}_{*}}{\partial θ_{s}})}_{o} |_{θ_{s} = l} \\ = & \sum_{i = 1}^{n} \sum_{j = 1}^{n} (k_{o j} \frac{\partial d_{j i}}{\partial l} + \frac{\partial k_{o j}}{\partial l} d_{j i}) y_{i} \\ = & \sum_{i = 1}^{n} \sum_{j = 1}^{n} y_{i} \{\begin{matrix} - exp (- \frac{{(x_{o} - x_{j})}^{2} + {(x_{j} - x_{i})}^{2}}{2 l^{2}}) \frac{{(x_{o} - x_{j})}^{2} + {(x_{j} - x_{i})}^{2}}{l^{3}}, & j \neq i \\ exp (- \frac{{(x_{o} - x_{j})}^{2}}{2 l^{2}}) \frac{{(x_{o} - x_{j})}^{2}}{l^{3}}, & j = i \end{matrix}\}, \end{matrix}

(A4)

\begin{matrix} {(\frac{\partial cov (f_{*})}{\partial θ_{s}})}_{o o} |_{θ_{s} = l} \\ = & \frac{\partial K {(X_{*}, X_{*})}_{o o}}{\partial l} - \sum_{i = 1}^{n} \sum_{j = 1}^{n} (\frac{\partial k_{o j}}{\partial l} d_{j i} k_{o i} + k_{o j} \frac{\partial d_{j i}}{\partial l} k_{o i} - k_{o j} d_{j i} \frac{\partial k_{o i}}{\partial l}) \\ = & \sum_{i = 1}^{n} \sum_{j = 1}^{n} \{\begin{matrix} \begin{matrix} exp (- \frac{{(x_{o} - x_{j})}^{2} + {(x_{j} - x_{i})}^{2} + {(x_{o} - x_{i})}^{2}}{2 l^{2}}) \\ * \frac{{(x_{o} - x_{j})}^{2} + {(x_{j} - x_{i})}^{2} - {(x_{o} - x_{i})}^{2}}{l^{3}} s_{f}^{2}, \end{matrix} & j \neq i \\ 0, & j = i \end{matrix}\} . \end{matrix}

(A5)

References

WHO. WHO Global Ambient Air Quality Database (Update 2018); World Health Organization: Geneva, Switzerland, 2018. [Google Scholar]
Landrigan, P.J. Air pollution and health. Lancet Public Health 2017, 2, e4–e5. [Google Scholar] [CrossRef] [Green Version]
WHO. Health Effects of Particulate Matter: Policy Implications for Countries in Eastern Europe, Caucasus and Central Asia (2013); World Health Organization Regional Office for Europe: Copenhagen, Denmark, 2013. [Google Scholar]
Chen, H.; Kwong, J.C.; Copes, R.; Tu, K.; Villeneuve, P.J.; Van Donkelaar, A.; Hystad, P.; Martin, R.V.; Murray, B.J.; Jessiman, B.; et al. Living near major roads and the incidence of dementia, Parkinson’s disease, and multiple sclerosis: A population-based cohort study. Lancet 2017, 389, 718–726. [Google Scholar] [CrossRef]
Khreis, H.; de Hoogh, K.; Nieuwenhuijsen, M.J. Full-chain health impact assessment of traffic-related air pollution and childhood asthma. Environ. Int. 2018, 114, 365–375. [Google Scholar] [CrossRef] [PubMed]
Improving Air Quality in the Tackling Nitrogen Dioxide in Our Towns and Cities; UK Overview Document; Department for Environment, Food & Rural Affairs and Department for Transport: London, UK, 2017.
Rai, A.C.; Kumar, P.; Pilla, F.; Skouloudis, A.N.; Di Sabatino, S.; Ratti, C.; Yasar, A.; Rickerby, D. End-user perspective of low-cost sensors for outdoor air pollution monitoring. Sci. Total Environ. 2017, 607, 691–705. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zheng, T.; Bergin, M.H.; Sutaria, R.; Tripathi, S.N.; Caldow, R.; Carlson, D.E. Gaussian process regression model for dynamically calibrating and surveilling a wireless low-cost particulate matter sensor network in Delhi. Atmos. Meas. Tech. 2019, 12, 5161–5181. [Google Scholar] [CrossRef] [Green Version]
Shen, J. PM_2.5 concentration prediction using times series based data mining. City 2012, 2013, 2014–2020. [Google Scholar]
Silibello, C.; D’Allura, A.; Finardi, S.; Bolignano, A.; Sozzi, R. Application of bias adjustment techniques to improve air quality forecasts. Atmos. Pollut. Res. 2015, 6, 928–938. [Google Scholar] [CrossRef]
Specht, D.F. A general regression neural network. IEEE Trans. Neural Netw. 1991, 2, 568–576. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Lin, K.; Pai, P.; Yang, S. Forecasting concentrations of air pollutants by logarithm support vector regression with immune algorithms. Appl. Math. Comput. 2011, 217, 5318–5327. [Google Scholar] [CrossRef]
Mao, Y.; Lee, S. Deep Convolutional Neural Network for Air Quality Prediction. J. Phys. Conf. Ser. 2019, 1302, 032046. [Google Scholar] [CrossRef]
Garriga-Alonso, A.; Rasmussen, C.E.; Aitchison, L. Deep convolutional networks as shallow gaussian processes. arXiv 2018, arXiv:1808.05587. [Google Scholar]
Bai, L.; Wang, J.; Ma, X.; Lu, H. Air pollution forecasts: An overview. Int. J. Environ. Res. Public Health 2018, 15, 780. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wang, P.; Mihaylova, L.; Munir, S.; Chakraborty, R.; Wang, J.; Mayfield, M.; Alam, K.; Khokhar, M.F.; Coca, D. A computationally efficient symmetric diagonally dominant matrix projection-based Gaussian process approach. Signal Process. 2021, 183, 108034. [Google Scholar] [CrossRef]
Burt, D.R.; Rasmussen, C.E.; Van Der Wilk, M. Rates of Convergence for Sparse Variational Gaussian Process Regression. arXiv 2019, arXiv:1903.03571. [Google Scholar]
Liu, H.; Ong, Y.S.; Shen, X.; Cai, J. When Gaussian process meets big data: A review of scalable GPs. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 4405–4423. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; Number 3; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
Wu, M.; Yin, B.; Wang, G.; Dick, C.; Cavallaro, J.R.; Studer, C. Large-scale MIMO detection for 3GPP LTE: Algorithms and FPGA implementations. IEEE J. Sel. Top. Signal Process. 2014, 8, 916–929. [Google Scholar] [CrossRef] [Green Version]
Chen, Z.; Wang, B. How priors of initial hyperparameters affect Gaussian process regression models. Neurocomputing 2018, 275, 1702–1710. [Google Scholar] [CrossRef] [Green Version]
Zhu, D.; Li, B.; Liang, P. On the matrix inversion approximation based on Neumann series in massive MIMO systems. In Proceedings of the 2015 IEEE International Conference on Communications (ICC), London, UK, 8–12 June 2015; pp. 1763–1769. [Google Scholar]
Titsias, M. Variational learning of inducing variables in sparse Gaussian processes. In Proceedings of the Artificial Intelligence and Statistics, Clearwater Beach, FL, USA, 16–18 April 2009; pp. 567–574. [Google Scholar]
Snelson, E.; Ghahramani, Z. Sparse Gaussian processes using pseudo-inputs. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, Canada, 4–9 December 2006; pp. 1257–1264. [Google Scholar]
WHO. Air Quality Guidelines for Particulate Matter, Ozone, Nitrogen Dioxide and Sulphur Dioxide. Global Update 2005; World Health Organization: Geneva, Switzerland, 2006. [Google Scholar]

Figure 1. Impacts of

σ_{n}^{2}

on ELBO and UBML: (a)

σ_{n}^{2} \in [0.1, 1.5]

, (b)

σ_{n}^{2} \in [1.5, 3.0]

, (c)

σ_{n}^{2} \in [3.0, 20.0]

, (d)

σ_{n}^{2} \in [20.0, 200.0]

.

Figure 1. Impacts of

σ_{n}^{2}

on ELBO and UBML: (a)

σ_{n}^{2} \in [0.1, 1.5]

, (b)

σ_{n}^{2} \in [1.5, 3.0]

, (c)

σ_{n}^{2} \in [3.0, 20.0]

, (d)

σ_{n}^{2} \in [20.0, 200.0]

.

Figure 2. Concentration of pollutants recorded at the same time period in both Sheffield and Peshawar: (a) NO, (b)

{NO}_{2}

, (c)

{SO}_{2}

, (d)

{PM}_{2.5}

.

Figure 2. Concentration of pollutants recorded at the same time period in both Sheffield and Peshawar: (a) NO, (b)

{NO}_{2}

, (c)

{SO}_{2}

, (d)

{PM}_{2.5}

.

Figure 3. Prediction and absolute error of pollutants in Sheffield: (a) NO, (b)

{NO}_{2}

, (c)

{SO}_{2}

, (d)

{PM}_{2.5}

.

Figure 3. Prediction and absolute error of pollutants in Sheffield: (a) NO, (b)

{NO}_{2}

, (c)

{SO}_{2}

, (d)

{PM}_{2.5}

.

Figure 4. Prediction and absolute error of pollutants in Peshawar: (a) NO, (b)

{NO}_{2}

, (c)

{SO}_{2}

, (d)

{PM}_{2.5}

.

Figure 4. Prediction and absolute error of pollutants in Peshawar: (a) NO, (b)

{NO}_{2}

, (c)

{SO}_{2}

, (d)

{PM}_{2.5}

.

Figure 5. Relationship of

σ_{n}^{2}

with four pollutants prediction RMSE: (a) NO, (b)

{NO}_{2}

, (c)

{SO}_{2}

, (d)

{PM}_{2.5}

.

Figure 5. Relationship of

σ_{n}^{2}

with four pollutants prediction RMSE: (a) NO, (b)

{NO}_{2}

, (c)

{SO}_{2}

, (d)

{PM}_{2.5}

.

Figure 6. Relationship of

σ_{n}^{2}

with pollutants prediction uncertainty bound: (a) NO, (b)

{NO}_{2}

, (c)

{SO}_{2}

, (d)

{PM}_{2.5}

.

Figure 6. Relationship of

σ_{n}^{2}

with pollutants prediction uncertainty bound: (a) NO, (b)

{NO}_{2}

, (c)

{SO}_{2}

, (d)

{PM}_{2.5}

.

Figure 7. Relationship of

s_{f}

on NO prediction RMSE and uncertainty bound: (a)

σ_{n}^{2} = 0.5

, (b)

σ_{n}^{2} = 0.5

, (c)

σ_{n}^{2} = 1.5

, (d)

σ_{n}^{2} = 1.5

.

Figure 7. Relationship of

s_{f}

on NO prediction RMSE and uncertainty bound: (a)

σ_{n}^{2} = 0.5

, (b)

σ_{n}^{2} = 0.5

, (c)

σ_{n}^{2} = 1.5

, (d)

σ_{n}^{2} = 1.5

.

Figure 8. Relationship of l on NO prediction RMSE and uncertainty bound: (a)

σ_{n}^{2} = 0.5

, (b)

σ_{n}^{2} = 0.5

, (c)

σ_{n}^{2} = 1.5

, (d)

σ_{n}^{2} = 1.5

.

Figure 8. Relationship of l on NO prediction RMSE and uncertainty bound: (a)

σ_{n}^{2} = 0.5

, (b)

σ_{n}^{2} = 0.5

, (c)

σ_{n}^{2} = 1.5

, (d)

σ_{n}^{2} = 1.5

.

Figure 9. Effects of

σ_{n}^{2}

on ELBO and UBML: (a) NO in Sheffield, (b) NO in Peshawar.

Figure 9. Effects of

σ_{n}^{2}

on ELBO and UBML: (a) NO in Sheffield, (b) NO in Peshawar.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, P.; Mihaylova, L.; Chakraborty, R.; Munir, S.; Mayfield, M.; Alam, K.; Khokhar, M.F.; Zheng, Z.; Jiang, C.; Fang, H. A Gaussian Process Method with Uncertainty Quantification for Air Quality Monitoring. Atmosphere 2021, 12, 1344. https://0-doi-org.brum.beds.ac.uk/10.3390/atmos12101344

AMA Style

Wang P, Mihaylova L, Chakraborty R, Munir S, Mayfield M, Alam K, Khokhar MF, Zheng Z, Jiang C, Fang H. A Gaussian Process Method with Uncertainty Quantification for Air Quality Monitoring. Atmosphere. 2021; 12(10):1344. https://0-doi-org.brum.beds.ac.uk/10.3390/atmos12101344

Chicago/Turabian Style

Wang, Peng, Lyudmila Mihaylova, Rohit Chakraborty, Said Munir, Martin Mayfield, Khan Alam, Muhammad Fahim Khokhar, Zhengkai Zheng, Chengxi Jiang, and Hui Fang. 2021. "A Gaussian Process Method with Uncertainty Quantification for Air Quality Monitoring" Atmosphere 12, no. 10: 1344. https://0-doi-org.brum.beds.ac.uk/10.3390/atmos12101344

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Gaussian Process Method with Uncertainty Quantification for Air Quality Monitoring

Abstract

1. Introduction

2. Background Knowledge

2.1. Gaussian Processes

2.2. Neumann Series Approximation

3. Uncertainty Quantification in Gaussian Processes

3.1. Uncertainty in Measurements

3.2. Uncertainty in Hyperparameters

3.3. Derivatives Approximation with Neumann Series

3.4. Impacts of Noise Level and Hyperparameters on ELBO and UBML

4. Experiments and Analysis

4.1. Air Quality Prediction

4.2. Impacts of Measurement Noise Level and Hyperparameters

4.3. Impacts of Noise Level on ELBO and UBML

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Data Collection

Appendix B. The WHO Concentration Criteria for Pollutants

Appendix C. Approximated Derivatives of SE Kernel

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI