## 1. Introduction

Currently, air pollution and its related health problems have become research hot spots [

1]. Numerous studies have indicated that particles smaller than 2.5

$\text{}\mathsf{\mu}\mathrm{m}$ in aerodynamic diameter (PM

_{2.5}) have adverse effects on human health and can cause pulmonary and cardiovascular diseases [

2,

3].

People exposed to polluted environments are prone to illness or even death. Thus, PM

_{2.5} exposure monitoring and pattern analysis are critical to air quality assessment and environmental epidemiologic studies [

4,

5]. PM

_{2.5} concentrations are traditionally obtained by ground monitoring sites distributed throughout a country. However, the existing ground monitoring sites are too sparse to provide continuous PM

_{2.5} monitoring due to the high construction cost. In contrast, satellite remote sensing has wide and continuous spatial coverage and has been widely applied in the estimation of PM

_{2.5} concentrations [

6,

7], although the cloud could affect the availability of data.

Satellite-derived Aerosol Optical Depth (AOD) represents the quantity of light removed from a beam by the role of aerosol scattering or absorption during its path [

8,

9]. Furthermore, previous studies have indicated that there exists a direct relationship between the atmospheric particles (such as PM

_{2.5}) and AOD [

10]. Thus, remote sensing satellites’ AOD products provide a potentially cost-effective way to estimate ground-level PM

_{2.5} mass concentrations [

11,

12]. A series of AOD products have been applied to such surveys [

13], e.g., the Moderate Resolution Imaging Spectroradiometer (MODIS) [

14,

15], the Multi-angle Imaging Spectroradiometer [

16], the Himawari-8 (H8) [

17,

18], and the Visible Infrared Imaging Radiometer Suite [

19]. However, the relatively coarse spatial resolutions (usually 3 km or 10 km) of the above-mentioned satellite sensors limit the precise estimates of PM

_{2.5} in urban areas. Recently, the Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm, which utilizes time-series analysis and image-based processing techniques, was developed to conduct aerosol retrievals and atmospheric corrections.

The Multiangle Implementation of Atmospheric Correction (MAIAC) is a new generic algorithm applied to collection 6 (C6) MODIS measurements to retrieve Aerosol Optical Depth (AOD) over land at high spatial resolution (1 km) [

20]. The related AOD product MCD19A2 (MODIS Collection 6 (C6) daily AOD dataset), which is based on the MAIAC algorithm, was released in 2018 [

21]. Although the goal of the MAIAC AOD product is aerosol monitoring, this product of 1 km resolution gives us chance to estimate PM

_{2.5} concentrations in a higher spatial and temporal resolution.

Previous studies have shown that the relationship between PM

_{2.5} and AOD is relatively complex and may be affected by a series of parameters, such as the aerosol type and the vertical structure of aerosol distribution [

22], the relative humidity (RH) [

23], planetary boundary layer height (PBLH) [

24], wind speed and direction [

25], the depth and temperature difference of the inversion layer [

24], land cover [

26], etc. Furthermore, recently, more sophisticated methods used to estimate PM

_{2.5} have been developed by taking into account these parameters.

Studies have tried to explore the relationship between these variables by statistical approaches. There have been many different approaches proposed by studies that explored the relationship between PM

_{2.5} and AOD. For example, including but not limited to, the linear regression model, the geographically weighted regression model [

27,

28], the two-stage model [

29] and the newly developed neural network methods [

30,

31]. As geospatial data, PM

_{2.5} concentration data have spatial heterogeneity and spatial dependence. The statistical characteristics of PM

_{2.5} concentrations may vary over space and time. This space–time anisotropy may violate the independent and identically distributed random variables in most of the machine learning methods [

32].

The Support Vector Machine (SVM) based on the principle of structural risk minimization initially developed for solving classification problems using small sample learning is found to be promising for solving regression problems. The SVM for regression termed as Support Vector Regression (SVR) has revealed superior performance due to its inherent capability to circumvent overfitting problem in regression and improved response approximation ability [

33].

Considering the characteristics of the experiment:

1. Nonlinear and complex relationship; as an atmospheric research, the relationship between PM_{2.5}, AOD and auxiliary variables is rather complicated and it would be better to describe it with nonlinear model. The kernel function can simplify the inner product operation in the mapping space, avoiding calculating in the high-dimensional space directly.

2. The relatively small dataset. Regression algorithms generally obey the law of big data; this means that the final result is relatively more accurate with more samples. The SVM makes it possible to achieve relatively good results on small samples. SVR based on the support vector machine solves regression problems using small sample learning. Furthermore, the ′small sample′ is a considerable concept; we think the samples in our experiment are enough for digging out the nonlinear relationship.

3. The capability to handle high-dimensional data sets well. SVR can grasp the nonlinear relationship between data and features on relatively small datasets, especially compared to most other machine learning methods.

Thus, this paper proposes the modified SVR (MSVR) method to improve the estimation accuracy of PM_{2.5} concentrations. MSVR considers the impacts of spatial distance on estimation accuracy and adds factors in the model input that are generally included and contribute relatively more significant influence, with MAIAC AOD as the primary predictor and the meteorological and land cover information as ancillary information.

## 5. Conclusions

The satellite AOD data used in this experiment has superiority over the conventional DB/DT AOD in terms of resolution and accuracy. A higher resolution (1 km) satellite AOD data is used to ensure that the obtained PM_{2.5} can reflect more accurate and detailed temporal and spatial characteristics. Additionally, the accuracy of MAIAC algorithm has been proved to be higher than DT/DB algorithm over dark surface. The experiment verified the feasibility of 1 km MAIAC AOD for PM_{2.5} retrieval and the superiority over the 3 km MODIS AOD in terms of spatial resolution and retrieval accuracy.

MSVR proposed in this paper, is modified based on the traditional SVR for the regression of AOD and PM_{2.5} and obtain the improvement of experiment accuracy. The results showed that the MSVR model could improve the accuracy of the regression from R^{2} 0.60 to 0.74 in 2017 and 0.66 to 0.78 in 2018 compared to the traditional SVR.

We introduced the commonly used meteorological parameters to reduce the influence of complex factors on PM_{2.5} retrieval from satellite AOD to a certain extent. The integrated meteorological parameters and land cover data demonstrated that the appropriate auxiliary variables could improve the performance of PM_{2.5} retrieval.

The experimental results also showed that PM_{2.5} has obvious spatial and temporal differences. We analyzed the spatial and temporal distribution and characteristics of PM_{2.5} in Hubei Province, and conducted the above analysis by season. We also analyzed the possible reason of such spatiotemporal differences.

In our future work, we will make efforts from three aspects. First, we will try to find satellites with higher resolution and aerosol retrieval algorithm with better performance. It is worth noting that, recently, there are studies that propose combining satellite remote-sensing techniques and a newly established low-cost sensor network to estimate long-term PM_{2.5} concentrations to increase the measurement density. Secondly, we will try to figure out the specific influence of meteorological, topographic and social factors on the distribution characteristics of PM_{2.5} and the retrieval of PM_{2.5} from satellite AOD. Thirdly, we will attempt to conduct longer time series and wider range of analyses of PM_{2.5} distribution characteristics.