Abstract: The presence of pollutants in the air has a direct impact on our health and causes detrimental changes to our environment. Air quality monitoring is therefore of paramount importance. The high cost of the acquisition and maintenance of accurate air quality stations implies that only a small number of these stations can be deployed in a country. To improve the spatial resolution of the air monitoring process, an interesting idea is to develop data-driven models to predict air quality based on readily available data. In this paper, we investigate the correlations between air pollutants concentrations and meteorological and road traffic data. Using machine learning, regression models are developed to predict pollutants concentration. Both linear and non-linear models are investigated in this paper. It is shown that non-linear models, namely Random Forest (RF) and Support Vector Regression (SVR), better describe the impact of traffic flows and meteorology on the concentrations of pollutants in the atmosphere. It is also shown that more accurate prediction models can be obtained when including some pollutants’ concentration as predictors. This may be used to infer the concentrations of some pollutants using those of other pollutants, thereby reducing the number of air pollution sensors.
Keywords: Air pollution, meteorological features, traffic features, Machine Learning, Linear Regression, Support Vector Machine, Random Forest