Deciphering Environmental Air Pollution with Large Scale City Data

Mayukh Bhattacharyya1*
Sayan Nag2*
Udita Ghosh3
1Stony Brook University
2University of Toronto
3Zendrive Inc

* denotes equal contribution

Spotlight and Oral at International Joint Conference on Artificial Intelligence (IJCAI) 2022

[Paper]
[Code]
[Slides]
[Video]


Abstract
Air pollution poses a serious threat to sustainable environmental conditions in the 21st century. Its importance in determining the health and living standards in urban settings is only expected to increase with time. Various factors ranging from artificial emissions to natural phenomena are known to be primary causal agents or influencers behind rising air pollution levels. However, the lack of large scale data involving the major artificial and natural factors has hindered the research on the causes and relations governing the variability of the different air pollutants. Through this work, we introduce a large scale city-wise dataset for exploring the relationships among these agents over a long period of time. We also introduce a transformer based model - cosSquareFormer, for the problem of pollutant level estimation and forecasting. Our model outperforms most of the benchmark models for this task. We also analyze and explore the dataset through our model and other methodologies to bring out important inferences which enable us to understand the dynamics of the casual agents at a deeper level. Through our paper, we seek to provide a great set of foundations for further research into this domain that will demand critical attention of ours in the near future.




Results
Table below shows the performance of predictions from different models for all 6 pollutants. LSTM E and Attention LSTM E are trained on explicit information of weekday and month whereas the explicit information have been excluded while training the remaining models.



In the figures below, it can be observed that our proposed model does a great job in following sudden daily fluctuations in the pollutant levels.



In order to explore the sequential nature of pollutants, we designed an ablation study with multiple sequence lengths with the same experimental setup to maintain parity for modeling. The results given in the figure below (left) show that pollutants like PM2.5, PM10 and NO2 have a better performance with longer sequence lengths, whereas the others either degrade or show a flat trend. Thus it can be assumed that the daily concentration of some pollutants indeed have a good dependence on past concentrations whereas some others are mostly independent of it.

The visualizations shown in the figure below (right) provide some information about each city’s conformity with the universal model. It shows us the cities which have pollutant levels which were much higher than that estimated by our model. It provides us the leads to explore the context and reason behind each such outlier city. An analysis on this basis will provide researchers to identify problematic cases in a meaningful way instead of just flagging cities with high pollutant levels.




Citation
 
Deciphering Environmental Air Pollution with Large Scale City Data. In IJCAI 2022.