Data-Driven Analysis of Distributed Air Quality Monitoring Data Reveals Hyperlocal Insight

Jiajun Gu, Jintao Gu, and K. Max Zhang

Distributed air quality sensor networks have been established in many cities in the world to continuously monitor concentrations of various air pollutants detrimental to human health. There is great interest in developing new analytical techniques capable of generating useful insights from sensor networks. In this study, we applied data-driven techniques, including machine-learning modeling and network analysis, to predict the spatial and temporal trends of NO2 and PM2.5 concentrations and identify local emission sources using data from the Breathe London project, where more than 100 low-cost air quality sensor pods were installed on lamp posts and buildings throughout the region of Greater London, England. During the network analysis, we defined the general temporal trends of pollutant concentrations driven by the regional phenomena among the sensor network and used them as reference to identify the local anomalous concentrations and the associated local drivers. In parallel, we implemented machine learning models to predict the spatial and temporal trends of pollutants using meteorological and land-use features. While the machine-learning models can effectively capture the general trends, we found that the poor performances were often indicative of local emission sources, consistent with those in the network analysis. A major accomplishment of this study is the identification of the influence of local emissions sources that have been poorly characterized in the emission inventories such as cooking and unpaved surfaces. Better understanding these emission sources will help inform new policy that can improve pollution and empower local communities to take local actions to address their air pollution problems.