Data-Driven Analysis of Distributed Air Quality Monitoring Data Reveals Hyperlocal Insight

Jiajun Gu, Jintao Gu, and K. Max Zhang

Distributed air quality sensor networks have been established in many cities in the world to continuously monitor concentrations of various air pollutants detrimental to human health. There is great interest in developing new analytical techniques capable of generating useful insights from sensor networks. In this study, we applied data-driven techniques, including machine-learning modeling and network analysis, to predict the spatial and temporal trends of NO2 and PM2.5 concentrations and identify local emission sources using data from the Breathe London project, where more than 100 low-cost air quality sensor pods were installed on lamp posts and buildings throughout the region of Greater London, England. During the network analysis, we defined the general temporal trends of pollutant concentrations driven by the regional phenomena among the sensor network and used them as reference to identify the local anomalous concentrations and the associated local drivers. In parallel, we implemented machine learning models to predict the spatial and temporal trends of pollutants using meteorological and land-use features. While the machine-learning models can effectively capture the general trends, we found that the poor performances were often indicative of local emission sources, consistent with those in the network analysis. A major accomplishment of this study is the identification of the influence of local emissions sources that have been poorly characterized in the emission inventories such as cooking and unpaved surfaces. Better understanding these emission sources will help inform new policy that can improve pollution and empower local communities to take local actions to address their air pollution problems.



Improving U.S. Wind Gust Prediction Using Machine Learning

Julian Arnheim and Jacob Coburn

Wind gusts impact the transportation sector (aviation and road transportation), the environment (windthrow), and other infrastructure (e.g., bridges). Short-term forecasting of wind gusts, especially infrequent strong (17-25.7 ms-1) and damaging (≥25.7 ms-1) wind gusts, is particularly challenging due to the range of phenomena responsible for generating wind gust and their high spatiotemporal variability. Machine Learning (ML) offers opportunities to examine relationships with key atmospheric predictors and improve wind gust prediction. In this work, we present wind gust observations from eight high-passenger-volume airports across the coterminous United States (CONUS) that represent different wind climates. The airports considered are Atlanta (ATL), Denver (DEN), Dallas-Fort Worth (DFW), Los Angeles (LAX), Orlando (MCO), Minneapolis-St. Paul (MSP), Phoenix (PHX), and Seattle-Tacoma (SEA). Wind gusts at 10-m a.g.l. are measured using 2-D sonic anemometers operated as part of the National Weather Service (NWS) Automated Surface Observation System (ASOS) network. The ASOS data from every 5 minutes are pre-processed to select the maximum wind gust value in each hour over a 15-year period: 2005-2019. Predictors are sampled at the top of each hour from the European Center for Medium Range Weather Forecasts (ECMWF) ERA5 reanalysis product. The pool of potential predictors at each airport includes an array of upper-level parameters sampled on multiple pressure levels (500, 750, 850, and 950 hPa): geopotential heights, temperature gradients, specific humidity, horizontal wind components, and vertical pressure tendency.

There is marked seasonality in the frequency of occurrence and intensity of wind gusts and large differences across the eight airports. The marginal probability of a wind gust of any magnitude (>7.2 ms-1) ranges from 0.10 at LAX to 0.33 in MSP. The marginal probability of a strong wind gust ranges from 0.0017 at LAX to 0.018 at DEN. Thus, wind gusts at DEN in excess of 17 ms-1 occur on almost 2% of all calendar hours and are 10 times more frequent than at LAX. DEN also has the highest probability of a damaging wind gust (0.0005) at 1 in 2000 hours. No wind gusts in excess of the NWS threshold for damaging wind gusts were reported at LAX over the study period. As the driving factors of wind gust variability may differ throughout the year, each station gust timeseries is divided into a cold (October-March) and warm (April-September) season. The marginal probability of a wind gust in each season extends as low as 0.081 for the cold season at PHX to as high as 0.361 for the warm season at MSP. Spatially, the East Coast and Pacific Northwest have a greater conditional probability of a wind gust in the cold season, and the Great Plains and Southwest have a greater conditional probability of a wind gust in the warm season. Among other signatures of synoptic climatological phenomena discussed herein, a monsoonal signal is evident in the strong seasonal-dependence of wind gust probability at PHX, where the marginal probability of a wind gust (0.16) is nearly three times as great in the warm season (0.238) as the cold season (0.081). Wind gust seasonality at PHX is amplified at higher wind extremes with a conditional probability of strong wind gusts in the warm season (0.04) four times as probable as strong wind gusts in the cold season (0.01), and a conditional probability of damaging wind gusts in the warm season (0.001) five times as probable as damaging wind gusts in the cold season.

Through this study, we seek to develop and assess predictive statistical models for wind gust occurrence and magnitude. We use multiple logistic regression as a reference forecast method for gust occurrence and multiple linear regression as a reference forecast for gust magnitude to determine the degree to which more complex ML approaches (Artificial Neural Networks) offer skill enhancement. Stepwise procedures are used for predictor selection and tools are applied to reduce predictor collinearity. ANN models with 1 to 20 hidden layers and regression models with and without an autoregressive (AR) term are trained using 70% and applied to the remaining 30% of independent data. All models are developed separately for the warm and cold seasons at each station. Generally, the ANNs exhibit moderately higher skill (e.g., lower false alarm rates for gusts and lower RMSE for gust magnitudes) over regression models for all airports studied. However, the overall predictive skill and the degree of ANN skill improvement are subject to substantial seasonal and spatial variability.


View poster

View abstract

Precipitation Response to CO2-Induced Warming in Millennial-Length AOGCM Simulations Using LongRunMIP

Kinen Kao ’22

Long Run Model Intercomparison Project (LongRunMIP) is a set of millennial-length climate simulations, enabling investigations of the long-term equilibration of climate in response to external forcings. The goal of this project is to explain the impacts of anthropogenic climate change on global precipitation change on millennial time scales in these simulations.

First, we will first explore how the climate sensitivity and hydrological sensitivity changes with time over millennial time scales. Next, we will explore how the sensitivity of various radiative fluxes and heat fluxes changes with time, and how each of these fluxes affect the hydrological sensitivity. Finally, we will investigate the regional differences in precipitation and hydrological sensitivity in millennial time scales, to understand how climate change will affect precipitation and hydrological sensitivity differently for various geographic regions of the world.

View poster