This website will present (partly preliminary) results from applying several machine learning and mathematics techniques to different atmospheric topics.
In this project we aiming to create lumped reaction schemes for highly complex chemical systems like the oxydation of volatile organic compounds (VOC). The method is characterised by an iterative process of scheme creation, empiric rate coefficient assignment, stochastic testing, and AI-driven network analysis to highlight missing dependencies in the chemistry schemes. As a result, the degree of lumping is automatically adjusted to the experimentally reported complexity of a VOC system.
We have created a ‘toy’ scheme with one fictional precursor VOC to test and demonstrate all computational approaches
(press here to see the full toy-case chemistry scheme).
This chemistry scheme aims to reflect the complex interactions and main reaction pathways in the autoxidation of carbon-centred radicals. The precursor chemical species is a fictional VOC called TOY. In this scheme, autoxidation reactions (see “Autoxidation: RO2 -> H-shift + O2 addition -> RO2 ” in the scheme) are competing bimolecular reactions (RO2 + NO/HO2/RO2) leading to the formation of stable, closed shell species (RO2 + NO: T_O*_NO3; RO2 + HO2: T_O*_OOH; and RO2 + RO2: T_O*_2OH, T_O*_O and TO*_TO*; where * represents integer numbers). Additionally, there are reaction branches that form highly reactive alkoxy radicals (T_RO_O*). These can rapidly (rate: KDEC = 106 s-1) undergo autoxidation form peroxy radicals (T_RO2_O*).
To investigate the toy-chemistry, the following states of the chemistry are intended in order to determine multi-experiment based, realistic rate coefficients (see Pichelstorfer et al., 2024 , section 3.3 for more details):
To investigate the effects, branching ratios of the reactions may be varied, e.g.: RO2 + NO → a) RONO2 and b) alkoxy may be either shifted to higher nitrate formation: increase channel a), or to see higher alkoxy radical formation: increase path b). Similarly, the branching in the H-shift reaction forming a peroxy radical (i.e., the RO2) and its competing reaction pathway to form a closed shell species (Tuni_O*_O) may be changed to either promote autoxidation or to limit it by the (almost) unimolecular termination pathway.
The formation of adducts, which typically are very low-volatility species, is important for the early growth of nano-clusters. It can be enhanced by keeping the NO and HO2 low while increasing the VOC-turnover. In case they don’t form sufficiently, the competing reaction pathways can be tuned down (i.e., the reactions with the sum of RO2 which are indicated by the term “RO2” in the rate coefficient).
AutoCONSTRAINT work:
Pichelstorfer, L., O’Meara, S. P., and McFiggans, G. B.: Theory informed, experiment based, constraint on the rate of autoxidation chemistry – An analytical approach, Aerosol Research Discuss. [preprint], https://doi.org/10.5194/ar-2024-40, in review, 2024.
To infer rate coefficients for the autoREACTIONS-generated autoxidation scheme (see table xx), we applied the same steps as outlined in a recent work introducing the autoCONSTRAINTS method (Pichelstorfer et al., 2024):
Figure 2: The abscissa shows peroxy radicals (denoted by their number of carbon + oxygen + nitrogen atoms) reacting at the rate coefficient k (lower ordinate) to form ROOR species. The precision of rate coefficient recovery from the (simulated) mass spectrum is shown on the upper ordinate (recovered over set rate coefficient). The colour code denotes the calculated contribution of the reaction to the change (by means of chemical reactions) in CIMS mass peak concentration.
Figure 4: Peroxy radicals (names given by the abscissa) react at a rate (denoted on the lower ordinate) to form ROH (downward facing triangle), alkyl species (upward facing triangle) and alkoxy radicals (crosses). A circle shows the sum of the individual reaction channel’s rates. The precision of rate coefficient recovery from the (simulated) mass spectrum is shown on the upper ordinate (recovered over set rate coefficient). The colour code denotes the calculated contribution of the reaction to the change (by means of chemical reactions) in CIMS mass peak concentration.
The reaction rate constants should be calibrated using measured mass spectra. However, the situation is ill-posed: the reaction schemes consist of a high number of reactions, while the experimental data for each autoxidation module is limited to a handful of mass spectra with limited measurement accuracy. We demonstrate here the impact of measurement uncertainty in the toy case. We assume ideal data: all concentrations are measured at several time points, and a small relative noise of 5 per cent is added. Adaptive MCMC methods (Haario H., et al., Stat. Comput., 16,339-354, 2006; Wang, S., et al., Environ. Sci. Technol., 51, 8442−8449, 2017) are then used to find ‘all’ reaction rate values that agree with the data within the measurement accuracy.
Figures 6, 7, and 8 show the histograms of the sampled parameter values in scaled units. The system is sensitive to several reaction rates: the parameters remain close to the default value 1. However, some parameters have much larger histogram ranges, even close to a uniform distribution between the minimum and maximum bounds set. This means that the system is more insensitive to changing those reaction rates.
Figure 9 verifies the approach: we simulate the system using all the different parameter combinations found by the MCMC sampling and construct the histograms of the concentration values at the final steady-state observation point. We can see how, in scaled units again, the concentration values are tightly fitted around the correct synthetic data values at roughly 5 per cent accuracy.
Figure 6: Histograms of the sampled parameter values in scaled units. (Reactions 0-15).
Figure 7: Histograms of the sampled parameter values in scaled units (Reactions 15-31).
Figure 8: Histograms of the sampled parameter values in scaled units (Reactions 32-48).
The chemical schemes describing autoxidation are detailed; however, they are still lumped. As a result, a scheme may not serve to explain all chemical regimes experimentally investigated. To overcome this, GNN is set up to highlight dependencies (e.g., reaction pathways) not covered so far. Findings will be converted to chemical reactions and considered in a new, autoREACTIONS-created chemical scheme. The complex chemical system can be represented as a chemical graph with nodes and edges. GNN efficiently processes chemical graphs to learn the intricate connections and adjust weights accordingly to capture important relationships. We will utilise the knowledge of the obtained rate coefficients to initialise GNN’s parameters and optimise it for exploring novel associations between newly generated chemical components and others present in the system. Moreover, GNN can include meteorological variables (temperature, humidity, etc.) to learn about changes in chemistry paths in different physical environments.
Figure 10: Pipeline of using the neural network for chemical reaction rate estimation
Figure 11: Reaction rate estimation between ground truth and our estimation for the toy-case
Figure 1: The proposed ChemNNE for chemical concentration prediction. It takes the environmental parameters and initial chemical concentration to predict the future chemical reaction process.
Figure 2: Visualization of the time evolution of chemistry. In (a), (b), and (c), we show the ground truth as red lines and the predictions with different colours. We pick four different chemical compounds for comparison and enlarge the region in red boxes to highlight the prediction errors.
Air quality prediction is key to mitigating health impacts and guiding decisions, yet existing models tend to focus on temporal trends while overlooking spatial generalization. We propose AQ-Net, a spatiotemporal reanalysis model for both observed and unobserved stations in the near future. AQ-Net utilizes the LSTM and multi-head attention for the temporal regression. We also propose a cyclic encoding technique to ensure continuous time representation. To learn fine-grained spatial air quality estimation, we incorporate AQ-Net with the neural kNN to explore feature-based interpolation, such that we can fill the spatial gaps given coarse observation stations. To demonstrate the efficiency of our model for spatiotemporal reanalysis, we use data from 2013–2017 collected in northern China for PM 2 . 5 analysis. Extensive experiments show that AQ-Net excels in air quality reanalysis, highlighting the potential of hybrid spatio-temporal models to better capture environmental dynamics—especially in urban areas where both spatial and temporal variability are critical.
The code can be found at AQ-Net.
Our model predicts PM2.5 by combining temporal and spatial dependencies. AQ-Net comprises three core components: an LSTM-MHA module, combining LSTM and multi-head attention for temporal feature extraction, a neural kNN module for spatial interpolation, and a Cyclic Encoding (CE) layer for time embedding. We use hourly measurements of PM2.5, PM10, CO, NO2, SO2, and O3 from monitoring stations, and estimate PM2.5 for the coming hours and days. The LSTM captures long-range pollutant fluctuations, while multi-head attention highlights critical time steps. A temporal pooling step condenses the latent sequence into a single feature vector, which the neural kNN module uses for spatial interpolation at unobserved stations based on their nearest neighbors. This integrated architecture generates 168-hour (7-day) PM2.5 estimation for both observed and unobserved locations, leveraging key temporal patterns and spatial relationships.
To illustrate the efficiency of our proposed AQ-Net on the large-scale dataset, we show spatiotemporal reanalysis in the entire northern China. As shown in Figure 3, utilizing the proposed neural kNN, we are able to estimate the complete spatial interpolation, capturing both observed and unobserved areas. Notably, pollution hotspots around northern and central Beijing are consistent with known urban emission sources. These results highlight the model’s ability to generalize beyond monitored stations, which is crucial for accurate city-wide air quality assessments.
contact@amc-lahti.fi
Location
LAHDEN YLIOPISTOKAMPUS Niemen kampus Niemenkatu 73 (B-osa) 15140 Lahti