Breathing Data

HPHR Fellow Ellen Considine

By Ellen Considine

Air Quality Data Availability (or Lack Thereof)

To estimate the health impacts of air pollution, we need to know (or at least have a decent estimate of) air pollution concentrations. This is important both in developing exposure-response curves (described in my second blog) and in using the exposure-response curves to obtain population-level estimates of the health burden from air pollution.


Measuring air pollution starts with having ground-level monitoring. These sensors can use light scattering or the mass of particles to measure concentrations of particulate matter and electrochemical reactions to measure concentrations of gases. Here is a map of world air quality monitors (screenshot from the WAQI on June 7, 2021).

As you can see, there are large gaps in monitoring data. In fact, a 2020 report by OpenAQ found that half of the world’s governments (representing 1.4 billion people) don’t produce air quality data. This situation is even more troubling when we consider that countries with more severe air pollution tend to have less available monitoring data. Note that in the figure below (from the OpenAQ report), the six countries with more than 150 monitors (China, France, Germany, India, Spain, and the United States) were excluded.

These air quality information gaps are problematic for several reasons. In air pollution epidemiology, restricting our analyses to areas with monitors dramatically reduces our sample size and can potentially bias our health effect estimates if the distribution of monitor locations is not representative of air pollution concentrations and/or individuals in the population of interest. The figure above illustrates that the global distribution of monitors is indeed biased with respect to air pollution concentrations. Even within the US, there is evidence (summarized here) that some government monitors are sited to help counties achieve and maintain attainment of the NAAQS, meaning that the monitors are preferentially located away from hotspots. 


If we assume that the air pollution is the same across a larger area surrounding each monitor, then variation in individuals’ actual exposures (not exactly the same as the monitor value) can act as noise in the statistical model and can drive health effect estimates towards the null (showing no effect). To address this exposure measurement error problem, many researchers interpolate between monitoring locations, often with a geostatistical or machine learning model. This approach can improve estimates but may also introduce bias associated with the interpolation model. A related issue is that health data and social covariates are often reported at different spatial and temporal scales than air quality information. This too can affect the accuracy of health effect estimates


OpenAQ also found that 62% of countries (representing 2.1 billion people) don’t share real-time air quality information. Real-time air quality information can help alert the public when air quality is bad. For instance, in my home state of Colorado, receiving alerts about smoke from wildfires can help individuals choose to stay inside and protect our lungs, which is especially important for vulnerable groups such as children, older adults, and asthmatics.


Of course, real-time air quality information isn’t very relevant for individuals who don’t live or work anywhere near an air quality monitor, and many individuals (especially those with lower socioeconomic status) may not have the choice to avoid high air pollution even if they know about it. This is to say, producing some real-time air quality information is an integral but insufficient condition for helping protect the public from high levels of air pollution.


What are some alternative approaches to traditional air quality monitoring, which uses high-accuracy but costly ground-level instrumentation? In recent years, data from lower-cost ground-level sensors and satellite imagery have been used to fill in some of the gaps.

Networks of low-cost sensors can provide fine-resolution air quality data in space and time, however, they tend to be less accurate than reference monitors. Exploring ways to algorithmically increase the accuracy of low-cost sensor measurements is an ongoing area of research. Some scientists have also experimented with using low-cost sensors for mobile monitoring, mostly in urban areas. 


Satellite measurements such as aerosol optical depth (converted into air pollution concentrations using ground-level calibrations) can provide air quality estimates in places without any ground-level monitors, however these measurements tend to have coarser resolution in space and time and can be affected by air quality higher in the atmosphere, which is less relevant for health. Satellite estimates are also increasingly used in air pollution interpolation models, helping to improve exposure estimates in areas without monitoring data. 


In my next blog, I will introduce environmental justice and discuss the issue of air inequality. 




If you’re looking for an interesting visualization that draws on open data, check out AirPollution.io: The 100 Most Polluted Places in India.

Like what you read?

More from Ellen Considine here.

Stay Connected with Ellen Considine