Air Monitoring Station HAP Data Rules and Calculations

Introduction

The ECATT Air Monitoring Stations Search captures data from a network of U.S. ambient air monitoring stations for Hazardous Air Pollutants (HAPs). An air monitoring station is a physical monitor permanently placed at a location to collect air quality data. Data owners (typically states, local, and federal agencies) are responsible for maintaining the air monitoring stations and uploading the data to EPA’s official database for ambient monitoring data known as the Air Quality System (AQS). EPA encourages state, local, and tribal air agencies to upload all monitoring data, but only programs receiving EPA funding are required to submit data to AQS. These programs include EPA’s National Air Toxics Trends System (NATTS), the Urban Air Toxics Monitoring Program (UTAMP), and community-scale air toxics monitoring grant sites.

EPA compiles data for HAPs from AQS as well as other data sources, many of which are specific to localized areas, into the Ambient Monitoring Archive (AMA) for HAPs. The AMA for HAPs standardizes the concentration data from these sources and performs quality assurance checks to validate or invalidate measurements. The AMA for HAPs serves as the source data for the ECATT Air Monitoring Station Search Results and Report.

Flowchart visually explaining how data moves from ambient monitoring stations to ECATT

ECATT Data Processing

The following describes how the raw AMA data are transformed for display in ECATT results and reports.

Raw Data Collection and Validation

Data Validation: EPA performs completeness checks on the raw AMA data to validate daily samples. A daily sample must be at least 75% complete from sub-daily measurements to be considered valid. This means that the amount of data collected must be at least 75% of the expected amount of data based on sampling frequency (e.g., 18 hours out of 24 hours).

Calculations for Daily Measurements

Data Selection: Measurements are selected for inclusion in the AMS calculations based on the following criteria (in order of priority). These priorities are used to select one concentration per parameter, per site, per day. A unique record or measurement in the AMA for HAPs data is defined by:
- Site Code: A unique 9-digit number to identify the monitoring site
- Parameter Code: Identifies the pollutant and type of measurement taken by the monitor
- Date and Time: Reports when the sample was taken
- Parameter Occurrence Code (POC): Distinguishes measurements of the same parameter at the same site using multiple instruments. POCs can be specific to a singular Program or apply to multiple Programs.
- Sample Validity: The measurement must be a valid daily sample (75% complete)
- Priority Trends: As part of the AMA for HAPs database, EPA evaluates all the HAPs taken at each monitoring site for each year, and prioritizes (or ranks) the data based on:
  - Program (i.e., network)
  - Annual completeness
  - Sample durations
  - Classification of whether the monitor is Primary vs. Secondary within a Program
Handling Non-Detects: ECATT offers three methodologies for handling measurements that reported or identified as "ND" (non-detects). One concentration set uses a value of zero for non-detects. The second set uses a value equal to one half of the method detection limit for non-detects.
The third method, Regression on Order Statistics (ROS), applies a simple linear regression model using ordered detected values and log normal distribution to estimate the concentration of the censored values. In this case, "censored values" refers to concentrations flagged as non-detects. "Uncensored values" are the concentrations measured above the detection limit. The basic steps for the ECATT ROS methodology are as follows:
1. Order raw concentration data (detects and non-detects) by descending concentration and assign a rank to each concentration.
2. Separate data into censored and uncensored datasets. Data flagged as non-detects are in the censored group.
3. Perform a log transformation (LN(Conc)) by calculating the natural log of the uncensored (detected) concentrations.
4. Perform a linear regression on the uncensored dataset to identify the slope (m) and intercept (b) of the linear equation y = mx+b, where y is the natural log of the uncensored concentrations calculated in Step 3 and x is the rank (sort order) of the uncensored concentrations determined in Step 1.
5. Use the linear equation, y = mx+b, to calculate the y values for censored data, where x is the rank (sort order) of the censored concentrations determined in Step 1; m is the slope of the linear equation determined in Step 4; and b is the y-intercept of the linear equation determined in Step 4.
6. Calculate the concentrations of the censored values as Concentration (µg/m3) = exp(y).
Comparisons to AirToxScreen: AirToxScreen is EPA's ongoing comprehensive evaluation of air toxics in the United States. AirToxScreen is a tool that uses emissions and meteorological data to predict and model annual air pollutant concentrations and estimate cancer and chronic non-cancer risks at census tract resolution. The most recent AirToxScreen is for the year 2020. ECATT links ambient monitored daily concentrations to the 2020 AirToxScreen annual census tract concentrations using the census tract for the monitoring location. ECATT then compares AMA data to AirToxScreen to identify discrepancies between measured ambient air concentrations and ambient air concentrations that are modeled based on reported emissions. ECATT flags daily concentrations that are >2 times, 10 times, 100 times, 1000 times, or 10,000 times the modeled concentrations.

Annual Calculations

Annual Statistics: The following annual statistics are calculated:
Where X is the variable (concentration measurement), µ is the average concentration, and N is the number of measurements.

Average Concentration: sum of all valid daily measurements divided by the number of measurements
Count of Daily Measurements: The number of valid daily measurements in a year
Count of Non-Detects: The number of measurements reported as "not detected," which means that the actual pollutant concentration is not detected by the laboratory method. It may be zero or some value between zero and what is detected by the instrument.
Median Concentration: The middle measurement in the yearly set of data, such that half the measurements are above and half the measurements are below this value
Concentration Variance: The variance (σ²) describes how much the set data varies from the mean value and is calculated as follows:

The annual average concentrations are then used to approximate the Cancer Risk and Hazard Quotient. EPA uses the health benchmarks, primarily consistent with those in AirToxScreen, to determine the nature and magnitude of the health risks of various pollutants to humans. Other data sources may include the ATSDR Chronic Minimal Risk Levels and HEM3 health benchmarks.

Cancer Risk: The probability of contracting cancer over the course of a lifetime, assuming continuous exposure (assumed to be 70 years).; Each cancer-causing pollutant is assigned a Unit Risk Exposure (URE) and the cancer risk is estimated at the annual measurement level using the following equation:; Cancer Risk (“n” in a million) = Annual Average Sample (µg/m3) * URE * 1,000,000; The URE is the upper-bound excess lifetime cancer risk estimated to result from continuous exposure to an agent at a concentration of 1 µg/m3 in air. The interpretation of the URE would be as follows: if the URE = 1.5 x 10-6 µg/m3, 1.5 excess tumors are expected to develop per 1,000,000 people if exposed daily for a lifetime to 1 µg of the chemical in 1 cubic meter of air. UREs are considered upper bound estimates, meaning they represent a plausible upper limit to the true value. (Note that this is usually not a true statistical confidence limit.) The true risk is likely to be less but could be greater.; The Site Cancer Risk is the sum of the annual average pollutant cancer risks for a site.; AMS Report: The value displayed in the AMS Results table and the map popup reflects the year that the user selected for their search. If the user selected “Last 5 Years”, then the highest annual average cancer for the site is displayed.
Hazard Quotient: The ratio of the potential exposure to the pollutant and the level at which no adverse effects are expected. A hazard quotient (HQ) less than or equal to one indicates that adverse noncancer effects are not likely to occur, and thus can be considered to have negligible hazard. HQs greater than one are not statistical probabilities of harm occurring. Instead, they are a simple statement of whether (and by how much) an exposure concentration exceeds the reference concentration (RfC). An HQ of 100 does not mean that the hazard is 10 times greater than an HQ of 10. Also, an HQ of 10 for one substance may not have the same meaning (in terms of hazard) as another substance resulting in the same HQ.; The Hazard Quotient is calculated for a specific pollutant using the following equation:; The Site Hazard Index is the sum of the hazard quotient risks for all pollutants within a Target System (e.g., pollutants affecting the same Target System can only be summed together) for a site.

Best Rolling 12 Months of Data - Annual Completeness Criteria

Cancer and hazard risks are calculated based on the assumption of continued, long-term exposure. Therefore, it may be misleading to present cancer or hazard risks if a monitor only has readings for part of the year. EPA has established the following criteria to select the Best Rolling 12 Months of Data to represent annual concentrations:

The annual dataset must comprise at least 3 valid quarters of data; and
Valid quarters must have at least 70% of expected measurements for a pollutant (so if measurements are daily, then ~63 out of 90 measurements in a quarter.)

When selecting the best 12-month period, ECATT further applies the following priorities:
Priority 1: Maximize the use of data from quarters in the calendar year before rolling to quarters in the previous years
Priority 2: Prioritize the use of more current data over older data

The Best Rolling 12 Months of Data methodology is a default setting in the ECATT Air Monitoring Stations Search. Users also have the option to generate results for all calendar year measurements without applying the completeness criteria by selecting the "Calendar Year" option in the search form.

Multi-Year Statistics

A Spearman Correlation is run across all years for a single site, parameter, and POC. It is used to indicate whether there is an overall increasing or decreasing trend in average concentration.

Creating the Pollution Rose

ECATT uses station latitude and longitude coordinates to link air monitoring stations to the nearest weather station. With these linkages, ECATT can combine daily pollutant measurements with the corresponding data on wind direction and speed for that day. This provides a way for ECATT users to determine the direction of the pollution source relative to the air monitor where the concentration was detected. For each pollutant monitored, ECATT creates a Pollution Rose, which is a radial chart displaying pollutant concentration magnitude as a function of wind direction. Each point represents one reading from the monitoring station for the selected pollutant. Its direction in relation to the center of the rose represents the cardinal direction of the wind on the day of the reading, and its distance from the center represents the magnitude of the reading. Below is an example pollution rose.

ECATT calculates the X and Y coordinates for the plot as follows:

Equation to calculate x and y coordinates for pollution rose plot

Where Wind Direction is in degrees, and Concentration is the daily pollutant concentration in µg/m3.