Generic QC Filters¶
This section describes how to configure each of the existing QC filters in UFO. All filters can also use the “where” statement to act only on observations meeting certain conditions. By default, each filter acts on all the variables marked as simulated variables in the ObsSpace. The filter variables
keyword can be used to limit the action of the filter to a subset of these variables or to specific channels, as shown in the examples from the Bounds Check Filter section below.
Bounds Check Filter¶
This filter rejects observations whose values (@ObsValue
in the ioda files) lie outside specified limits:
- filter: Bounds Check
filter variables:
- name: brightness_temperature
channels: 4-6
minvalue: 240.0
maxvalue: 300.0
In the above example the filter checks if brightness temperature for channels 4, 5 and 6 is outside of the [240, 300] range. Suppose we have the following observation data with 3 locations and 4 channels:
channel 3: [100, 250, 450]
channel 4: [250, 260, 270]
channel 5: [200, 250, 270]
channel 6: [340, 200, 250]
In this example, all observations from channel 3 will pass QC because the filter isn’t configured to act on this channel. All observations for channel 4 will pass QC because they are within [minvalue, maxvalue]. 1st observation in channel 5, and first and second observations in channel 6 will be rejected.
- filter: Bounds Check
filter variables:
- name: air_temperature
minvalue: 230
- filter: Bounds Check
filter variables:
- name: eastward_wind
- name: northward_wind
minvalue: -40
maxvalue: 40
In the above example two filters are configured, one testing temperature, and the other testing wind components. The first filter would reject all temperature observations that are below 230. The second, all wind component observations whose magnitude is above 40.
In practice, one would be more likely to want to filter out wind component observations based on the value of the wind speed sqrt(eastward_wind**2 + northward_wind**2)
. This can be done using the test variables
keyword, which rejects observations of a variable if the value of another lies outside specified bounds. The “test variable” does not need to be a simulated variable; in particular, it can be an ObsFunction, i.e. a quantity derived from simulated variables. For example, the following snippet filters out wind component observations if the wind speed is above 40:
- filter: Bounds Check
filter variables:
- name: eastward_wind
- name: northward_wind
test variables:
- name: Velocity@ObsFunction
maxvalue: 40
If there is only one entry in the test variables
list, the same criterion is applied to all filter variables. Otherwise the number of test variables needs to match that of filter variables, and each filter variable is filtered according to the values of the corresponding test variable.
Background Check Filter¶
This filter checks for bias corrected distance between observation value and model simulated value (\(y-H(x)\)) and rejects obs where the absolute difference is larger than absolute threshold
or threshold
* sigma_o when the filter action is set to reject
. This filter can also adjust observation error through a constant inflation factor when the filter action is set to inflate error
. If no action section is included in the yaml, the filter is set to reject the flagged observations.
- filter: Background Check
filter variables:
- name: air_temperature
threshold: 2.0
absolute threshold: 1.0
action:
name: reject
- filter: Background Check
filter variables:
- name: eastward_wind
- name: northward_wind
threshold: 2.0
where:
- variable:
name: latitude@MetaData
minvalue: -60.0
maxvalue: 60.0
action:
name: inflate error
inflation: 2.0
The first filter would flag temperature observations where abs((y+bias)-H(x)) > min ( absolute_threshold, threshold * sigma_o), and then the flagged data are rejected due to filter action is set to reject.
The second filter would flag wind component observations where abs((y+bias)-H(x)) > threshold * sigma_o and latitude of the observation location are within 60 degree. The flagged data will then be inflated with a factor 2.0.
Please see the Filter Actions section for more detail.
There is an option for the background check filter checks for distance between observation value and model simulated value without bias correction (\(y-H(x)\)) when the additional parameter bias correction parameter is set to 1.0 and rejects obs where the absolute difference is larger than absolute threshold
or threshold
* sigma_o when the filter action is set to reject
.If no action section is included in the yaml, the filter is set to reject the flagged observations.
- filter: Background Check
filter variables:
- name: brightness_temperature
channels: 1-24
absolute threshold: 3.5
bias correction parameter: 1.0
action:
name: reject
This filter would flag temperature observations where abs(y-H(x)) > min ( absolute_threshold, threshold * sigma_o), and then the flagged data are rejected due to filter action is set to reject.
Domain Check Filter¶
This filter retains all observations selected by the “where” statement and rejects all others. Below, the filter is configured to retain only observations
* taken at locations where the sea surface temperature retrieved from the model is between 200 and 300 K (inclusive)
* with valid height
metadata (not set to “missing value”)
* taken by stations with IDs 3, 6 or belonging to the range 11-120
* without valid air_pressure
metadata.
- filter: Domain Check
where:
- variable:
name: sea_surface_temperature@GeoVaLs
minvalue: 200
maxvalue: 300
- variable:
name: height@MetaData
is_defined:
- variable:
name: station_id@MetaData
is_in: 3, 6, 11-120
- variable:
name: air_pressure@MetaData
is_not_defined:
BlackList Filter¶
This filter behaves like the exact opposite of Domain Check: it rejects all observations selected by the “where” statement statement. The status of all others remains the same. Below, the filter is configured to reject observations taken by stations with IDs 1, 7 or belonging to the range 100-199:
- filter: BlackList
where:
- variable:
name: station_id@MetaData
is_in: 1, 7, 100-199
RejectList Filter¶
This is an alternative name for the BlackList filter.
AcceptList Filter¶
This filter sets the QC flag to pass for all observations selected by the “where” statement that have previously been rejected for any reason other than missing data, a pre-processing flag indicating rejection, or failure of the ObsOperator. This is mostly useful in QC procedures where all observations are initially rejected and then those fulfilling certain criteria are accepted, overriding the rejection.
Below, the filter is configured to accept only observations taken by stations with IDs 1, 7 or belonging to the range 100-199 (inclusive):
- filter: RejectList # initially reject all observations
- filter: AcceptList # accept back selected observations
where:
- variable:
name: station_id@MetaData
is_in: 1, 7, 100-199
Perform Action Filter¶
This filter performs the action specified in the action
parameter on observations selected by the “where” statement.
Example 1¶
Here the filter is configured to inflate errors of all observations from the Southern hemisphere by a factor of two:
- filter: Perform Action
action:
name: inflate error
inflation: 2.0
where:
- variable: latitude
maxvalue: 0
Note
Technically, the same result could be obtained by replacing Perform Action
in the listing
above by RejectList
. However, having a RejectList
filter that does not actually
reject any observations can be confusing.
Example 2¶
The filter configured in this way behaves like RejectList
:
- filter: Perform Action
action:
name: reject
Example 3¶
The filter configured in this way behaves like AcceptList
:
- filter: Perform Action
action:
name: accept
Thinning Filter¶
This filter rejects a specified fraction of observations, selected at random. It supports the following YAML parameters:
amount
: the fraction of observations to reject (a number between 0 and 1).random seed
(optional): an integer used to initialize a random number generator if it has not been initialized yet. If not set, the seed is derived from the calendar time.
Note: because of how this filter is implemented, the fraction of rejected observations may not be exactly equal to amount
, especially if the total number of observations is small.
Example:
- filter: Thinning
amount: 0.75
random seed: 125
Gaussian Thinning Filter¶
This filter thins observations by preserving only one observation in each cell of a grid. Cell assignment can be based on an arbitrary combination of:
horizontal position
vertical position (in terms of air pressure)
time
category (arbitrary integer associated with each observation).
Selection of the observation to preserve in each cell is based on
its position in the cell
optionally, its priority.
The following YAML parameters are supported:
Horizontal grid:
horizontal_mesh
: Approximate width (in km) of zonal bands into which the Earth’s surface is split. Thinning in the horizontal direction is disabled if this parameter is negative. Default: approx. 111 km (= 1 deg of latitude).use_reduced_horizontal_grid
: True to use a reduced grid, with high-latitude zonal bands split into fewer cells than low-latitude bands to keep cell size nearly uniform. False to use a regular grid, with the same number of cells at all latitudes. Default:true
.round_horizontal_bin_count_to_nearest
: True to set the number of zonal bands so that the band width is as close as possible tohorizontal_mesh
, and the number of cells (“bins”) in each zonal band so that the cell width in the zonal direction is as close as possible to that in the meridional direction. False to set the number of zonal bands so that the band width is as small as possible, but no smaller thanhorizontal_mesh
, and the cell width in the zonal direction is as small as possible, but no smaller than in the meridional direction. Default:false
.
Vertical grid:
vertical_mesh
: Cell size (in Pa) in the vertical direction. Thinning in the vertical direction is disabled if this parameter is not specified or negative.vertical_min
: Lower bound of the pressure interval split into cells of sizevertical_mesh
. Default: 100 Pa.vertical_max
: Upper bound of the pressure interval split into cells of sizevertical_mesh
. This parameter is rounded upwards to the nearest multiple ofvertical_mesh
starting fromvertical_min
. Default: 110,000 Pa.
Temporal grid:
time_mesh
: Cell size in the temporal direction. Temporal thinning is disabled if this this parameter is not specified or set to 0.time_min
: Lower bound of the time interval split into cells of sizetime_mesh
. Temporal thinning is disabled if this parameter is not specified.time_max
: Upper bound of the time interval split into cells of sizetime_mesh
. This parameter is rounded upwards to the nearest multiple oftime_mesh
starting fromtime_min
. Temporal thinning is disabled if this parameter is not specified.
Observation categories:
category_variable
: Variable storing integer-valued IDs associated with observations. Observations belonging to different categories are thinned separately.
Selection of observations to retain:
priority_variable
: Variable storing observation priorities. Among all observations in a cell, only those with the highest priority are considered as candidates for retaining. If not specified, all observations are assumed to have equal priority.distance_norm
: Determines which of the highest-priority observations lying in a cell is retained. Allowed values:geodesic
: retain the observation closest to the cell center in the horizontal direction (air pressure and time are ignored when selecting the observation to retain)maximum
: retain the observation lying furthest from the cell’s bounding box in the system of coordinates in which the cell is a unit cube (all dimensions along which thinning is enabled are taken into account).
Default:
geodesic
.
Example 1 (thinning by the horizontal position only):
- filter: Gaussian Thinning
horizontal_mesh: 1111.949266 #km = 10 deg at equator
Example 2 (thinning observations from multiple categories and with non-equal priorities by their horizontal position, pressure and time):
- filter: Gaussian Thinning
distance_norm: maximum
horizontal_mesh: 5000
vertical_mesh: 10000
time_mesh: PT01H
time_min: 2018-04-14T21:00:00Z
time_max: 2018-04-15T03:00:00Z
category_variable:
name: instrument_id@MetaData
priority_variable:
name: priority@MetaData
Temporal Thinning Filter¶
This filter thins observations so that the retained ones are sufficiently separated in time. It supports the following YAML parameters:
min_spacing
: Minimum spacing between two successive retained observations. Default:PT1H
.seed_time
: If not set, the thinning filter will consider observations as candidates for retaining in chronological order.If set, the filter will start from the observation taken as close as possible to
seed_time
, then consider all successive observations in chronological order, and finally all preceding observations in reverse chronological order.category_variable
: Variable storing integer-valued IDs associated with observations. Observations belonging to different categories are thinned separately. If not specified, all observations are thinned together.priority_variable
: Variable storing integer-valued observation priorities. If not specified, all observations are assumed to have equal priority.tolerance
: Only relevant ifpriority_variable
is set.If set to a nonzero duration, then whenever an observation O lying at least
min_spacing
from the previous retained observation O’ is found, the filter will inspect all observations lying no more thantolerance
further from O’ and retain the one with the highest priority. In case of ties, observations closer to O’ are preferred.
Example 1 (selecting at most one observation taken by each station per 1.5 h, starting from the observation closest to seed time):
- filter: Temporal Thinning
min_spacing: PT01H30M
seed_time: 2018-04-15T00:00:00Z
category_variable:
name: call_sign@MetaData
Example 2 (selecting at most one observation taken by each station per 1 h, starting from the earliest observation, and allowing the filter to retain an observation taken up to 20 min after the first qualifying observation if its quality score is higher):
- filter: Temporal Thinning
min_spacing: PT01H
tolerance: PT20M
category_variable:
name: call_sign@MetaData
priority_variable:
name: score@MetaData
Poisson Disk Thinning Filter¶
This filter thins observations by iterating over them in random order and retaining each observation lying outside the exclusion volumes (ellipsoids or cylinders) surrounding observations that have already been retained.
The following YAML parameters are supported:
Exclusion volume:
min_horizontal_spacing
: Size of the exclusion volume in the horizontal direction (in km).If the priority_variable parameter is set, this parameter may be a map assigning an exclusion volume size to each observation priority, or a floating-point constant. If the priority_variable parameter is not set (and hence all observations have the same priority), this parameter must be a floating-point constant. Exclusion volumes of lower-priority observations must be at least as large as those of higher-priority ones. If this parameter is not set, horizontal position is ignored during thinning.
Note: Owing to a bug in the eckit YAML parser, maps need to be written in the JSON style, with keys quoted. Example:
min_horizontal_spacing: {"1": 123, "2": 321}
This will not work:
min_horizontal_spacing: {1: 123, 2: 321}
and neither will this:
min_horizontal_spacing: 1: 123 2: 321
nor this:
min_horizontal_spacing: "1": 123 "2": 321
min_vertical_spacing
: Size of the exclusion volume in the vertical direction (in Pa).Like
min_horizontal_spacing
, this parameter can be either a constant or a map. If not set, vertical position is ignored during thinning.min_time_spacing
: Size of the exclusion volume in the temporal direction.Like
min_horizontal_spacing
, this parameter can be either a constant or a map. If not set, observation time is ignored during thinning.exclusion_volume_shape
: Shape of the exclusion volume surrounding each observation.Allowed values:
cylinder
: the exclusion volume of an observation taken at latitude lat, longitude lon, pressure p and time t is the set of all locations (lat’, lon’, p’, t’) for which all of the following conditions are met:the geodesic distance between (lat, lon) and (lat’, lon’) is smaller than min_horizontal_spacing
|p - p’| < min_vertical_spacing
|t - t’| < min_time_spacing.
ellipsoid
: the exclusion volume of an observation taken at latitude lat, longitude lon, pressure p and time t is the set of all locations (lat’, lon’, p’, t’) for which the following condition is met:geodesic_distance((lat, lon), (lat’, lon’))^2 / min_horizontal_spacing^2 + (p - p’)^2 / min_vertical_spacing^2 + (t - t’)^2 / min_time_spacing^2 < 1.
Default:
cylinder
.
Observation categories:
category_variable
: Variable storing integer-valued IDs associated with observations. Observations belonging to different categories are thinned separately. If not set, all observations are thinned together.
Selection of observations to retain:
priority_variable
: Variable storing observation priorities. An observation will not be retained if it lies within the exclusion volume of an observation with a higher priority.As noted in the documentation of
min_horizontal_spacing
, the exclusion volume size must be a (weakly) monotonically decreasing function of observation priority, i.e. the exclusion volumes of all observations with the same priority must have the same size, and the exclusion volumes of lower-priority observations must be at least as large as those of higher-priority ones.If this parameter is not set, all observations are assumed to have equal priority.
shuffle
: If true, observations will be randomly shuffled before being inspected as candidates for retaining. Default: true.Note: It is recommended to leave shuffling enabled in production code, since the performance of the spatial point index (kd-tree) used in the filter’s implementation may be degraded if observation locations are ordered largely monotonically (and random shuffling essentially prevents that from happening).
random_seed
: Seed with which to initialize the random number generator used to shuffle the observations ifshuffle
is set to true.If omitted, a seed will be generated based on the current (calendar) time.
Example 1¶
With the following parameters, observations are thinned by horizontal position only. The exclusion volume size depends on the observation priority. Each scan is thinned separately.
- filter: Poisson Disk Thinning
min_horizontal_spacing: {"0": 600, "1": 200} # priority -> km
category_variable:
name: scan_index@MetaData
priority_variable:
name: priority@MetaData
random_seed: 12345
Example 2¶
With the following parameters, observations are thinned by the horizontal position, vertical position and time. The exclusion volumes are ellipsoidal. Shuffling is disabled.
- filter: Poisson Disk Thinning
min_horizontal_spacing: 1000 # km
min_vertical_spacing: 10000 # Pa
min_time_spacing: PT1H
exclusion_volume_shape: ellipsoid
shuffle: false
Difference Check Filter¶
This filter will compare the difference between a reference variable and a second variable and assign a QC flag if the difference is outside of a prescribed range.
For example:
- filter: Difference Check
reference: brightness_temperature_8@ObsValue
value: brightness_temperature_9@ObsValue
minvalue: 0
The above YAML is checking the difference between brightness_temperature_9@ObsValue
and brightness_temperature_8@ObsValue
and rejecting negative values.
In psuedo-code form:
if (brightness_temperature_9@ObsValue - brightness_temperature_8@ObsValue < minvalue) reject_obs()
- The options for YAML include:
minvalue
: the minimum value the differencevalue - reference
can be. Set this to 0, for example, and all negative differences will be rejected.maxvalue
: the maximum value the differencevalue - reference
can be. Set this to 0, for example, and all positive differences will be rejected.threshold
: the absolute value the differencevalue - reference
can be (sign independent). Set this to 10, for example, and all differences outside of the range from -10 to 10 will be rejected.
Note that threshold
supersedes minvalue
and maxvalue
in the filter.
Derivative Check Filter¶
This filter will compute a local derivative over each observation record and assign a QC flag if the derivative is outside of a prescribed range.
- By default, this filter will compute the local derivative at each point in a record.
For the first location (1) in a record:
dy/dx = (y(2)-y(1))/(x(2)-x(1))
For the last location (n) in a record:
dy/dx = (y(n)-y(n-1))/(x(n)-x(n-1))
For all other locations (i):
dy/dx = (y(i+1)-y(i-1))/(x(i+1)-x(i-1))
Alternatively if one wishes to use a specific range/slope for the entire observation record, i1
and i2
can be defined in the YAML.
For this case, For all locations in the record:
dy/dx = (y(i2)-y(i1))/(x(i2)-x(i1))
Note that this filter really only works/makes sense for observations that have been sorted by the independent variable and grouped by some other field.
An example:
- filter: Derivative Check
independent: datetime
dependent: air_pressure
minvalue: -50
maxvalue: 0
passedBenchmark: 238 # number of passed obs
The above YAML is checking the derivative of air_pressure
with respect to datetime
for a radiosonde profile and rejecting observations where the derivative is positive or less than -50 Pa/sec.
- The options for YAML include:
independent
: the name of the independent variable (dx
)dependent
: the name of the dependent variable (dy
)minvalue
: the minimum value the derivative can be without the observations being rejectedmaxvalue
: the maximum value the derivative can be without the observations being rejectedi1
: the index of the first observation location in the record to usei2
: the index of the last observation location in the record to use
- A special case exists for when the independent variable is ‘distance’, meaning the dx is computed from the difference of latitude/longitude pairs converted to distance.
Additionally, when the independent variable is ‘datetime’ and the dependent variable is set to ‘distance’, the derivative filter becomes a speed filter, removing moving observations when the horizontal speed is outside of some range.
Track Check Filter¶
This filter checks tracks of mobile weather stations, rejecting observations inconsistent with the rest of the track.
Each track is checked separately. The algorithm performs a series of sweeps over the observations from each track. For each observation, multiple estimates of the instantaneous speed and (optionally) ascent/descent rate are obtained by comparing the reported position with the positions reported during a number a nearby (earlier and later) observations that haven’t been rejected in previous sweeps. An observation is rejected if a certain fraction of these estimates lie outside the valid range. Sweeps continue until one of them fails to reject any observations, i.e. the set of retained observations is self-consistent.
Note that this filter was originally written with aircraft observations in mind. However, it can potentially be useful also for other observation types.
The following YAML parameters are supported:
temporal_resolution
: Assumed temporal resolution of the observations, i.e. absolute accuracy of the reported observation times. Default: PT1M.spatial_resolution
: Assumed spatial resolution of the observations (in km), i.e. absolute accuracy of the reported positions.Instantaneous speeds are estimated conservatively with the formula
speed_estimate = (reported_distance - spatial_resolution) / (reported_time + temporal_resolution).
The default spatial resolution is 1 km.
num_distinct_buddies_per_direction
,distinct_buddy_resolution_multiplier
: Control the size of the set of observations against which each observation is compared.Let O_i (i = 1, …, N) be the observations from a particular track ordered chronologically. Each observation O_i is compared against m observations immediately preceding it and n observations immediately following it. The number m is chosen so that {O_{i-m}, …, O_{i-1}} is the shortest sequence of observations preceding O_i that contains
num_distinct_buddies_per_direction
observations distinct from O_i that have not yet been rejected. Two observations taken at times t and t’ and locations x and x’ are deemed to be distinct if the following conditions are met:|t’ - t| >
distinct_buddy_resolution_multiplier
*temporal_resolution
|x’ - x| >
distinct_buddy_resolution_multiplier
*spatial_resolution
Similarly, the number n is chosen so that {O_{i+1}, …, O_{i+n)} is the shortest sequence of observations following O_i that contains
num_distinct_buddies_per_direction
observations distinct from O_i that have not yet been rejected.Both parameters default to 3.
max_climb_rate
: Maximum allowed rate of ascent and descent (in Pa/s). If not specified, climb rate checks are disabled.max_speed_interpolation_points
: Encoding of the function mapping air pressure (in Pa) to the maximum speed (in m/s) considered to be realistic.The function is taken to be a linear interpolation of a series of (pressure, speed) points. The pressures and speeds at these points should be specified as keys and values of a JSON-style map. Owing to a bug in the eckit YAML parser, the keys must be enclosed in quotes. For example,
max_speed_interpolation_points: { "0": 900, "100000": 100 }
encodes a linear function equal to 900 m/s at 0 Pa and 100 m/s at 100000 Pa.
rejection_threshold
: Maximum fraction of climb rate or speed estimates obtained by comparison with other observations that are allowed to fall outside the allowed ranges before an observation is rejected. Default: 0.5.station_id_variable
: Variable storing string- or integer-valued station IDs. Observations taken by each station are checked separately.If not set and observations were grouped into records when the observation space was constructed, each record is assumed to consist of observations taken by a separate station. If not set and observations were not grouped into records, all observations are assumed to have been taken by a single station.
Note: the variable used to group observations into records can be set with the
ObsSpace.ObsDataIn.obsgrouping.group_variable
YAML option.
Example:
- filter: Track Check
temporal_resolution: PT30S
spatial_resolution: 20 # km
num_distinct_buddies_per_direction: 3
distinct_buddy_resolution_multiplier: 3
max_climb_rate: 200 # Pa/s
max_speed_interpolation_points: {"0": 1000, "20000": 400, "110000": 200} # Pa: m/s
rejection_threshold: 0.5
station_id_variable: station_id@MetaData
Met Office Buddy Check Filter¶
This filter cross-checks observations taken at nearby locations against each other, updating their gross error probabilities (PGEs) and rejecting observations whose PGE exceeds a threshold specified in the filter parameters. For example, if an observation has a very different value than several other observations taken at nearby locations and times, it is likely to be grossly in error, so its PGE is increased. PGEs obtained in this way can be taken into account during variational data assimilation to reduce the weight attached to unreliable observations without necessarily rejecting them outright.
The YAML parameters supported by this filter are listed below.
General parameters:
filter variables
(a standard parameter supported by all filters): List of the variables to be checked. Currently only surface (single-level) variables are supported. Variables can be either scalar or vector (with two Cartesian components, such as the eastward and northward wind components). In the latter case the two components need to be specified one after the other in thefilter variables
list, with the first component having thefirst_component_of_two
option set to true. Example:filter variables: - name: air_temperature - name: eastward_wind options: first_component_of_two: true - name: northward_wind
rejection_threshold
: Observations will be rejected if the gross error probability lies at or above this threshold. Default: 0.5.traced_boxes
: A list of quadrangles bounded by two meridians and two parallels. Tracing information (potentially useful for debugging) will be output for observations lying within any of these quadrangles. Example:traced_boxes: - min_latitude: 30 max_latitude: 45 min_longitude: -180 max_longitude: -150 - min_latitude: -45 max_latitude: -30 min_longitude: -180 max_longitude: -150
Default: empty list.
Buddy pair identification:
search_radius
: Maximum distance between two observations that may be classified as buddies, in km. Default: 100 km.station_id_variable
: Variable storing string- or integer-valued station IDs.If not set and observations were grouped into records when the observation space was constructed, each record is assumed to consist of observations taken by a separate station. If not set and observations were not grouped into records, all observations are assumed to have been taken by a single station.
Note: the variable used to group observations into records can be set with the
obs space.obsdatain.obsgrouping.group_variable
YAML option. An example of its use can be found in the Profile consistency checks section above.num_zonal_bands
: Number of zonal bands to split the Earth’s surface into when building a search data structure.Note: Apart from the impact on the speed of buddy identification, both this parameter and
sort_by_pressure
affect the order in which observations are processed and thus the final estimates of gross error probabilities, since the probability updates made when checking individual observation pairs are not commutative.Default: 24.
sort_by_pressure
: Whether to include pressure in the sorting criteria used when building a search data structure, in addition to longitude, latitude and time. See the note next tonum_zonal_bands
. Default: false.max_total_num_buddies
: Maximum total number of buddies of any observation.Note: In the context of this parameter,
max_num_buddies_from_single_band
andmax_num_buddies_with_same_station_id
, the number of buddies of any observation O is understood as the number of buddy pairs (O, O’) where O’ != O. This definition facilitates the buddy check implementation (and makes it compatible with the original version from the OPS system), but is an underestimate of the true number of buddies, since it doesn’t take into account pairs of the form (O’, O).Default: 15.
max_num_buddies_from_single_band
: Maximum number of buddies of any observation belonging to a single zonal band. See the note next tomax_total_num_buddies
. Default: 10.max_num_buddies_with_same_station_id
: Maximum number of buddies of any observation sharing that observation’s station ID. See the note next tomax_total_num_buddies
. Default: 5.use_legacy_buddy_collector
: Set to true to identify pairs of buddy observations using an algorithm reproducing exactly the algorithm used in Met Office’s OPS system, but potentially skipping some valid buddy pairs. Default: false.
Control of gross error probability updates:
horizontal_correlation_scale
: Encoding of the function that maps the latitude (in degrees) to the horizontal correlation scale (in km).The function is taken to be a piecewise linear interpolation of a series of (latitude, scale) points. The latitudes and scales at these points should be specified as keys and values of a JSON-style map. Owing to a limitation in the eckit YAML parser (https://github.com/ecmwf/eckit/pull/21), the keys must be enclosed in quotes. For example,
horizontal_correlation_scale: { "-90": 200, "90": 100 }
encodes a function varying linearly from 200 km at the south pole to 100 km at the north pole.
Default:
{ "-90": 100, "90": 100 }
, i.e. a constant function equal to 100 km everywhere.temporal_correlation_scale
: Temporal correlation scale. Default: PT6H.damping_factor_1
Parameter used to “damp” gross error probability updates using method 1 described in section 3.8 of the OPS Scientific Documentation Paper 2 to make the buddy check better-behaved in data-dense areas. See the reference above for the full description. Default: 1.0.damping_factor_2
Parameter used to “damp” gross error probability updates using method 2 described in section 3.8 of the OPS Scientific Documentation Paper 2 to make the buddy check better-behaved in data-dense areas. See the reference above for the full description. Default: 1.0.
Example:
- filter: Met Office Buddy Check:
filter variables:
- name: eastward_wind
options:
first_component_of_two: true
- name: northward_wind
- name: air_temperature
rejection_threshold: 0.5
traced_boxes: # trace all observations
- min_latitude: -90
max_latitude: 90
min_longitude: -180
max_longitude: 180
search_radius: 100 # km
station_id_variable:
name: station_id@MetaData
num_zonal_bands: 24
sort_by_pressure: false
max_total_num_buddies: 15
max_num_buddies_from_single_band: 10
max_num_buddies_with_same_station_id: 5
use_legacy_buddy_collector: false
horizontal_correlation_scale: { "-90": 100, "90": 100 }
temporal_correlation_scale: PT6H
damping_factor_1: 1.0
damping_factor_2: 1.0
Implementation Notes¶
The implementation of this filter consists of four steps: sorting, buddy pair identification, PGE update and observation flagging. Observations are grouped into zonal bands and sorted by (a) band index, (b) longitude, (c) latitude, in descending order, (d) pressure (if the sort_by_pressure
option is on), and (e) datetime. Observations are then iterated over, and for each observation a number of nearby observations (lying no further than search_radius
) are identified as its buddies. The size and “diversity” of the list of buddy pairs can be controlled with the max_total_num_buddies
, max_num_buddies_from_single_band
and max_num_buddies_with_same_station_id
options. Subsequently, the PGEs of the observations forming each buddy pair are updated. Typically, the PGEs are decreased if the signs of the innovations agree and increased if they disagree. The magnitude of this change depends on the background error correlation between the two observation locations, the error estimates of the observations and background values, and the prior PGEs of the observations: the PGE change is the larger, the stronger the correlation between the background errors and the narrower the error margins. Once all buddy pairs have been processed, observations whose PGEs exceed the specified rejection_threshold
are flagged.
Variable Assignment Filter¶
This “filter” (it is not a true filter; rather, a “processing step”) assigns specified values to
specified variables at locations selected by the where
statement, or at all locations if
the where
keyword is not present.
The values can be constants or vectors generated by ObsFunctions. If the variables don’t exist
yet, they are created; in this case locations not selected by the where
statement are
initialized with missing-value markers.
The values assigned to individual variables are specified in the assignments
list in the
YAML file. Each element of this list can contain the following options:
name
: Name of the variable to which new values should be assigned.channels
: (Optional) Set of channels to which new values should be assigned.value
: Value to be assigned to the specified variable. Either this option orfunction
(but not both) must be present.function
: Variable (typically an ObsFunction) that should be evaluated and assigned to the specified variable. Either this option orvalue
(but not both) must be present.type
: Type (int
,float
,string
ordatetime
) of the variable to which new values should be assigned. This option only needs to be provided if the variable doesn’t exist yet. If this option is provided and the variable already exists, its type must match the value of this option, otherwise an exception is thrown.
Example 1¶
Create new variables air_temperature@GrossErrorProbability
and
relative_humidity@GrossErrorProbability
and set them to 0.1 at all locations.
- filter: Variable Assignment
assignments:
- name: air_temperature@GrossErrorProbability
type: float # type must be specified if the variable doesn't already exist
value: 0.1
- name: relative_humidity@GrossErrorProbability
type: float
value: 0.1
Example 2¶
Set air_temperature@GrossErrorProbability
to 0.05 at all locations in the tropics.
- filter: Variable Assignment
where:
- variable:
name: latitude@MetaData
minvalue: -30
maxvalue: 30
assignments:
- name: air_temperature@GrossErrorProbability
value: 0.05
Example 3¶
Set relative_humidity@GrossErrorProbability
to values computed by an ObsFunction
(0.1 in the southern extratropics and 0.05 in the northern extratropics, with a linear
transition in between).
- filter: Variable Assignment
assignments:
- name: relative_humidity@GrossErrorProbability
function:
name: ObsErrorModelRamp@ObsFunction
options:
xvar:
name: latitude@MetaData
x0: [-30]
x1: [30]
err0: [0.1]
err1: [0.05]
RTTOV 1D-Var Check (RTTOVOneDVar) Filter¶
This filter performs a 1-dimensional variational assimilation (1D-Var) that produces optimal retrievals of physical parameters that describe the atmosphere and surface and on which there is information in the measurement. It takes as input a set of observations (brightness temperatures) and model background fields which are used to initialise the retrieval profile. A retrieval (or analysis) is performed using an iterative procedure that attempts to find the minimum of a cost function that represents the most likely profile vector given the error characteristics of the two data sources.
The elements contained in the retrieval profile depend on the sensitivity of the measuring instruments to atmospheric and surface properties and also what can be modelled with a relatively high degree of accuracy. Most retrieval profiles will consist of atmospheric temperature and humidity, and surface skin temperature, with other possible constituents being liquid and ice water or some other cloud parameter measure, and emissivity parameters.
The filter provides some retrieval parameters to the assimilation which may be missing in the background or insufficiently accurate, such as surface skin temperature, and to filter out observations for which a retrieval could not be performed and thus may be difficult to assimilate in the full variational assimilation.
The filter is a port of the Met Office OPS 1D-Var and makes use of the Fortran RTTOV interface within JEDI. The code is written predominantly in Fortran. Files containing the observation error covariance (R) and the background error covariance (B) are expected as inputs.
This filter requires the following YAML parameters:
BMatrix
: path to the b-matrix file.RMatrix
: path to the r-matrix file.nlevels
: the number of levels used in the retrieval profile.retrieval variables
: list of retrieval variables (e.g. temperature etc) which form the 1D-Var retrieval vector (x). This needs to match the b-matrix file.ModOptions
: options needed for the observation operator (RTTOV only at the moment).filter variables
: list of variables (brightness_temperature) and channels which form the 1D-Var observation vector (y).
The following are optional YAML parameters with appropriate defaults:
ModName
: forward model name (only RTTOV at the moment). Default:RTTOV
.qtotal
: flag for total humidity (qt = q + qclw + qi). If this is true the b-matrix must include qt or the code will abort. If this is false then the b-matrix must not contain qt or the code will abort. Default:false
.UseMLMinimization
: flag to turn on Marquardt-Levenberg minimizer otherwise a Newton minimizer is used Default:false
.UseJforConvergence
: flag to use J for the measure of convergence. Default is comparison of the profile absolute differences to background error multiplied byConvergenceFactor
. Default:false
.UseRHwaterForQC
: flag to use liquid water in the q saturation calculations. Default:true
.FullDiagnostics
: flag to turn on full diagnostics. Default:false
.Max1DVarIterations
: maximum number of iterations. Default:7
.JConvergenceOption
: integer to select convergence option. 1 equals percentage change in cost tested between iterations. Otherwise the absolute change in cost is tested between iterations. Default:1
.IterNumForLWPCheck
: choose which iteration to start checking the liquid water path. Default:2
.MaxMLIterations
: the maximum number of iterations for the internal Marquardt-Levenberg loop. Default:7
.ConvergenceFactor
: cost factor used when the absolute difference in the profile is used to determine convergence. Default:0.4
.CostConvergenceFactor
: the cost threshold used for convergence check when cost function value is used for convergence. Default:0.01
.EmissLandDefault
: the default emissivity value to use over land. Default:0.95
.EmissSeaIceDefault
: the default emissivity value to use over seaice. Default:0.92
.
Example:
- filter: RTTOV OneDVar Check
BMatrix: ../resources/bmatrix/rttov/atms_bmatrix_70_test.dat
RMatrix: ../resources/rmatrix/rttov/atms_noaa_20_rmatrix_test.nc4
nlevels: 70
retrieval variables:
- air_temperature
- specific_humidity
- mass_content_of_cloud_liquid_water_in_atmosphere_layer
- mass_content_of_cloud_ice_in_atmosphere_layer
- surface_temperature
- specific_humidity_at_two_meters_above_surface
- skin_temperature
- air_pressure_at_two_meters_above_surface
ModOptions:
Absorbers: [Water_vapour, CLW, CIW]
obs options:
RTTOV_default_opts: OPS
SatRad_compatibility: false # done in filter
Sensor_ID: noaa_20_atms
CoefficientPath: Data/
filter variables:
- name: brightness_temperature
channels: 1-22
qtotal: true
ModelOb Threshold Filter¶
This filter applies a threshold to a model profile interpolated to the observation height.
The specified model profile variable is linearly (vertical) interpolated to the observation height using the specified model vertical coordinate variable. This is referred to as the “ModelOb”. Note that the ModelOb is not necessarily one of the HofX variables.
The observation height must be in the same coordinate system as that specified for the model vertical coordinate, e.g. both pressure.
The ModelOb is compared against a set of height-dependent thresholds. We supply a vector of threshold values, and a vector of vertical coordinate values corresponding to those thresholds. The coordinate values must be in the same vertical coordinate as the observation, e.g. pressure. The threshold values are then linearly interpolated to the observation height.
The observation is flagged for rejection if the ModelOb lies outside the threshold value according to threshold type - min or max. E.g. if the threshold type is min, then the observation is flagged if ModelOb is less than the interpolated threshold value.
This filter requires the following YAML parameters:
model profile
: name of the model profile variable (GeoVaLs).model vertical coordinate
: name of the model vertical coordinate variable (GeoVal).observation height
: name of the observation height variable to interpolate to.thresholds
: vector of threshold values.coordinate values
: vector of vertical coordinate values corresponding tothresholds
.threshold type
:min
. ormax
.
Example
- filter: ModelOb Threshold
model profile:
name: relative_humidity@GeoVaLs
model vertical coordinate:
name: air_pressure@GeoVaLs
observation height:
name: air_pressure_levels@MetaData
thresholds: [50,50,40,30]
coordinate values: [100000,80000,50000,20000]
threshold type: min