Generic QC Filters¶
This section describes how to configure each of the existing QC filters in UFO. All filters can also use the “where” statement to act only on observations meeting certain conditions. By default, each pre and prior filter acts on all the variables marked as observed variables in the ObsSpace, where as post filters act on all simulated variables in the ObsSpace. The filter variables
keyword can be used to limit the action of the filter to a subset of these variables or to specific channels, as shown in the examples from the Bounds Check Filter section below.
Bounds Check Filter¶
This filter rejects observations whose values (ObsValue/
in the ioda files) lie outside specified limits:
- filter: Bounds Check
filter variables:
- name: brightnessTemperature
channels: 4-6
minvalue: 240.0
maxvalue: 300.0
In the above example the filter checks if brightness temperature for channels 4, 5 and 6 is outside of the [240, 300] range. Suppose we have the following observation data with 3 locations and 4 channels:
channel 3: [100, 250, 450]
channel 4: [250, 260, 270]
channel 5: [200, 250, 270]
channel 6: [340, 200, 250]
In this example, all observations from channel 3 will pass QC because the filter isn’t configured to act on this channel. All observations for channel 4 will pass QC because they are within [minvalue, maxvalue]. 1st observation in channel 5, and first and second observations in channel 6 will be rejected.
- filter: Bounds Check
filter variables:
- name: airTemperature
minvalue: 230
- filter: Bounds Check
filter variables:
- name: windEastward
- name: windNorthward
minvalue: -40
maxvalue: 40
In the above example two filters are configured, one testing temperature, and the other testing wind components. The first filter would reject all temperature observations that are below 230. The second, all wind component observations whose magnitude is above 40.
In practice, one would be more likely to want to filter out wind component observations based on the value of the wind speed sqrt(windEastward**2 + windNorthward**2)
. This can be done using the test variables
keyword, which rejects observations of a variable if the value of another lies outside specified bounds. The “test variable” does not need to be a simulated or observed variable; in particular, it can be an ObsFunction, i.e. a quantity derived from simulated variables. For example, the following snippet filters out wind component observations if the wind speed is above 40:
- filter: Bounds Check
filter variables:
- name: windEastward
- name: windNorthward
test variables:
- name: ObsFunction/Velocity
maxvalue: 40
If there is only one entry in the test variables
list, the same criterion is applied to all filter variables. Otherwise the number of test variables needs to match that of filter variables, and each filter variable is filtered according to the values of the corresponding test variable.
Background Check Filter¶
This filter checks for bias corrected distance between observation value and model simulated value (\(y-H(x)\)) and rejects obs where the absolute difference is larger than absolute threshold
, or threshold
* \({\sigma}_o\), or threshold
* \({\sigma}_b\), where \({\sigma}_o\) is observation error and \({\sigma}_b\) is background error. This filter can also adjust observation error through a constant inflation factor when the filter action is set to inflate error
. If no action section is included in the yaml, the filter is set to reject the flagged observations.
- filter: Background Check
filter variables:
- name: airTemperature
threshold: 2.0
absolute threshold: 1.0
action:
name: reject
- filter: Background Check
filter variables:
- name: windEastward
- name: windNorthward
threshold: 2.0
where:
- variable:
name: MetaData/latitude
minvalue: -60.0
maxvalue: 60.0
action:
name: inflate error
inflation: 2.0
- filter: Background Check
filter variables:
- name: sea_surface_height
threshold wrt background error: true
threshold: 2.0
The first filter would flag temperature observations where \(|y-(H(x)+bias)| > \min (\) absolute_threshold
, threshold
* \({\sigma}_o)\), and
then the flagged data are rejected due to the filter action being set to reject
.
The second filter would flag wind component observations where \(|y-(H(x)+bias)| >\) threshold
* \({\sigma}_o\) and latitude of the observation location are within 60 degree. The flagged data will then be inflated with a factor 2.0.
Please see the Filter Actions section for more detail.
The third filter compares the departure against the background error rather than the observation error. It would flag sea surface height observations where \(|y-(H(x)+bias)| >\) threshold
* \({\sigma}_b\), and reject the flagged observations as no filter action is specified. If threshold wrt background error
is set to true
, then threshold
must be set and absolute threshold
must not.
There is an option for the background check filter to check for distance between observation value and model simulated value without bias correction (\(y-H(x)\)) when the additional parameter bias correction parameter
is set to 1.0 and rejects obs where the absolute difference is larger than absolute threshold
or threshold
* \({\sigma}_o\) when the filter action is set to reject
. If no action section is included in the yaml, the filter is set to reject the flagged observations.
- filter: Background Check
filter variables:
- name: brightnessTemperature
channels: 1-24
absolute threshold: 3.5
bias correction parameter: 1.0
action:
name: reject
This filter would flag temperature observations where \(|y-H(x)| > \min (\) absolute_threshold
, threshold
* \({\sigma}_o)\), and then the flagged data are rejected due to filter action is set to reject.
Bayesian Background Check Filter¶
Similar to the standard Background Check filter, which rejects observations based on the difference between observation value and model simulated value (\(y-H(x)\)), the Bayesian Background Check also takes into account the probability that an observation is “bad”, i.e. “in gross error”. It is expected that the initial Probability of Gross Error (PGE) is set before calling the Bayesian Background Check filter (e.g. using a Variable Assignment filter). In the Bayesian Background Check filter, this initial PGE
value determines the weight given to the uniform (“bad”) probability distribution - while (1-PGE)
is the weight given to the “good” distribution (a Gaussian in \([y-H(x)]\), with variance \({\sigma}^2\) given by the sum of background uncertainty and observation uncertainty variances). The initial PGE
divided by the combined probability distribution, gives the conditional probability that the observation is in gross error. This conditional probability value is the after-check PGE, PGEBd
. It is saved in the ObsSpace for optional later use in the buddy check, and observations are also rejected if it exceeds a given threshold. There is also the option of the Bayesian Background Check filter performing a “squared difference” check, to reject observations if \([y-H(x)]^2/{\sigma}^2\) exceeds a threshold.
The .yaml file requires the following filter parameter:
prob density bad obs
(PdBad
): the value of the prior uniform probability distribution for the observation to be bad (e.g. 0.1/K for a domain 273-283 K for some temperature observation).
The .yaml file can also contain optional filter parameters, which override the default values in ufo/filters/BayesianBackgroundCheck.h and ufo/utils/ProbabilityOfGrossErrorParameters.h:
PGE threshold
(PGECrit
, default 0.1): if the adjusted (after-check) PGE exceeds this value, the observation is rejected;perform obs minus BG threshold check
(PerformSDiffCheck
: defaulttrue
): if true perform an additional squared difference check, that \([y-H(x)]^2/{\sigma}^2\) does not exceed a threshold;obs minus BG threshold
(SDiffCrit
, default 100.0): threshold value for the squared difference check;max exponent
(ExpArgMax
, default 80.0): maximum allowed value of the exponent in the “good” probability distribution;obs error multiplier
(ObErrMult
, default 1.0): weight of observation error in the combined error variance;BG error multiplier
(BkgErrMult
, default 1.0): weight of background error in the combined error variance;bg error
: constant background error term. If present this will be used instead of the real background errors;bg error suffix
(BkgErrSuffix
, default “_background_error”): suffix which has been appended to variable name for background errors which are to be read in;bg error group
(BkgErrGroup
, default “ObsDiag”): group name which background errors for each variable are stored in;save total pd
(SaveTotalPd
, default false): if true, save the total (combined) probability distribution to theGrossErrorProbabilityTotal
group. This is required as an input by the Bayesian Whole Report filter.max error variance
(ErrVarMax
): a maximum value for the error variance. If not set, no maximum is applied.
- filter: Variable Assignment
assignments:
- name: GrossErrorProbability/ice_area_fraction
type: float
value: 0.04
- filter: Bayesian Background Check
filter variables:
- name: ice_area_fraction
prob density bad obs: 1.0
PGE threshold: 0.07
obs minus BG threshold: 100.0
Note that this filter requires the background value (HofX) and background error. Unless a constant background error term ‘bg error’ is provided in the yaml, the latter is accessed from the obs diagnostics - as an interim measure, supplied in a separate .nc4 file (see .yaml snippet below), with variable name e.g. ice_area_fraction_background_error
(no group name) to go with ice_area_fraction
.
HofX: HofX
obs diagnostics:
filename: Data/ufo/testinput_tier_1/background_errors_for_bayesianbgcheck_test.nc4
By default, a filter variable is treated as scalar. But for vectors, such as wind, the two components must be specified one after the other in the .yaml, and the first must have the option first_component_of_two
set to true.
- filter: Bayesian Background Check
filter variables:
- name: windEastward
options:
first_component_of_two: true
- name: windNorthward
Bayesian Background check currently only works for single-level observations, not profiles.
Bayesian Background QC Flags filter¶
The Bayesian Background QC Flags filter sets Met Office OPS QC flags based on values of probability of gross error (PGE). This filter should be invoked after any other filters which modify PGE, such as the Bayesian background check and the buddy check, have been run. If the PGE is larger than a chosen threshold then the observation is rejected by setting flags at the observation location. Eventually the Met Office QC flags will be replaced with Diagnostic Flags, but the core functionality will remain the same.
The following filter parameters can be set:
PGE threshold
: value of PGE above which an observation is rejected.PGE variable name substitutions
: a list of pairs of variable names. The PGE of the second variable in each pair is used to set the QC flags of the first variable; by default this happens for wind u and v components.
An example yaml section is as follows:
- filter: Bayesian Background QC Flags
filter variables: [airTemperature, windEastward, windNorthward]
PGE threshold: 0.8
PGE variable name substitutions: {"windEastward", "windNorthward"}
Air temperature QC flags are set if the temperature PGE is greater than 0.8. Due to the use of the variable name substitutions, both eastward and northward wind flags are set if the northward wind PGE is greater than 0.8. This could be useful if the PGE of only one of the wind components has been modified by the QC filters.
Bayesian Whole Report Filter¶
Synoptic stations typically provide reports at regular intervals. A report is a combination of variables observed by different sensors at a single location. Reports may include some, but not necessarily all, of pressure, temperature, dew point and wind speed and direction.
This filter calculates the probability that a whole report is affected by gross error, through the Bayesian combination of the probability of gross error of individual observations. This is based on the logic that if multiple observations within a report appear dubious based on a Bayesian Background check, it is likely that the whole report is affected by, for example, location error. This filter should be called after the Bayesian Background Check. The probability that whole report is affected by gross error is calculated from all the gross error probability of all the variables in the filter variables
list, except where the not_used_in_whole_report
option is specified for a given variable.
Once the probability that whole report is affected by gross error has
been calculated, it is used to update the probability of gross error
for each variable in the filter variables
list. Where this
updated probability of gross error exceeds the PGE threshold
,
the observation is flagged. PGE threshold
is an optional yaml parameter
which applies to the whole filter, and has a default value of 0.1
.
Variables can be either scalar or vector (with two Cartesian components, such as the eastward and northward wind components). In
the latter case the two components need to be specified one after the other in the filter variables
list, with the second component having the second_component_of_two option
set to true.
For each variable, the optional parameter probability_density_bad
(default value 0.1
) is used
to set the prior probability density of that variable being
“bad”. The filter can also apply a specific prior probability density of bad observations for the following observation types, identified by the integer ID MetaData/ObsType
:
Bogus
bogus_probability_density_bad
Synop (SynopManual, SynopAuto, MetarManual, MetarAuto, SynopMob, SynopBufr, WOW)
synop_probability_density_bad
These are both optional parameters. If they are not specified,
probability_density_bad
is used in their place, as for all other observation types.
For each filter variable, the following groups must be available from the ObsSpace:
GrossErrorProbability/
: the latest value of GrossErrorProbability,GrossErrorProbabilityInitial/
: the initial value of GrossErrorProbability before updates by any other filter, which can be saved using the Variable Assignment filter,GrossErrorProbabilityTotal/
: the total (combined) probability distribution, which is optionally saved the Bayesian Background Check filter,QCFlags/
: Met Office QC flags, which will eventually be replaced with Diagnostic Flags, must be initialized before this filter.
Additionally, the prior probability of gross error applying to the whole report must be available from MetaData/grossErrorProbabilityReport
.
Example:
- filter: Bayesian Whole Report
filter variables:
- name: pressure_at_model_surface
options:
probability_density_bad: 0.1
bogus_probability_density_bad: 0.1
- name: air_temperature_at_2m
options:
probability_density_bad: 0.1
- name: windEastward
options:
probability_density_bad: 0.1
synop_probability_density_bad: 0.1
bogus_probability_density_bad: 0.1
- name: windNorthward
options:
not_used_in_whole_report: true
second_component_of_two: true
- name: relativeHumidityAt2M
options:
not_used_in_whole_report: true
probability_density_bad: 0.1
PGE threshold: 0.15
Domain Check Filter¶
This filter retains all observations selected by the “where” statement and rejects all others. Below, the filter is configured to retain only observations
* taken at locations where the sea surface temperature retrieved from the model is between 200 and 300 K (inclusive)
* with valid height
metadata (not set to “missing value”)
* taken by stations with IDs 3, 6 or belonging to the range 11-120
* without valid pressure
metadata.
- filter: Domain Check
where:
- variable:
name: GeoVaLs/sea_surface_temperature
minvalue: 200
maxvalue: 300
- variable:
name: MetaData/height
is_defined:
- variable:
name: MetaData/stationIdentification
is_in: 3, 6, 11-120
- variable:
name: MetaData/pressure
is_not_defined:
BlackList Filter¶
This filter behaves like the exact opposite of Domain Check: it rejects all observations selected by the “where” statement statement. The status of all others remains the same. Below, the filter is configured to reject observations taken by stations with IDs 1, 7 or belonging to the range 100-199:
- filter: BlackList
where:
- variable:
name: MetaData/stationIdentification
is_in: 1, 7, 100-199
RejectList Filter¶
This is an alternative name for the BlackList filter.
AcceptList Filter¶
This filter sets the QC flag to pass for all observations selected by the “where” statement that have previously been rejected for any reason other than missing data, a pre-processing flag indicating rejection, or failure of the ObsOperator. This is mostly useful in QC procedures where all observations are initially rejected and then those fulfilling certain criteria are accepted, overriding the rejection.
Below, the filter is configured to accept only observations taken by stations with IDs 1, 7 or belonging to the range 100-199 (inclusive):
- filter: RejectList # initially reject all observations
- filter: AcceptList # accept back selected observations
where:
- variable:
name: MetaData/stationIdentification
is_in: 1, 7, 100-199
Perform Action Filter¶
This filter performs the action specified in the action
parameter on observations selected by the “where” statement.
Example 1¶
Here the filter is configured to inflate errors of all observations from the Southern hemisphere by a factor of two:
- filter: Perform Action
action:
name: inflate error
inflation: 2.0
where:
- variable: latitude
maxvalue: 0
Note
Technically, the same result could be obtained by replacing Perform Action
in the listing
above by RejectList
. However, having a RejectList
filter that does not actually
reject any observations can be confusing.
Example 2¶
The filter configured in this way behaves like RejectList
:
- filter: Perform Action
action:
name: reject
Example 3¶
The filter configured in this way behaves like AcceptList
:
- filter: Perform Action
action:
name: accept
Thinning Filter¶
This filter rejects a specified fraction of observations, selected at random. It supports the following YAML parameters:
amount
: the fraction of observations to reject (a number between 0 and 1).random seed
(optional): an integer used to initialize a random number generator if it has not been initialized yet. If not set, the seed is derived from the calendar time.
Note: because of how this filter is implemented, the fraction of rejected observations may not be exactly equal to amount
, especially if the total number of observations is small.
Example:
- filter: Thinning
amount: 0.75
random seed: 125
Gaussian Thinning Filter¶
This filter thins observations by preserving only one observation in each cell of a grid. Cell assignment can be based on an arbitrary combination of:
horizontal position
vertical position (in terms of height or pressure)
time
category (arbitrary integer associated with each observation).
Selection of the observation to preserve in each cell is based on
its position in the cell
optionally, its priority.
The following YAML parameters are supported:
Horizontal grid:
horizontal_mesh
: Approximate width (in km) of zonal bands into which the Earth’s surface is split. Thinning in the horizontal direction is disabled if this parameter is negative. Default: approx. 111 km (= 1 deg of latitude).use_reduced_horizontal_grid
: True to use a reduced grid, with high-latitude zonal bands split into fewer cells than low-latitude bands to keep cell size nearly uniform. False to use a regular grid, with the same number of cells at all latitudes. Default:true
.round_horizontal_bin_count_to_nearest
: True to set the number of zonal bands so that the band width is as close as possible tohorizontal_mesh
, and the number of cells (“bins”) in each zonal band so that the cell width in the zonal direction is as close as possible to that in the meridional direction. False to set the number of zonal bands so that the band width is as small as possible, but no smaller thanhorizontal_mesh
, and the cell width in the zonal direction is as small as possible, but no smaller than in the meridional direction.Defaults to
false
unless theops_compatibility_mode
option is enabled, in which case it’s set totrue
.partition_longitude_bins_using_mesh
: True to calculate partioning of longitude bins explicitly using horizontal mesh distance. By default this option is set tofalse
and calculating the number of longitude bins per latitude bin index involves the integer number of latitude bins. Setting this option totrue
adopts the Met Office OPS method whereby the integer number of latitude bins is replaced, in the calculation of longitude bins, by the Earth half-circumference divided by the horizontal mesh distance.Defaults to
false
unless theops_compatibility_mode
option is enabled, in which case it’s set totrue
.define_meridian_20000_km
: True to define horizontalMesh with respect to a value for the Earth’s meridian distance (half Earth circumference) of exactly 20000.0 km. By default this option is set tofalse
and the Earth’s meridian is defined for the purposes of calculating thinning boxes aspi*Constants::mean_earth_rad
~ 20015.087 km.Defaults to
false
unless theops_compatibility_mode
option is enabled, in which case it’s set totrue
.
Vertical grid:
vertical_mesh
: Cell size in the vertical direction. Thinning in the vertical direction is disabled if this parameter is not specified or negative.vertical_min
: Lower bound of the vertical coordinate interval split into cells of sizevertical_mesh
. Default: 100 (Pa).vertical_max
: Upper bound of the vertical coordinate interval split into cells of sizevertical_mesh
. This parameter is rounded upwards to the nearest multiple ofvertical_mesh
starting fromvertical_min
. Default: 110,000 (Pa).vertical_coordinate
: Name of the observation vertical coordinate. Default:pressure
.
Temporal grid:
time_mesh
: Cell size in the temporal direction. Temporal thinning is disabled if this this parameter is not specified or set to 0.time_min
: Lower bound of the time interval split into cells of sizetime_mesh
. Temporal thinning is disabled if this parameter is not specified.time_max
: Upper bound of the time interval split into cells of sizetime_mesh
. This parameter is rounded upwards to the nearest multiple oftime_mesh
starting fromtime_min
. Temporal thinning is disabled if this parameter is not specified.
Observation categories:
category_variable
: Variable storing integer-valued IDs associated with observations. Observations belonging to different categories are thinned separately.
Selection of observations to consider for thinning:
retain_only_if_all_filter_variables_are_valid
: Determines how to treat observations where multiple filter variables are present and their QC flags may differ (for example, a satellite observation with multiple channels).true
: include an observation in the set of locations to be thinned only if all filter variables have passed QC. For invalid observation locations (selected by a where clause but where one or more filter variables have failed QC) any remaining unflagged filter variables are rejected.false
: include an observation in the set of locations to be thinned if any filter variable has passed QC.
Default:
false
.
Selection of observations to retain:
priority_variable
: Variable storing observation priorities. Among all observations in a cell, only those with the highest priority are considered as candidates for retaining. If not specified, all observations are assumed to have equal priority.distance_norm
: Determines which of the highest-priority observations lying in a cell is retained. Allowed values:geodesic
: retain the observation closest to the cell center in the horizontal direction (the vertical coordinate and time are ignored when selecting the observation to retain)maximum
: retain the observation lying furthest from the cell’s bounding box in the system of coordinates in which the cell is a unit cube (all dimensions along which thinning is enabled are taken into account).
Defaults to
geodesic
unless theops_compatibility_mode
option is enabled, in which case it’s set tomaximum
.records_are_single_obs
: When set totrue
, thinning is performed on whole records (profiles), rather than treating every observation in every record as an individual observation. (See here for an example of using theobs space.obsdatain.obsgrouping
YAML option to group observations into records.) Thus if a record (specifically the earliest non-missing observation in a record) is deemed to be thinned, or accepted, every observation in that record is respectively thinned or accepted. This option does nothing if observations are not grouped into records. Can be used in combination with other options, such aspriority_variable
andcategory_variable
. Ifcategory_variable
is not empty andrecords_are_single_obs
istrue
, an exception will be thrown if the elements in any profile lie in two or more categories.select_median
: When set totrue
, retain the observation whoseObsValue
(orDerivedObsValue
- the latest modified valid type) is closest to the median value of all observations in the cell. (Cells containing no observations are ignored; option not tested withpriority_variable
orcategory_variable
set.)min_num_obs_per_bin
: Set to an integer to retain observations only from cells with greater than or equal to this number of observations in the cell. All observations in cells with less than this many observations are rejected. If set to <= \(1\), accept the single observation in any cell with only one observation. (Only applies whenselect_median: true
; otherwise this option does nothing; ifmin_num_obs_per_bin
is not set whenselect_median: true
, the default value is \(5\).)tiebreaker_pick_latest
: Set this option totrue
to make the filter select the observation with the later time within a cell, when the distance to the centre of the cell is equal between the observations being compared and the observations have equal priorities.ops_compatibility_mode
: Set this option totrue
to make the filter produce identical results as theOps_Thinning
subroutine from the Met Office OPS system when both are run serially (on a single process).This modifies the filter behavior in the following ways:
The
round_horizontal_bin_count_to_nearest
option is set totrue
.The
distance_norm
option is set tomaximum
.The
partition_longitude_bins_using_mesh
option is set totrue
.The
define_meridian_2000_km
option is set totrue
.Bin indices are calculated by rounding values away from rather towards zero. This can alter the bin indices assigned to observations lying at bin boundaries.
The bin lattice is assumed to cover the whole real axis (for times and pressures) or the [-360, 720] degrees interval (for longitudes) rather than just the intervals [
time_min
,time_max
], [pressure_min
,pressure_max
] and [0, 360] degrees, respectively. This may cause observations lying at the boundaries of the latter intervals to be put in bins of their own, which is normally undesirable.A different (non-stable) sorting algorithm is used to order observations before inspection. This can alter the set of retained observations if some bins contain multiple equally good observations (with the same priority and distance to the cell center measured with the selected norm). If this happens for a significant fraction of bins, it may be a sign the criteria used to rank observations (the priority and the distance norm) are not specific enough.
Example 1 (thinning by the horizontal position only):
- filter: Gaussian Thinning
horizontal_mesh: 1111.949266 #km = 10 deg at equator
Example 2 (thinning observations from multiple categories and with non-equal priorities by their horizontal position, pressure and time):
- filter: Gaussian Thinning
distance_norm: maximum
horizontal_mesh: 5000
vertical_mesh: 10000
time_mesh: PT01H
time_min: 2018-04-14T21:00:00Z
time_max: 2018-04-15T03:00:00Z
category_variable:
name: MetaData/instrument_id
priority_variable:
name: MetaData/thinningPriority
Temporal Thinning Filter¶
This filter thins observations so that the retained ones are sufficiently separated in time. It supports the following YAML parameters:
min_spacing
: Minimum spacing between two successive retained observations. Default:PT1H
.seed_time
: If not set, the thinning filter will consider observations as candidates for retaining in chronological order.If set, the filter will start from the observation taken as close as possible to
seed_time
, then consider all successive observations in chronological order, and finally all preceding observations in reverse chronological order.category_variable
: Variable storing integer-valued IDs associated with observations. Observations belonging to different categories are thinned separately. If not specified, all observations are thinned together.priority_variable
: Variable storing integer-valued observation priorities. If not specified, all observations are assumed to have equal priority.tolerance
: Only relevant ifpriority_variable
is set.If set to a nonzero duration, then whenever an observation O lying at least
min_spacing
from the previous retained observation O’ is found, the filter will inspect all observations lying no more thantolerance
further from O’ and retain the one with the highest priority. In case of ties, observations closer to O’ are preferred.
Example 1 (selecting at most one observation taken by each station per 1.5 h, starting from the observation closest to seed time):
- filter: Temporal Thinning
min_spacing: PT01H30M
seed_time: 2018-04-15T00:00:00Z
category_variable:
name: MetaData/call_sign
Example 2 (selecting at most one observation taken by each station per 1 h, starting from the earliest observation, and allowing the filter to retain an observation taken up to 20 min after the first qualifying observation if its quality score is higher):
- filter: Temporal Thinning
min_spacing: PT01H
tolerance: PT20M
category_variable:
name: MetaData/call_sign
priority_variable:
name: MetaData/score
Poisson Disk Thinning Filter¶
This filter thins observations by iterating over them in random order and retaining each observation lying outside the exclusion volumes (ellipsoids or cylinders) surrounding observations that have already been retained.
The following YAML parameters are supported:
Exclusion volume:
min_horizontal_spacing
: Size of the exclusion volume in the horizontal direction (in km).If the priority_variable parameter is set, this parameter may be a map assigning an exclusion volume size to each observation priority, or a floating-point constant. If the priority_variable parameter is not set (and hence all observations have the same priority), this parameter must be a floating-point constant. Exclusion volumes of lower-priority observations must be at least as large as those of higher-priority ones. If this parameter is not set, horizontal position is ignored during thinning.
Note: Owing to a bug in the eckit YAML parser, maps need to be written in the JSON style, with keys quoted. Example:
min_horizontal_spacing: {"1": 123, "2": 321}
This will not work:
min_horizontal_spacing: {1: 123, 2: 321}
and neither will this:
min_horizontal_spacing: 1: 123 2: 321
nor this:
min_horizontal_spacing: "1": 123 "2": 321
min_vertical_spacing
: Size of the exclusion volume in the vertical direction (in Pa).Like
min_horizontal_spacing
, this parameter can be either a constant or a map. If not set, vertical position is ignored during thinning.min_time_spacing
: Size of the exclusion volume in the temporal direction.Like
min_horizontal_spacing
, this parameter can be either a constant or a map. If not set, observation time is ignored during thinning.exclusion_volume_shape
: Shape of the exclusion volume surrounding each observation.Allowed values:
cylinder
: the exclusion volume of an observation taken at latitude lat, longitude lon, pressure p and time t is the set of all locations (lat’, lon’, p’, t’) for which all of the following conditions are met:the geodesic distance between (lat, lon) and (lat’, lon’) is smaller than min_horizontal_spacing
|p - p’| < min_vertical_spacing
|t - t’| < min_time_spacing.
ellipsoid
: the exclusion volume of an observation taken at latitude lat, longitude lon, pressure p and time t is the set of all locations (lat’, lon’, p’, t’) for which the following condition is met:geodesic_distance((lat, lon), (lat’, lon’))^2 / min_horizontal_spacing^2 + (p - p’)^2 / min_vertical_spacing^2 + (t - t’)^2 / min_time_spacing^2 < 1.
Default:
cylinder
.
Observation categories:
category_variable
: Variable storing integer-valued IDs associated with observations. Observations belonging to different categories are thinned separately. If not set, all observations are thinned together.
Selection of observations to retain:
priority_variable
: Variable storing observation priorities. An observation will not be retained if it lies within the exclusion volume of an observation with a higher priority.As noted in the documentation of
min_horizontal_spacing
, the exclusion volume size must be a (weakly) monotonically decreasing function of observation priority, i.e. the exclusion volumes of all observations with the same priority must have the same size, and the exclusion volumes of lower-priority observations must be at least as large as those of higher-priority ones.If this parameter is not set, all observations are assumed to have equal priority.
shuffle
: If true, observations will be randomly shuffled before being inspected as candidates for retaining. Default: true.Note: It is recommended to leave shuffling enabled in production code, since the performance of the spatial point index (kd-tree) used in the filter’s implementation may be degraded if observation locations are ordered largely monotonically (and random shuffling essentially prevents that from happening).
random_seed
: Seed with which to initialize the random number generator used to shuffle the observations ifshuffle
is set to true.If omitted, a seed will be generated based on the current (calendar) time.
select median
: If true, the retained observation is the one that has the median value of the observations in the exclusion volume. If there is an even number of observations in the exclusion volume, the first of the central pair is retained unlesswrite median
is set to true. The name of one (and only one) filter variable must be passed to the filter (see example 3). Default: false.Note: the overlap of exclusion volumes of retained observations may mean that the shape and size of the exclusion volume from which the median is calculated is not consistent with what might be expected. The code was implemented to enable consistency with a Met Office legacy code base and is expected to be deprecated once this requirement no longer exists.
write median
: If true, the median observation value in each exclusion volume is written to theDerivedObsValue
group. Values that contributed to the median but which were not the median are set to missing. If there are an even number of observations contributing to the median, the value that is written is the mean of the two central observations. Default: false.Note: this can only be used if
select median
is set to true.
Example 1¶
With the following parameters, observations are thinned by horizontal position only. The exclusion volume size depends on the observation priority. Each scan is thinned separately.
- filter: Poisson Disk Thinning
min_horizontal_spacing: {"0": 600, "1": 200} # priority -> km
category_variable:
name: MetaData/scan_index
priority_variable:
name: MetaData/thinningPriority
random_seed: 12345
Example 2¶
With the following parameters, observations are thinned by the horizontal position, vertical position and time. The exclusion volumes are ellipsoidal. Shuffling is disabled.
- filter: Poisson Disk Thinning
min_horizontal_spacing: 1000 # km
min_vertical_spacing: 10000 # Pa
min_time_spacing: PT1H
exclusion_volume_shape: ellipsoid
shuffle: false
Example 3¶
With the following parameters, observations are thinned by selecting the median observation within an exclusion volume.
- filter: Poisson Disk Thinning
filter variables:
- name: waterTemperature
min_horizontal_spacing: 50
shuffle: false
select median: true
Stuck Check Filter¶
This filter thins observations by iterating over them by station and flagging each observation that is part of a “streak” of sequential observations. The first condition for a “streak” is that the observation values are the same over a certain count of sequential observations. The second condition is either (a) that this set of observations is longer than a user-defined duration or (b) that it covers the full trajectory of a station.
Alternatively, a percentage can be specified, where if observation values are the same over more than this percentage of all non-missing values in a record, they are flagged as a streak. See here for an example of using the obs space.obsdatain.obsgrouping
YAML option to group observations into records. With no obsgrouping, the full set of valid observations counts as a single record.
The observation values which are used for evaluation of whether a “streak” exists are the
filter variables
. If multiple filter variables
are present, then each variable is
considered independently. In other words the filter flags observations based on each variable,
independent to the other variables. Any observations that form streaks in at least one
variable will be flagged.
The following YAML parameters are supported:
filter variables
: the variables to use to classify observations as “stuck”. This required parameter must be entered as a string vector.number stuck tolerance
: the maximum number of observations in a row with the same observation value before its classification as a potential streak is made. This required parameter must be entered as a non-negative integer.time stuck tolerance
: the maximum time duration before a potential streak is rejected This required parameter must be entered in ISO 8601 duration format. Ifnumber stuck tolerance
is exceeded and all of the station’s observations are part of the same streak,time stuck tolerance
is ignored and all of the observations are rejected regardless of the duration.percentage stuck tolerance
: the maximum percentage out of all non-missing values in each record, above which this many observations with the same value in a row are rejected as a streak. The percentage is first converted to a number for each record; if the number is less than 2, no observations are flagged in that record (otherwise every observation would be flagged as a streak of 1).
If percentage stuck tolerance
is defined, number stuck tolerance
and time stuck tolerance
must NOT be defined.
If number stuck tolerance
and time stuck tolerance
are defined, percentage stuck tolerance
must NOT be defined.
Example 1¶
With the following parameters, a “streak” of observations is defined as sequential observations with identical air temperature measured values. All observations in the streak will be flagged if the streak (a) consists of more than 2 observations and (b) lasts longer than 2 hours or consists of the full set of observations from the station.
- filter: Stuck Check:
filter variables: [airTemperature]
number stuck tolerance: 2
time stuck tolerance: PT2H
Example 2¶
With the following parameters, 2 types of streaks will be identified independently and the observations will be flagged accordingly if either of the following observed values are classified as “stuck”: air temperature and air pressure.
- filter: Stuck Check:
filter variables: [airTemperature, pressure]
number stuck tolerance: 2
time stuck tolerance: PT2H
Say we have 5 observations each taken an hour apart. Let the air temperature values equal: 274, 274, 274, 275, 275; and the air pressure values equal 4, 4, 5, 5, 5. In this case, all of the observations would be rejected.
Example 3¶
With the following parameters, a “streak” of observations is defined as sequential observations with identical air temperature measured values. A streak is rejected if it is longer than 50 % of the record.
- filter: Stuck Check:
filter variables: [airTemperature]
percentage stuck tolerance: 50
Say we have 5 observations in one record: 274, 274, 274, 275, 275; and 4 in another: 274, 274, 275, 275. The first 3 observations in the first record form a streak and are rejected (3 is greater than 50 % of 5). They are the only ones rejected. This is because the next record comprises 2 streaks each 2 observations long, and 2 is exactly 50 % of 4, not greater than 50 % of 4; therefore neither clear the threshold for rejection.
Difference Check Filter¶
This filter will compare the difference between a reference variable and a second variable and assign a QC flag if the difference is outside of a prescribed range.
For example:
- filter: Difference Check
reference: ObsValue/brightnessTemperature_8
value: ObsValue/brightnessTemperature_9
minvalue: 0
The above YAML is checking the difference between ObsValue/brightnessTemperature_9
and ObsValue/brightnessTemperature_8
and rejecting negative values.
In psuedo-code form:
if (ObsValue/brightnessTemperature_9 - ObsValue/brightnessTemperature_8 < minvalue) reject_obs()
- The options for YAML include:
minvalue
: the minimum value the differencevalue - reference
can be. Set this to 0, for example, and all negative differences will be rejected.maxvalue
: the maximum value the differencevalue - reference
can be. Set this to 0, for example, and all positive differences will be rejected.threshold
: the absolute value the differencevalue - reference
can be (sign independent). Set this to 10, for example, and all differences outside of the range from -10 to 10 will be rejected.
Note that threshold
supersedes minvalue
and maxvalue
in the filter.
Derivative Check Filter¶
This filter will compute a local derivative over each observation record and assign a QC flag if the derivative is outside of a prescribed range.
- By default, this filter will compute the local derivative at each point in a record.
For the first location (1) in a record:
dy/dx = (y(2)-y(1))/(x(2)-x(1))
For the last location (n) in a record:
dy/dx = (y(n)-y(n-1))/(x(n)-x(n-1))
For all other locations (i):
dy/dx = (y(i+1)-y(i-1))/(x(i+1)-x(i-1))
Alternatively if one wishes to use a specific range/slope for the entire observation record, i1
and i2
can be defined in the YAML.
For this case, For all locations in the record:
dy/dx = (y(i2)-y(i1))/(x(i2)-x(i1))
Note that this filter really only works/makes sense for observations that have been sorted by the independent variable and grouped by some other field.
An example:
- filter: Derivative Check
independent: datetime
dependent: pressure
minvalue: -50
maxvalue: 0
passedBenchmark: 238 # number of passed obs
The above YAML is checking the derivative of pressure
with respect to datetime
for a radiosonde profile and rejecting observations where the derivative is positive or less than -50 Pa/sec.
- The options for YAML include:
independent
: the name of the independent variable (dx
)dependent
: the name of the dependent variable (dy
)minvalue
: the minimum value the derivative can be without the observations being rejectedmaxvalue
: the maximum value the derivative can be without the observations being rejectedi1
: the index of the first observation location in the record to usei2
: the index of the last observation location in the record to use
- A special case exists for when the independent variable is ‘distance’, meaning the dx is computed from the difference of latitude/longitude pairs converted to distance.
Additionally, when the independent variable is ‘datetime’ and the dependent variable is set to ‘distance’, the derivative filter becomes a speed filter, removing moving observations when the horizontal speed is outside of some range.
Spike and Step Check Filter¶
This filter goes through each record and flags observations where the value of the dependent variable (as specified by the user) is classified as a spike or step relative to adjacent points along the (user-specified) independent variable, e.g. profiles of ocean temperature against depth. (Only tested for data grouped into records - set grouping with the obs space.obsdatain.obsgrouping.group_variable
YAML option. An example of its use can be found in the Profile consistency checks section.)
A spike is a point whose dependent variable value differs from the adjacent points on either side of it by more than a given tolerance. A step is when two adjacent points’ dependent variable values differ from each other by more than the tolerance. The tolerance can vary along the independent variable (more below). Points only count as spikes or steps if they are isolated, and not part of a trend spanning multiple points. A spike results in the point in question being flagged; a step results in both points on either side of the step being flagged.
Required parameters:
independent
: the independent (\(x\)) variable, e.g. depth in ocean profiles. (Must be float type.)dependent
: the dependent (\(y\)) variable, e.g. temperature or salinity in ocean profiles. (Must be float type.) NB: only one of each must be given.tolerance.nominal value
: the tolerance value where \(x = 0\). The tolerance is the value against which adjacent differences \(dy\) in the dependent variable are compared, to determine whether points are spikes or steps.
Optional parameters:
count spikes
: If false, do not count spikes. Default: true.count steps
: If false, do not count steps. Default: true.tolerance.threshold
: For checking conditions for a large spike or large consistent gradient. The smallertolerance.threshold
is, the more symmetrical a spike must be to be considered a spike, and the more tightly the point must be aligned with the points on either side to be considered a consistent gradient (in which case the point would not be considered a spike). Default: \(0.5\).tolerance.gradient
: \(dy/dx\) tolerance. If a point doesn’t meet the conditions for a large spike, it may yet count as a small spike if its gradient on either side exceeds the gradient tolerance (plus other conditions). Default: numeric maximum, i.e. nothing can exceed the gradient tolerance - small spikes are not counted if this option is left out.tolerance.gradient x resolution
: precision of \(dx\) when calculating \(dy/dx\). Default: epsilon, i.e. the smallest possible to avoid a divide by \(0\) error.tolerance.factors
andtolerance.x boundaries
: vector floats of respectively the multiplier factors and \(x\)-points which when joined by straight line segments, determine the tolerance against \(x\): tolerance equals nominal tolerance multiplied by this line segment function thus defined. Either bothfactors
andx boundaries
must be given and of the same size, or neither given.x boundaries
must be given in order of increasing \(x\) (andfactors
must match up with them). Default: nominal tolerance applies across whole \(x\) domain if neither are given.boundary layer.x range
: a 2-element vector[min, max]
defining the \(x\)-domain,min
\(\le x <\)max
, such that within it, the tolerance is modified (seestep tolerance range
below). Default:{0, 0}
.boundary layer.step tolerance range
: if \(x\) is within the boundary layer defined byboundary layer.x range
, then if the adjacent difference \(dy\) is within the range defined by this 2-element vectorstep tolerance range
, it cannot count as a step. Default:{0, 0}
.boundary layer.maximum x interval
: a 2-element vector [within, outside] such that if the spacing \(dx\) between two points is greater than the first element (when \(x\) within theboundary layer.x range
) or the second (when \(x\) outside theboundary layer.x range
), then ignore the corresponding \(dy\); do not check if it is a spike or step. Default: {numeric max, numeric max}, i.e. check every observation.
A call to Spike and Step Check MUST be preceded by creating Diagnostic Flags for the dependent variables in question, and the flags MUST be named “spike” and “step”:
- filter: Create Diagnostic Flags
filter variables:
- name: waterTemperature
- name: salinity
flags:
- name: spike
initial value: false
- name: step
initial value: false
This is because the Spike and Step Check sets these flags separately within the code itself. The flags thus set can then be used in the YAML, e.g. to count how many spikes and steps are in each record, and reject entire records whose sum of spikes and steps exceeds a given threshold. An example of this can be found in qc_spike_and_step_check.yaml
An example of applying the Spike and Step Check filter:
- filter: Spike and Step Check
filter variables:
- name: ObsValue/waterTemperature
dependent: ObsValue/waterTemperature # dy/
independent: MetaData/depthBelowWaterSurface # dx
count spikes: true
count steps: true
tolerance:
nominal value: 10 # K, in the case of temperature (not real value)
gradient: 0.1 # K/m - if dy/dx greater, could be a spike
gradient x resolution: 10 # m - can't know dx to better precision
factors: [1.0, 1.0, 0.5, 0.5, 0.1] # multiply tolerance, for ranges bounded by...
x boundaries: [0, 200, 300, 600, 600] # ...these values of x (depth in m)
boundary layer:
x range: [0.0, 300.0] # when bounded by these x values (depth in m)...
step tolerance range: [-1.0, -2.0] # ...relax tolerance for steps in boundary layer...
maximum x interval: [50.0, 100.0] # ...and ignore level if dx greater than this
action:
name: reject
In this case, both spikes and steps are counted for waterTemperature
profiles, and rejected for waterTemperature
only, since that is the only filter variable
listed. If other filter variables were listed, they would all be rejected at locations where spikes and steps in waterTemperature
(the dependent variable) are found. If looking for spikes and steps in other variables, the Spike and Step Check needs to be called again on each of them as the dependent variable separately.
The tolerance value as a function of \(x\), is the nominal value
(\(10\) K) multiplied by the tolerance factor function. In this example, the filter is more sensitive to spikes and steps the deeper you go. Note that tolerance function is constant at the last value in factors
when \(x\) exceeds the last value in x boundaries
. For jumps in tolerance such as at \(x = 600\) m, the value on the left hand side (smaller \(x\)) is used.
The temperature gradient (in K/m) is computed for each profile, and any point that does not count as a large spike but whose gradient on either side exceeds the gradient tolerance \(0.1\) K/m (amongst other conditions), is counted as a small spike. (The flagging does not distinguish between large and small spikes, they are all spikes.) For any points separated by less than \(10\) m (gradient x resolution
), the gradient is computed as the dependent variable adjacent difference \(dy\) divided by \(10\) m, preserving the sign of \(dx\).
The boundary layer is defined by boundary layer.x range
to be 0
\(\le x <\)300
m. When \(x\) is within the boundary layer, a step is unflagged if \(dy\) is within the step tolerance range
multiplied by the tolerance function - as shown in the figure below:
If two adjacent points have \(y\) value differing by more than the tolerance at their level \(x\), and if neither is a spike nor part of a large consistent gradient, they are flagged as a step (i.e. if \(dy\) is in the dark grey region). However, the condition is more lenient within the boundary layer, 0
\(\le x <\)300
m: the points are accepted as not a step if their \(dy\) falls within the light grey region, which is \(-1\) to \(-2\) times the tolerance (boundary layer.step tolerance range: [-1.0, -2.0]
).
Additionally, if the spacing \(dx\) between adjacent points is \(> 50\) m while \(x\) within the boundary layer, then the corresponding \(dy\) is skipped when checking for spikes and steps. That is, points spaced too far apart cannot be confidently flagged as spikes or steps. Outside of the boundary layer, the condition is applied when \(dx > 100\) m, as boundary layer.maximum x interval: [50.0, 100.0]
.
The reason for the boundary layer
options section is to accomodate a thermocline or halocline in the ocean, where a large negative gradient is expected and is not cause to flag a step, unless very large indeed, or large and positive. There is no impact on spike flagging. If the section is left out, the rest of the code applies, there is no relaxation of tolerance conditions anywhere.
Note that this filter does not currently support use of “where” clauses.
Track Check Filter¶
This filter checks tracks of mobile weather stations, rejecting observations inconsistent with the rest of the track.
Each track is checked separately. The algorithm performs a series of sweeps over the observations from each track. For each observation, multiple estimates of the instantaneous speed and (optionally) ascent/descent rate are obtained by comparing the reported position with the positions reported during a number a nearby (earlier and later) observations that haven’t been rejected in previous sweeps. An observation is rejected if a certain fraction of these estimates lie outside the valid range. Sweeps continue until one of them fails to reject any observations, i.e. the set of retained observations is self-consistent.
Note that this filter was originally written with aircraft observations in mind. However, it can potentially be useful also for other observation types.
The following YAML parameters are supported:
temporal_resolution
: Assumed temporal resolution of the observations, i.e. absolute accuracy of the reported observation times. Default: PT1M.spatial_resolution
: Assumed spatial resolution of the observations (in km), i.e. absolute accuracy of the reported positions.Instantaneous speeds are estimated conservatively with the formula
speed_estimate = (reported_distance - spatial_resolution) / (reported_time + temporal_resolution).
The default spatial resolution is 1 km.
num_distinct_buddies_per_direction
,distinct_buddy_resolution_multiplier
: Control the size of the set of observations against which each observation is compared.Let O_i (i = 1, …, N) be the observations from a particular track ordered chronologically. Each observation O_i is compared against m observations immediately preceding it and n observations immediately following it. The number m is chosen so that {O_{i-m}, …, O_{i-1}} is the shortest sequence of observations preceding O_i that contains
num_distinct_buddies_per_direction
observations distinct from O_i that have not yet been rejected. Two observations taken at times t and t’ and locations x and x’ are deemed to be distinct if the following conditions are met:|t’ - t| >
distinct_buddy_resolution_multiplier
*temporal_resolution
|x’ - x| >
distinct_buddy_resolution_multiplier
*spatial_resolution
Similarly, the number n is chosen so that {O_{i+1}, …, O_{i+n)} is the shortest sequence of observations following O_i that contains
num_distinct_buddies_per_direction
observations distinct from O_i that have not yet been rejected.Both parameters default to 3.
max_climb_rate
: Maximum allowed rate of ascent and descent (in Pa/s). If not specified, climb rate checks are disabled.max_speed_interpolation_points
: Encoding of the function mapping air pressure (in Pa) to the maximum speed (in m/s) considered to be realistic.The function is taken to be a linear interpolation of a series of (pressure, speed) points. The pressures and speeds at these points should be specified as keys and values of a JSON-style map. Owing to a bug in the eckit YAML parser, the keys must be enclosed in quotes. For example,
max_speed_interpolation_points: { "0": 900, "100000": 100 }
encodes a linear function equal to 900 m/s at 0 Pa and 100 m/s at 100000 Pa.
rejection_threshold
: Maximum fraction of climb rate or speed estimates obtained by comparison with other observations that are allowed to fall outside the allowed ranges before an observation is rejected. Default: 0.5.station_id_variable
: Variable storing string- or integer-valued station IDs. Observations taken by each station are checked separately.If not set and observations were grouped into records when the observation space was constructed, each record is assumed to consist of observations taken by a separate station. If not set and observations were not grouped into records, all observations are assumed to have been taken by a single station.
Note: the variable used to group observations into records can be set with the
obs space.obsdatain.obsgrouping.group_variable
YAML option.
Example:
- filter: Track Check
temporal_resolution: PT30S
spatial_resolution: 20 # km
num_distinct_buddies_per_direction: 3
distinct_buddy_resolution_multiplier: 3
max_climb_rate: 200 # Pa/s
max_speed_interpolation_points: {"0": 1000, "20000": 400, "110000": 200} # Pa: m/s
rejection_threshold: 0.5
station_id_variable: MetaData/stationIdentification
Ship Track Check Filter¶
This filter checks tracks of mobile weather stations, rejecting observations inconsistent with the
rest of the track. It differs from Track Check Filter
in that it only considers
inconsistencies in the lat-lon and time dimensions of each observation.
Each track is checked separately. The algorithm starts by performing the following calculations between consecutive observations:
Distances between each observation
The speed between each observation
Angles of the track formed by each triplet of consecutive observations
Various track statistics will be calculated:
The number of track segments (tracks between two consecutive observations) with less than an hour between the two observations.
The number of track segments which exceed a user-defined maximum speed.
The average speed of all track segments which do not fall into categories (1) and (2).
The number of track angles which are greater than or equal to 90 degrees.
If (1), (2), and (4) exceed a percentage of the total observations and the user-defined
early break check
setting is enabled, then the track is skipped over, with all
observations left unflagged.
If the filter proceeds, observations are flagged iteratively by removing one of the two
observations forming the fastest segment, until either (a) the segment with the fastest speed is
less than a user-defined max speed (m/s)
and the angles formed by this segment with its
adjacent segments are both less than 90 degrees or (b) the segment with the fastest speed is less
than 80 percent of max speed (m/s)
.
Numerous criteria are applied to choose which of the two observations forming the fastest track
segment should be removed, and track statistic (3) is heavily used in this assessment.
If the percentage of observations rejected rises greater than a
user-defined rejection threshold
fraction, the full track is rejected.
The following YAML parameters are supported:
temporal resolution
: Assumed temporal resolution of the observations (i.e. absolute accuracy of the reported observation times), used for the speed calculations. Required parameter.spatial resolution (km)
: Assumed spatial resolution of the observations (in km), i.e. absolute accuracy of the reported positions. Required parameter.max speed (m/s)
: The maximum speed (in m/s) between any two observations, above which requires the rejection of one of the comprising observations. Required parameter.rejection threshold
: The maximum fraction of track observations to be rejected, above which causes the full track to be rejected. Required parameter.early break check
: A boolean setting that determines if a track should be skipped (unfiltered) if its count of track statistics (1), (2), and (4) are too large a percentage of the total number of observations. Required parameter.input category
: The type of input source. If a static source such as BUOY, track statistic (1) will not be considered in deciding if a track should be skipped. Default: SHPSYN. The supported sources are: LNDSYN, SHPSYN, BUOY, MOBSYN, OPENROAD, TEMP, BATHY, TESAC, BUOYPROF, LNDSYB, and SHPSYB.records_are_single_obs
: If true, then treat each record as a single location within the track - accept or reject entire records according to the above criteria. Default: false. If option set to true while observations are not grouped into records, an error will be thrown. Set grouping with theobs space.obsdatain.obsgrouping.group_variable
YAML option. An example of its use can be found in the Profile consistency checks section.station_id_variable
: The variable that defines the tracks - note that this may be different from the obs grouping variable(s) that define records (there may be multiple records per track). If not given and ifrecords_are_single_obs: true
OR if not given while not grouped into records at all, then all the observations (records or individual) are treated as belonging to a single continuous track. However, if not given while grouped into records butrecords_are_single_obs: false
, then each record is treated as a separate track.
Example:
- filter: Ship Track Check
temporal resolution: PT30S
spatial resolution (km): .1
max speed (m/s): 3.0
rejection threshold: 0.5
station_id_variable:
name: MetaData/stationIdentification
records_are_single_obs: true
Met Office Buddy Check Filter¶
This filter cross-checks observations taken at nearby locations against each other, updating their gross error probabilities (PGEs) and rejecting observations whose PGE exceeds a threshold specified in the filter parameters. For example, if an observation has a very different value than several other observations taken at nearby locations and times, it is likely to be grossly in error, so its PGE is increased. PGEs obtained in this way can be taken into account during variational data assimilation to reduce the weight attached to unreliable observations without necessarily rejecting them outright.
The YAML parameters supported by this filter are listed below.
General parameters:
filter variables
(a standard parameter supported by all filters): List of the variables to be checked. Surface data, single-level and multi-level variables. are supported. Variables can be either scalar or vector (with two Cartesian components, such as the eastward and northward wind components). In the latter case the two components need to be specified one after the other in thefilter variables
list, with the first component having thefirst_component_of_two
option set to true. Example:filter variables: - name: airTemperature - name: windEastward options: first_component_of_two: true - name: windNorthward
rejection_threshold
: Observations will be rejected if the gross error probability lies at or above this threshold. Default: 0.5.traced_boxes
: A list of quadrangles bounded by two meridians and two parallels. Tracing information (potentially useful for debugging) will be output for observations lying within any of these quadrangles. Example:traced_boxes: - min_latitude: 30 max_latitude: 45 min_longitude: -180 max_longitude: -150 - min_latitude: -45 max_latitude: -30 min_longitude: -180 max_longitude: -150
Default: empty list.
Buddy pair identification:
num_levels
: Number of levels. Optional parameter.This would not be specified for surface fields. It should be set to 1 for single level fields and be set to >1 for multi-level fields (i.e. corresponding to the number of levels).
search_radius
: Maximum distance between two observations that may be classified as buddies, in km. Default: 100 km.station_id_variable
: Variable storing string- or integer-valued station IDs.If not set and observations were grouped into records when the observation space was constructed, each record is assumed to consist of observations taken by a separate station. If not set and observations were not grouped into records, all observations are assumed to have been taken by a single station.
Note: the variable used to group observations into records can be set with the
obs space.obsdatain.obsgrouping.group_variable
YAML option. An example of its use can be found in the Profile consistency checks section above.override_obs_grouping
: Override observation space grouping (defaulttrue
).If the observation space has been divided into records according to at least one grouping variable then, by default, the multi-level buddy check will be performed. However, if the parameter num_levels is equal to 1, the division into records is disregarded if the parameter override_obs_grouping is set to true. In that case individual observations are treated separately in the buddy check. The value of override_obs_grouping only has an effect if num_levels has been set to 1. In all other cases it is ignored.
num_zonal_bands
: Number of zonal bands to split the Earth’s surface into when building a search data structure.Note: Apart from the impact on the speed of buddy identification, both this parameter and
sort_by_pressure
affect the order in which observations are processed and thus the final estimates of gross error probabilities, since the probability updates made when checking individual observation pairs are not commutative.Default: 24.
sort_by_pressure
: Whether to include pressure in the sorting criteria used when building a search data structure, in addition to longitude, latitude and time. See the note next tonum_zonal_bands
. Default: false.max_total_num_buddies
: Maximum total number of buddies of any observation.Note: In the context of this parameter,
max_num_buddies_from_single_band
andmax_num_buddies_with_same_station_id
, the number of buddies of any observation O is understood as the number of buddy pairs (O, O’) where O’ != O. This definition facilitates the buddy check implementation (and makes it compatible with the original version from the OPS system), but is an underestimate of the true number of buddies, since it doesn’t take into account pairs of the form (O’, O).Default: 15.
max_num_buddies_from_single_band
: Maximum number of buddies of any observation belonging to a single zonal band. See the note next tomax_total_num_buddies
. Default: 10.max_num_buddies_with_same_station_id
: Maximum number of buddies of any observation sharing that observation’s station ID. See the note next tomax_total_num_buddies
. Default: 5.use_legacy_buddy_collector
: Set to true to identify pairs of buddy observations using an algorithm reproducing exactly the algorithm used in Met Office’s OPS system, but potentially skipping some valid buddy pairs. Default: false.
Control of gross error probability updates:
horizontal_correlation_scale
: Encoding of the function that maps the latitude (in degrees) to the horizontal correlation scale (in km).The function is taken to be a piecewise linear interpolation of a series of (latitude, scale) points. The latitudes and scales at these points should be specified as keys and values of a JSON-style map. Owing to a limitation in the eckit YAML parser (https://github.com/ecmwf/eckit/pull/21), the keys must be enclosed in quotes. For example,
horizontal_correlation_scale: { "-90": 200, "90": 100 }
encodes a function varying linearly from 200 km at the south pole to 100 km at the north pole.
Default:
{ "-90": 100, "90": 100 }
, i.e. a constant function equal to 100 km everywhere.horizontal_correlation_scale_2
(optional): In the same format ashorizontal_correlation_scale
, define a second latitude-dependent length scale to use in the calculation of the PGE adjustment. E.g. the oceans use mesoscale and synoptic length scales. N.B.: for this option to be used, bothanisotropy
,anisotropy_2
andbackground_error_group_2
must be specified (see below).anisotropy
: Latitude-dependent anisotropy factor, specified in the same format ashorizontal_correlation_scale
. Must be given if using 2-scale Buddy Check. A factor of 1 means isotropic; >1 means a buddy pair of observations a given distance apart would cause a greater PGE increase if the line joining them is aligned closer to an E-W line than to N-S. If not specified, the filter reverts to 1-scale isotropic.anisotropy_2
: As foranisotropy
, but to match the second length scale. Must be given if using 2-scale Buddy Check. If not specified, the filter reverts to 1-scale isotropic.temporal_correlation_scale
: Temporal correlation scale. Default: PT6H.vertical_correlation_scale
: Vertical correlation scale which relates to the ratio of pressures. Default: 6.damping_factor_1
Parameter used to “damp” gross error probability updates using method 1 described in section 3.8 of the OPS Scientific Documentation Paper 2 to make the buddy check better-behaved in data-dense areas. See the reference above for the full description. Default: 1.0.damping_factor_2
Parameter used to “damp” gross error probability updates using method 2 described in section 3.8 of the OPS Scientific Documentation Paper 2 to make the buddy check better-behaved in data-dense areas. See the reference above for the full description. Default: 1.0.background_error_group
: Group name of the background error variable. Default:ObsDiag
.background_error_suffix
: Suffix of the background error variable. Default:_background_error
, i.e. if neither the group nor suffix are specified, the background error is assumed to beObsDiag/<var>_background_error
for the corresponding filter variable<var>
.background_error_group_2
: As forbackground_error_group
, but for the second length scale. If not specified, the filter reverts to 1-scale isotropic.background_error_suffix_2
: As forbackground_error_suffix
, but for the second length scale. Default:""
, e.g. if not specified butbackground_error_group_2: MesoscaleError
, then the second background error is assumed to beMesoscaleError/<var>
for the filter variable<var>
.
Example:
- filter: Met Office Buddy Check:
filter variables:
- name: windEastward
options:
first_component_of_two: true
- name: windNorthward
- name: airTemperature
rejection_threshold: 0.5
traced_boxes: # trace all observations
- min_latitude: -90
max_latitude: 90
min_longitude: -180
max_longitude: 180
search_radius: 100 # km
station_id_variable:
name: MetaData/stationIdentification
num_zonal_bands: 24
sort_by_pressure: false
max_total_num_buddies: 15
max_num_buddies_from_single_band: 10
max_num_buddies_with_same_station_id: 5
use_legacy_buddy_collector: false
horizontal_correlation_scale: { "-90": 100, "90": 100 }
temporal_correlation_scale: PT6H
damping_factor_1: 1.0
damping_factor_2: 1.0
Implementation Notes¶
The implementation of this filter consists of four steps: sorting, buddy pair identification, PGE update and observation flagging. Observations are grouped into zonal bands and sorted by (a) band index, (b) longitude, (c) latitude, in descending order, (d) pressure (if the sort_by_pressure
option is on), and (e) datetime. Observations are then iterated over, and for each observation a number of nearby observations (lying no further than search_radius
) are identified as its buddies. The size and “diversity” of the list of buddy pairs can be controlled with the max_total_num_buddies
, max_num_buddies_from_single_band
and max_num_buddies_with_same_station_id
options. Subsequently, the PGEs of the observations forming each buddy pair are updated. Typically, the PGEs are decreased if the signs of the innovations agree and increased if they disagree. The magnitude of this change depends on the background error correlation between the two observation locations, the error estimates of the observations and background values, and the prior PGEs of the observations: the PGE change is the larger, the stronger the correlation between the background errors and the narrower the error margins. Once all buddy pairs have been processed, observations whose PGEs exceed the specified rejection_threshold
are flagged.
In calculation of the background error correlation, for both surface and multi-level fields, a vertical correlation of 1 is assumed. For single-level data, the estimate of the background error correlation depends upon the ratio of pressures between each pair of observations.
History Check Filter¶
This filter runs the Ship Track Check filter and/or the Stuck Check filter (depending on the observation type) on an auxiliary obs space. The auxiliary obs space should be a superset of the original obs space, with an earlier start time than the assimilation window and either the same, or optionally a later, end time. The equivalent observations to those which were flagged in the auxiliary obs space are then flagged in the original obs space. This filter is motivated by the fact that the Ship Track Check and Stuck Check filters both rely on viewing observations within the context of their surrounding observations. Thus, this filter makes the underlying filters more reliable for observations close to the temporal boundaries of the assimilation window. The filters are run independently: any observations within the assimilation window flagged by either of the sub-filters will be flagged by this filter.
The following YAML parameters are supported:
input category
: Surface observation subtype which determines if the ship track check and/or the stuck check filters should be run. Supported options are LNDSYN, SHPSYN, BUOY, MOBSYN, OPENROAD, TEMP, BATHY, TESAC, BUOYPROF, LNDSYB, and SHPSYB. Required parameter.time before start of window
: The duration of time before the start of the assimilation window to collect for the history check. This required parameter must be entered in ISO 8601 duration format.time after end of window
: The duration of time after the end of the assimilation window to collect for the history check. This optional parameter must be entered in ISO 8601 duration format.ship track check parameters
: The options for running the ship track check filter, should the subtype not be LNDSYN or LNDSYB. These must be filled in for the ship track check filter to run. The particular sub-parameters to fill in aretemporal resolution
,spatial resolution (km)
,max speed (m/s)
,rejection threshold
, andearly break check
. Please refer to the Ship Track Check filter documentation for additional details on how each of these sub-parameters works. Optional parameter.stuck check parameters
: The options for running the stuck check filter, should the subtype not be TEMP, BATHY, TESAC, or BUOYPROF. These must be filled in for the stuck check filter to run. The particular sub-parameters to fill in arenumber stuck tolerance
andtime stuck tolerance
. Please refer to the Stuck Check Filter documentation for additional details on how each of these sub-parameters works. Optional parameter.obs space
: The options used to create the auxiliary obs space that is determined by the observation subtype. A user needs to enter the following fields: name, simulated variables, and obsdatain. It additionally may be necessary to specify the distribution as InefficientDistribution. This prevents the observations from distributing to different processors between the original obs space and the auxiliary obs space, which could cause in-window observations flagged in the auxiliary obs space to be left unflagged in the original obs space. Required parameter.station_id_variable
: Variable storing string- or integer-valued station IDs. Observations taken by each station are checked separately. Applies to assimilation observation space. Optional parameter.If not set and observations were grouped into records when the observation space was constructed, each record is assumed to consist of observations taken by a separate station. If not set and observations were not grouped into records, all observations are assumed to have been taken by a single station.
Example:¶
With the following parameters, the history check filter will be run on the obs space explicitly
simulated, using the generated air temperature values for the stuck check and the lat-lon-dt values
for the ship track check. time before start of window
set as 3 hours will cause the
filters to run from 3 hours before the start of the assimilation window (regardless of the time
range present in the auxiliary obs space).
- filter: History Check
input category: 'SHPSYN'
time before start of window: PT3H
filter variables: [airTemperature]
stuck check parameters:
number stuck tolerance: 2
time stuck tolerance: PT2H
ship track check parameters:
temporal resolution: PT1S
spatial resolution (km): 0.001
max speed (m/s): 0.01
rejection threshold: 0.5
early break check: false
station_id_variable:
name: MetaData/stationIdentification
obs space:
name: Ship
distribution: InefficientDistribution
simulated variables: [airTemperature]
generate:
list:
lats: [-37.1, -37.2, -37.3]
lons: [82.5, 82.5, 82.5]
datetimes: [ '2010-01-01T00:00Z', '2010-01-01T01:30Z', '2010-01-01T03:00Z']
obs errors: [1.0]
Variable Assignment Filter¶
This “filter” (it is not a true filter; rather, a “processing step”) assigns specified values to
specified variables at locations selected by the where
statement, or at all locations if
the where
keyword is not present. The where operator
parameter can be used to
specify the logical operator used to combine conditions used in the where
statement.
The possible values are and
(the default) and or
.
Note that it is possible to use the where operator
option without the where
statement.
The option has no impact in that case.
The assigned values can be constants, existing ObsSpace variables or vectors generated by
ObsFunctions. If the variables don’t exist yet, they are created; in this case locations not
selected by the where
statement are initialized with missing-value markers.
The values assigned to individual variables are specified in the assignments
list in the
YAML file. Each element of this list can contain the following options:
name
: Name of the variable to which new values should be assigned. The variable can be from any group except forObsValue
(useDerivedObsValue
instead).channels
: (Optional) Set of channels to which new values should be assigned.value
: Value to be assigned to the specified variable. If this parameter is set to the stringmissing
, the variable will be set to the relevant missing value at all locations that pass thewhere
clause. The missing value to use is deduced from the type of the variable. Note it is therefore not possible to assign the stringmissing
to a variable because it will be automatically converted to the missing string signifier. Exactly one of thevalue
,source variable
andfunction
options must be present.source variable
: Variable that should be copied into the destination variable (specified in thename
option). Exactly one of thevalue
,source variable
andfunction
options must be present.function
: An ObsFunction that should be evaluated and assigned to the specified variable. Exactly one of thevalue
,source variable
andfunction
options must be present.type
: Type (int
,float
,string
ordatetime
) of the variable to which new values should be assigned. This option only needs to be provided if the variable doesn’t exist yet. If this option is provided and the variable already exists, its type must match the value of this option, otherwise an exception is thrown.
It is possible to assign variables or ObsFunctions of type int
to variables of type
float
and vice versa. No other type conversions are supported.
If the modified variable belongs to the DerivedObsValue
group and is a observed variable,
QC flags previously set to missing
are reset to pass
at locations where a valid
observed value has been assigned. Conversely, QC flags previously set to pass
are reset to
missing
at locations where the observed value has been set to missing.
Example 1¶
Create new variables GrossErrorProbability/airTemperature
and
GrossErrorProbability/relativeHumidity
and set them to 0.1 at all locations.
- filter: Variable Assignment
assignments:
- name: GrossErrorProbability/airTemperature
type: float # type must be specified if the variable doesn't already exist
value: 0.1
- name: GrossErrorProbability/relativeHumidity
type: float
value: 0.1
Example 2¶
Set GrossErrorProbability/airTemperature
to 0.05 at all locations in the tropics.
- filter: Variable Assignment
where:
- variable:
name: MetaData/latitude
minvalue: -30
maxvalue: 30
assignments:
- name: GrossErrorProbability/airTemperature
value: 0.05
Example 3¶
Set GrossErrorProbability/relativeHumidity
to values computed by an ObsFunction
(0.1 in the southern extratropics and 0.05 in the northern extratropics, with a linear
transition in between).
- filter: Variable Assignment
assignments:
- name: GrossErrorProbability/relativeHumidity
function:
name: ObsFunction/ObsErrorModelRamp
options:
xvar:
name: MetaData/latitude
x0: [-30]
x1: [30]
err0: [0.1]
err1: [0.05]
Example 4¶
Copy the variable MetaData/height
to DerivedMetaData/geopotentialHeight
.
- filter: Variable Assignment
assignments:
- name: DerivedMetaData/geopotentialHeight
type: float # type must be specified if the variable doesn't already exist
source variable: MetaData/height
Example 5¶
Initialise the variable MetaData/pressure
to the missing floating-point value
at all locations.
- filter: Variable Assignment
assignments:
- name: MetaData/pressure
type: float # type must be specified if the variable doesn't already exist
value: missing
Create Diagnostic Flags Filter¶
This “filter” (it is not a true filter; rather, a “processing step”) makes it possible to define new diagnostic flags and to reinitialize existing ones.
Diagnostic flags are stored in Boolean ObsSpace variables. A diagnostic flag Flag associated with a observed variable var is stored in the variable DiagnosticFlags/Flag/var
.
The diagnostic flags to create or reinitialize are specified in the flags
list in the
YAML file. Each element of this list can contain the following keys:
name
(required): The flag name. Conventionally, flag names follow the CamelCase naming convention (like group names).initial value
: Initial value for the flag (eithertrue
orfalse
). If not specified, defaults tofalse
.force reinitialization
: Determines what happens if the flag already exists. By default, the flag is not reinitialized, i.e. its current value is preserved. Setforce reinitialization
totrue
to reset the flag toinitial value
.
In addition, the filter recognizes the standard filter options filter variables
and defer to post
, but not where
or action
.
Setting and unsetting of diagnostic flags is normally performed using actions on a given filter; examples can be seen in Filter Actions.
Example 1¶
The following YAML snippet creates diagnostic flags Duplicate
and ExtremeValue
for all observed variables and initializes them to false
unless they already exist, in which cause their current values are preserved.
- filter: Create Diagnostic Flags
flags:
- name: Duplicate
- name: ExtremeValue
For instance, if the list of observed variables in the ObsSpace is [airTemperature, relativeHumidity]
, the filter will create the following Boolean variables: DiagnosticFlags/Duplicate/airTemperature
, DiagnosticFlags/Duplicate/relativeHumidity
, DiagnosticFlags/ExtremeValue/airTemperature
and DiagnosticFlags/ExtremeValue/relativeHumidity
.
Example 2¶
The following YAML snippet creates a diagnostic flag OriginallyMeasuredInMmHg
for the observed variable stationPressure
and initializes it to true
, overwriting any current values if this flag already exists:
- filter: Create Diagnostic Flags
filter variables: [stationPressure]
flags:
- name: OriginallyMeasuredInMmHg
initial value: true
force reinitialization: true
RTTOV 1D-Var Check (RTTOVOneDVar) Filter¶
This filter performs a 1-dimensional variational assimilation (1D-Var) that produces optimal retrievals of physical parameters that describe the atmosphere and surface on which there is information in the measurement. It takes as input a set of observations (brightness temperatures) and model background fields which are used to initialise the retrieval profile. A retrieval (or analysis) is performed using an iterative procedure that attempts to find the minimum of a cost function that represents the most likely profile vector given the error characteristics of the two data sources.
The elements contained in the retrieval profile depend on the sensitivity of the measuring instruments to atmospheric and surface properties and also what can be modelled with a relatively high degree of accuracy. Most retrieval profiles will consist of atmospheric temperature and humidity, and surface skin temperature, with other possible constituents being liquid and ice water or some other cloud parameter measure, and emissivity parameters.
The filter provides some retrieval parameters to the main assimilation which may be missing in the background or insufficiently accurate, such as surface skin temperature, and to filter out observations for which a retrieval could not be performed and thus may be difficult to assimilate in the full variational assimilation.
The filter is a port of the Met Office OPS 1D-Var and makes use of the Fortran RTTOV interface within JEDI. The code is written predominantly in Fortran. Files containing the observation error covariance (R) and the background error covariance (B) are expected as inputs.
This filter requires the following YAML parameters:
BMatrix
: path to the b-matrix file.RMatrix
: path to the r-matrix file.nlevels
: the number of levels used in the retrieval profile.retrieval variables from geovals
: list of retrieval variables (e.g. temperature etc) which form the 1D-Var retrieval vector (x) and are provided by the model interface. These need to be in the b-matrix file.ModOptions
: options needed for the observation operator (RTTOV only at the moment).filter variables
: list of variables (brightnessTemperature) and channels which form the 1D-Var observation vector (y).
The following are optional YAML parameters with defaults listed where a variable has a default value:
retrieval variables not from geovals
: list of retrieval variables (e.g. pressureAtCloudTop etc) which form the 1D-Var retrieval vector (x) and are not provided by the model interface. These need to be in theObsSpace
and b-matrix file.ModName
: forward model name (only RTTOV at the moment). Default:RTTOV
.surface emissivity
: there is a parameter section which includes all the options required for the surface emissivity. There is a separate section below which describes all the available options.qtotal
: flag for total humidity (qt = q + qclw + qi). If this is true the b-matrix must include qt or the code will abort. If this is false then the b-matrix must not contain qt or the code will abort. Default:false
.UseQtSplitRain
: flag to choose if rain is included in the non-vapour part of qtotal when split. e.g. qnv = ql + qi + qr. Default:true
.RTTOVMWScattSwitch
: flag to make sure the retrieval profile is setup for use with output with RTTOV-Scatt. Default:false
.RTTOVUseTotalIce
: flag to use the total ice option for cloud ice water with RTTOV-Scatt. This will only have an effect if the aboveRTTOVMWScattSwitch
is true. Default:true
.UseMLMinimization
: flag to turn on the Marquardt-Levenberg minimizer otherwise a Newton minimizer is used. Default:false
.UseJforConvergence
: flag to use the cost function value (J) for the measure of convergence. Default is comparison of the profile absolute differences to background error multiplied byConvergenceFactor
. Default:false
.UseRHwaterForQC
: flag to use liquid water in the q saturation calculations. Default:true
.UseColdSurfaceCheck
: flag to reset low level temperatures over sea ice and cold low land. Default:false
.Store1DVarLWP
: flag to write the retrieved liquid water path to the observation database. Default:false
.Store1DVarIWP
: flag to write the retrieved ice water path to the observation database. Default:false
.Store1DVarCLW
: flag to write the retrieved liquid water profiles to the observation database. Default:false
.Store1DVarTransmittance
: flag to write the retrieved surface to space transmittance to the observation database. Default:false
.RecalculateBT
: flag to recalculate the brightness temperatures using retrieved surface variables (emissivity, skin temperature) and retrieved cloud layer variables (CTP, ECA). Default:false
.Max1DVarIterations
: maximum number of iterations. Default:7
.JConvergenceOption
: integer to select convergence option. 1 equals percentage change in cost function value tested between iterations. Otherwise the absolute change in cost function value is tested between iterations. Default:1
.IterNumForLWPCheck
: choose which iteration to start checking the liquid water path. Default:2
.MaxMLIterations
: the maximum number of iterations for the internal Marquardt-Levenberg loop. Default:7
.ConvergeCheckChansAfterIteration
: if the iteration number is greater than this value then the channels specified byConvergeCheckChans
have the observation error inflated to 100000.0. Default:3
.ConvergeCheckChans
: a vector of channels which will have there error inflated to 100000.0 after the number of iterations specified byConvergeCheckChansAfterIteration
is exceeded.RetrievedErrorFactor
: a float value which is multiplied by the ObsError to provide a bounds check for the retrieved brightness temperatures. If any of the channels used in the retrieval fail this check the profile is rejected. When retrieving pressureAtTopOfCloud only those channels which are active after the cloudy channel selection are evaluated in this test. If this value is less than zero then this check is not performed. Default:4.0
.ConvergenceFactor
: the factor used when the absolute difference in the profile is used to determine convergence. Default:0.4
.CostConvergenceFactor
: the cost function threshold used for convergence check when cost function value is used for convergence. Default:0.01
.IRCloud_Threshold
the fraction of the air_temperature Jacobian (\(\partial BT_i/ \partial T_j\)) integrated (in ln(pressure)) from the top of the atmosphere to the surface that is permitted to be below the retrieved pressureAtTopOfCloud when retrieving a cloud layer in the IR. Default0.01
.SkinTempErrorLand
the value to scale the skin temperature error over land. If less than zero no scaling is applied. Default-1.0
.MaxLWPForCloudyCheck
the maximum value, in kg/m2, of the liquid water path when checking the profile during the minimization. Default2.0
.MaxIWPForCloudyCheck
the maximum value, in kg/m2, of the ice water path when checking the profile during the minimization. Default2.0
.
The following are the options contained in the surface emissivity
section of the YAML parameters. For all options, if the emissivity
is set to zero, rttov will calculate the value when called.
type
: there are five different options which setup how the surface emissivity will be initialized and in some cases how it will be retrieved. The default method isfixed
.rttovtocalculate
: this specifies that rttov will calculate theemissivity
for all surface types. The model used can be specified in theModOptions
or the RTTOV default will be used.fixed
: Theemissivity
values specified byEmissSeaDefault
,EmissLandDefault
andEmissSeaIceDefault
will be used for a given surface type.readfromdb
: Theemissivity
values are read from theObsSpace
from the group specified by thegroup in obs space
option within this parameter.readfromdbwitherror
: Theemissivity
and associatedemissivityError
are read from theObsSpace
. The values should both be present in the group specified by thegroup in obs space
option within this parameter.principalcomponent
: This will setup the principal component object within the code which is needed when retrieving principal component emissivity. Initial values are set depending on the presence of an atlas.
EmissSeaDefault
: the default emissivity value to use over sea surface types. Default:0.0
.EmissLandDefault
: the default emissivity value to use over land surface types. Default:0.95
.EmissSeaIceDefault
: the default emissivity value to use over seaice surface types. Default:0.92
.group in obs space
: the group in theObsSpace
where theemissivity
(andemissivityError
if requested) are read from. This is relevant for thereadfromdb
andreadfromdbwitherror
types.EmisEigVecPath
: the filename for the eigenvector file needed when thetype
isprincipalcomponent
.EmisAtlas
: the filename for the emissivity eigenvector atlas to setup the first values of the emissivity. This is used with theprincipalcomponent
type and is optional. If this file is not included a first guess value for each channel is available from the file specified by the EmisEigVecPath.mwEmissRetrieval
: a flag to set the emissivity retrieval as active for the mw instruments. The b-matrix file must contain entries for this retrieval to work correctly. Default isfalse
.number of surface emissivity retrieval elements
: the number of emissivity channels to retieve. This must match the number in the b-matrix file. This option is used ifmwEmissRetrieval
is true. Default is5
.emissivity to channel mapping
: a vector of channels corresponding to the emissivity channels to be retrieved. The size of this vector must match thenumber of surface emissivity retrieval elements
. This option is used ifmwEmissRetrieval
is true. Default is1, 2, 3, 16, 17
.channel to emissivity mapping
: a vector of emissivity elements. This needs to be the same size as the number of channels used in the 1d-var. This option is used ifmwEmissRetrieval
is true. Default is1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5
.
The following are optional YAML parameters to provide diagnostics for developers:
FullDiagnostics
: flag to turn on full diagnostics. Default:false
.StartOb
: the starting observation number for the main loop over all observations. This has been added for testing to allow a subset of observations in an ObsSpace to be evaluated by the filter. Default:0
.FinishOb
: the finishing observation number for the main loop over all observations. This has been added for testing to allow a subset of observations in an ObsSpace to be evaluated by the filter. Default:0
.obs bias group for testing
: specify the group which contains the ObsBias value. This allows for testing without an obs bias section. If this is not specified then the ObsBias passed into the filter is used.
Follow this hyperlink for example 1 in a test yaml.
### Example 1 for IASI hyperspectral IR ###
- filter: RTTOV OneDVar Check
ModOptions:
Absorbers: *rttov_absorbers
obs options:
RTTOV_default_opts: UKMO_PS45
SatRad_compatibility: false # done in filter
RTTOV_GasUnitConv: *gasunitconv
UseRHwaterForQC: false
UseColdSurfaceCheck: false
RTTOV_ScaleRefOzone: *RTTOV_ScaleRefOzone
WMO_ID: *wmo_id
Sat_ID: *sat_id
Instrument_Name: *instrument_id
CoefficientPath: *coefpath
RTTOV_clw_data: false
BMatrix: ../resources/bmatrix/rttov/iasi_bmatrix_70_test.dat
RMatrix: ../resources/rmatrix/rttov/iasi_metopb_rmatrix_test.nc4
filter variables:
- name: brightnessTemperature
channels: *1dvarchannels
retrieval variables from geovals:
- air_temperature # 1
- specific_humidity # 2
- surface_temperature # 3
- specific_humidity_at_two_meters_above_surface # 4
- skin_temperature # 5
- surface_pressure # 6
retrieval variables not from geovals:
- cloud_top_pressure # 16
- cloud_fraction # 17
- emissivity_pc # 18
nlevels: 70
UseMLMinimization: true
obs bias group for testing: ObsBias
Max1DVarIterations: 10
MaxMLIterations: 10
SkinTempErrorLand: 5.0
surface emissivity:
type: principalcomponent
EmisEigVecPath: ../resources/auxillary/IASI_EmisEigenVec.dat
RecalculateBT: true
Follow this hyperlink for example 2 in a test yaml.
### Example 2 for ATOVs mw sounder ###
- filter: RTTOV OneDVar Check
ModOptions:
Absorbers: *rttov_absorbers
obs options:
RTTOV_default_opts: UKMO_PS45
SatRad_compatibility: false
RTTOV_GasUnitConv: true
UseRHwaterForQC: false
UseColdSurfaceCheck: false
Do_MW_Scatt: true
RTTOV_clw_data: *rttovclwdata
WMO_ID: *wmo_id
Sat_ID: *sat_id
Instrument_Name: *inst_name
QtSplitRain: *qtsplitrain
CoefficientPath: Data/
BMatrix: ../resources/bmatrix/rttov/atms_bmatrix_70_test.dat #using atms to ignore emiss for now
RMatrix: ../resources/rmatrix/rttov/atovs_metopb_rmatrix_test.nc4
filter variables:
- name: brightnessTemperature
channels: *all_channels
retrieval variables from geovals:
- air_temperature # 1
- specific_humidity # 10
- mass_content_of_cloud_liquid_water_in_atmosphere_layer
- mass_content_of_cloud_ice_in_atmosphere_layer
- surface_temperature # 3
- specific_humidity_at_two_meters_above_surface # 4
- skin_temperature # 5
- surface_pressure # 6
surface emissivity:
type: fixed # default
EmissSeaDefault: 0.0 # default
EmissLandDefault: 0.95 # default
EmissSeaIceDefault: 0.92 # default
nlevels: 70
qtotal: true
UseQtSplitRain: false
UseJforConvergence: true
JConvergenceOption: 2
CostConvergenceFactor: 0.05
Max1DVarIterations: 20
MaxLWPForCloudyCheck: 6.0
MaxIWPForCloudyCheck: 6.0
RTTOVMWScattSwitch: true
UseRHwaterForQC: *UseRHwaterForQC
UseColdSurfaceCheck: *UseColdSurfaceCheck
Store1DVarLWP: true
Store1DVarIWP: true
Store1DVarTransmittance: true
ModelOb Threshold Filter¶
This filter applies a threshold to a model profile interpolated to the observation height.
The specified model profile variable is linearly (vertical) interpolated to the observation height using the specified model vertical coordinate variable. This is referred to as the “ModelOb”. Note that the ModelOb is not necessarily one of the HofX variables.
The observation height must be in the same coordinate system as that specified for the model vertical coordinate, e.g. both pressure.
The ModelOb is compared against a set of height-dependent thresholds. We supply a vector of threshold values, and a vector of vertical coordinate values corresponding to those thresholds. The coordinate values must be in the same vertical coordinate as the observation, e.g. pressure. The threshold values are then linearly interpolated to the observation height.
The observation is flagged for rejection if the ModelOb lies outside the threshold value according to threshold type - min or max. E.g. if the threshold type is min, then the observation is flagged if ModelOb is less than the interpolated threshold value.
This filter requires the following YAML parameters:
model profile
: name of the model profile variable (GeoVaLs).model vertical coordinate
: name of the model vertical coordinate variable (GeoVal).observation height
: name of the observation height variable to interpolate to.thresholds
: vector of threshold values.coordinate values
: vector of vertical coordinate values corresponding tothresholds
.threshold type
:min
. ormax
.
Example
- filter: ModelOb Threshold
model profile:
name: GeoVaLs/relative_humidity
model vertical coordinate:
name: GeoVaLs/air_pressure
observation height:
name: MetaData/pressure
thresholds: [50,50,40,30]
coordinate values: [100000,80000,50000,20000]
threshold type: min
Satwind Inversion Filter¶
This filter is a processing step which modifies the assigned pressure of AMV observations if a temperature inversion is detected in the model profile and defined criteria are met.
The model profile is searched for the presence of a temperature inversion. Where there are multiple temperature inversions, only the lowest one is found. This is intended only for use on low level AMVs, typically below 700 hPa height.
The pressure of the AMV is corrected downwards in height if the following conditions are true:
Originally assigned pressure is greater than or equal to min_pressure (Pa).
AMV derived from IR and visible channels only.
Temperature inversion is present in the model profile for pressures less than or equal to max_pressure (Pa).
In order to be considered significant, the temperature difference across the top and base of the inversion must be greater than or equal to the inversion_temperature (K) value.
Relative humidity at the top of the inversion is less than the rh_threshold value.
AMV has been assigned above the height of the inversion base.
The AMV is then re-assigned to the base of the inversion.
Reference for initial implementation:
Cotton, J., Forsythe, M., Warrick, F., (2016). Towards improved height assignment and quality control of AMVs in Met Office NWP. Proceedings for the 13th International Winds Workshop 27 June - 1 July 2016, Monterey, California, USA.
This filter requires the following YAML parameters:
observation pressure
: name of the observation pressure variable to correct.RH threshold
: relative humidity (%) threshold value.
The following are optional YAML parameters with appropriate defaults:
minimum pressure
: minimum AMV pressure (Pa) to consider for correction. Default:70000.
Pa.maximum pressure
: maximum model pressure (Pa) to consider. Default:105000.
Pa.inversion temperature
: temperature difference (K) between the inversion base and top. Default:2.0
K.
Example:
- filter: Satwind Inversion Correction
observation pressure:
name: MetaData/pressure
RH threshold: 50
maximum pressure: 96000
GNSS-RO 1D-Var Check (GNSSROOneDVar) Filter¶
This filter performs a 1-dimensional variational assimilation (1D-Var) that acts as a quality-control check for GNSS-RO profile data. It finds the optimal set of bending angles based on the background departures from the observations. If these optimal values are too far from the observation, or the minimisation does not converge within a given number of iterations, then the full profile of observations is rejected. Other, smaller, tests are also included.
The bending angle observations are normally stored individually, rather than being kept as a profile. Therefore the profile is constructed using the record number as an identifier for which observations belong to a given profile. These observations are sorted according to their impact parameter (smallest first) and the GeoVaL for the first observation is used to represent the model background values for the whole profile. This filter is currently tied to the Met Office’s bending angle operator for GNSS-RO and thus requires the appropriate inputs for that operator.
This filter requires the following parameters to be set in the yaml:
bmatrix_filename
: The file-name of the background-error covariance used.capsupersat
: If true calculate saturation vapour pressure with respect to water and ice (below zero degrees), else calculate it with respect to water everywhere.cost_funct_test
: The profile is rejected if the final cost-function is larger thancost_funct_test
times the number of observations.Delta_ct2
: The minimisation is considered to have converged if the absolute value of the change in the solution for an iteration, divided by the gradient in the cost-function is less thanDelta_ct2
times the number of observations in the profile, divided by 200.Delta_factor
: The minimisation is considered to have converged if the absolute change in the cost-function at this iteration is less thanDelta_factor
times either the previous value of the cost-function or the number of observations (whichever is the smaller). That is, the minimisation has converged if:ABS(J_new - J_old) < Delta_factor * min(J_old, nObs)
min_temp_grad
: Threshold for the minimum temperature gradient before a profile is considered isothermal (units: K per model level). Only applies if pseudo-levels are being used.n_iteration_test
: The maximum number of iterations - the profile is rejected if it does not converge in time.OB_test
: If the RMS difference between the observations and the background bending angle is greater thanOB_test
then the whole profile is rejected.pseudo_ops
: Whether to use pseudo-levels to reduce interpolation errors in the forward model.vert_interp_ops
: If true use linear interpolation in ln(pressure), otherwise use linear interpolation in exner.y_test
: If an observation is more thany_test
times the observation error away from the solution bending angle, then the observation (not the whole profile) is rejected.
Example
Model Best Fit Pressure Filter¶
This filter calculates the model best-fit pressure and flags cases where this estimate is poorly constrained. Optionally, it can output the best-fit eastward and northward wind vectors, which are the model background winds interpolated to the model best-fit pressure.
The model best-fit pressure is defined as the model pressure (Pa) with the smallest vector difference between the AMV and model background wind, but additionally is not allowed to be lower than the threshold specified in the top pressure parameter. Vertical interpolation is performed between model levels to find the minimum vector difference.
Checking if the pressure is well-constrained:
Remove any winds where the minimum vector difference between the AMV u (windEastward) and v (windNorthward) and the background column u and v is greater than the threshold specified in the upper vector diff parameter. This check aims to remove cases where there is no good agreement between the AMV and the winds at any level in the background wind column.
Remove any winds where the vector difference is less than the lower vector diff anywhere outside the band of width 2 * pressure band half-width centered around the best-fit pressure level. This aims to catch cases where there are secondary minima or very broad minima. In both cases the best-fit pressure is not well constrained.
This filter accepts the following YAML parameters:
observation pressure
: Name of the observation pressure variable. Required parameter.model pressure
: Name of the model pressure variable. Required parameter.top pressure
: Minimum allowed pressure region. Default:10000.
Pa.pressure band half-width
: Pressure band, for calculating constraint. Default:10000.
Pa.upper vector diff
: Max vector difference allowed, for calculating constraint. Default:4.
m/s.lower vector diff
: Min vector difference allowed, for calculating constraint. Default:2.
m/s.tolerance vector diff
: Tolerance for vec_diff comparison. Default:1.0e-8
m/s.tolerance pressure
: Tolerance for pressure comparison. Default:0.01
Pa.calculate bestfit winds
: To calculate best-fit winds by linear interpolation. Output stored in “DerivedValue/model_bestfit_eastward_wind” and “DerivedValue/model_bestfit_northward_wind”. Default:false
Example
- filter: Model Best Fit Pressure
observation pressure:
name: MetaData/pressure
model pressure:
name: GeoVaLs/air_pressure_levels
top pressure: 10000
pressure band half-width: 10000
upper vector diff: 4
lower vector diff: 2
tolerance vector diff: 1.0e-8
tolerance pressure: 0.01
calculate bestfit winds: true
Process AMV QI¶
This “filter” (it is not a true filter; rather, a “processing step”) converts AMV Quality Index (QI) values stored in the 3-10-077 BUFR template into variables with names corresponding to the wind generating application number.
If not present, new QI variables are created. Created QI variables depend on “windGeneratingApplication_<number>” and fills them with the values found in “windPercentConfidence_<number>”.
The wind generating application numbers are associated as below:
Wind generating application number |
QI type |
Variable name |
---|---|---|
1 |
Full weighted mixture of individual quality tests |
QI_full_weighted_mixture |
2 |
Weighted mixture of individual tests, but excluding forecast comparison |
QI_weighted_mixture_exc_forecast_comparison |
3 |
Recursive filter function |
qiRecursiveFilterFunction |
4 |
Common quality index (QI) without forecast |
QI_common |
5 |
QI without forecast |
qiWithoutForecast |
6 |
QI with forecast |
qiWithForecast |
7 |
Estimated Error (EE) in m/s converted to a percent confidence |
QI_estimated_error |
This filter accepts the following YAML parameters:
number of generating apps
: How many generating application variables to search for. Required parameter.
Example
- filter: Process AMV QI
number of generating apps: 4
Satname Filter¶
This filter creates a string variable that makes it simpler to identify Satwind (AMV) observations by combining satellite and channel information. This is useful for later processing where we want to apply filters to subsets of observations.
To identify the type of motion that has been tracked, AMV BUFR observations are supplied with a channel central frequency (Hz) and a wind computation method as described in code table 002023 below:
Num |
Method |
Description |
---|---|---|
0 |
Reserved |
|
1 |
Infrared |
Motion observed in the infrared channel |
2 |
Visible |
Motion observed in the visible channel |
3 |
Vapour cloud |
Motion observed in the water vapour channel |
4 |
Combination |
Motion observed in a combination of spectral channels |
5 |
Vapour clear |
Motion observed in the water vapour channel in clear air |
6 |
Ozone |
Motion observed in the ozone channel |
7 |
Vapour |
Motion observed in water vapour channel (cloud or clear) |
13 |
Root-mean-square |
The most common use of the wind computation method is to distinguish between clear-sky and cloudy water vapour targets.
This filter combines this channel information, together with the satellite name, to
create a string MetaData/satwind_id
that defines the satellite/channel combination of each observation.
We also output a diagnostic variable Diag/satwind_id
which provides information on unidentified
satellites or channels.
Required variables:
MetaData/sensorCentralFrequency
MetaData/satelliteIdentifier
MetaData/windComputationMethod
Outputs variables:
MetaData/satwind_id
Diag/satwind_id
This filter requires the following YAML parameters:
min WMO Satellite id
: Minimum WMO platform number to considermax WMO Satellite id
: Maximum WMO platform number to considermin frequency
: For each channel, the minimum central frequency (Hz)max frequency
: For each channel, the maximum central frequency (Hz)wind channel
: For each channel, the string name to call this channelSat ID
: For each satellite, the WMO identifier for each platformSat name
: For each satellite, the string name for this platform
This following YAML parameter is optional:
satobchannel
: Wind computation method number, ignored if none.
Example:
- filter: satname
SatName assignments:
- min WMO Satellite id: 1
max WMO Satellite id: 999
Satellite_comp:
- satobchannel: 1
min frequency: 2.6e+13
max frequency: 2.7e+13
wind channel: ir112
- satobchannel: 1
min frequency: 7.5e+13
max frequency: 8.2e+13
wind channel: ir38
Satellite_id:
- Sat ID: 270
Sat name: GOES16
This yaml will attempt to identify two infrared channels with computation method
value of 1 and central frequencies falling between the min and max frequency bounds.
If observations are identified from GOES-16 (platform number 270) they are also labelled.
This will fill MetaData/satwind_id
with values of “GOES16ir112”,”GOES16ir38” if these are present
in the observations.
If either the satellite or channel are not identified, then MetaData/satwind_id
is set to
“*** MISSING ***”. To help track down why observations are set to missing, Diag/satwind_id
has the form id<satellite identifier>_comp<cloud motion method>_freq<central frequency>
.
For example, if the satellite is identified but the channel is not “GOES16_comp3_freq0.484317e14”,
if the satellite is not identified but the channel is “id270ir112”.
Met Office Duplicate Check Filter¶
This filter can be used to thin data that are both spatially and temporally dense. The algorithm employed is designed to reproduce the operation of the Met Office OPS code; more generic thinning algorithms can be performed with the Gaussian Thinning, Temporal Thinning and Poisson Disk Thinning filters.
The duplicate check algorithm divides the globe into latitude bands of width latitude band width
(degrees).
Observations in each latitude band are sorted on longitude from lowest to highest; for simplicity, the discontinuity at the edges is not considered.
Starting at the band nearest to the North Pole, and moving southwards, the check determines whether any pairs of observations are colocated inside a volume defined by the parameters
latitude bin half-width
(degrees), longitude bin half-width
(degrees), time bin half-width
(Pa) and pressure bin half-width
(s).
If such a pair is found, the observation with the largest value of the priority variable is retained.
The priority variable is in the MetaData
group and its name is governed by the parameter priority name
.
In the event of a tie in values of priority, the observation with the lower value of longitude is retained.
For a given latitude band the algorithm searches in that band and the adjacent one in order to avoid discontinuities at band edges.
This filter is designed to reproduce the Met Office OPS code so the sorting by longitude is performed using the Met Office sorting algorithm.
In addition to the above mandatory parameters there are two overridable parameters.
pre-sort by record ID
will trigger an additional sort of the data in ascending identifier order before running the filter (all observations are guaranteed to have a record ID). This ensures that the results from a run using more than one MPI processor will match the results on a single processor. The default for this option is false
.
vertical coordinate variable
allows the specification of the variable to use for the vertical coordinate information. By default this is MetaData/pressure
.
An example configuration of this filter is as follows:
- filter: Met Office Duplicate Check
priority name: thinningPriority
latitude band width: 1.5
latitude bin half-width: 1.0
longitude bin half-width: 1.0
time bin half-width: 30.0
pressure bin half-width: 250.0
pre-sort by record ID: true
vertical coordinate variable: ObsValue/waterPressure
In this example the globe is divided into bands of width 1.5 degrees. The search volume spans 2 degrees in latitude, 2 degrees in longitude, 500 Pa in pressure and 60 seconds in time.
The priority variable name is MetaData/thinningPriority
; observations with higher values of this variable are retained in preference to those with lower values.