Additional QC Filter Options¶
Where Statement¶
By default, filters are applied to all observations of the variables specified in the filter variables
list (or if this list is not present, all simulated variables). The where
keyword can be used to apply a filter only to observations meeting certain conditions.
Consider the following set of observations:
Obs. index - latitude - longitude - air_temperature (K) |
---|
0 - 0 - 50 - 300 |
1 - 20 - 60 - 200 |
2 - 40 - 70 - 290 |
3 - 60 - 80 - 260 |
4 - 80 - 90 - 220 |
and suppose that we want to reject air temperature observations below 230 K taken in the tropical zone (between 30°S and 30°N). We could do this using the Bounds Check filter with a where
statement:
- filter: Bounds Check
filter variables: air_temperature
minvalue: 230
action:
name: reject # this is the default action, specified explicitly for clarity
where:
- variable:
name: latitude@MetaData
minvalue: -30
maxvalue: 30
This would cause the filter to be applied only to air temperature observations selected by the where
statement, i.e. meeting the specified condition -30 <= latitude@MetaData <= 30
. Please note this does not mean all these observations would be rejected; rather, it means the Bounds Check filter would inspect only these observations and apply its usual criteria (in this case, “is the air temperature below the minimum allowed value of 230 K?”) to decide whether any of them should be rejected. In our example, only observation 1 would be rejected, since this is the only observation (a) taken in the range of latitudes selected by the where
statement and (b) with a value lying below the minimum value passed to the Bounds Check filter.
The list passed to the where
keyword can contain more than one item, each representing a separate condition imposed on a particular variable. The filter is applied only to observations meeting all of these conditions. The following kinds of conditions are accepted:
minvalue
and/ormaxvalue
: filter applied only to observations for which the condition variable lies in the specified range. The upper and lower bounds can be floating-point numbers or datetimes in the ISO 8601 format. If any date/time components are set to *, they are disregarded. See Example 2 below on where this can be useful. Each of these strings must be 20 characters long so defining ‘any year’ would be indicated by ****.is_defined
: filter applied only to observations for which the condition variable has a valid value (not a missing data indicator).is_not_defined
: filter applied only to observations for which the condition variable is set to a missing data indicator.is_in
: filter applied only to observations for which the condition variable is set to a value belonging to the given whitelist.is_close_to_any_of
: filter applied only to observations for which the condition variable (a float) is close to any of the variables in the given reference list. Two variables are defined as close if they differ by less than a provided tolerance. The tolerance must be provided and can either be absolute (absolute_tolerance
) or relative (relative_tolerance
).is_not_in
: filter applied only to observations for which the condition variable is set to a value not belonging to the given blacklist.is_not_close_to_any_of
: filter applied only to observations for which the condition variable (a float) is not close to any of the variables in the given reference list. Two variables are defined as close if they differ by less than a provided tolerance. The tolerance must be provided and can either be absolute (absolute_tolerance
) or relative (relative_tolerance
).any_bit_set_of
: filter applied only to observations for which the condition variable is an integer with at least one of the bits with specified indices set.any_bit_unset_of
: filter applied only to observations for which the condition variable is an integer with at least one of the bits with specified indices unset (i.e. zero).matches_regex
: filter applied only to observations for which the condition variable is a string that matches the specified regular expression or an integer whose decimal representation matches that expression. The regular expression should conform to the ECMAScript syntax described at http://www.cplusplus.com/reference/regex/ECMAScript.matches_wildcard
: filter applied only to observations for which the condition variable is a string that matches the specified wildcard pattern or an integer whose decimal representation matches that pattern. The following wildcards are recognized:*
(matching any number of characters, including zero) and?
(matching any single character).matches_any_wildcard
: filter applied only to observations for which the condition variable is a string that matches at least one of the specified wildcard patterns, or an integer whose decimal representation matches at least one of these patterns. The same wildcards are recognized as formatches_wildcard
.
The elements of both whitelists and blacklists can be strings, non-negative integers or ranges of non-negative integers. It is not necessary to put any value after the colon following is_defined
and is_not_defined
. Bits are numbered from zero starting from the least significant bit.
The following examples illustrate the use of these conditions.
Example 1¶
where:
- variable:
name: sea_surface_temperature@GeoVaLs
minvalue: 200
maxvalue: 300
- variable:
name: latitude@MetaData
maxvalue: 60.
- variable:
name: height@MetaData
is_defined:
- variable:
name: station_id@MetaData
is_in: 3, 6, 11-120
In this example, the filter will be applied only to observations for which all of the following four criteria are met:
the sea surface temperature is within the range of [200, 300] K,
the latitude is <= than 60°N,
the observation location’s altitude has a valid value (is not set to a missing data indicator), and
the station id is one of the ids in the whitelist.
Example 2¶
where:
- variable:
name: datetime@MetaData
minvalue: "****-01-01T00:00:00Z"
maxvalue: "****-25-05T00:00:00Z"
- variable:
name: datetime@MetaData
minvalue: "****-**-**T09:00:00Z"
maxvalue: "****-**-**T18:00:00Z"
In this example, the filter will be applied only to observations taken between 09:00:00 and 18:00:00, between 1st January and 25th May of every year (end inclusive). Note that datetime components are not yet 'loop aware'. That is, a where clause between May and February for example would require two filters: one covering the Jan-Feb period and a second to cover the May-Dec period.
Example 3¶
where:
- variable:
name: mass_concentration_of_chlorophyll_in_sea_water@PreQC
any_bit_set_of: 0, 1
In this example, the filter will be applied only to observations for which the :code:`mass_concentration_of_chlorophyll_in_sea_water@PreQC` variable is an integer whose binary representation has a 1 at position 0 and/or position 1. (Position 0 denotes the least significant bit -- in other words, bits are numbered "from right to left".)
Example 4¶
where:
- variable:
name: mass_concentration_of_chlorophyll_in_sea_water@PreQC
any_bit_set_of: 4
- variable:
name: mass_concentration_of_chlorophyll_in_sea_water@PreQC
any_bit_unset_of: 10-12
In this example, the filter will be applied only to observations for which the :code:`mass_concentration_of_chlorophyll_in_sea_water@PreQC` variable is an integer whose binary representation has a 1 at position 4 and a 0 at any of positions 10 to 12.
Example 5¶
where:
- variable:
name: station_id@MetaData
matches_regex: 'EUR[A-Z]*'
In this example, the filter will be applied only to observations taken by stations whose IDs match the regular expression :code:`EUR[A-Z]*`, i.e. consist of the string :code:`EUR` followed by any number of capital letters.
Example 6¶
where:
- variable:
name: station_id@MetaData
matches_wildcard: 'EUR??TEST*'
In this example, the filter will be applied only to observations taken by stations whose IDs match the wildcard pattern :code:`EUR??TEST*`, i.e. consist of the string :code:`EUR` followed by two arbitrary characters, the string :code:`TEST` and any number of arbitrary characters.
Example 7¶
where:
- variable:
name: observation_type@MetaData
matches_any_wildcard: ['102*', '103*']
In this example, assuming that observation_type@MetaData
is an integer variable, the filter will be applied only to observations whose types have decimal representations starting with 102
or 103
.
Example 8¶
where:
- variable:
name: model_elevation@GeoVaLs
is_close_to_any_of: [0.0, 1.0]
absolute_tolerance: 1.0e-12
In this example, assuming that model_elevation@GeoVaLs
is a float variable, the filter will be applied only to observations whose model_elevation
is within 1.0e-12
of either 0.0
or 1.0
.
Example 9¶
where:
- variable:
name: model_elevation@GeoVaLs
is_not_close_to_any_of: [100.0, 200.0]
relative_tolerance: 0.1
In this example, assuming that model_elevation@GeoVaLs
is a float variable, the filter will be applied only to observations whose model_elevation
is not within 10 % of either 100.0
or 200.0
.
ObsFunction and ObsDiagnostic Suffixes¶
In addition to, e.g., @GeoVaLs
, @MetaData
, @ObsValue
, @HofX
, there are two new suffixes that can be used.
@ObsFunction
indicates that a particular variable should be a registeredObsFunction
(ObsFunction
classes are defined in theufo/src/ufo/filters/obsfunctions
folder). One example of anObsFunction
isVelocity@ObsFunction
, which uses the 2 wind components to produce wind speed and can be used as follows:
- filter: Domain Check
filter variables:
- name: eastward_wind
- name: northward_wind
where:
- variable: Velocity@ObsFunction
maxvalue: 20.0
Warning: ObsFunctions are evaluated for all observations, including those that have been unselected by previous elements of the where
list or rejected by filters run earlier. This can lead to problems if these ObsFunctions incorrectly assume they will always be given valid inputs.
@ObsDiagnostic
will be used to store non-H(x) diagnostic values from thesimulateObs
function in individualObsOperator
classes. TheObsDiagnostics
interface class in OOPS is used to pass those diagnostics to theObsFilters
. Because the diagnostics are provided bysimulateObs
, they can only be used in filters that implement thepostFilter
function (currently only Background Check and Met Office Buddy Check). ThesimulateObs
interface toObsDiagnostics
will be first demonstrated in CRTM.In order to set up
ObsDiagnostics
for use in a filter, the following changes need to be made:In the constructor of the filter, ensure that the diagnostic is added to the
allvars_
variable. For instance:allvars_ += Variable("refractivity@ObsDiag");
. This step informs the code to set up the object, ready for use in the operator.In the observation operator, make sure that the
ObsDiagnostics
object is received, check that this contains the variables that you are expecting to save, and save the variables. An example of this (in Fortran) is in Met Office GNSS-RO operatorUse the variable in the filter via the
data_.get()
routine. For instance add:Variable refractivityVariable = Variable("refractivity@ObsDiag"); data_.get(refractivityVariable, iLevel, inputData);
in the main filter body
Filter Actions¶
The action taken on observations flagged by the filter can be adjusted using the action
option recognized by each filter. So far, four actions have been implemented:
reject
: observations flagged by the filter are marked as rejected.accept
: observations flagged by the filter are marked as accepted if they have previously been rejected for any reason other than missing data, a pre-processing flag indicating rejection, or failure of the ObsOperator.inflate error
: the error estimates of observations flagged by the filter are multiplied by a factor. This can be either a constant (specified using theinflation factor
option) or a variable (specified using theinflation variable
option).assign error
: the error estimates of observations flagged by the filter are set to a specified value. Again. this can be either a constant (specified using theerror parameter
option) or a variable (specified using theerror function
option).
The default action for almost all filters (taken when the action
keyword is omitted) is reject
. There are two exceptions: the default action of the AcceptList
filter is accept
and the Perform Action
filter has no default action (it requires the action
keyword to be present).
Example 1¶
- filter: Background Check
filter variables:
- name: air_temperature
threshold: 2.0
absolute threshold: 1.0
action:
name: reject
- filter: Background Check
filter variables:
- name: eastward_wind
- name: northward_wind
threshold: 2.0
where:
- variable: latitude
minvalue: -60.0
maxvalue: 60.0
action:
name: inflate error
inflation: 2.0
- filter: BlackList
filter variables:
- name: brightness_temperature
channels: *all_channels
action:
name: assign error
error function:
name: ObsErrorModelRamp@ObsFunction
channels: *all_channels
options:
channels: *all_channels
xvar:
name: CLWRetSymmetricMW@ObsFunction
options:
clwret_ch238: 1
clwret_ch314: 2
clwret_types: [ObsValue, HofX]
x0: [ 0.050, 0.030, 0.030, 0.020, 0.000,
0.100, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.030]
x1: [ 0.600, 0.450, 0.400, 0.450, 1.000,
1.500, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.200]
err0: [ 2.500, 2.200, 2.000, 0.550, 0.300,
0.230, 0.230, 0.250, 0.250, 0.350,
0.400, 0.550, 0.800, 3.000, 3.500]
err1: [20.000, 18.000, 12.000, 3.000, 0.500,
0.300, 0.230, 0.250, 0.250, 0.350,
0.400, 0.550, 0.800, 3.000, 18.000]
Example 2 - DrawObsErrorFromFile@ObsFunction¶
Next we demonstrate deriving the observation error from a NetCDF file which defines the variance/covariance:
- Filter: Perform Action
filter variables:
- name: air_temperature
action:
name: assign error
error function:
name: DrawObsErrorFromFile@ObsFunction
options:
file: <filepath>
interpolation:
- name: satellite_id@MetaData
method: exact
- name: processing_center@MetaData
method: exact
- name: air_pressure@MetaData
method: linear
Outer Loop Iterations¶
By default, filters are applied only before the first iteration of the outer loop of the data assimilation process. Use the apply at iterations
parameter to customize the set of iterations after which a particular filter is applied. In the example below, the Background Check filter will be run before the outer loop starts (“after the zeroth iteration”) and after the first iteration:
- filter: Background Check
apply at iterations: 0,1
threshold: 0.25