Additional QC Filter Options

Where Statement

By default, filters are applied to all observations of the variables specified in the filter variables list (or if this list is not present, all simulated variables). The where keyword can be used to apply a filter only to observations meeting certain conditions.

Consider the following set of observations:

Obs. index - latitude - longitude - air_temperature (K)

0 - 0 - 50 - 300

1 - 20 - 60 - 200

2 - 40 - 70 - 290

3 - 60 - 80 - 260

4 - 80 - 90 - 220

and suppose that we want to reject air temperature observations below 230 K taken in the tropical zone (between 30°S and 30°N). We could do this using the Bounds Check filter with a where statement:

- filter: Bounds Check
  filter variables: air_temperature
  minvalue: 230
  action:
    name: reject # this is the default action, specified explicitly for clarity
  where:
  - variable:
      name: latitude@MetaData
    minvalue: -30
    maxvalue:  30

This would cause the filter to be applied only to air temperature observations selected by the where statement, i.e. meeting the specified condition -30 <= latitude@MetaData <= 30. Please note this does not mean all these observations would be rejected; rather, it means the Bounds Check filter would inspect only these observations and apply its usual criteria (in this case, “is the air temperature below the minimum allowed value of 230 K?”) to decide whether any of them should be rejected. In our example, only observation 1 would be rejected, since this is the only observation (a) taken in the range of latitudes selected by the where statement and (b) with a value lying below the minimum value passed to the Bounds Check filter.

The list passed to the where keyword can contain more than one item, each representing a separate condition imposed on a particular variable. The filter is applied only to observations meeting all of these conditions. The following kinds of conditions are accepted:

  • minvalue and/or maxvalue: filter applied only to observations for which the condition variable lies in the specified range. The upper and lower bounds can be floating-point numbers or datetimes in the ISO 8601 format. If any date/time components are set to *, they are disregarded. See Example 2 below on where this can be useful. Each of these strings must be 20 characters long so defining ‘any year’ would be indicated by ****.

  • is_defined: filter applied only to observations for which the condition variable has a valid value (not a missing data indicator).

  • is_not_defined: filter applied only to observations for which the condition variable is set to a missing data indicator.

  • is_in: filter applied only to observations for which the condition variable is set to a value belonging to the given whitelist.

  • is_close_to_any_of: filter applied only to observations for which the condition variable (a float) is close to any of the variables in the given reference list. Two variables are defined as close if they differ by less than a provided tolerance. The tolerance must be provided and can either be absolute (absolute_tolerance) or relative (relative_tolerance).

  • is_not_in: filter applied only to observations for which the condition variable is set to a value not belonging to the given blacklist.

  • is_not_close_to_any_of: filter applied only to observations for which the condition variable (a float) is not close to any of the variables in the given reference list. Two variables are defined as close if they differ by less than a provided tolerance. The tolerance must be provided and can either be absolute (absolute_tolerance) or relative (relative_tolerance).

  • is_true: filter applied only to observations for which the condition variable (normally a diagnostic flag) is set to true.

  • is_false: filter applied only to observations for which the condition variable (normally a diagnostic flag) is set to false.

  • any_bit_set_of: filter applied only to observations for which the condition variable is an integer with at least one of the bits with specified indices set.

  • any_bit_unset_of: filter applied only to observations for which the condition variable is an integer with at least one of the bits with specified indices unset (i.e. zero).

  • matches_regex: filter applied only to observations for which the condition variable is a string that matches the specified regular expression or an integer whose decimal representation matches that expression. The regular expression should conform to the ECMAScript syntax described at http://www.cplusplus.com/reference/regex/ECMAScript.

  • matches_wildcard: filter applied only to observations for which the condition variable is a string that matches the specified wildcard pattern or an integer whose decimal representation matches that pattern. The following wildcards are recognized: * (matching any number of characters, including zero) and ? (matching any single character).

  • matches_any_wildcard: filter applied only to observations for which the condition variable is a string that matches at least one of the specified wildcard patterns, or an integer whose decimal representation matches at least one of these patterns. The same wildcards are recognized as for matches_wildcard.

The elements of both whitelists and blacklists can be strings, non-negative integers or ranges of non-negative integers. It is not necessary to put any value after the colon following is_defined and is_not_defined. Bits are numbered from zero starting from the least significant bit.

By default, if multiple conditions are used in a where statement then the logical and of the results is used to determine which locations are selected by the statement. The logical operator used to combine the results can be chosen explicitly with the where operator parameter; the permitted operators are and and or. The use of the or operator is illustrated in Example 11. Note that it is possible to use the where operator option without the where statement. The option has no impact in that case.

The following examples illustrate the use of these conditions.

Example 1

where:
- variable:
    name: sea_surface_temperature@GeoVaLs
  minvalue: 200
  maxvalue: 300
- variable:
    name: latitude@MetaData
  maxvalue: 60.
- variable:
    name: height@MetaData
  is_defined:
- variable:
    name: station_id@MetaData
  is_in: 3, 6, 11-120

In this example, the filter will be applied only to observations for which all of the following four criteria are met:

  • the sea surface temperature is within the range of [200, 300] K,

  • the latitude is <= than 60°N,

  • the observation location’s altitude has a valid value (is not set to a missing data indicator), and

  • the station id is one of the ids in the whitelist.

Example 2

where:
- variable:
    name:  datetime@MetaData
  minvalue: "****-01-01T00:00:00Z"
  maxvalue: "****-25-05T00:00:00Z"
- variable:
    name:  datetime@MetaData
  minvalue: "****-**-**T09:00:00Z"
  maxvalue: "****-**-**T18:00:00Z"

In this example, the filter will be applied only to observations taken between 09:00:00 and 18:00:00, between 1st January and 25th May of every year (end inclusive). Note that datetime components are not yet ‘loop aware’. That is, a where clause between May and February for example would require two filters: one covering the Jan-Feb period and a second to cover the May-Dec period.

Example 3

where:
- variable:
    name: mass_concentration_of_chlorophyll_in_sea_water@PreQC
  any_bit_set_of: 0, 1

In this example, the filter will be applied only to observations for which the mass_concentration_of_chlorophyll_in_sea_water@PreQC variable is an integer whose binary representation has a 1 at position 0 and/or position 1. (Position 0 denotes the least significant bit – in other words, bits are numbered “from right to left”.)

Example 4

where:
- variable:
    name: mass_concentration_of_chlorophyll_in_sea_water@PreQC
  any_bit_set_of: 4
- variable:
    name: mass_concentration_of_chlorophyll_in_sea_water@PreQC
  any_bit_unset_of: 10-12

In this example, the filter will be applied only to observations for which the mass_concentration_of_chlorophyll_in_sea_water@PreQC variable is an integer whose binary representation has a 1 at position 4 and a 0 at any of positions 10 to 12.

Example 5

where:
- variable:
    name: station_id@MetaData
  matches_regex: 'EUR[A-Z]*'

In this example, the filter will be applied only to observations taken by stations whose IDs match the regular expression EUR[A-Z]*, i.e. consist of the string EUR followed by any number of capital letters.

Example 6

where:
- variable:
    name: station_id@MetaData
  matches_wildcard: 'EUR??TEST*'

In this example, the filter will be applied only to observations taken by stations whose IDs match the wildcard pattern EUR??TEST*, i.e. consist of the string EUR followed by two arbitrary characters, the string TEST and any number of arbitrary characters.

Example 7

where:
- variable:
    name: observation_type@MetaData
  matches_any_wildcard: ['102*', '103*']

In this example, assuming that observation_type@MetaData is an integer variable, the filter will be applied only to observations whose types have decimal representations starting with 102 or 103.

Example 8

where:
- variable:
    name: model_elevation@GeoVaLs
  is_close_to_any_of: [0.0, 1.0]
  absolute_tolerance: 1.0e-12

In this example, assuming that model_elevation@GeoVaLs is a float variable, the filter will be applied only to observations whose model_elevation is within 1.0e-12 of either 0.0 or 1.0.

Example 9

where:
- variable:
    name: model_elevation@GeoVaLs
  is_not_close_to_any_of: [100.0, 200.0]
  relative_tolerance: 0.1

In this example, assuming that model_elevation@GeoVaLs is a float variable, the filter will be applied only to observations whose model_elevation is not within 10 % of either 100.0 or 200.0.

Example 10

where:
- variable:
    name: DiagnosticFlags/ExtremeValue/air_temperature
  is_true:
- variable:
    name: DiagnosticFlags/ExtremeValue/relative_humidity
  is_false:

In this example, the filter will be applied only to observations with the ExtremeValue diagnostic flag set for the air temperature, but not for the relative humidity.

Example 11

where:
- variable:
    name: latitude@MetaData
  minvalue: 60.
- variable:
    name: latitude@MetaData
  maxvalue: -60.
where operator: or

In this example, the filter will be applied only to observations for which either of the following criteria are met:

  • the latitude is further north than 60°N,

  • the latitude is further south than 60°S.

ObsFunction and ObsDiagnostic Suffixes

In addition to, e.g., @GeoVaLs, @MetaData, @ObsValue, @HofX, there are two new suffixes that can be used.

  • @ObsFunction indicates that a particular variable should be a registered ObsFunction (ObsFunction classes are defined in the ufo/src/ufo/filters/obsfunctions folder). One example of an ObsFunction is Velocity@ObsFunction, which uses the 2 wind components to produce wind speed and can be used as follows:

- filter: Domain Check
  filter variables:
  - name: eastward_wind
  - name: northward_wind
  where:
  - variable: Velocity@ObsFunction
    maxvalue: 20.0

Warning: ObsFunctions are evaluated for all observations, including those that have been unselected by previous elements of the where list or rejected by filters run earlier. This can lead to problems if these ObsFunctions incorrectly assume they will always be given valid inputs.

  • @ObsDiagnostic will be used to store non-H(x) diagnostic values from the simulateObs function in individual ObsOperator classes. The ObsDiagnostics interface class in OOPS is used to pass those diagnostics to the ObsFilters. Because the diagnostics are provided by simulateObs, they can only be used in filters that implement the postFilter function (currently only Background Check and Met Office Buddy Check). The simulateObs interface to ObsDiagnostics will be first demonstrated in CRTM.

  • In order to set up ObsDiagnostics for use in a filter, the following changes need to be made:

    • In the constructor of the filter, ensure that the diagnostic is added to the allvars_ variable. For instance: allvars_ += Variable("refractivity@ObsDiag");. This step informs the code to set up the object, ready for use in the operator.

    • In the observation operator, make sure that the ObsDiagnostics object is received, check that this contains the variables that you are expecting to save, and save the variables. An example of this (in Fortran) is in Met Office GNSS-RO operator

    • Use the variable in the filter via the data_.get() routine. For instance add:

      Variable refractivityVariable = Variable("refractivity@ObsDiag");
      data_.get(refractivityVariable, iLevel, inputData);
      

      in the main filter body

Filter Actions

The action taken on observations flagged by the filter can be adjusted using the action option recognized by each filter. The following actions are available:

  • reject: observations flagged by the filter are marked as rejected.

  • accept: observations flagged by the filter are marked as accepted if they have previously been rejected for any reason other than missing observation value, a pre-processing flag indicating rejection, or failure of the observation operator.

  • passivate: observations flagged by the filter are marked as passive.

  • inflate error: the error estimates of observations flagged by the filter are multiplied by a factor. This can be either a constant (specified using the inflation factor option) or a variable (specified using the inflation variable option).

  • assign error: the error estimates of observations flagged by the filter are set to a specified value. Again, this can be either a constant (specified using the error parameter option) or a variable (specified using the error function option).

  • set and unset: the diagnostic flag indicated by the flag option will be set to true or false, respectively, at observations flagged by the filter. These actions recognize a further optional keyword ignore, which can be set to:

    • rejected observations if the diagnostic flag should not be changed at observations that have previously been rejected or

    • defective observations if the diagnostic flag should not be changed at observations that have previously been rejected because of a missing observation value, a pre-processing flag indicating rejection, or failure of the observation operator.

To perform multiple actions, replace the action option, which takes a single action, by actions, which takes a list of actions. This list may contain at most one action altering quality control flags, namely reject, accept and passivate; if present, such an action must be the last in the list. The action and actions options are mutually exclusive.

The default action for almost all filters (taken when both the action and actions keywords are omitted) is reject. There are two exceptions: the default action of the AcceptList filter is accept and the Perform Action filter has no default action (either the action or actions keyword must be present).

Example 1 - rejection, error inflation and assignment

- filter: Background Check
  filter variables:
  - name: air_temperature
  threshold: 2.0
  absolute threshold: 1.0
  action:
    name: reject
- filter: Background Check
  filter variables:
  - name: eastward_wind
  - name: northward_wind
  threshold: 2.0
  where:
  - variable: latitude@MetaData
    minvalue: -60.0
    maxvalue: 60.0
  action:
    name: inflate error
    inflation: 2.0
- filter: BlackList
  filter variables:
  - name: brightness_temperature
  channels: *all_channels
  action:
    name: assign error
    error function:
      name: ObsErrorModelRamp@ObsFunction
      channels: *all_channels
      options:
        channels: *all_channels
        xvar:
          name: CLWRetSymmetricMW@ObsFunction
          options:
            clwret_ch238: 1
            clwret_ch314: 2
            clwret_types: [ObsValue, HofX]
        x0:    [ 0.050,  0.030,  0.030,  0.020,  0.000,
                0.100,  0.000,  0.000,  0.000,  0.000,
                0.000,  0.000,  0.000,  0.000,  0.030]
        x1:    [ 0.600,  0.450,  0.400,  0.450,  1.000,
                1.500,  0.000,  0.000,  0.000,  0.000,
                0.000,  0.000,  0.000,  0.000,  0.200]
        err0:  [ 2.500,  2.200,  2.000,  0.550,  0.300,
                0.230,  0.230,  0.250,  0.250,  0.350,
                0.400,  0.550,  0.800,  3.000,  3.500]
        err1:  [20.000, 18.000, 12.000,  3.000,  0.500,
                0.300,  0.230,  0.250,  0.250,  0.350,
                0.400,  0.550,  0.800,  3.000, 18.000]

Example 2 - error assignment using DrawObsErrorFromFile@ObsFunction

Next we demonstrate deriving the observation error from a NetCDF file which defines the variance/covariance:

- filter: Perform Action
  filter variables:
  - name: air_temperature
  action:
    name: assign error
    error function:
      name: DrawObsErrorFromFile@ObsFunction
      options:
        file: <filepath>
        interpolation:
        - name: satellite_id@MetaData
          method: exact
        - name: processing_center@MetaData
          method: exact
        - name: air_pressure@MetaData
          method: linear

Example 3 - setting and unsetting a diagnostic flag

- filter: Bounds Check
  filter variables:
  - name: air_temperature
  min value: 250
  max value: 350
  # Set the ExtremeValue diagnostic flag at particularly
  # hot and cold observations, but do not reject them
  action:
    name: set
    flag: ExtremeValue
- filter: Perform Action
  filter variables:
  - name: air_temperature
  where:
  - variable:
      name: latitude@MetaData
    maxvalue: -60
  - variable:
      name: air_temperature@ObsValue
    maxvalue: 250
  # Unset the ExtremeValue diagnostic flag at cold observations
  # in the Antarctic
  action:
    name: unset
    flag: ExtremeValue

Example 4 - setting a diagnostic flag at observations rejected by a filter

In this example, a Domain Check filter rejecting observations outside the 60°S–60°N zonal band is followed by a Bounds Check filter rejecting temperature readings above 350 K and below 250 K. The observations rejected by the Bounds Check filter are additionally marked with the ExtremeCheck diagnostic flag. The ignore: rejected observations option passed to the set action ensures that observations that fail the criteria of the Bounds Check filter, but have already been rejected by the Domain Check filter, are not marked with the ExtremeCheck flag.

- filter: Domain Check
  where:
  - variable:
      name: latitude@MetaData
    minvalue: -60
    maxvalue:  60
- filter: Bounds Check
  filter variables:
  - name: air_temperature
  min value: 250
  max value: 350
  # Reject particularly hot and cold observations
  # and mark them with the ExtremeValue diagnostic flag
  actions:
  - name: set
    flag: ExtremeCheck
    ignore: rejected observations
  - name: reject

Example 5 - setting a diagnostic flag at observations accepted by a filter

In this example, observations taken in the zonal band 30°S–30°N that have previously been rejected for a reason other a missing observation value, a pre-processing flag indicating rejection, or failure of the observation operator are re-accepted and additionally marked with the Tropics diagnostic flag. The ignore: defective observations option passed to the set action ensures that the diagnostic flag is not assigned to observations that are not accepted because of their previous rejection for one of the reasons listed above.

- filter: AcceptList
  where:
  - variable:
      name: latitude@MetaData
    minvalue: -30
    maxvalue: 30
  actions:
  - name: set
    flag: Tropics
    ignore: defective observations
  - name: accept

Outer Loop Iterations

By default, filters are applied only before the first iteration of the outer loop of the data assimilation process. Use the apply at iterations parameter to customize the set of iterations after which a particular filter is applied. In the example below, the Background Check filter will be run before the outer loop starts (“after the zeroth iteration”) and after the first iteration:

- filter: Background Check
  apply at iterations: 0,1
  threshold: 0.25