Additional QC Filter Options

Where Statement

By default, filters are applied to all observations of the variables specified in the filter variables list (or if this list is not present, all simulated variables). The where keyword can be used to apply a filter only to observations meeting certain conditions.

Consider the following set of observations:

Obs. index - latitude - longitude - airTemperature (K)

0 - 0 - 50 - 300

1 - 20 - 60 - 200

2 - 40 - 70 - 290

3 - 60 - 80 - 260

4 - 80 - 90 - 220

and suppose that we want to reject air temperature observations below 230 K taken in the tropical zone (between 30°S and 30°N). We could do this using the Bounds Check filter with a where statement:

- filter: Bounds Check
  filter variables: airTemperature
  minvalue: 230
  action:
    name: reject # this is the default action, specified explicitly for clarity
  where:
  - variable:
      name: MetaData/latitude
    minvalue: -30
    maxvalue:  30

This would cause the filter to be applied only to air temperature observations selected by the where statement, i.e. meeting the specified condition -30 <= MetaData/latitude <= 30. Please note this does not mean all these observations would be rejected; rather, it means the Bounds Check filter would inspect only these observations and apply its usual criteria (in this case, “is the air temperature below the minimum allowed value of 230 K?”) to decide whether any of them should be rejected. In our example, only observation 1 would be rejected, since this is the only observation (a) taken in the range of latitudes selected by the where statement and (b) with a value lying below the minimum value passed to the Bounds Check filter.

The list passed to the where keyword can contain more than one item, each representing a separate condition imposed on a particular variable. The filter is applied only to observations meeting all of these conditions. The following kinds of conditions are accepted:

  • minvalue and/or maxvalue: filter applied only to observations for which the condition variable lies in the specified range. The upper and lower bounds can be floating-point numbers or datetimes in the ISO 8601 format. If any date/time components are set to *, they are disregarded. See Example 2 below on where this can be useful. Each of these strings must be 20 characters long so defining ‘any year’ would be indicated by ****.

  • value: is_valid: filter applied only to observations for which the condition variable has a valid value (not a missing data indicator).

  • value: is_not_valid: filter applied only to observations for which the condition variable is set to a missing data indicator.

  • is_in: filter applied only to observations for which the condition variable is set to a value belonging to the given whitelist, which may contain either integers or strings. Integers can be given as single values or as ranges with or without brackets (e.g. 1-5 or [1-5]), or as comma separated lists without brackets (e.g. 1,2,3,4,5). Strings may be given as single values or as comma separated lists and must always be bracketed. Strings may be quoted or unquoted (e.g. [a,b,c,d,e] or ['a','b','c','d','e']).

  • is_close_to_any_of: filter applied only to observations for which the condition variable (a float) is close to any of the variables in the given reference list. List is specified as a bracketed list of floats (e.g. [0.0,1.0,2.0]). To specify a single value, pass a list with one element (e.g. [0.0]). Two variables are defined as close if they differ by less than a provided tolerance. The tolerance must be provided and can either be absolute (absolute_tolerance) or relative (relative_tolerance).

  • is_not_in: filter applied only to observations for which the condition variable is set to a value not belonging to the given blacklist. Variables are specified in the same way as for the is_in filter.

  • is_not_close_to_any_of: filter applied only to observations for which the condition variable (a float) is not close to any of the variables in the given reference list. Comparison variable(s) are specified in the same way as is_close_to_any_of. Two variables are defined as close if they differ by less than a provided tolerance. The tolerance must be provided and can either be absolute (absolute_tolerance) or relative (relative_tolerance).

  • value: is_true: filter applied only to observations for which the condition variable (normally a diagnostic flag) is set to true.

  • value: is_false: filter applied only to observations for which the condition variable (normally a diagnostic flag) is set to false.

  • any_bit_set_of: filter applied only to observations for which the condition variable is an integer with at least one of the bits with specified indices set.

  • any_bit_unset_of: filter applied only to observations for which the condition variable is an integer with at least one of the bits with specified indices unset (i.e. zero).

  • matches_regex: filter applied only to observations for which the condition variable is a string that matches the specified regular expression or an integer whose decimal representation matches that expression. The regular expression should conform to the ECMAScript syntax described at http://www.cplusplus.com/reference/regex/ECMAScript.

  • matches_wildcard: filter applied only to observations for which the condition variable is a string that matches the specified wildcard pattern or an integer whose decimal representation matches that pattern. The following wildcards are recognized: * (matching any number of characters, including zero) and ? (matching any single character).

  • matches_any_wildcard: filter applied only to observations for which the condition variable is a string that matches at least one of the specified wildcard patterns, or an integer whose decimal representation matches at least one of these patterns. The same wildcards are recognized as for matches_wildcard.

The elements of both whitelists and blacklists can be strings, non-negative integers or ranges of non-negative integers. Bits are numbered from zero starting from the least significant bit.

By default, if multiple conditions are used in a where statement then the logical and of the results is used to determine which locations are selected by the statement. The logical operator used to combine the results can be chosen explicitly with the where operator parameter; the permitted operators are and and or. The use of the or operator is illustrated in Example 11. Note that it is possible to use the where operator option without the where statement. The option has no impact in that case.

The following examples illustrate the use of these conditions.

Example 1

where:
- variable:
    name: GeoVaLs/sea_surface_temperature
  minvalue: 200
  maxvalue: 300
- variable:
    name: MetaData/latitude
  maxvalue: 60.
- variable:
    name: MetaData/height
  value: is_valid
- variable:
    name: MetaData/stationIdentification
  is_in: 3, 6, 11-120

In this example, the filter will be applied only to observations for which all of the following four criteria are met:

  • the sea surface temperature is within the range of [200, 300] K,

  • the latitude is <= than 60°N,

  • the observation location’s altitude has a valid value (is not set to a missing data indicator), and

  • the station id is one of the ids in the whitelist.

Example 2

where:
- variable:
    name: MetaData/datetime
  minvalue: "****-01-01T00:00:00Z"
  maxvalue: "****-25-05T00:00:00Z"
- variable:
    name: MetaData/datetime
  minvalue: "****-**-**T09:00:00Z"
  maxvalue: "****-**-**T18:00:00Z"

In this example, the filter will be applied only to observations taken between 09:00:00 and 18:00:00, between 1st January and 25th May of every year (end inclusive). Note that datetime components are not yet ‘loop aware’. That is, a where clause between May and February for example would require two filters: one covering the Jan-Feb period and a second to cover the May-Dec period.

Example 3

where:
- variable:
    name: PreQC/chlorophyllMassConcentration
  any_bit_set_of: 0, 1

In this example, the filter will be applied only to observations for which the PreQC/chlorophyllMassConcentration variable is an integer whose binary representation has a 1 at position 0 and/or position 1. (Position 0 denotes the least significant bit – in other words, bits are numbered “from right to left”.)

Example 4

where:
- variable:
    name: PreQC/chlorophyllMassConcentration
  any_bit_set_of: 4
- variable:
    name: PreQC/chlorophyllMassConcentration
  any_bit_unset_of: 10-12

In this example, the filter will be applied only to observations for which the PreQC/chlorophyllMassConcentration variable is an integer whose binary representation has a 1 at position 4 and a 0 at any of positions 10 to 12.

Example 5

where:
- variable:
    name: MetaData/stationIdentification
  matches_regex: 'EUR[A-Z]*'

In this example, the filter will be applied only to observations taken by stations whose IDs match the regular expression EUR[A-Z]*, i.e. consist of the string EUR followed by any number of capital letters.

Example 6

where:
- variable:
    name: MetaData/stationIdentification
  matches_wildcard: 'EUR??TEST*'

In this example, the filter will be applied only to observations taken by stations whose IDs match the wildcard pattern EUR??TEST*, i.e. consist of the string EUR followed by two arbitrary characters, the string TEST and any number of arbitrary characters.

Example 7

where:
- variable:
    name: MetaData/observationTypeNum
  matches_any_wildcard: ['102*', '103*']

In this example, assuming that MetaData/observationTypeNum is an integer variable, the filter will be applied only to observations whose types have decimal representations starting with 102 or 103.

Example 8

where:
- variable:
    name: GeoVaLs/model_elevation
  is_close_to_any_of: [0.0, 1.0]
  absolute_tolerance: 1.0e-12

In this example, assuming that GeoVaLs/model_elevation is a float variable, the filter will be applied only to observations whose model_elevation is within 1.0e-12 of either 0.0 or 1.0.

Example 9

where:
- variable:
    name: GeoVaLs/model_elevation
  is_not_close_to_any_of: [100.0, 200.0]
  relative_tolerance: 0.1

In this example, assuming that GeoVaLs/model_elevation is a float variable, the filter will be applied only to observations whose model_elevation is not within 10 % of either 100.0 or 200.0.

Example 10

where:
- variable:
    name: DiagnosticFlags/ExtremeValue/airTemperature
  value: is_true
- variable:
    name: DiagnosticFlags/ExtremeValue/relativeHumidity
  value: is_false

In this example, the filter will be applied only to observations with the ExtremeValue diagnostic flag set for the air temperature, but not for the relative humidity.

Example 11

where:
- variable:
    name: MetaData/latitude
  minvalue: 60.
- variable:
    name: MetaData/latitude
  maxvalue: -60.
where operator: or

In this example, the filter will be applied only to observations for which either of the following criteria are met:

  • the latitude is further north than 60°N,

  • the latitude is further south than 60°S.

ObsFunction and ObsDiagnostic Suffixes

In addition to, e.g., GeoVaLs/, MetaData/, ObsValue/, HofX/, there are two new suffixes that can be used.

  • ObsFunction/ indicates that a particular variable should be a registered ObsFunction (ObsFunction classes are defined in the ufo/src/ufo/filters/obsfunctions folder). One example of an ObsFunction is ObsFunction/Velocity, which uses the 2 wind components to produce wind speed and can be used as follows:

- filter: Domain Check
  filter variables:
  - name: windEastward
  - name: windNorthward
  where:
  - variable: ObsFunction/Velocity
    maxvalue: 20.0

Warning: ObsFunctions are evaluated for all observations, including those that have been unselected by previous elements of the where list or rejected by filters run earlier. This can lead to problems if these ObsFunctions incorrectly assume they will always be given valid inputs.

  • ObsDiagnostic/ will be used to store non-H(x) diagnostic values from the simulateObs function in individual ObsOperator classes. The ObsDiagnostics interface class in OOPS is used to pass those diagnostics to the ObsFilters. Because the diagnostics are provided by simulateObs, they can only be used in filters that implement the postFilter function (currently only Background Check and Met Office Buddy Check). The simulateObs interface to ObsDiagnostics will be first demonstrated in CRTM.

  • In order to set up ObsDiagnostics for use in a filter, the following changes need to be made:

    • In the constructor of the filter, ensure that the diagnostic is added to the allvars_ variable. For instance: allvars_ += Variable("ObsDiag/refractivity");. This step informs the code to set up the object, ready for use in the operator.

    • In the observation operator, make sure that the ObsDiagnostics object is received, check that this contains the variables that you are expecting to save, and save the variables. An example of this (in Fortran) is in Met Office GNSS-RO operator

    • Use the variable in the filter via the data_.get() routine. For instance add:

      Variable refractivityVariable = Variable("ObsDiag/refractivity");
      data_.get(refractivityVariable, iLevel, inputData);
      

      in the main filter body

Filter Actions

The action taken on observations flagged by the filter can be adjusted using the action option recognized by each filter. The following actions are available:

  • reject: observations flagged by the filter are marked as rejected.

  • accept: observations flagged by the filter are marked as accepted if they have previously been rejected for any reason other than missing observation value, a pre-processing flag indicating rejection, or failure of the observation operator.

  • passivate: observations flagged by the filter are marked as passive.

  • inflate error: the error estimates of observations flagged by the filter are multiplied by a factor. This can be either a constant (specified using the inflation factor option) or a variable (specified using the inflation variable option).

  • assign error: the error estimates of observations flagged by the filter are set to a specified value. Again, this can be either a constant (specified using the error parameter option) or a variable (specified using the error function option).

  • set and unset: the diagnostic flag indicated by the flag option will be set to true or false, respectively, at observations flagged by the filter. These actions recognize a further optional keyword ignore, which can be set to:

    • rejected observations if the diagnostic flag should not be changed at observations that have previously been rejected or

    • defective observations if the diagnostic flag should not be changed at observations that have previously been rejected because of a missing observation value, a pre-processing flag indicating rejection, or failure of the observation operator.

  • set flag bit: the bit of the bitmap diagnostic flag indicated by the bit option will be set at observations flagged by filter.

  • flag original and average profiles: rejects any observations in the original profiles that have been flagged by the filter, and also rejects all observations in any averaged profile whose corresponding original profile contains at least one flagged observation. See the example below for further details.

To perform multiple actions, replace the action option, which takes a single action, by actions, which takes a list of actions. This list may contain at most one action altering quality control flags, namely reject, accept and passivate; if present, such an action must be the last in the list. The action and actions options are mutually exclusive.

The default action for almost all filters (taken when both the action and actions keywords are omitted) is reject. There are two exceptions: the default action of the AcceptList filter is accept and the Perform Action filter has no default action (either the action or actions keyword must be present).

Example 1 - rejection, error inflation and assignment

- filter: Background Check
  filter variables:
  - name: airTemperature
  threshold: 2.0
  absolute threshold: 1.0
  action:
    name: reject
- filter: Background Check
  filter variables:
  - name: windEastward
  - name: windNorthward
  threshold: 2.0
  where:
  - variable: MetaData/latitude
    minvalue: -60.0
    maxvalue: 60.0
  action:
    name: inflate error
    inflation: 2.0
- filter: BlackList
  filter variables:
  - name: brightnessTemperature
  channels: *all_channels
  action:
    name: assign error
    error function:
      name: ObsFunction/ObsErrorModelRamp
      channels: *all_channels
      options:
        channels: *all_channels
        xvar:
          name: ObsFunction/CLWRetSymmetricMW
          options:
            clwret_ch238: 1
            clwret_ch314: 2
            clwret_types: [ObsValue, HofX]
        x0:    [ 0.050,  0.030,  0.030,  0.020,  0.000,
                0.100,  0.000,  0.000,  0.000,  0.000,
                0.000,  0.000,  0.000,  0.000,  0.030]
        x1:    [ 0.600,  0.450,  0.400,  0.450,  1.000,
                1.500,  0.000,  0.000,  0.000,  0.000,
                0.000,  0.000,  0.000,  0.000,  0.200]
        err0:  [ 2.500,  2.200,  2.000,  0.550,  0.300,
                0.230,  0.230,  0.250,  0.250,  0.350,
                0.400,  0.550,  0.800,  3.000,  3.500]
        err1:  [20.000, 18.000, 12.000,  3.000,  0.500,
                0.300,  0.230,  0.250,  0.250,  0.350,
                0.400,  0.550,  0.800,  3.000, 18.000]

Example 2 - error assignment using ObsFunction/DrawObsErrorFromFile

Next we demonstrate deriving the observation error from a NetCDF file which defines the variance/covariance:

- filter: Perform Action
  filter variables:
  - name: airTemperature
  action:
    name: assign error
    error function:
      name: ObsFunction/DrawObsErrorFromFile
      options:
        file: <filepath>
        interpolation:
        - name: MetaData/satelliteIdentifier
          method: exact
        - name: MetaData/dataProviderOrigin
          method: exact
        - name: MetaData/pressure
          method: linear

Example 3 - setting and unsetting a diagnostic flag

- filter: Bounds Check
  filter variables:
  - name: airTemperature
  min value: 250
  max value: 350
  # Set the ExtremeValue diagnostic flag at particularly
  # hot and cold observations, but do not reject them
  action:
    name: set
    flag: ExtremeValue
- filter: Perform Action
  filter variables:
  - name: airTemperature
  where:
  - variable:
      name: MetaData/latitude
    maxvalue: -60
  - variable:
      name: ObsValue/airTemperature
    maxvalue: 250
  # Unset the ExtremeValue diagnostic flag at cold observations
  # in the Antarctic
  action:
    name: unset
    flag: ExtremeValue

Example 4 - setting a diagnostic flag at observations rejected by a filter

In this example, a Domain Check filter rejecting observations outside the 60°S–60°N zonal band is followed by a Bounds Check filter rejecting temperature readings above 350 K and below 250 K. The observations rejected by the Bounds Check filter are additionally marked with the ExtremeCheck diagnostic flag. The ignore: rejected observations option passed to the set action ensures that observations that fail the criteria of the Bounds Check filter, but have already been rejected by the Domain Check filter, are not marked with the ExtremeCheck flag.

- filter: Domain Check
  where:
  - variable:
      name: MetaData/latitude
    minvalue: -60
    maxvalue:  60
- filter: Bounds Check
  filter variables:
  - name: airTemperature
  min value: 250
  max value: 350
  # Reject particularly hot and cold observations
  # and mark them with the ExtremeValue diagnostic flag
  actions:
  - name: set
    flag: ExtremeCheck
    ignore: rejected observations
  - name: reject

Example 5 - setting a diagnostic flag at observations accepted by a filter

In this example, observations taken in the zonal band 30°S–30°N that have previously been rejected for a reason other a missing observation value, a pre-processing flag indicating rejection, or failure of the observation operator are re-accepted and additionally marked with the Tropics diagnostic flag. The ignore: defective observations option passed to the set action ensures that the diagnostic flag is not assigned to observations that are not accepted because of their previous rejection for one of the reasons listed above.

- filter: AcceptList
  where:
  - variable:
      name: MetaData/latitude
    minvalue: -30
    maxvalue: 30
  actions:
  - name: set
    flag: Tropics
    ignore: defective observations
  - name: accept

Example 6: flag original and average profiles action

The flag original and averaged profiles action should only be used for data sets that satisfy two criteria:

  1. They have been grouped into records (profiles).

  2. They have an extended section of the ObsSpace that consists of profiles that have been averaged onto model levels.

The action rejects any observations in the original profiles that have been flagged by the filter. It also rejects all observations in any averaged profile whose corresponding original profile contains at least one flagged observation.

This action should therefore be used for filters that are run only on the original profiles; it enables the corresponding averaged profiles to be flagged without running the filter on them. Doing this reduces the chance of configuration errors occurring. It may also be desirable if the filter relies on properties of the original profiles that are not shared by the averaged profiles or if the filter is expensive to run. If rejecting observations in the averaged profile is not required, the standard reject action can be used instead.

NB the ObsSpace extension is produced with the following yaml options:

extension:
  allocate companion records with length: N

where N is an integer equal to the number of levels per averaged profile.

For example, to run the Gaussian Thinning filter on all of the original profiles and automatically flag the equivalent averaged profiles, the following yaml block can be used:

- filter: Gaussian Thinning
  where:
  - variable:
      name: MetaData/extendedObsSpace
    is_in: 0
  action:
    name: flag original and averaged profiles

If any location in an original profile is flagged (with the thinned flag in this case), all of the locations in the corresponding average profile are also flagged with thinned.

Example 7: Copy Flags From Extended To Original Space filter

Technically this is a UFO filter and not a filter action. The intention is that after applying a filter on only the extended section of the ObsSpace (only the model levels - see example yaml snippet below), any flags thus set can be copied back to the corresponding observation levels (in original space) by calling Copy Flags From Extended To Original Space. This is useful in cases where some filters are applied on model levels only and others on observation levels only, then consistency can be maintained throughout.

- filter: Bayesian Background Check
  where:
  - variable:
      name: MetaData/extendedObsSpace
    is_in: 1
  ...

This filter should only be used for data sets that satisfy two criteria:

  1. They have been grouped into records (profiles).

  2. They have an extended section of the ObsSpace that consists of profiles that have been averaged onto model levels.

See Example 6 above.

These yaml parameters are required:

  • filter variables: flags to copy from extended to original space. MUST be in the DiagnosticFlags group, otherwise filter throws an error.

  • observation vertical coordinate: variable containing the observation levels (e.g. air pressure, ocean depth) in its original space.

  • model vertical coordinate: variable containing the model levels (e.g. air pressure, ocean depth) in its extended space. (One way this can be achieved is by applying the ProfileAverage obsOperator on the extended space, in combination with another obsOperator such as VertInterp on the original space.)

Note that any diagnostic flags in the original space that are already set remain unchanged by this filter, regardless of whether the flag on the corresponding model level is set or unset; the filter can only set flags in original space that are currently unset. The filter also leaves flags unchanged in extended space. The filter does not alter any QC flags (only diagnostic flags specified by the user), nor the rejection status of any observation location. If the rejection status is also required to be set, then a subsequent Perform Action filter should be applied, with a where statement conditional on the diagnostic flag(s) copied across by Copy Flags From Extended to Original Space.

- filter: Copy Flags From Extended To Original Space
  where:
    - variable:
        name: ObsValue/waterPotentialTemperature
      value: is_valid
  filter variables:
    - name: DiagnosticFlags/BayBgCheckReject/salinity
    - name: DiagnosticFlags/BayBgCheckReject/waterPotentialTemperature
  observation vertical coordinate: DerivedObsValue/depthBelowWaterSurface
  model vertical coordinate: HofX/depthBelowWaterSurface

The example above matches up each observation level in the original space of DerivedObsValue/depthBelowWaterSurface with its corresponding model level in the extended space of HofX/depthBelowWaterSurface; for every unset observation-level flag in DiagnosticFlags/BayBgCheckReject/salinity and DiagnosticFlags/BayBgCheckReject/waterPotentialTemperature, for which ObsValue/waterPotentialTemperature is non-missing (due to the ‘where’ statement), the flag value at the corresponding model-level overwrites it. Be wary when using ‘where’ statements with this filter, because the ‘where’ statement covers all the filter variables listed - any where-excluded locations’ flag values remain unchanged.

Outer Loop Iterations

By default, filters are applied only before the first iteration of the outer loop of the data assimilation process. Use the apply at iterations parameter to customize the set of iterations after which a particular filter is applied. In the example below, the Background Check filter will be run before the outer loop starts (“after the zeroth iteration”) and after the first iteration:

- filter: Background Check
  apply at iterations: 0,1
  threshold: 0.25