ObsFunctionSelectStatistic

The output is all 0’s except for 1 in locations corresponding (or closest) to optionally the minimum, maximum, median or mean of the given input variable, within each record. Only supports float-type input, and outputs int-type only.

Required input parameters:

variable

The input variable. May be a multi-channel variable, in which case both the name of the variable and the channels must be given. Only one variable should be specified, otherwise the filter will stop with an error.

Optional input parameters:

select minimum

If true, output will contain 1 in one location per record where the input variable is minimum within that record. Default: false.

select maximum

If true, output will contain 1 in one location per record where the input variable is maximum within that record. Default: false.

select median

If true, output will contain 1 in one location per record where the input variable is closest to the median of all values of the input variable within that record. Default: false.

select mean

If true, output will contain 1 in one location per record where the input variable is closest to the mean computed from values of the input variable within that record. Default: false.

force select

If true, a record for which all values of the input variable are missing will still result in a 1 at the first location in each record. Default: false, the output would be all 0’s at locations of a record with all missing values of the input variable.

Note that any or all of the select ... options can be set to true - the output could then contain multiple 1’s per record. The 1’s do not add: for example, if selecting both mean and median, and they happen to be in the same location in one record, that location is still 1 in the output. If none of the select ... options are true, the output is all 0’s.

The “where” option is supported. By default, this ObsFunction only ignores missing values in the input variable. If some locations are required to be excluded from calculation of the statistic(s) to be selected, where the ObsValue is missing or a QC flag is set, then this must be made explicit in a where clause.

Example configuration:

An example with a multi-channel variable:

obs function:
  name: IntObsFunction/SelectStatistic
  options:
    where:
    - variable:
        name: ObsValue/var1
      is_defined:
    - variable:
        name: QCflagsData/var2
      is_in: 0
    variable:
    - name: MetaData/input_data
      channels: 1-3
    select minimum: true
    select maximum: true

Assuming the observations have been grouped into records (an example of grouping can be found in Profile consistency checks), this will return a 3-channel output with 1 in the locations of the minimum and maximum values of MetaData/input_data in each channel of each record. This excludes locations with missing values of ObsValue/var1, and locations where ObsValue/var2 has not passed QC so far. That is, 1 is only written to the locations where MetaData/input_data is lowest and highest out of the subset in each channel, each record, that satisfy both “where” conditions.

The variable assigned an output from this ObsFunction can then be used in a variety of ways:

E.g.1. as a priority variable in a Gaussian Thinning filter;

E.g.2. in a “where” clause to pick out a particular value, such as the deepest depth in each ocean profile:

- filter: Variable Assignment  # mask to select bottom level
  assignments:
  - name: DerivedMetaData/bottom_level
    type: int
    function:
      name: IntObsFunction/SelectStatistic
      options:
        variable:
        - name: MetaData/ocean_depth
        select maximum: true
        force select: true
- filter: Variable Assignment   # bottom depth
  where:
  - variable:
      name: DerivedMetaData/bottom_level
    is_in: 1
  assignments:
  - name: DerivedMetaData/bottom_depth
    type: float
    source variable: MetaData/ocean_depth
- filter: Variable Assignment   # bottom depth zeroes
  where:
  - variable:
      name: DerivedMetaData/bottom_level
    is_in: 1
  - variable:
      name: DerivedMetaData/number_of_levels
    is_in: 0
  assignments:
  - name: DerivedMetaData/bottom_depth
    type: float
    value: 0

This produces DerivedMetaData/bottom_depth which is all missing except for the largest value of MetaData/ocean_depth in each record; for records with all depths missing, DerivedMetaData/bottom_depth has 0 written to the first level of such records. (Assuming DerivedMetaData/number_of_levels has been produced previously using ProfileLevelCount.) This is thanks to the use of force select: true, otherwise records with all depths missing would be all missing values in DerivedMetaData/bottom_depth.

E.g.3. to un-flag certain locations after a QC filter has been applied:

- filter: Variable Assignment # un-flag surface level in each profile
  assignments:
  - name: DerivedMetaData/surface_level
    type: int
    function:
      name: IntObsFunction/SelectStatistic
      options:
        variable: MetaData/ocean_depth
        select minimum: true
- filter: Perform Action
  where:
  - variable:
      name: DerivedMetaData/surface_level
    is_in: 1
  actions:
  - name: unset
    flag: LevelSubsampleReject
  - name: accept

So if a previous filter that had set the Diagnostic Flag LevelSubsampleReject had rejected the surface level in any profiles, the surface level would be reinstated in those profiles, and everything else would be left untouched.