DrawValueFromFile

The purpose of this obsfunction is to produce values by interpolating an array loaded from a file, indexed by coordinates whose names correspond to ObsSpace variables.

Values produced by this ObsFunction must be float, int or std::string, though coordinates themselves can be defined as float, int, std::string or util::DateTime.

Note that the return type of the obsfunction is specified in the group name:

ObsFunction/DrawValueFromFile -> Float return
IntObsFunction/DrawValueFromFile -> Integer return
StringObsFunction/DrawValueFromFile -> String return

Example 1 (minimal)

Here is an illustrative example where we derive some new variable in ObsSpace by extracting data from a CSV file column identified by the group option (DerivedObsValue is chosen in the examples below).

- filter: Variable Assignment
  assignments:
  - name: <some-new-variable-name>
    function:
      name: ObsFunction/DrawValueFromFile
      options:
        file: <path-to-input>    # path to the CSV/NetCDF file
        group: DerivedObsValue   # group with the payload variable
        interpolation:
        - name: MetaData/satelliteIdentifier
          method: exact

and the CSV file, located at <path-to-input>, might look like this:

MetaData/stationIdentification,DerivedObsValue/airTemperature
string,float
ABC,0.1
DEF,0.2
GHI,0.3

The input file is loaded and at each location, the value of DerivedObsValue/airTemperature is extracted by

  • selecting the row of the CSV file in which the value in the MetaData/stationIdentification column matches exactly the value of the MetaData/stationIdentification ObsSpace variable at that location and

  • taking the value of the DerivedObsValue/airTemperature column from the selected row.

It is possible to customize this process in several ways by

  • making the DerivedObsValue/airTemperature dependent on more than one variable (see Example 2 (multi-channel) below).

  • using other interpolation methods than exact match (for example nearest-neighbor match or linear interpolation).

  • using a NetCDF rather than a CSV input file via the file option.

options

  • group: Group to identify the payload array (array being extracted/interpolated). This allows us to hold more than 1 payload per file if we were so inclined as long as each belongs to a different group.

  • channels: (Optional) List of channel numbers to match from our payload variable. See Example 2 (multi-channel) below.

  • file: Path to an input NetCDF or CSV file. The input file formats are described in more detail below.

  • interpolation: A list of one or more elements indicating how to map specific ObsSpace variables to slices of arrays loaded from the input file. This list is described in more detail below.

Input file formats

Supported file formats (backends) include NetCDF and CSV. Here we go into a little detail about these formats, described in detail below.

CSV

An input CSV file should have the following structure:

  • First line: comma-separated column names in ioda-v1 style (var@Group) or ioda-v2 style (Group/var)

  • Second line: comma-separated column data types (datetime, float, int or string)

  • Further lines: comma-separated data entries.

The number of entries in each line should be the same. The column order does not matter. One of the columns should belong to the group specified in the group option, indicating the payload array. Its data type should be a float, int, or std::string. The values from the other columns (sometimes called coordinates below) are compared against ObsSpace variables with the same names to determine the row or rows from which the payload is extracted at each location. The details of this comparison (e.g. whether an exact match is required, the nearest match is used, or piecewise linear interpolation is performed) depend on the interpolation option described below.

Notes:

  • A column containing channel numbers (which aren’t stored in a separate ObsSpace variable) should be labelled sensorChannelNumber@MetaData or MetaData/sensorChannelNumber.

  • Single underscores serve as placeholders for missing values; for example, the following row

    ABC,_,_
    

    contains missing values in the second and third columns.

NetCDF

ioda-v1 and ioda-v2-style NetCDF files are supported. ioda-v1-style files should have the following structure:

  • They contain a 1D, 2D or 3D payload array of type float or int or std::string with unique group name (that is, a name beginning with <groupname>/).

  • Each dimension of this array should be indexed by at least one 1D coordinate array. Coordinates can be of type float, int or string. Datetimes should be represented as ISO-8601 strings (e.g. “2001-01-01T00:00:00Z”). Coordinate names should correspond to names of ObsSpace variables. Use the name MetaData/sensorChannelNumber for channel numbers (for which there is no dedicated ObsSpace variable).

ioda-v2-style files are similar except that

  • Our payload array should be placed in the <groupname> group (rather than with a <groupname>/ suffix).

  • Coordinate variables should be placed in appropriate groups, e.g. MetaData. Because of the limitations of the NetCDF file format, these variables can only be used as auxiliary coordinates of the payload variable (listed in its coordinates attribute).

The interpolation option

This list indicates which ObsSpace variables, and in which order, will be used as criteria for the extract step.

Each element of this list should have the following attributes:

  • name: Name of an ObsSpace variable (and of a coordinate present in the input CSV or NetCDF file).

  • method: Method used to map values of this variable at individual location to matching slices of the payload array loaded from the input file. This can be one of:

    • exact: Selects slices where the coordinate matches exactly the value of the specified ObsSpace variable.

      If no match is found, an error is reported unless there are slices where the indexing coordinate is set to the missing value placeholder; in this case these slices are selected instead. This can be used to define a fallback value (used if there is no exact match).

      This is the only method that can be used for variables of type string.

    • nearest: Selects slices where the coordinate is closest to the value of the specified ObsSpace variable.

      In case of a tie (e.g. if the value of the ObsSpace variable is 3 and the coordinate contains values 2 and 4, but not 3), the smaller of the candidate coordinate values is used (in this example, 2). This behaviour is arbitrarily chosen.

    • least upper bound: Select slices corresponding to the least value of the coordinate greater than or equal to the value of the specified ObsSpace variable.

    • greatest upper bound: Select slices corresponding to the greatest value of the coordinate less than or equal to the value of the specified ObsSpace variable.

    • linear: Performs a piecewise linear interpolation along the dimension indexed by the specified ObsSpace variable.

      This method is supported only for the obs function producing a float (not an int or a string). It can only be used for the final indexing variable, since it does not select slices, but produces the final result (a single value).

    • bilinear: Performs a bilinear interpolation along two dimensions indexed by the ObsSpace variables.

      This method is supported only for the obs function producing a float (not an int or a string). It can only be used for the final two indexing variables, since it does not select slices, but produces the final result (a single value).

    • trilinear: Performs a trilinear interpolation along three dimensions indexed by the ObsSpace variables.

      This method is supported only for the obs function producing a float (not an int or a string). The three interpolation variables must also be floats.

      It is possible to specify log-linear interpolation along each dimension with the option coordinate transformation: loglinear. For further context see example 5 below.

    • extrapolation mode: Chosen behaviour in the case where an extraction step leads to extrapolation.

      By default (i.e. where no extrapolation is specified), no extrapolation is performed. That is, an exception is thrown where the point being extracted lies beyond the coordinate value range for the chosen interpolation algorithm. Various extrapolation modes are available, detailed below.

      • error: Throw an exception. This is the default behaviour when extrapolation mode is undefined.

      • nearest: Pick nearest index.

      • missing: Return a missing value indicator. Any subsequent extraction stages are then ignored.

At each location the criterion variables specified in the interpolation list are inspected in order, successively restricting the range of selected slices. An error is reported if the end result is an empty range of slices or (unless linear interpolation is used for the last criterion variable) a range containing more than one slice.

Note: If the channels option has been specified, the channel number is implicitly used as the first criterion variable and needs to match exactly a value from the MetaData/sensorChannelNumber coordinate.

The following examples illustrate more advanced usage of this obsfunction.

Example 2 (multi-channel)

Here we illustrate how we might extend our first example by having multiple channels as well as additional variables over which the payload varies.

- filter: Variable Assignment
  assignments:
  - name: <some-new-variable-name>
    function:
      name: ObsFunction/DrawValueFromFile
      channels: &all_channels 1-3
      options:
        file: <path-to-input>    # path to the CSV/NetCDF file
        channels: *all_channels
        group: DerivedObsValue   # group with the payload variable
        interpolation:
        - name: MetaData/satelliteIdentifier
          method: exact
        - name: MetaData/dataProviderOrigin
          method: exact
        - name: MetaData/pressure
          method: linear

Note the channel selection, using standard yaml syntax. Internally, channel number extraction is an ‘exact’ match step, done before any user defined interpolation takes place. Since there is no channel number variable in ObsSpace, we instead expect input data containing channel information to be described by the name MetaData/sensorChannelNumber as mentioned in here.

This might be described by a CSV similar to:

MetaData/stationIdentification,MetaData/pressure,MetaData/sensorChannelNumber,DerivedObsValue/mydata
string,float,int,float
ABC,30000,0, 0.1
ABC,60000,0, 0.2
...

Our NetCDF might look something like:

netcdf mydata {
dimensions:
    index = 10 ;
variables:
    float DerivedObsValue/mydata(index) ;
    int index(index) ;
    int MetaData/sensorChannelNumber(index) ;
    int MetaData/satelliteIdentifier(index) ;
    float MetaData/pressure(index) ;
...
}

Example 3 (extrapolation)

This time, we demonstrate utilising various extrapolation methods for our extract/interpolation steps:

- filter: Variable Assignment
  assignments:
  - name: <some-new-variable-name>
    function:
      name: ObsFunction/DrawValueFromFile
      options:
        file: <path-to-input>    # path to the CSV/NetCDF file
        group: DerivedObsValue      # group with the payload variable
        interpolation:
        - name: MetaData/satelliteIdentifier
          method: exact
          extrapolation mode: error
        - name: MetaData/longitude
          method: nearest
          extrapolation mode: missing
        - name: MetaData/latitude
          method: nearest
          extrapolation mode: nearest

Example 4 (bilinear interpolation)

Next we demonstrate the use of bilinear interpolation of two variables:

- filter: Variable Assignment
  assignments:
  - name: <some-new-variable-name>
    function:
      name: ObsFunction/DrawValueFromFile
      options:
        file: <path-to-input>    # path to the CSV/NetCDF file
        group: DerivedObsValue      # group with the payload variable
        interpolation:
        - name: MetaData/longitude
          method: bilinear
        - name: MetaData/latitude
          method: bilinear

Example 5 (trilinear interpolation)

The following example shows the use of trilinear interpolation of three variables (latitude, longitude and air pressure). The interpolation is performed log-linearly in pressure. Any out-of-bounds values are set to the value of the relevant bound prior to performing the interpolation.

- filter: Variable Assignment
  assignments:
  - name: <some-new-variable-name>
    function:
      name: ObsFunction/DrawValueFromFile
      options:
        file: <path-to-input>    # path to the CSV/NetCDF file
        group: DerivedObsValue      # group with the payload variable
        interpolation:
        - name: MetaData/longitude
          method: trilinear
          extrapolation mode: nearest
        - name: MetaData/latitude
          method: trilinear
          extrapolation mode: nearest
        - name: MetaData/pressure
          method: trilinear
          coordinate transformation: loglinear
          extrapolation mode: nearest