DrawValueFromFile

The purpose of this obsfunction is to produce values by interpolating an array loaded from a file, indexed by coordinates whose names correspond to ObsSpace variables.

Values produced by this ObsFunction must be float, int or std::string, though coordinates themselves can be defined as float, int, std::string or util::DateTime.

Note that the return type of the obsfunction is specified in the group name:

DrawValueFromFile@ObsFunction -> Float return
DrawValueFromFile@IntObsFunction -> Integer return
DrawValueFromFile@StringObsFunction -> String return

Example 1 (minimal)

Here is an illustrative example where we derive some new variable in ObsSpace by extracting data from a CSV file column identified by the group option (DerivedObsValue is chosen in the examples below).

- filter: Variable Assignment
  assignments:
  - name: <some-new-variable-name>
    function:
      name: DrawValueFromFile@ObsFunction
      options:
        file: <path-to-input>    # path to the CSV/NetCDF file
        group: DerivedObsValue   # group with the payload variable
        interpolation:
        - name: satellite_id@MetaData
          method: exact

and the CSV file, located at <path-to-input>, might look like this:

station_id@MetaData,air_temperature@DerivedObsValue
string,float
ABC,0.1
DEF,0.2
GHI,0.3

The input file is loaded and at each location, the value of air_temperature@DerivedObsValue is extracted by

  • selecting the row of the CSV file in which the value in the station_id@MetaData column matches exactly the value of the station_id@MetaData ObsSpace variable at that location and

  • taking the value of the air_temperature@DerivedObsValue column from the selected row.

It is possible to customize this process in several ways by

  • making the air_temperature@DerivedObsValue dependent on more than one variable (see Example 2 (multi-channel) below).

  • using other interpolation methods than exact match (for example nearest-neighbor match or linear interpolation).

  • using a NetCDF rather than a CSV input file via the file option.

options

  • group: Group to identify the payload array (array being extracted/interpolated). This allows us to hold more than 1 payload per file if we were so inclined as long as each belongs to a different group.

  • channels: (Optional) List of channel numbers to match from our payload variable. See Example 2 (multi-channel) below.

  • file: Path to an input NetCDF or CSV file. The input file formats are described in more detail below.

  • interpolation: A list of one or more elements indicating how to map specific ObsSpace variables to slices of arrays loaded from the input file. This list is described in more detail below.

Input file formats

Supported file formats (backends) include NetCDF and CSV. Here we go into a little detail about these formats, described in detail below.

CSV

An input CSV file should have the following structure:

  • First line: comma-separated column names in ioda-v1 style (var@Group) or ioda-v2 style (Group/var)

  • Second line: comma-separated column data types (datetime, float, int or string)

  • Further lines: comma-separated data entries.

The number of entries in each line should be the same. The column order does not matter. One of the columns should belong to the group specified in the group option, indicating the payload array. Its data type should be a float, int, or std::string. The values from the other columns (sometimes called coordinates below) are compared against ObsSpace variables with the same names to determine the row or rows from which the payload is extracted at each location. The details of this comparison (e.g. whether an exact match is required, the nearest match is used, or piecewise linear interpolation is performed) depend on the interpolation option described below.

Notes:

  • A column containing channel numbers (which aren’t stored in a separate ObsSpace variable) should be labelled channel_number@MetaData or MetaData/channel_number.

  • Single underscores serve as placeholders for missing values; for example, the following row

    ABC,_,_
    

    contains missing values in the second and third columns.

NetCDF

ioda-v1 and ioda-v2-style NetCDF files are supported. ioda-v1-style files should have the following structure:

  • They contain a 1D, 2D or 3D payload array of type float or int or std::string with unique group name (that is, a name ending with @<groupname>).

  • Each dimension of this array should be indexed by at least one 1D coordinate array. Coordinates can be of type float, int or string. Datetimes should be represented as ISO-8601 strings (e.g. “2001-01-01T00:00:00Z”). Coordinate names should correspond to names of ObsSpace variables. Use the name channel_number@MetaData for channel numbers (for which there is no dedicated ObsSpace variable).

ioda-v2-style files are similar except that

  • Our payload array should be placed in the <groupname> group (rather than with a @<groupname> suffix).

  • Coordinate variables should be placed in appropriate groups, e.g. MetaData. Because of the limitations of the NetCDF file format, these variables can only be used as auxiliary coordinates of the payload variable (listed in its coordinates attribute).

The interpolation option

This list indicates which ObsSpace variables, and in which order, will be used as criteria for the extract step.

Each element of this list should have the following attributes:

  • name: Name of an ObsSpace variable (and of a coordinate present in the input CSV or NetCDF file).

  • method: Method used to map values of this variable at individual location to matching slices of the payload array loaded from the input file. This can be one of:

    • exact: Selects slices where the coordinate matches exactly the value of the specified ObsSpace variable.

      If no match is found, an error is reported unless there are slices where the indexing coordinate is set to the missing value placeholder; in this case these slices are selected instead. This can be used to define a fallback value (used if there is no exact match).

      This is the only method that can be used for variables of type string.

    • nearest: Selects slices where the coordinate is closest to the value of the specified ObsSpace variable.

      In case of a tie (e.g. if the value of the ObsSpace variable is 3 and the coordinate contains values 2 and 4, but not 3), the smaller of the candidate coordinate values is used (in this example, 2). This behaviour is arbitrarily chosen.

    • least upper bound: Select slices corresponding to the least value of the coordinate greater than or equal to the value of the specified ObsSpace variable.

    • greatest upper bound: Select slices corresponding to the greatest value of the coordinate less than or equal to the value of the specified ObsSpace variable.

    • linear: Performs a piecewise linear interpolation along the dimension indexed by the specified ObsSpace variable.

      This method is supported only by the obs function producing a float (not an int or a string). It can only be used for the final indexing variable, since it does not select slices, but produces the final result (a single value).

    • bilinear: Performs a bilinear interpolation along two dimensions indexed by the ObsSpace variables.

      This method is supported only by the obs function producing a float (not an int or a string). It can only be used for the final two indexing variables, since it does not select slices, but produces the final result (a single value).

    • extrapolation mode: Chosen behaviour in the case where an extraction step leads to extrapolation.

      By default (i.e. where no extrapolation is specified), no extrapolation is performed. That is, an exception is thrown where the point being extracted lies beyond the coordinate value range for the chosen interpolation algorithm. Various extrapolation modes are available, detailed below.

      • error: Throw an exception. This is the default behaviour when extrapolation mode is undefined.

      • nearest: Pick nearest index.

      • missing: Return a missing value indicator. Any subsequent extraction stages are then ignored.

At each location the criterion variables specified in the interpolation list are inspected in order, successively restricting the range of selected slices. An error is reported if the end result is an empty range of slices or (unless linear interpolation is used for the last criterion variable) a range containing more than one slice.

Note: If the channels option has been specified, the channel number is implicitly used as the first criterion variable and needs to match exactly a value from the channel_number@MetaData coordinate.

The following examples illustrate more advanced usage of this obsfunction.

Example 2 (multi-channel)

Here we illustrate how we might extend our first example by having multiple channels as well as additional variables over which the payload varies.

- filter: Variable Assignment
  assignments:
  - name: <some-new-variable-name>
    function:
      name: DrawValueFromFile@ObsFunction
      channels: &all_channels 1-3
      options:
        file: <path-to-input>    # path to the CSV/NetCDF file
        channels: *all_channels
        group: DerivedObsValue   # group with the payload variable
        interpolation:
        - name: satellite_id@MetaData
          method: exact
        - name: processing_center@MetaData
          method: exact
        - name: air_pressure@MetaData
          method: linear

Note the channel selection, using standard yaml syntax. Internally, channel number extraction is an ‘exact’ match step, done before any user defined interpolation takes place. Since there is no channel number variable in ObsSpace, we instead expect input data containing channel information to be described by the name channel_number@MetaData as mentioned in here.

This might be described by a CSV similar to:

station_id@MetaData,air_pressure@MetaData,channel_number@MetaData,mydata@DerivedObsValue
string,float,int,float
ABC,30000,0, 0.1
ABC,60000,0, 0.2
...

Our NetCDF might look something like:

netcdf mydata {
dimensions:
    index = 10 ;
variables:
    float mydata@DerivedObsValue(index) ;
    int index(index) ;
    int channel_number@MetaData(index) ;
    int satellite_id@MetaData(index) ;
    float air_pressure@MetaData(index) ;
...
}

Example 3 (extrapolation)

This time, we demonstrate utilising various extrapolation methods for our extract/interpolation steps:

- filter: Variable Assignment
  assignments:
  - name: <some-new-variable-name>
    function:
      name: DrawValueFromFile@ObsFunction
      options:
        file: <path-to-input>    # path to the CSV/NetCDF file
        group: DerivedObsValue      # group with the payload variable
        interpolation:
        - name: satellite_id@MetaData
          method: exact
          extrapolation mode: error
        - name: longitude@MetaData
          method: nearest
          extrapolation mode: missing
        - name: latitude@MetaData
          method: nearest
          extrapolation mode: nearest

Example 4 (bilinear interpolation)

Next we demonstrate the use of bilinear interpolation of two variables:

- filter: Variable Assignment
  assignments:
  - name: <some-new-variable-name>
    function:
      name: DrawValueFromFile@ObsFunction
      options:
        file: <path-to-input>    # path to the CSV/NetCDF file
        group: DerivedObsValue      # group with the payload variable
        interpolation:
        - name: longitude@MetaData
          method: bilinear
        - name: latitude@MetaData
          method: bilinear