Variational Bias Correction in UFO¶

Note

Summarized from :

[ZDC+14]

[Dee04]

Linear combination formulation¶

\[\vec{y} = H(\vec{x}) + \sum_{i=0}^{N} \beta_i p_i(\vec{x}) + \tilde{\vec{e}_o}\]

with scalar coefficients \(\beta_i, i = 0, . . . ,N\) . The selection of predictors \(p_i(\vec{x}), i = 1, . . . ,N\) is flexible and depends on the instrument and channel.

Augmented control variable¶

Define the augmented control vector

\[\vec{z}^T = \lbrack \vec{x}^T \vec{\beta}^T \rbrack\]

Therefore, the cost function to be minimized

\[J(\vec{z}) = \frac{1}{2} (\vec{z}_b - \vec{z})^T \textbf{Z}^{-1} (\vec{z}_b - \vec{z}) + \frac{1}{2} (\vec{y} - \tilde{H}(\vec{z}))^T \textbf{R}^{-1} (\vec{y} - \tilde{H}(\vec{z}))\]

where

(17)¶\[\tilde{H}(\vec{z}) = H(\vec{x}) + \sum_{i=0}^{N} \beta_i p_i(\vec{x})\]

Adjoint of the bias model¶

In the incremental formulation of the variational analysis, nonlinear observation operators are linearized about the latest outer-loop estimate \(\overline{\vec{x}}\) of \(\vec{x}\) . Similarly, for the modified operator we use

\[\begin{split}H(\vec{x}, \beta) \approx H(\overline{\vec{x}}, \beta) & = H(\overline{\vec{x}}) + \sum_{i=0}^{N} \beta_i p_i(\overline{\vec{x}}) \\ & = H(\overline{\vec{x}}) + \mathcal{P}(\overline{\vec{x}}) \cdot \vec{\beta}\end{split}\]

where \(\mathcal{P}(\overline{\vec{x}})\) is a \(m × n\) predictor matrix consisting of \(n\) predictors evaluated on \(m\) observation locations.

The modification to \(H(\vec{x})\) is therefore additive and linear in the bias parameters, and its adjoint with respect to these additional control parameters is trivial to implement.

For the linear predictor model (17), the derivatives with respect to the parameters are simply the values of the predictors at the observation locations

\[\frac{\partial H }{\partial \vec{\beta}} \Bigg \vert_{\vec{\beta} = \vec{\beta}_b} = \mathcal{P}(\overline{\vec{x}})\]

Background error covariance¶

In general the parameter estimation errors will be correlated with the state estimation errors, because they depend on the same data. We know of no practical way to account for this statistical dependence, and therefore take

\[\begin{split}\textbf{Z} = \begin{bmatrix} \textbf{B}_x & 0 \\ 0 & \textbf{B}_{\beta} \end{bmatrix}\end{split}\]

where \(\textbf{B}_x\) denotes the usual (state) background error covariance, and \(\textbf{B}_\beta\) the parameter background error covariance.

We take \(\textbf{B}_\beta\) diagonal:

\[\begin{split}\textbf{B}_\beta & = diag(\sigma_{\beta_1}^2, ...., \sigma_{\beta_n}^2) \\ & = \begin{bmatrix} \sigma_{\beta_1}^2 & & \\ & \ddots & \\ & & \sigma_{\beta_n}^2 \end{bmatrix} \\ & = \begin{bmatrix} \frac{\sigma_{o_1}^2}{N_1} & & \\ & \ddots & \\ & & \frac{\sigma_{o_n}^2}{N_j} \end{bmatrix}\end{split}\]

Here \(\beta_j\) denotes the \(j^{th}\) bias parameter, \(\sigma_{o_j}\) is the error standard deviation of the observations associated with \(\beta_j\), and \(N_j\) is a positive integer represents the number of observations.

Note

For example, taking \(N_j = 10,000\) for all parameters, the system will adapt quickly to changes in the bias for a clean channel generating thousands of radiances per analysis cycle.

On the other hand, it will respond slowly to a cloudy channel that generates only a few hundreds of data per cycle.

Note

When the \(N_j\) are sufficiently large (say, \(N_j >> 100\) ), the effect of neglecting off-diagonal elements of the parameter background error covariance matrix should be insignificant. This is because \(\mathcal{O}(N_j)\) observations are used to estimate just a few bias parameters; the estimation errors will be small even when the estimation is suboptimal.

The situation is, of course, very different for the state estimation, which can be extremely sensitive to the specification of the background error covariances, especially in data-sparse areas.

VarBC example¶

To use the bias correction in an observation operator, add the obs bias section as shown in the highlighted lines below.

observations:
- obs space:
    name: AMSUA-NOAA19
    ...
    simulated variables: [brightness_temperature]
    channels: &channels 1-15
  obs operator:
    name: CRTM
    obs options:
      Sensor_ID: &Sensor_ID amsua_n19
    ...
  obs bias:
    input file: Data/obs/satbias_crtm_in_amsua_n19.nc4
    variational bc:
      predictors:
      - name: constant
      - name: emissivity
      - name: scan_angle
        order: 4
      - name: scan_angle
        order: 3
      - name: scan_angle
        order: 2
      - name: scan_angle

Here is the detailed explanation:

Defines the predictors (required)
Here, we defined 6 predictors to be used for VarBC, which are constant, emissivity, and 1st, 2nd, 3rd, 4th order scan_angle, respectively. To find what predictor functions are available, please refer to directory ufo/src/ufo/predictors/.
variational bc:
  predictors:
  - name: constant
  - name: emissivity
  - name: scan_angle
    order: 4
  - name: scan_angle
    order: 3
  - name: scan_angle
    order: 2
  - name: scan_angle
Defines the input file for the bias coefficients prior (optional)

Usually, the prior is coming from the previous data assimilation cycle. If it is not available, all coefficients will start with zero.
input file: Data/obs/satbias_crtm_in_amsua_n19.nc4

Static Bias Correction in UFO¶

Static bias correction is handled very similarly to variational bias correction. Mathematically, the only difference is that the coefficients \(\beta_i\) of predictors used for static bias correction are kept constant and equal to 1. These predictors are defined in the obs bias.static bc YAML section, whose syntax is identical to obs bias.variational bc. For example,

static bc:
  predictors:
  - name: interpolate_data_from_file
    corrected variables:
    - name: air_temperature
      file: air_temperature_static_bc.csv
      interpolation:
      - name: station_id@MetaData
        method: exact
    - name: relative_humidity
      file: relative_humidity_static_bc.csv
      interpolation:
      - name: station_id@MetaData
        method: exact
      - name: air_pressure@MetaData
        method: least upper bound

See the interpolate_data_from_file section for more information about the predictor used above, which was written specifically with static bias correction in mind.

Available Predictors¶

cloud_liquid_water¶

Cloud liquid water.

The following options are supported:

satellite: Satellite reference name such as SSMIS; this lets the predictor know which which channels to expect. At present SSMIS is the only supported satellite.
varGroup: (Optional) Name of the ObsSpace group from which brightness temperatures will be loaded. By default, ObsValue.
ch...: Satellite-dependent channel numbers used for cloud liquid water calculation. For SSMIS the following channel numbers need to be specified: ch19h, ch19v, ch22v, ch37h, ch37v, ch91h and ch91v:.

Example¶

name: cloud_liquid_water
satellite: SSMIS
ch19h: 12
ch19v: 13
ch22v: 14
ch37h: 15
ch37v: 16
ch91v: 17
ch91h: 18

constant¶

A predictor equal to one at all locations.

cosine_of_latitude_times_orbit_node¶

Cosine of the observation latitude multiplied by the sensor azimuth angle.

emissivity¶

Emissivity.

interpolate_data_from_file¶

A predictor drawing values from an input CSV or NetCDf file depending on the values of specified ObsSpace variables. Typically used for static bias correction.

Example 1 (minimal)¶

Consider a simple example first and suppose this predictor is configured as follows:

name: interpolate_data_from_file
corrected variables:
- name: air_temperature
  file: myfolder/example_1.csv
  interpolation:
  - name: station_id@MetaData
    method: exact

and the example_1.csv file looks like this:

station_id@MetaData,air_temperature@ObsBias
string,float
ABC,0.1
DEF,0.2
GHI,0.3

The predictor will load this file and at each location compute the bias correction of air temperature by

selecting the row of the CSV file in which the value in the station_id@MetaData column matches exactly the value of the station_id@MetaData ObsSpace variable at that location and
taking the value of the air_temperature@ObsBias column from the selected row.

It is possible to customize this process in several ways by

correcting more than one variable
making the bias correction dependent on more than one variable
using other interpolation methods than exact match (for example nearest-neighbor match or linear interpolation)
using a NetCDF rather than a CSV input file.

This is explained in more detail below.

The `corrected variables` option¶

Each element of the corrected variables list specifies how to generate bias corrections for a particular bias-corrected variable and should have the following attributes:

name: Name of a bias-corrected variable.
channels: (Optional) List of channel numbers of the bias-corrected variable.
file: Path to an input NetCDF or CSV file. The input file formats are described in more detail below.
interpolation A list of one or more elements indicating how to map specific ObsSpace variables to slices of arrays loaded from the input file. This list is described in more detail below.

The predictor produces zeros for all bias-corrected variables missing from the corrected variables list.

Input file formats¶

CSV¶

An input CSV file should have the following structure:

First line: comma-separated column names in ioda-v1 style (var@Group) or ioda-v2 style (Group/var)
Second line: comma-separated column data types (datetime, float, int or string)
Further lines: comma-separated data entries.

The number of entries in each line should be the same. The column order does not matter. One of the columns should belong to the ObsBias group and contain the bias corrections to use in specific circumstances. Its data type should be either float or int. The values from the other columns (sometimes called coordinates below) are compared against ObsSpace variables with the same names to determine the row or rows from which the bias correction is extracted at each location. The details of this comparison (e.g. whether an exact match is required, the nearest match is used, or piecewise linear interpolation is performed) depend on the interpolation option described below.

Notes:

A column containing channel numbers (which aren’t stored in a separate ObsSpace variable) should be labelled channel_number@MetaData or MetaData/channel_number, as shown in Example 3 (multichannel variables) below.
Single underscores serve as placeholders for missing values; for example, the following row
```
ABC,_,_
```
contains missing values in the second and third columns.

NetCDF¶

ioda-v1 and ioda-v2-style NetCDF files are supported. ioda-v1-style files should have the following structure:

They should contain exactly one 1D or 2D array of type float with a name ending with @ObsBias and containing bias corrections.
Each dimension of this array should be indexed by at least one 1D coordinate array. Coordinates can be of type float, int or string. Datetimes should be represented as ISO-8601 strings. Coordinate names should correspond to names of ObsSpace variables. Use the name channel_number@MetaData for channel numbers (for which there is no dedicated ObsSpace variable).

ioda-v2-style files are similar except that

Bias corrections should be stored in an array placed in the ObsBias group (rather than with a @ObsBias suffix).
Coordinate variables should be placed in appropriate groups, e.g. MetaData. Because of the limitations of the NetCDF file format, these variables can only be used as auxiliary coordinates of the payload variable (listed in its coordinates attribute).

The `interpolation` option¶

This list indicates which ObsSpace variables, and in which order, will be used as criteria determining the value produced by the predictor.

Each element of this list should have the following attributes:

name: Name of an ObsSpace variable (and of a coordinate present in the input CSV or NetCDF file).
method: Method used to map values of this variable at individual location to matching slices of the bias correction array loaded from the input file. This can be one of:
- exact: Selects slices where the coordinate matches exactly the value of the specified ObsSpace variable.
  
  If no match is found, an error is reported unless there are slices where the indexing coordinate is set to the missing value placeholder; in this case these slices are selected instead. This can be used to define a fallback value (used if there is no exact match), as shown in Example 4 (fallback values, ranges) below.
  
  This is the only method that can be used for variables of type string.
- nearest: Selects slices where the coordinate is closest to the value of the specified ObsSpace variable.
  
  In case of a tie (e.g. if the value of the ObsSpace variable is 3 and the coordinate contains values 2 and 4, but not 3), the smaller of the candidate coordinate values is used (in this example, 2).
- least upper bound: Select slices corresponding to the least value of the coordinate greater than or equal to the value of the specified ObsSpace variable.
- greatest upper bound: Select slices corresponding to the greatest value of the coordinate less than or equal to the value of the specified ObsSpace variable.
- linear: Performs a piecewise linear interpolation along the dimension indexed by the specified ObsSpace variable.
  
  This method can only be used in the last element of the interpolation list.

At each location the criterion variables specified in the interpolation list are inspected in order, successively restricting the range of selected slices. An error is reported if the end result is an empty range of slices or (unless linear interpolation is used for the last criterion variable) a range containing more than one slice.

Note: If the channels option has been specified, the channel number is implicitly used as the first criterion variable and needs to match exactly a value from the channel_number@MetaData coordinate.

The following examples illustrate more advanced applications of this predictor.

Example 2 (multiple criterion variables, linear interpolation)¶

To make the air-temperature bias correction depend not only on the station ID, but also on the air pressure, we could use the following YAML snippet

name: interpolate_data_from_file
corrected variables:
- name: air_temperature
  file: example_2.csv
  interpolation:
  - name: station_id@MetaData
    method: exact
  - name: air_pressure@MetaData
    method: linear

and CSV file:

station_id@MetaData,air_pressure@MetaData,air_temperature@ObsBias
string,float,float
ABC,30000,0.1
ABC,60000,0.2
ABC,90000,0.3
XYZ,40000,0.4
XYZ,80000,0.5

For an observation taken by station XYZ at pressure 60000 the bias correction would be evaluated in the following way:

First, find all rows in the CSV file with a value of XYZ in the station_id@MetaData column.
Then take the values of the air_pressure@MetaData and air_temperature@ObsBias columns in these rows and use them to construct a piecewise linear interpolant. Evaluate this interpolant at pressure 60000. This produces the value of 0.45.

Example 3 (multichannel variables)¶

To make the brightness-temperature bias correction vary with the channel number and scan position, we could use the following YAML snippet

name: interpolate_data_from_file
corrected variables:
- name: brightness_temperature
  channels: 1-2, 4-6
  file: example_3.csv
  interpolation:
  - name: scan_position@MetaData
    method: nearest

and CSV file:

channel_number@MetaData,scan_position@MetaData,brightness_temperature@ObsBias
int,int,float
1,25,0.01
2,25,0.02
4,25,0.04
5,25,0.05
6,25,0.06
1,75,0.11
2,75,0.12
4,75,0.14
5,75,0.15
6,75,0.16

This would produce, for example, a bias correction of 0.12 for an observation from channel 2 taken at scan position 60.

Example 4 (fallback values, ranges)¶

To apply a bias correction of 1.0 to observations taken by station XYZ in the Northern hemisphere and 0.0 to all other observations, we could use the following YAML snippet

name: interpolate_data_from_file
corrected variables:
- name: air_temperature
  file: example_4.csv
  interpolation:
  - name: station_id@MetaData
    method: exact
  - name: latitude@MetaData
    method: least upper bound

and CSV file:

station_id@MetaData,latitude@MetaData,air_temperature@ObsBias
string,float,float
_,_,0
XYZ,0,0
XYZ,90,1

Above, the first row of the data block (_,_,0) encodes the bias correction to apply to observations taken by stations other than XYZ; the second row, to observations taken by station XYZ in the Southern hemisphere or on the equator (latitude ≤ 0); and the third row, to observations taken by station XYZ in the Northern hemisphere.

lapse_rate¶

nth power of the lapse rate.

The following options are supported:

order (Optional) Power to which to raise the lapse rate. By default, 1.

Legendre¶

The Legendre polynomial \(P_n(x)\) where n is the value of the order option,

x = -1 + 2 * (scan_position - 1) / (n_scan_positions - 1),

n_scan_positions is the value of the number of scan positions option and scan_position is the sensor scan position loaded from the scan_position@MetaData variable (assumed to range from 1 to n_scan_positions).

The following options are supported:

number of scan positions The number of scan positions.
order (Optional) Order of the Legendre polynomial. By default, 1.

Example¶

name: Legendre
number of scan positions: 32
order: 2

orbital_angle¶

A term of the Fourier series of the orbital angle \(\theta\) (loaded from the satellite_orbital_angle@MetaData variable), i.e. \(\sin(n\theta)\) or \(\cos(n\theta)\).

The following options are supported:

component: Either sin or cos.
order (Optional) Order of the term to be calculated (\(n\) in the formulas above). By default, 1.

Example¶

name: orbital_angle
component: cos
order: 2

scan_angle¶

nth power of the scan angle.

The following options are supported:

var_name: (Optional) Name of the ObsSpace variable (from the MetaData group) storing the scan angle (in degrees). By default, sensor_view_angle.
order (Optional) Power to which to raise the scan angle. By default, 1.

Example¶

name: scan_angle
var_name: scan_position
order: 2

sine_of_latitude¶

Sine of the observation latitude.

thickness¶

Thickness (in km) of a specified pressure level interval, calculated as the difference between the geopotential heights at two pressure levels and normalized to zero mean and unit variance.

The following options are required:

layer top: Pressure value (in Pa) at the top of the required thickness layer.
layer base: Pressure value (in Pa) at the bottom of the required thickness layer.
mean: Climatological mean of the predictor.
standard deviation: Climatological standard deviation of the predictor.

Example¶

name: thickness
layer top: 30000
layer base: 85000
mean: 7.6
standard deviation: 0.4

Variational Bias Correction in UFO¶

Linear combination formulation¶

Augmented control variable¶

Adjoint of the bias model¶

Background error covariance¶

VarBC example¶

Static Bias Correction in UFO¶

Available Predictors¶

cloud_liquid_water¶

Example¶

constant¶

cosine_of_latitude_times_orbit_node¶

emissivity¶

interpolate_data_from_file¶

Example 1 (minimal)¶

The corrected variables option¶

Input file formats¶

CSV¶

NetCDF¶

The interpolation option¶

Example 2 (multiple criterion variables, linear interpolation)¶

Example 3 (multichannel variables)¶

Example 4 (fallback values, ranges)¶

lapse_rate¶

Legendre¶

Example¶

orbital_angle¶

Example¶

scan_angle¶

Example¶

sine_of_latitude¶

thickness¶

Example¶

The `corrected variables` option¶

The `interpolation` option¶