Variational Bias Correction in UFO

Note

Summarized from :

Linear combination formulation

\[\vec{y} = H(\vec{x}) + \sum_{i=0}^{N} \beta_i p_i(\vec{x}) + \tilde{\vec{e}_o}\]

with scalar coefficients \(\beta_i, i = 0, . . . ,N\) . The selection of predictors \(p_i(\vec{x}), i = 1, . . . ,N\) is flexible and depends on the instrument and channel.

Augmented control variable

Define the augmented control vector

\[\vec{z}^T = \lbrack \vec{x}^T \vec{\beta}^T \rbrack\]

Therefore, the cost function to be minimized

\[J(\vec{z}) = \frac{1}{2} (\vec{z}_b - \vec{z})^T \textbf{Z}^{-1} (\vec{z}_b - \vec{z}) + \frac{1}{2} (\vec{y} - \tilde{H}(\vec{z}))^T \textbf{R}^{-1} (\vec{y} - \tilde{H}(\vec{z}))\]

where

(17)\[\tilde{H}(\vec{z}) = H(\vec{x}) + \sum_{i=0}^{N} \beta_i p_i(\vec{x})\]

Adjoint of the bias model

In the incremental formulation of the variational analysis, nonlinear observation operators are linearized about the latest outer-loop estimate \(\overline{\vec{x}}\) of \(\vec{x}\) . Similarly, for the modified operator we use

\[\begin{split}H(\vec{x}, \beta) \approx H(\overline{\vec{x}}, \beta) & = H(\overline{\vec{x}}) + \sum_{i=0}^{N} \beta_i p_i(\overline{\vec{x}}) \\ & = H(\overline{\vec{x}}) + \mathcal{P}(\overline{\vec{x}}) \cdot \vec{\beta}\end{split}\]

where \(\mathcal{P}(\overline{\vec{x}})\) is a \(m × n\) predictor matrix consisting of \(n\) predictors evaluated on \(m\) observation locations.

The modification to \(H(\vec{x})\) is therefore additive and linear in the bias parameters, and its adjoint with respect to these additional control parameters is trivial to implement.

For the linear predictor model (17), the derivatives with respect to the parameters are simply the values of the predictors at the observation locations

\[\frac{\partial H }{\partial \vec{\beta}} \Bigg \vert_{\vec{\beta} = \vec{\beta}_b} = \mathcal{P}(\overline{\vec{x}})\]

Background error covariance

In general the parameter estimation errors will be correlated with the state estimation errors, because they depend on the same data. We know of no practical way to account for this statistical dependence, and therefore take

\[\begin{split}\textbf{Z} = \begin{bmatrix} \textbf{B}_x & 0 \\ 0 & \textbf{B}_{\beta} \end{bmatrix}\end{split}\]

where \(\textbf{B}_x\) denotes the usual (state) background error covariance, and \(\textbf{B}_\beta\) the parameter background error covariance.

We take \(\textbf{B}_\beta\) diagonal:

\[\begin{split}\textbf{B}_\beta & = diag(\sigma_{\beta_1}^2, ...., \sigma_{\beta_n}^2) \\ & = \begin{bmatrix} \sigma_{\beta_1}^2 & & \\ & \ddots & \\ & & \sigma_{\beta_n}^2 \end{bmatrix} \\ & = \begin{bmatrix} \frac{\sigma_{o_1}^2}{N_1} & & \\ & \ddots & \\ & & \frac{\sigma_{o_n}^2}{N_j} \end{bmatrix}\end{split}\]

Here \(\beta_j\) denotes the \(j^{th}\) bias parameter, \(\sigma_{o_j}\) is the error standard deviation of the observations associated with \(\beta_j\), and \(N_j\) is a positive integer represents the number of observations.

Note

  • For example, taking \(N_j = 10,000\) for all parameters, the system will adapt quickly to changes in the bias for a clean channel generating thousands of radiances per analysis cycle.

  • On the other hand, it will respond slowly to a cloudy channel that generates only a few hundreds of data per cycle.

Note

  • When the \(N_j\) are sufficiently large (say, \(N_j >> 100\) ), the effect of neglecting off-diagonal elements of the parameter background error covariance matrix should be insignificant. This is because \(\mathcal{O}(N_j)\) observations are used to estimate just a few bias parameters; the estimation errors will be small even when the estimation is suboptimal.

  • The situation is, of course, very different for the state estimation, which can be extremely sensitive to the specification of the background error covariances, especially in data-sparse areas.

VarBC example

To use the bias correction in an observation operator, add the obs bias section as shown in the highlighted lines below.

:linenos:
:emphasize-lines: 12-24

observations:
- obs space:
    name: AMSUA-NOAA19
    ...
    simulated variables: [brightness_temperature]
    channels: &channels 1-15
  obs operator:
    name: CRTM
    obs options:
      Sensor_ID: &Sensor_ID amsua_n19
    ...
  obs bias:
    input file: Data/obs/satbias_crtm_in_amsua_n19.nc4
    variational bc:
      predictors:
      - name: constant
      - name: emissivity
      - name: scan_angle
        order: 4
      - name: scan_angle
        order: 3
      - name: scan_angle
        order: 2
      - name: scan_angle

Here is the detailed explanation:

  1. Defines the predictors (required)

Here, we defined 6 predictors to be used for VarBC, which are constant, emissivity, and 1st, 2nd, 3rd, 4th order scan_angle, respectively. To find what predictor functions are available, please refer to directory ufo/src/ufo/predictors/.

variational bc:
  predictors:
  - name: constant
  - name: emissivity
  - name: scan_angle
    order: 4
  - name: scan_angle
    order: 3
  - name: scan_angle
    order: 2
  - name: scan_angle
  1. Defines the input file for the bias coefficients prior (optional)

    Usually, the prior is coming from the previous data assimilation cycle. If it is not available, all coefficients will start with zero.

input file: Data/obs/satbias_crtm_in_amsua_n19.nc4

Static Bias Correction in UFO

Static bias correction is handled very similarly to variational bias correction. Mathematically, the only difference is that the coefficients \(\beta_i\) of predictors used for static bias correction are kept constant and equal to 1. These predictors are defined in the obs bias.static bc YAML section, whose syntax is identical to obs bias.variational bc. For example,

static bc:
  predictors:
  - name: interpolate_data_from_file
    corrected variables:
    - name: air_temperature
      file: air_temperature_static_bc.csv
      interpolation:
      - name: station_id@MetaData
        method: exact
    - name: relative_humidity
      file: relative_humidity_static_bc.csv
      interpolation:
      - name: station_id@MetaData
        method: exact
      - name: air_pressure@MetaData
        method: least upper bound

See the interpolate_data_from_file section for more information about the predictor used above, which was written specifically with static bias correction in mind.

Available Predictors

cloud_liquid_water

Cloud liquid water.

The following options are supported:

  • satellite: Satellite reference name such as SSMIS; this lets the predictor know which which channels to expect. At present SSMIS is the only supported satellite.

  • varGroup: (Optional) Name of the ObsSpace group from which brightness temperatures will be loaded. By default, ObsValue.

  • ch...: Satellite-dependent channel numbers used for cloud liquid water calculation. For SSMIS the following channel numbers need to be specified: ch19h, ch19v, ch22v, ch37h, ch37v, ch91h and ch91v:.

Example

name: cloud_liquid_water
satellite: SSMIS
ch19h: 12
ch19v: 13
ch22v: 14
ch37h: 15
ch37v: 16
ch91v: 17
ch91h: 18

constant

A predictor equal to one at all locations.

cosine_of_latitude_times_orbit_node

Cosine of the observation latitude multiplied by the sensor azimuth angle.

emissivity

Emissivity.

interpolate_data_from_file

A predictor drawing values from an input CSV or NetCDf file depending on the values of specified ObsSpace variables. Typically used for static bias correction.

Example 1 (minimal)

Consider a simple example first and suppose this predictor is configured as follows:

name: interpolate_data_from_file
corrected variables:
- name: air_temperature
  file: myfolder/example_1.csv
  interpolation:
  - name: station_id@MetaData
    method: exact

and the example_1.csv file looks like this:

station_id@MetaData,air_temperature@ObsBias
string,float
ABC,0.1
DEF,0.2
GHI,0.3

The predictor will load this file and at each location compute the bias correction of air temperature by

  • selecting the row of the CSV file in which the value in the station_id@MetaData column matches exactly the value of the station_id@MetaData ObsSpace variable at that location and

  • taking the value of the air_temperature@ObsBias column from the selected row.

It is possible to customize this process in several ways by

  • correcting more than one variable

  • making the bias correction dependent on more than one variable

  • using other interpolation methods than exact match (for example nearest-neighbor match or linear interpolation)

  • using a NetCDF rather than a CSV input file.

This is explained in more detail below.

The corrected variables option

Each element of the corrected variables list specifies how to generate bias corrections for a particular bias-corrected variable and should have the following attributes:

  • name: Name of a bias-corrected variable.

  • channels: (Optional) List of channel numbers of the bias-corrected variable.

  • file: Path to an input NetCDF or CSV file. See here for supported file formats. However, note that unlike DrawValueFromFile, we don’t specify the group name corresponding to our payload array. We expect it to be ObsBias.

  • interpolation: A list of one or more elements indicating how to map specific ObsSpace variables to slices of arrays loaded from the input file. See here for further details.

The predictor produces zeros for all bias-corrected variables missing from the corrected variables list.

The following examples illustrate more advanced applications of this predictor.

Example 2 (multiple criterion variables, linear interpolation)

To make the air-temperature bias correction depend not only on the station ID, but also on the air pressure, we could use the following YAML snippet

name: interpolate_data_from_file
corrected variables:
- name: air_temperature
  file: example_2.csv
  interpolation:
  - name: station_id@MetaData
    method: exact
  - name: air_pressure@MetaData
    method: linear

and CSV file:

station_id@MetaData,air_pressure@MetaData,air_temperature@ObsBias
string,float,float
ABC,30000,0.1
ABC,60000,0.2
ABC,90000,0.3
XYZ,40000,0.4
XYZ,80000,0.5

For an observation taken by station XYZ at pressure 60000 the bias correction would be evaluated in the following way:

  • First, find all rows in the CSV file with a value of XYZ in the station_id@MetaData column.

  • Then take the values of the air_pressure@MetaData and air_temperature@ObsBias columns in these rows and use them to construct a piecewise linear interpolant. Evaluate this interpolant at pressure 60000. This produces the value of 0.45.

Example 3 (multichannel variables)

To make the brightness-temperature bias correction vary with the channel number and scan position, we could use the following YAML snippet

name: interpolate_data_from_file
corrected variables:
- name: brightness_temperature
  channels: 1-2, 4-6
  file: example_3.csv
  interpolation:
  - name: scan_position@MetaData
    method: nearest

and CSV file:

channel_number@MetaData,scan_position@MetaData,brightness_temperature@ObsBias
int,int,float
1,25,0.01
2,25,0.02
4,25,0.04
5,25,0.05
6,25,0.06
1,75,0.11
2,75,0.12
4,75,0.14
5,75,0.15
6,75,0.16

This would produce, for example, a bias correction of 0.12 for an observation from channel 2 taken at scan position 60.

Example 4 (fallback values, ranges)

To apply a bias correction of 1.0 to observations taken by station XYZ in the Northern hemisphere and 0.0 to all other observations, we could use the following YAML snippet

name: interpolate_data_from_file
corrected variables:
- name: air_temperature
  file: example_4.csv
  interpolation:
  - name: station_id@MetaData
    method: exact
  - name: latitude@MetaData
    method: least upper bound

and CSV file:

station_id@MetaData,latitude@MetaData,air_temperature@ObsBias
string,float,float
_,_,0
XYZ,0,0
XYZ,90,1

Above, the first row of the data block (_,_,0) encodes the bias correction to apply to observations taken by stations other than XYZ; the second row, to observations taken by station XYZ in the Southern hemisphere or on the equator (latitude ≤ 0); and the third row, to observations taken by station XYZ in the Northern hemisphere.

lapse_rate

nth power of the lapse rate.

The following options are supported:

  • order (Optional) Power to which to raise the lapse rate. By default, 1.

Legendre

The Legendre polynomial \(P_n(x)\) where n is the value of the order option,

x = -1 + 2 * (scan_position - 1) / (n_scan_positions - 1),

n_scan_positions is the value of the number of scan positions option and scan_position is the sensor scan position loaded from the scan_position@MetaData variable (assumed to range from 1 to n_scan_positions).

The following options are supported:

  • number of scan positions The number of scan positions.

  • order (Optional) Order of the Legendre polynomial. By default, 1.

Example

name: Legendre
number of scan positions: 32
order: 2

orbital_angle

A term of the Fourier series of the orbital angle \(\theta\) (loaded from the satellite_orbital_angle@MetaData variable), i.e. \(\sin(n\theta)\) or \(\cos(n\theta)\).

The following options are supported:

  • component: Either sin or cos.

  • order (Optional) Order of the term to be calculated (\(n\) in the formulas above). By default, 1.

Example

name: orbital_angle
component: cos
order: 2

scan_angle

nth power of the scan angle.

The following options are supported:

  • var_name: (Optional) Name of the ObsSpace variable (from the MetaData group) storing the scan angle (in degrees). By default, sensor_view_angle.

  • order (Optional) Power to which to raise the scan angle. By default, 1.

Example

name: scan_angle
var_name: scan_position
order: 2

sine_of_latitude

Sine of the observation latitude.

thickness

Thickness (in km) of a specified pressure level interval, calculated as the difference between the geopotential heights at two pressure levels and normalized to zero mean and unit variance.

The following options are required:

  • layer top: Pressure value (in Pa) at the top of the required thickness layer.

  • layer base: Pressure value (in Pa) at the bottom of the required thickness layer.

  • mean: Climatological mean of the predictor.

  • standard deviation: Climatological standard deviation of the predictor.

Example

name: thickness
layer top: 30000
layer base: 85000
mean: 7.6
standard deviation: 0.4