Variational Bias Correction in UFO¶
Linear combination formulation¶
\[\vec{y} = H(\vec{x}) + \sum_{i=0}^{N} \beta_i p_i(\vec{x}) + \tilde{\vec{e}_o}\]with scalar coefficients \(\beta_i, i = 0, . . . ,N\) . The selection of predictors \(p_i(\vec{x}), i = 1, . . . ,N\) is flexible and depends on the instrument and channel.
Augmented control variable¶
Define the augmented control vector
\[\vec{z}^T = \lbrack \vec{x}^T \vec{\beta}^T \rbrack\]Therefore, the cost function to be minimized
\[J(\vec{z}) = \frac{1}{2} (\vec{z}_b - \vec{z})^T \textbf{Z}^{-1} (\vec{z}_b - \vec{z}) + \frac{1}{2} (\vec{y} - \tilde{H}(\vec{z}))^T \textbf{R}^{-1} (\vec{y} - \tilde{H}(\vec{z}))\]where
(17)¶\[\tilde{H}(\vec{z}) = H(\vec{x}) + \sum_{i=0}^{N} \beta_i p_i(\vec{x})\]
Adjoint of the bias model¶
In the incremental formulation of the variational analysis, nonlinear observation operators are linearized about the latest outer-loop estimate \(\overline{\vec{x}}\) of \(\vec{x}\) . Similarly, for the modified operator we use
\[\begin{split}H(\vec{x}, \beta) \approx H(\overline{\vec{x}}, \beta) & = H(\overline{\vec{x}}) + \sum_{i=0}^{N} \beta_i p_i(\overline{\vec{x}}) \\ & = H(\overline{\vec{x}}) + \mathcal{P}(\overline{\vec{x}}) \cdot \vec{\beta}\end{split}\]where \(\mathcal{P}(\overline{\vec{x}})\) is a \(m × n\) predictor matrix consisting of \(n\) predictors evaluated on \(m\) observation locations.
The modification to \(H(\vec{x})\) is therefore additive and linear in the bias parameters, and its adjoint with respect to these additional control parameters is trivial to implement.
For the linear predictor model (17), the derivatives with respect to the parameters are simply the values of the predictors at the observation locations
\[\frac{\partial H }{\partial \vec{\beta}} \Bigg \vert_{\vec{\beta} = \vec{\beta}_b} = \mathcal{P}(\overline{\vec{x}})\]
Background error covariance¶
In general the parameter estimation errors will be correlated with the state estimation errors, because they depend on the same data. We know of no practical way to account for this statistical dependence, and therefore take
\[\begin{split}\textbf{Z} = \begin{bmatrix} \textbf{B}_x & 0 \\ 0 & \textbf{B}_{\beta} \end{bmatrix}\end{split}\]where \(\textbf{B}_x\) denotes the usual (state) background error covariance, and \(\textbf{B}_\beta\) the parameter background error covariance.
We take \(\textbf{B}_\beta\) diagonal:
\[\begin{split}\textbf{B}_\beta & = diag(\sigma_{\beta_1}^2, ...., \sigma_{\beta_n}^2) \\ & = \begin{bmatrix} \sigma_{\beta_1}^2 & & \\ & \ddots & \\ & & \sigma_{\beta_n}^2 \end{bmatrix} \\ & = \begin{bmatrix} \frac{\sigma_{o_1}^2}{N_1} & & \\ & \ddots & \\ & & \frac{\sigma_{o_n}^2}{N_j} \end{bmatrix}\end{split}\]Here \(\beta_j\) denotes the \(j^{th}\) bias parameter, \(\sigma_{o_j}\) is the error standard deviation of the observations associated with \(\beta_j\), and \(N_j\) is a positive integer represents the number of observations.
Note
For example, taking \(N_j = 10,000\) for all parameters, the system will adapt quickly to changes in the bias for a clean channel generating thousands of radiances per analysis cycle.
On the other hand, it will respond slowly to a cloudy channel that generates only a few hundreds of data per cycle.
Note
When the \(N_j\) are sufficiently large (say, \(N_j >> 100\) ), the effect of neglecting off-diagonal elements of the parameter background error covariance matrix should be insignificant. This is because \(\mathcal{O}(N_j)\) observations are used to estimate just a few bias parameters; the estimation errors will be small even when the estimation is suboptimal.
The situation is, of course, very different for the state estimation, which can be extremely sensitive to the specification of the background error covariances, especially in data-sparse areas.
VarBC example¶
To use the bias correction in an observation operator, add the obs bias
section as shown in the highlighted lines below.
1observations:
2- obs space:
3 name: AMSUA-NOAA19
4 ...
5 simulated variables: [brightness_temperature]
6 channels: &channels 1-15
7 obs operator:
8 name: CRTM
9 obs options:
10 Sensor_ID: &Sensor_ID amsua_n19
11 ...
12 obs bias:
13 input file: Data/obs/satbias_crtm_in_amsua_n19.nc4
14 variational bc:
15 predictors:
16 - name: constant
17 - name: emissivity
18 - name: scan_angle
19 options:
20 order: 4
21 - name: scan_angle
22 options:
23 order: 3
24 - name: scan_angle
25 options:
26 order: 2
27 - name: scan_angle
Here is the detailed explanation:
Defines the predictors (required)
Here, we defined 6 predictors to be used for VarBC, which are
constant
,emissivity
, and 1st, 2nd, 3rd, 4th orderscan_angle
, respectively. To find what predictor functions are available, please refer to directoryufo/src/ufo/predictors/
.variational bc: predictors: - name: constant - name: emissivity - name: scan_angle options: order: 4 - name: scan_angle options: order: 3 - name: scan_angle options: order: 2 - name: scan_angle
Defines the input file for the bias coefficients prior (optional)
Usually, the prior is coming from the previous data assimilation cycle. If it is not available, all coefficients will start with zero.
input file: Data/obs/satbias_crtm_in_amsua_n19.nc4
Static Bias Correction in UFO¶
Static bias correction is handled very similarly to variational bias correction. Mathematically, the only difference is that the coefficients \(\beta_i\) of predictors used for static bias correction are kept constant and equal to 1. These predictors are defined in the obs bias.static bc
YAML section, whose syntax is identical to obs bias.variational bc
. For example,
static bc:
predictors:
- name: interpolate_data_from_file
options:
corrected variables:
- name: air_temperature
file: air_temperature_static_bc.csv
interpolation:
- name: station_id@MetaData
method: exact
- name: relative_humidity
file: relative_humidity_static_bc.csv
interpolation:
- name: station_id@MetaData
method: exact
- name: air_pressure@MetaData
method: least upper bound
See the interpolate_data_from_file section for more information about the predictor used above, which was written specifically with static bias correction in mind.
Available Predictors¶
Unless specified otherwise, all configuration parameters of predictors are specified in the options
group in the YAML file.
cloud_liquid_water¶
Cloud liquid water.
The following options are supported:
satellite
: Satellite reference name such asSSMIS
; this lets the predictor know which which channels to expect. At presentSSMIS
is the only supported satellite.varGroup
: (Optional) Name of the ObsSpace group from which brightness temperatures will be loaded. By default,ObsValue
.ch...
: Satellite-dependent channel numbers used for cloud liquid water calculation. ForSSMIS
the following channel numbers need to be specified:ch19h
,ch19v
,ch22v
,ch37h
,ch37v
,ch91h
andch91v
:.
Example¶
name: cloud_liquid_water
options:
satellite: SSMIS
ch19h: 12
ch19v: 13
ch22v: 14
ch37h: 15
ch37v: 16
ch91v: 17
ch91h: 18
constant¶
A predictor equal to one at all locations.
cosine_of_latitude_times_orbit_node¶
Cosine of the observation latitude multiplied by the sensor azimuth angle.
emissivity¶
Emissivity.
interpolate_data_from_file¶
A predictor drawing values from an input CSV or NetCDf file depending on the values of specified ObsSpace variables. Typically used for static bias correction.
Example 1 (minimal)¶
Consider a simple example first and suppose this predictor is configured as follows:
name: interpolate_data_from_file
options:
corrected variables:
- name: air_temperature
file: myfolder/example_1.csv
interpolation:
- name: station_id@MetaData
method: exact
and the example_1.csv
file looks like this:
station_id@MetaData,air_temperature@ObsBias
string,float
ABC,0.1
DEF,0.2
GHI,0.3
The predictor will load this file and at each location compute the bias correction of air temperature by
selecting the row of the CSV file in which the value in the
station_id@MetaData
column matches exactly the value of thestation_id@MetaData
ObsSpace variable at that location andtaking the value of the
air_temperature@ObsBias
column from the selected row.
It is possible to customize this process in several ways by
correcting more than one variable
making the bias correction dependent on more than one variable
using other interpolation methods than exact match (for example nearest-neighbor match or linear interpolation)
using a NetCDF rather than a CSV input file.
This is explained in more detail below.
The corrected variables
option¶
Each element of the corrected variables
list specifies how to generate bias corrections for
a particular bias-corrected variable and should have the following attributes:
name
: Name of a bias-corrected variable.channels
: (Optional) List of channel numbers of the bias-corrected variable.file
: Path to an input NetCDF or CSV file. The input file formats are described in more detail below.interpolation
A list of one or more elements indicating how to map specific ObsSpace variables to slices of arrays loaded from the input file. This list is described in more detail below.
The predictor produces zeros for all bias-corrected variables missing from the corrected
variables
list.
Input file formats¶
CSV¶
An input CSV file should have the following structure:
First line: comma-separated column names in ioda-v1 style (
var@Group
) or ioda-v2 style (Group/var
)Second line: comma-separated column data types (datetime, float, int or string)
Further lines: comma-separated data entries.
The number of entries in each line should be the same. The column order does not matter. One of the
columns should belong to the ObsBias
group and contain the bias corrections to use in
specific circumstances. Its data type should be either float
or int
. The values
from the other columns (sometimes called coordinates below) are compared against ObsSpace
variables with the same names to determine the row or rows from which the bias correction is
extracted at each location. The details of this comparison (e.g. whether an exact match is
required, the nearest match is used, or piecewise linear interpolation is performed) depend on the
interpolation
option described below.
Notes:
A column containing channel numbers (which aren’t stored in a separate ObsSpace variable) should be labelled
channel_number@MetaData
orMetaData/channel_number
, as shown in Example 3 (multichannel variables) below.Single underscores serve as placeholders for missing values; for example, the following row
ABC,_,_
contains missing values in the second and third columns.
NetCDF¶
ioda-v1 and ioda-v2-style NetCDF files are supported. ioda-v1-style files should have the following structure:
They should contain exactly one 1D or 2D array of type
float
with a name ending with@ObsBias
and containing bias corrections.Each dimension of this array should be indexed by at least one 1D coordinate array. Coordinates can be of type
float
,int
orstring
. Datetimes should be represented as ISO-8601 strings. Coordinate names should correspond to names of ObsSpace variables. Use the namechannel_number@MetaData
for channel numbers (for which there is no dedicated ObsSpace variable).
ioda-v2-style files are similar except that
Bias corrections should be stored in an array placed in the
ObsBias
group (rather than with a@ObsBias
suffix).Coordinate variables should be placed in appropriate groups, e.g.
MetaData
. Because of the limitations of the NetCDF file format, these variables can only be used as auxiliary coordinates of the payload variable (listed in itscoordinates
attribute).
The interpolation
option¶
This list indicates which ObsSpace variables, and in which order, will be used as criteria determining the value produced by the predictor.
Each element of this list should have the following attributes:
name
: Name of an ObsSpace variable (and of a coordinate present in the input CSV or NetCDF file).method
: Method used to map values of this variable at individual location to matching slices of the bias correction array loaded from the input file. This can be one of:exact
: Selects slices where the coordinate matches exactly the value of the specified ObsSpace variable.If no match is found, an error is reported unless there are slices where the indexing coordinate is set to the missing value placeholder; in this case these slices are selected instead. This can be used to define a fallback value (used if there is no exact match), as shown in Example 4 (fallback values, ranges) below.
This is the only method that can be used for variables of type
string
.nearest
: Selects slices where the coordinate is closest to the value of the specified ObsSpace variable.In case of a tie (e.g. if the value of the ObsSpace variable is 3 and the coordinate contains values 2 and 4, but not 3), the smaller of the candidate coordinate values is used (in this example, 2).
least upper bound
: Select slices corresponding to the least value of the coordinate greater than or equal to the value of the specified ObsSpace variable.greatest upper bound
: Select slices corresponding to the greatest value of the coordinate less than or equal to the value of the specified ObsSpace variable.linear
: Performs a piecewise linear interpolation along the dimension indexed by the specified ObsSpace variable.This method can only be used in the last element of the
interpolation
list.
At each location the criterion variables specified in the interpolation
list are inspected
in order, successively restricting the range of selected slices. An error is reported if the end
result is an empty range of slices or (unless linear interpolation is used for the last criterion
variable) a range containing more than one slice.
Note: If the channels
option has been specified, the channel number is implicitly used as the
first criterion variable and needs to match exactly a value from the channel_number@MetaData
coordinate.
The following examples illustrate more advanced applications of this predictor.
Example 2 (multiple criterion variables, linear interpolation)¶
To make the air-temperature bias correction depend not only on the station ID, but also on the air pressure, we could use the following YAML snippet
name: interpolate_data_from_file
options:
corrected variables:
- name: air_temperature
file: example_2.csv
interpolation:
- name: station_id@MetaData
method: exact
- name: air_pressure@MetaData
method: linear
and CSV file:
station_id@MetaData,air_pressure@MetaData,air_temperature@ObsBias
string,float,float
ABC,30000,0.1
ABC,60000,0.2
ABC,90000,0.3
XYZ,40000,0.4
XYZ,80000,0.5
For an observation taken by station XYZ at pressure 60000 the bias correction would be evaluated in the following way:
First, find all rows in the CSV file with a value of
XYZ
in thestation_id@MetaData
column.Then take the values of the
air_pressure@MetaData
andair_temperature@ObsBias
columns in these rows and use them to construct a piecewise linear interpolant. Evaluate this interpolant at pressure 60000. This produces the value of 0.45.
Example 3 (multichannel variables)¶
To make the brightness-temperature bias correction vary with the channel number and scan position, we could use the following YAML snippet
name: interpolate_data_from_file
options:
corrected variables:
- name: brightness_temperature
channels: 1-2, 4-6
file: example_3.csv
interpolation:
- name: scan_position@MetaData
method: nearest
and CSV file:
channel_number@MetaData,scan_position@MetaData,brightness_temperature@ObsBias
int,int,float
1,25,0.01
2,25,0.02
4,25,0.04
5,25,0.05
6,25,0.06
1,75,0.11
2,75,0.12
4,75,0.14
5,75,0.15
6,75,0.16
This would produce, for example, a bias correction of 0.12 for an observation from channel 2 taken at scan position 60.
Example 4 (fallback values, ranges)¶
To apply a bias correction of 1.0 to observations taken by station XYZ in the Northern hemisphere and 0.0 to all other observations, we could use the following YAML snippet
name: interpolate_data_from_file
options:
corrected variables:
- name: air_temperature
file: example_4.csv
interpolation:
- name: station_id@MetaData
method: exact
- name: latitude@MetaData
method: least upper bound
and CSV file:
station_id@MetaData,latitude@MetaData,air_temperature@ObsBias
string,float,float
_,_,0
XYZ,0,0
XYZ,90,1
Above, the first row of the data block (_,_,0
) encodes the bias correction to apply to
observations taken by stations other than XYZ; the second row, to observations taken by station XYZ
in the Southern hemisphere or on the equator (latitude
≤ 0); and the third row, to
observations taken by station XYZ in the Northern hemisphere.
lapse_rate¶
nth power of the lapse rate.
The following options are supported:
order
(Optional) Power to which to raise the lapse rate. By default, 1.
Legendre¶
The Legendre polynomial \(P_n(x)\) where n is the value of the order
option,
x = -1 + 2 * (scan_position - 1) / (n_scan_positions - 1),
n_scan_positions
is the value of the number of scan positions
option and scan_position
is the sensor scan position loaded from the scan_position@MetaData
variable (assumed to range from 1 to n_scan_positions
).
The following options are supported:
number of scan positions
The number of scan positions. Note: this parameter should not be inside theoptions
group.order
(Optional) Order of the Legendre polynomial. By default, 1.
Example¶
name: Legendre
number of scan positions: 32
options:
order: 2
orbital_angle¶
A term of the Fourier series of the orbital angle \(\theta\) (loaded from the satellite_orbital_angle@MetaData
variable), i.e. \(\sin(n\theta)\) or \(\cos(n\theta)\).
The following options are supported:
component
: Eithersin
orcos
.order
(Optional) Order of the term to be calculated (\(n\) in the formulas above). By default, 1.
Example¶
name: orbital_angle
options:
component: cos
order: 2
scan_angle¶
nth power of the scan angle.
The following options are supported:
var_name
: (Optional) Name of the ObsSpace variable (from theMetaData
group) storing the scan angle (in degrees). By default,sensor_view_angle
.order
(Optional) Power to which to raise the scan angle. By default, 1.
Example¶
name: scan_angle
options:
var_name: scan_position
order: 2
sine_of_latitude¶
Sine of the observation latitude.
thickness¶
Thickness (in km) of a specified pressure level interval, calculated as the difference between the geopotential heights at two pressure levels and normalized to zero mean and unit variance.
The following options are required:
layer top
: Pressure value (in Pa) at the top of the required thickness layer.layer base
: Pressure value (in Pa) at the bottom of the required thickness layer.mean
: Climatological mean of the predictor.standard deviation
: Climatological standard deviation of the predictor.
Example¶
name: thickness
options:
layer top: 30000
layer base: 85000
mean: 7.6
standard deviation: 0.4