Observation error covariance with correlations for observations within one record¶
The observation error covariance can be set up to use correlations for observations within one record. To use this capability, the obsgrouping feature of ObsSpace needs to be used to group observations by records (see e.g. IODA Interfaces).
Correlations are computed using either a Gaspari-Cohn, Gaussian, or Markov function with the lengthscale specified in yaml. The same correlations are applied to all assimilated variables in one record. Correlations between locations in different records are considered to be zero.
The full observation error covariance matrix is \(R = D^{1/2} * C * D^{1/2}\) where \(D^{1/2}\) is a diagonal matrix with the observation error standard deviations (ObsError group) on the diagonal, and \(C\) is the correlation matrix.
This type of observation error covariance is set up using the following options:
correlation function(Parameter, type: string, default value:gc99): the correlation function to use for computing the correlation matrix. Currently a Gaspari-Cohn correlation profile (gc99), a gaussian correlation profile (gaussian), and a Markov method (markov) are the available options.correlation lengthscale(RequiredParameter, type: float): the lengthscale which normalizes the distance between observations (distance /correlation lengthscale). Correlations are set to zero at and beyond this value for thegc99correlation function.correlation variable names(OptionalParameter, type: vector of strings): variables in MetaData group used as a coordinate variable in the distance calculation. This is not needed when using thehaversinedistance function because latitude and longitude are the only variables needed.distance function(Parameter, type: string, default value:linear): the name of the function used to calculate the distance between observations. Currently onlylinearandhaversineare supported. Iflinearis used thecorrelation variable namesmust be defined. For the haversine function only the latitude and longitude are needed and thecorrelation lengthscaleneeds to be specified in meters.lengthscale factor for markov correlation limit(Parameter, type: double, default value: 1.0): the lengthscale factor is multiplied by the lengthscale to provide the limit in which correlations are evaluated, beyond this distance correlation values are set to zero. This is only used whenmarkovis selected as thecorrelation function.apply basic reconditioning(Parameter, type: boolean, default value: false): this parameter should only be used with thegaussiancorrelation function. It will apply the same basic, ridge regression reconditioning that is applied to themarkovcorrelation function.
For testing and diagnostics purposes, ufo_obserrorcov_diags.x application is available. It saves the following diagnostics for one specified record in the netcdf file:
coordinate used for computing correlations (e.g. pressure in the example above),
correlation matrix \(C\),
random vector \(x\) for the specified record,
result of \(C * x\).
A Python script for plotting is provided in ufo/tools/plots/plot_obserrorwithingroupcorr_diags.py.
Gaspari-Cohn correlation function¶
When using the Gaspari-Cohn correlation function (the default), a user must set a correlation
length (the correlation lengthscale parameter) which sets the cutoff point where the
Gaspari-Cohn function becomes zero. This length is approximately 3.57 times the standard deviation
of the Gaussian function which best-fits the Gaspari-Cohn. An example yaml configuration is below:
observations:
- obs space:
name: Sondes (within group covariances for one variable)
obsdatain:
... # input/output files and other options
obsgrouping:
group variables: [sequenceNumber]
sort variable: pressure
sort order: ascending
simulated variables: [airTemperature]
obs error:
covariance model: within group covariances
correlation function: gc99 # optional b/c gc99 is the default
correlation lengthscale: 15000. # length in meters
correlation variable names: [pressure]
distance function: linear
Markov correlation function¶
The Markov correlation function sets up a correlation profile following a decaying exponential:
\(e^{-|x|/l}\), where \(l\) is the correlation lengthscale. The
lengthscale factor for markov correlation limit, \(f\) is a factor which sets the length
beyond which the function will evalute to zero (similar to the lengthscale for Gaspari-Cohn
correlation function). If the distance \(d\) between two observation locations is greater than
\(f*l\), then the correlation will be set to zero. An example yaml configuration is below:
observations:
- obs space:
name: Sondes (within group covariances with markov correlation)
obsdatain:
... # input/output files and other options
obsgrouping:
group variables: [sequenceNumber]
sort variable: pressure
sort order: ascending
simulated variables: [airTemperature, windEastward, windNorthward]
obs error:
covariance model: within group covariances
distance function: haversine
correlation function: markov
lengthscale factor for markov correlation limit: 2.0 # default is 1.0
correlation lengthscale: 13000. # length in meters
The Markov correlation function is not guaranteed to produce a positive definite \(R\) matrix, so a basic ridge regression reconditioning in the form of adding 10% of the smallest eigenvalue to the diagonal of the Markov correlation matrix.
Gaussian correlation function¶
The gaussian correlation function creates a gaussian correlation profile:
\(e^{-\frac{(x/l)^2}{2}}\), where \(x\) is the distance between observations and
\(l\) is the correlation lengthscale, which represents the standard deviation of
the Guassian profile.
The gaussian correlation function (like the Markov function) is not guaranteed to produce
a positive definite \(R\) matrix, and has several reconditioning options. The recommended
option is to use the same reconditioning that is applied to the Markov correlation function.
This is enabled by setting apply basic reconditioning: true in the yaml configuration.
There is also a more advanced reconditioning option available which is documented in
Obs Error Reconditioning; some tuning by the user is recommended before using this option
scientifically.
observations:
- obs space:
name: Sondes (Gaussian correlation function w/ reconditioner)
obsdatain:
... # input/output files and other options
obsgrouping:
group variables: [sequenceNumber]
sort variable: pressure
sort order: ascending
simulated variables: [airTemperature, windEastward, windNorthward]
obs error:
covariance model: within group covariances
distance function: haversine
correlation function: gaussian
correlation lengthscale: 5000. # in meters
apply basic reconditioning: true # default is false
# to apply more advanced reconditioning use config below
# reconditioning:
# recondition method: Ridge Regression
# fraction: 0.9