Fit a straight line to X-Y data with correlated errors
tony
Calculate the best fit line, in a least-squares sense, taking into account correlated errors in x and y.
Output includes the reduced chi-square statistic, covariance matrix for fit coefficients, and confidence bands.
Please let me know if you find any errors. A more general solution for fitting with correlated errors would be welcome (e.g., Moniot 2009, doi:10.1016/j.apnum.2008.01.001), but that's beyond my maths skills.
// Use an LSE method to find best fit line y = a + bx
// For normally distributed and possibly correlated errors in x and y.
// When correlation coefficients in Rwave are all 0,
// YorkFit(Ywave, Yerror, Xwave, Xerror, Rwave, confidence=0.95, CB=1, studentize=1)
// is the same as
// CurveFit/ODR=2/X=1 line Ywave /X=Xwave /W=sigmaY /I=1 /XW=sigmaX /F={0.95, 5}
// Writes to W_coef, W_sigma, V_chisq, V_q, V_MSWD, M_covar, and
// W_ParamConfidenceInterval
// Uses method of York (1969) Least squares fitting of a line with
// correlated errors. Earth and Planetary Science Letters, 5: 320-324
// Maximum likelihood errors after York et al. (2004)
// Unified equations for the slope, intercept, and standard errors of the
// best straight line. American Journal of Physics, 72: 367-375.
// optional parameters for YorkFit
// errors defines the type of errors in Yerror and Xerror waves. Set
// errors to 1 (default) for se, 2 for 2*se, 0: for reciprocal errors.
// confidence supplies the confidence level for calculating confidence
// interval for fit coefficients and confidence bands.
// CB determines whether to append confidence bands to plot. CB = 1
// (default): plot confidence bands, CB = 2: calculate and plot Ludwig
// (1980) style confidence bands. These are 'symmetrized', such that
// confidence bands conicide with those for a regression of x against y.
// When CB = 0, confidence bands are recalculated but not added to the
// top graph.
// Set studentize to 1 (default) to always use data scatter to calculate
// studentized confidence interval. Otherwise, if p(chisq) > studentize
// or studentize = 0, a priori errors are used to calculate confidence
// interval. This option is provided as method for adjusting the
// confidence bands for fits to underdispersed data.
function YorkFit(Ywave, Yerror, Xwave, Xerror, Rwave, [errors, confidence, CB, studentize])
wave Ywave, Yerror, Xwave, Xerror, Rwave
variable errors, confidence, CB, studentize
errors = ParamIsDefault(errors) ? 1 : errors
confidence = ParamIsDefault(confidence) ? 0.95 : confidence
CB = ParamIsDefault(CB) ? 1 : CB
studentize = ParamIsDefault(studentize) ? 1 : studentize
// the fit coefficients
variable a, b
variable oldb, Xbar, Ybar, s2_a, s2_b
Make /D/free/N=(numpnts(Xwave)) sigmaX, sigmaY, weightX, weightY, W, w_alpha, w_beta, w_U, w_V, w1, w2, Xcorr, Ycorr
sigmaX = errors > 0 ? Xerror/errors : 1/Xerror
sigmaY = errors > 0 ? Yerror/errors : 1/Yerror
// first get an estimate of b
CurveFit /Q/X=1 line Ywave /X=Xwave /W=sigmaY /D/I=1
wave wfit = $"fit_"+NameOfWave(Ywave)
wave /D w_coef
b=w_coef[1]
weightX = 1/sigmaX^2
weightY = 1/sigmaY^2
w_alpha=sqrt(weightX*weightY)
do
oldb=b
W=weightX[p]*weightY[p]/(weightX[p]+b^2*weightY[p]-2*b*Rwave*w_alpha)
w1=W*Xwave
Xbar=sum(w1)/sum(W)
w1=W*Ywave
Ybar=sum(w1)/sum(W)
w_U=Xwave-Xbar
w_V=Ywave-Ybar
w_beta=W*(w_U/weightY + b*w_V/weightX - (b*w_U + w_V)*Rwave/w_alpha)
w1=W*w_beta*w_V
w2=W*w_beta*w_U
b=sum(w1)/sum(w2)
while(abs((b - oldb)/oldb) > 1e-14)
Make /D/O w_coef
a=Ybar-b*Xbar
w_coef={a, b}
wfit=a+b*x
printf "%s = %g%+g*x\r", GetWavesDataFolder(wfit, 4), a, b
// calculate chi-squared and MSWD
Duplicate /free W w_chiSq
w_chiSq=W*(Ywave-b*Xwave-a)^2
// use same global variables as Igor's CurveFit
Execute /Q "variable /G V_chisq, V_q"
NVAR chi2=V_chisq
NVAR pChi=V_q
chi2=sum(w_chiSq)
variable DoF=numpnts(Xwave)-2
variable /G V_MSWD=chi2/DoF
// calculate uncertainties in fit coefficients
Xcorr=Xbar+w_beta
w1=W*Xcorr
xbar=sum(w1)/sum(W) // xbar is now calculated for corrected points, small-xbar in York et al 2004
w_U=Xcorr-xbar // this is small u in York et al 2004
w1=W*w_U^2
s2_b=1/sum(w1)
s2_a=1/sum(W)+xbar^2*s2_b
variable sigma_a=sqrt(s2_a)
variable sigma_b=sqrt(S2_b)
Make /D/O w_sigma={sigma_a,sigma_b}
// these are LSE calculated for points adjusted to maximum liklihood positions,
// and are equal to maximum likelihood errors
// calculate p value for chi-squared
pChi = 1 - StatsChiCDF(chi2, numpnts(Xwave)-2)
if(studentize==0 || pChi > studentize)
printf "Calculating %g%% confidence from a priori errors\r", 100*confidence
endif
// in the case of underdispersion, optionally use tValue for 'infinite' points
DoF = studentize==0 || pChi > studentize ? 1e10 : DoF
variable tValue = StatsInvStudentCDF(1-(1-confidence)/2, DoF)
// calculate confidence interval for fit coefficients
printf "Coefficient values ± %g%% Confidence Interval\r", 100*confidence
printf "a = %g ± %g\r", a, tValue*w_sigma[0]
printf "b = %g ± %g\r", b, tValue*w_sigma[1]
Make /D/O W_ParamConfidenceInterval={tValue*w_sigma[0],tValue*w_sigma[1],confidence}
// calculate covariance matrix
variable cov = -xbar*s2_b // cov(a,b)
Make /D/O/N=(2,2) M_Covar
M_Covar = { {s2_a, cov}, {cov, s2_b} }
Print w_coef
Print w_sigma
printf "M_covar = {{%g,%g},{%g,%g}}\r", s2_a, cov, cov, s2_b
printf "χ2 = %g, MSWD = %g, p(χ2) = %g\r", chi2, V_MSWD, pChi
// create confidence band waves
Make /O/N=200 $"UC_"+NameOfWave(Ywave) /wave=UC, $"LC_"+NameOfWave(Ywave) /wave=LC
SetScale /I x, leftx(wfit), pnt2x(wfit,numpnts(wfit)-1), UC, LC
if(CB==2)
// calculate confidence interval, following Ludwig (1980)
// Calculation of uncertainties of U-Pb isotope data
// Earth and Planetary Science Letters, 46: 212-220
// switch to using variable names consistent with Ludwig reference:
// i = intercept, s = slope
// xbar is for corrected points. do the same for ybar:
Ycorr = a + xCorr * b
w1=W*Ycorr
ybar=sum(w1)/sum(W)
variable deltaTheta = tValue*sigma_b*cos(atan(b))^2
variable sSym = (tan(atan(b)+deltaTheta) + tan(atan(b)-deltaTheta))/2
variable iSym = ybar - sSym*xbar
variable deltaS = (tan(atan(b) + deltaTheta) - tan(atan(b)-deltaTheta) )/2
variable deltaI = tValue*sigma_a + (deltaS - tValue*sigma_b) * xbar
UC = iSym + sSym*x + sqrt(deltaI^2 + deltaS^2*x*(x-2*xbar))
LC = iSym + sSym*x - sqrt(deltaI^2 + deltaS^2*x*(x-2*xbar))
printf "Symmetric confidence bands: %s, %s\r", PossiblyQuoteName(NameOfWave(UC)), PossiblyQuoteName(NameOfWave(LC))
else
// the 'normal' form for confidence interval, non-symmetrized.
UC = a + b*x + sqrt(tValue^2*s2_a + tValue^2*s2_b*x*(x-2*xbar))
LC = a + b*x - sqrt(tValue^2*s2_a + tValue^2*s2_b*x*(x-2*xbar))
endif
CheckDisplayed Ywave, wfit, UC, LC
if(!(V_flag&0x01))
return 1
endif
if(!(V_flag&0x02))
AppendToGraph wfit
endif
if(CB && !(V_flag&0x04))
AppendToGraph UC
endif
if(CB && !(V_flag&0x08))
AppendToGraph LC
endif
end
// For normally distributed and possibly correlated errors in x and y.
// When correlation coefficients in Rwave are all 0,
// YorkFit(Ywave, Yerror, Xwave, Xerror, Rwave, confidence=0.95, CB=1, studentize=1)
// is the same as
// CurveFit/ODR=2/X=1 line Ywave /X=Xwave /W=sigmaY /I=1 /XW=sigmaX /F={0.95, 5}
// Writes to W_coef, W_sigma, V_chisq, V_q, V_MSWD, M_covar, and
// W_ParamConfidenceInterval
// Uses method of York (1969) Least squares fitting of a line with
// correlated errors. Earth and Planetary Science Letters, 5: 320-324
// Maximum likelihood errors after York et al. (2004)
// Unified equations for the slope, intercept, and standard errors of the
// best straight line. American Journal of Physics, 72: 367-375.
// optional parameters for YorkFit
// errors defines the type of errors in Yerror and Xerror waves. Set
// errors to 1 (default) for se, 2 for 2*se, 0: for reciprocal errors.
// confidence supplies the confidence level for calculating confidence
// interval for fit coefficients and confidence bands.
// CB determines whether to append confidence bands to plot. CB = 1
// (default): plot confidence bands, CB = 2: calculate and plot Ludwig
// (1980) style confidence bands. These are 'symmetrized', such that
// confidence bands conicide with those for a regression of x against y.
// When CB = 0, confidence bands are recalculated but not added to the
// top graph.
// Set studentize to 1 (default) to always use data scatter to calculate
// studentized confidence interval. Otherwise, if p(chisq) > studentize
// or studentize = 0, a priori errors are used to calculate confidence
// interval. This option is provided as method for adjusting the
// confidence bands for fits to underdispersed data.
function YorkFit(Ywave, Yerror, Xwave, Xerror, Rwave, [errors, confidence, CB, studentize])
wave Ywave, Yerror, Xwave, Xerror, Rwave
variable errors, confidence, CB, studentize
errors = ParamIsDefault(errors) ? 1 : errors
confidence = ParamIsDefault(confidence) ? 0.95 : confidence
CB = ParamIsDefault(CB) ? 1 : CB
studentize = ParamIsDefault(studentize) ? 1 : studentize
// the fit coefficients
variable a, b
variable oldb, Xbar, Ybar, s2_a, s2_b
Make /D/free/N=(numpnts(Xwave)) sigmaX, sigmaY, weightX, weightY, W, w_alpha, w_beta, w_U, w_V, w1, w2, Xcorr, Ycorr
sigmaX = errors > 0 ? Xerror/errors : 1/Xerror
sigmaY = errors > 0 ? Yerror/errors : 1/Yerror
// first get an estimate of b
CurveFit /Q/X=1 line Ywave /X=Xwave /W=sigmaY /D/I=1
wave wfit = $"fit_"+NameOfWave(Ywave)
wave /D w_coef
b=w_coef[1]
weightX = 1/sigmaX^2
weightY = 1/sigmaY^2
w_alpha=sqrt(weightX*weightY)
do
oldb=b
W=weightX[p]*weightY[p]/(weightX[p]+b^2*weightY[p]-2*b*Rwave*w_alpha)
w1=W*Xwave
Xbar=sum(w1)/sum(W)
w1=W*Ywave
Ybar=sum(w1)/sum(W)
w_U=Xwave-Xbar
w_V=Ywave-Ybar
w_beta=W*(w_U/weightY + b*w_V/weightX - (b*w_U + w_V)*Rwave/w_alpha)
w1=W*w_beta*w_V
w2=W*w_beta*w_U
b=sum(w1)/sum(w2)
while(abs((b - oldb)/oldb) > 1e-14)
Make /D/O w_coef
a=Ybar-b*Xbar
w_coef={a, b}
wfit=a+b*x
printf "%s = %g%+g*x\r", GetWavesDataFolder(wfit, 4), a, b
// calculate chi-squared and MSWD
Duplicate /free W w_chiSq
w_chiSq=W*(Ywave-b*Xwave-a)^2
// use same global variables as Igor's CurveFit
Execute /Q "variable /G V_chisq, V_q"
NVAR chi2=V_chisq
NVAR pChi=V_q
chi2=sum(w_chiSq)
variable DoF=numpnts(Xwave)-2
variable /G V_MSWD=chi2/DoF
// calculate uncertainties in fit coefficients
Xcorr=Xbar+w_beta
w1=W*Xcorr
xbar=sum(w1)/sum(W) // xbar is now calculated for corrected points, small-xbar in York et al 2004
w_U=Xcorr-xbar // this is small u in York et al 2004
w1=W*w_U^2
s2_b=1/sum(w1)
s2_a=1/sum(W)+xbar^2*s2_b
variable sigma_a=sqrt(s2_a)
variable sigma_b=sqrt(S2_b)
Make /D/O w_sigma={sigma_a,sigma_b}
// these are LSE calculated for points adjusted to maximum liklihood positions,
// and are equal to maximum likelihood errors
// calculate p value for chi-squared
pChi = 1 - StatsChiCDF(chi2, numpnts(Xwave)-2)
if(studentize==0 || pChi > studentize)
printf "Calculating %g%% confidence from a priori errors\r", 100*confidence
endif
// in the case of underdispersion, optionally use tValue for 'infinite' points
DoF = studentize==0 || pChi > studentize ? 1e10 : DoF
variable tValue = StatsInvStudentCDF(1-(1-confidence)/2, DoF)
// calculate confidence interval for fit coefficients
printf "Coefficient values ± %g%% Confidence Interval\r", 100*confidence
printf "a = %g ± %g\r", a, tValue*w_sigma[0]
printf "b = %g ± %g\r", b, tValue*w_sigma[1]
Make /D/O W_ParamConfidenceInterval={tValue*w_sigma[0],tValue*w_sigma[1],confidence}
// calculate covariance matrix
variable cov = -xbar*s2_b // cov(a,b)
Make /D/O/N=(2,2) M_Covar
M_Covar = { {s2_a, cov}, {cov, s2_b} }
Print w_coef
Print w_sigma
printf "M_covar = {{%g,%g},{%g,%g}}\r", s2_a, cov, cov, s2_b
printf "χ2 = %g, MSWD = %g, p(χ2) = %g\r", chi2, V_MSWD, pChi
// create confidence band waves
Make /O/N=200 $"UC_"+NameOfWave(Ywave) /wave=UC, $"LC_"+NameOfWave(Ywave) /wave=LC
SetScale /I x, leftx(wfit), pnt2x(wfit,numpnts(wfit)-1), UC, LC
if(CB==2)
// calculate confidence interval, following Ludwig (1980)
// Calculation of uncertainties of U-Pb isotope data
// Earth and Planetary Science Letters, 46: 212-220
// switch to using variable names consistent with Ludwig reference:
// i = intercept, s = slope
// xbar is for corrected points. do the same for ybar:
Ycorr = a + xCorr * b
w1=W*Ycorr
ybar=sum(w1)/sum(W)
variable deltaTheta = tValue*sigma_b*cos(atan(b))^2
variable sSym = (tan(atan(b)+deltaTheta) + tan(atan(b)-deltaTheta))/2
variable iSym = ybar - sSym*xbar
variable deltaS = (tan(atan(b) + deltaTheta) - tan(atan(b)-deltaTheta) )/2
variable deltaI = tValue*sigma_a + (deltaS - tValue*sigma_b) * xbar
UC = iSym + sSym*x + sqrt(deltaI^2 + deltaS^2*x*(x-2*xbar))
LC = iSym + sSym*x - sqrt(deltaI^2 + deltaS^2*x*(x-2*xbar))
printf "Symmetric confidence bands: %s, %s\r", PossiblyQuoteName(NameOfWave(UC)), PossiblyQuoteName(NameOfWave(LC))
else
// the 'normal' form for confidence interval, non-symmetrized.
UC = a + b*x + sqrt(tValue^2*s2_a + tValue^2*s2_b*x*(x-2*xbar))
LC = a + b*x - sqrt(tValue^2*s2_a + tValue^2*s2_b*x*(x-2*xbar))
endif
CheckDisplayed Ywave, wfit, UC, LC
if(!(V_flag&0x01))
return 1
endif
if(!(V_flag&0x02))
AppendToGraph wfit
endif
if(CB && !(V_flag&0x04))
AppendToGraph UC
endif
if(CB && !(V_flag&0x08))
AppendToGraph LC
endif
end
Forum
Support
Gallery
Igor Pro 9
Learn More
Igor XOP Toolkit
Learn More
Igor NIDAQ Tools MX
Learn More
A few years ago I was working on line-fitting in three dimensions. I found the following references useful: (the first is a paper-back book)
S. Van Huffel and J. Vandewalle, “The total least squares problem,” Society for Industrial and Applied Mathematics (SIAM), (Philadelphia, 1991) ISBN 0-89871-275-0.
G. H. Golub and C. F. Van Loan, “An analysis of the total least squares problem,” SIAM J. Numer. Anal., 17, no. 6, December 1981, pp. 883-893.
S. Van Huffel and J. Vandewalle, “Analyses and properties of the generalized total least squares problem AX≈B when some or all columns in A are subject to error,” SIAM J. Matrix. Anal. Appl., 10, no. 3, July 1989, pp. 294-315.
T. J. Abatzoglou and J. M. Mendel, “Constrained total least squares,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’87), 12, 1987, pp. 1485-1488.
August 10, 2020 at 04:40 am - Permalink
Thanks, Stephen, those are useful references.
August 11, 2020 at 09:42 am - Permalink
You're welcome, Tony. I would also suggest looking at IgorPro's Singular Value Decomposition operation
DisplayHelpTopic "MatrixSVD"
That, in conjunction with diagonalization of the covariance matrix, might be useful. SVD can be handy for doing line fits through 2- and 3-D point clouds. Finding confidence levels of the resulting coefficients is beyond my skill set.
August 11, 2020 at 10:53 am - Permalink
For anyone following this thread, I have edited the code snippet to add some additional options for calculating confidence bands.
October 11, 2020 at 02:43 am - Permalink