Fit a straight line to X-Y data with correlated errors

// Use an LSE method to find best fit line y = a + bx // For normally distributed and possibly correlated errors in x and y. // When correlation coefficients in Rwave are all 0, // YorkFit(Ywave, Yerror, Xwave, Xerror, Rwave, confidence=0.95, CB=1, studentize=1) // is the same as // CurveFit/ODR=2/X=1 line Ywave /X=Xwave /W=sigmaY /I=1 /XW=sigmaX /F={0.95, 5} // Writes to W_coef, W_sigma, V_chisq, V_q, V_MSWD, M_covar, and // W_ParamConfidenceInterval // Uses method of York (1969) Least squares fitting of a line with // correlated errors. Earth and Planetary Science Letters, 5: 320-324 // Maximum likelihood errors after York et al. (2004) // Unified equations for the slope, intercept, and standard errors of the // best straight line. American Journal of Physics, 72: 367-375. // optional parameters for YorkFit // errors defines the type of errors in Yerror and Xerror waves. Set // errors to 1 (default) for se, 2 for 2*se, 0: for reciprocal errors. // confidence supplies the confidence level for calculating confidence // interval for fit coefficients and confidence bands. // CB determines whether to append confidence bands to plot. CB = 1 // (default): plot confidence bands, CB = 2: calculate and plot Ludwig // (1980) style confidence bands. These are 'symmetrized', such that // confidence bands conicide with those for a regression of x against y. // When CB = 0, confidence bands are recalculated but not added to the // top graph. // Set studentize to 1 (default) to always use data scatter to calculate // studentized confidence interval. Otherwise, if p(chisq) > studentize // or studentize = 0, a priori errors are used to calculate confidence // interval. This option is provided as method for adjusting the // confidence bands for fits to underdispersed data. function YorkFit(Ywave, Yerror, Xwave, Xerror, Rwave, [errors, confidence, CB, studentize]) wave Ywave, Yerror, Xwave, Xerror, Rwave variable errors, confidence, CB, studentize errors = ParamIsDefault(errors) ? 1 : errors confidence = ParamIsDefault(confidence) ? 0.95 : confidence CB = ParamIsDefault(CB) ? 1 : CB studentize = ParamIsDefault(studentize) ? 1 : studentize // the fit coefficients variable a, b variable oldb, Xbar, Ybar, s2_a, s2_b Make /D/free/N=(numpnts(Xwave)) sigmaX, sigmaY, weightX, weightY, W, w_alpha, w_beta, w_U, w_V, w1, w2, Xcorr, Ycorr sigmaX = errors > 0 ? Xerror/errors : 1/Xerror sigmaY = errors > 0 ? Yerror/errors : 1/Yerror // first get an estimate of b CurveFit /Q/X=1 line Ywave /X=Xwave /W=sigmaY /D/I=1 wave wfit = $"fit_"+NameOfWave(Ywave) wave /D w_coef b=w_coef[1] weightX = 1/sigmaX^2 weightY = 1/sigmaY^2 w_alpha=sqrt(weightX*weightY) do oldb=b W=weightX[p]*weightY[p]/(weightX[p]+b^2*weightY[p]-2*b*Rwave*w_alpha) w1=W*Xwave Xbar=sum(w1)/sum(W) w1=W*Ywave Ybar=sum(w1)/sum(W) w_U=Xwave-Xbar w_V=Ywave-Ybar w_beta=W*(w_U/weightY + b*w_V/weightX - (b*w_U + w_V)*Rwave/w_alpha) w1=W*w_beta*w_V w2=W*w_beta*w_U b=sum(w1)/sum(w2) while(abs((b - oldb)/oldb) > 1e-14) Make /D/O w_coef a=Ybar-b*Xbar w_coef={a, b} wfit=a+b*x printf "%s = %g%+g*x\r", GetWavesDataFolder(wfit, 4), a, b // calculate chi-squared and MSWD Duplicate /free W w_chiSq w_chiSq=W*(Ywave-b*Xwave-a)^2 // use same global variables as Igor's CurveFit Execute /Q "variable /G V_chisq, V_q" NVAR chi2=V_chisq NVAR pChi=V_q chi2=sum(w_chiSq) variable DoF=numpnts(Xwave)-2 variable /G V_MSWD=chi2/DoF // calculate uncertainties in fit coefficients Xcorr=Xbar+w_beta w1=W*Xcorr xbar=sum(w1)/sum(W) // xbar is now calculated for corrected points, small-xbar in York et al 2004 w_U=Xcorr-xbar // this is small u in York et al 2004 w1=W*w_U^2 s2_b=1/sum(w1) s2_a=1/sum(W)+xbar^2*s2_b variable sigma_a=sqrt(s2_a) variable sigma_b=sqrt(S2_b) Make /D/O w_sigma={sigma_a,sigma_b} // these are LSE calculated for points adjusted to maximum liklihood positions, // and are equal to maximum likelihood errors // calculate p value for chi-squared pChi = 1 - StatsChiCDF(chi2, numpnts(Xwave)-2) if(studentize==0 || pChi > studentize) printf "Calculating %g%% confidence from a priori errors\r", 100*confidence endif // in the case of underdispersion, optionally use tValue for 'infinite' points DoF = studentize==0 || pChi > studentize ? 1e10 : DoF variable tValue = StatsInvStudentCDF(1-(1-confidence)/2, DoF) // calculate confidence interval for fit coefficients printf "Coefficient values ± %g%% Confidence Interval\r", 100*confidence printf "a = %g ± %g\r", a, tValue*w_sigma[0] printf "b = %g ± %g\r", b, tValue*w_sigma[1] Make /D/O W_ParamConfidenceInterval={tValue*w_sigma[0],tValue*w_sigma[1],confidence} // calculate covariance matrix variable cov = -xbar*s2_b // cov(a,b) Make /D/O/N=(2,2) M_Covar M_Covar = { {s2_a, cov}, {cov, s2_b} } Print w_coef Print w_sigma printf "M_covar = {{%g,%g},{%g,%g}}\r", s2_a, cov, cov, s2_b printf "χ2 = %g, MSWD = %g, p(χ2) = %g\r", chi2, V_MSWD, pChi // create confidence band waves Make /O/N=200 $"UC_"+NameOfWave(Ywave) /wave=UC, $"LC_"+NameOfWave(Ywave) /wave=LC SetScale /I x, leftx(wfit), pnt2x(wfit,numpnts(wfit)-1), UC, LC if(CB==2) // calculate confidence interval, following Ludwig (1980) // Calculation of uncertainties of U-Pb isotope data // Earth and Planetary Science Letters, 46: 212-220 // switch to using variable names consistent with Ludwig reference: // i = intercept, s = slope // xbar is for corrected points. do the same for ybar: Ycorr = a + xCorr * b w1=W*Ycorr ybar=sum(w1)/sum(W) variable deltaTheta = tValue*sigma_b*cos(atan(b))^2 variable sSym = (tan(atan(b)+deltaTheta) + tan(atan(b)-deltaTheta))/2 variable iSym = ybar - sSym*xbar variable deltaS = (tan(atan(b) + deltaTheta) - tan(atan(b)-deltaTheta) )/2 variable deltaI = tValue*sigma_a + (deltaS - tValue*sigma_b) * xbar UC = iSym + sSym*x + sqrt(deltaI^2 + deltaS^2*x*(x-2*xbar)) LC = iSym + sSym*x - sqrt(deltaI^2 + deltaS^2*x*(x-2*xbar)) printf "Symmetric confidence bands: %s, %s\r", PossiblyQuoteName(NameOfWave(UC)), PossiblyQuoteName(NameOfWave(LC)) else // the 'normal' form for confidence interval, non-symmetrized. UC = a + b*x + sqrt(tValue^2*s2_a + tValue^2*s2_b*x*(x-2*xbar)) LC = a + b*x - sqrt(tValue^2*s2_a + tValue^2*s2_b*x*(x-2*xbar)) endif CheckDisplayed Ywave, wfit, UC, LC if(!(V_flag&0x01)) return 1 endif if(!(V_flag&0x02)) AppendToGraph wfit endif if(CB && !(V_flag&0x04)) AppendToGraph UC endif if(CB && !(V_flag&0x08)) AppendToGraph LC endif end

s.r.chinn

A few years ago I was working on line-fitting in three dimensions. I found the following references useful: (the first is a paper-back book)

S. Van Huffel and J. Vandewalle, “The total least squares problem,” Society for Industrial and Applied Mathematics (SIAM), (Philadelphia, 1991) ISBN 0-89871-275-0.

G. H. Golub and C. F. Van Loan, “An analysis of the total least squares problem,” SIAM J. Numer. Anal., 17, no. 6, December 1981, pp. 883-893.

S. Van Huffel and J. Vandewalle, “Analyses and properties of the generalized total least squares problem AX≈B when some or all columns in A are subject to error,” SIAM J. Matrix. Anal. Appl., 10, no. 3, July 1989, pp. 294-315.

T. J. Abatzoglou and J. M. Mendel, “Constrained total least squares,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’87), 12, 1987, pp. 1485-1488.

August 10, 2020 at 04:40 am - Permalink

tony

Thanks, Stephen, those are useful references.

August 11, 2020 at 09:42 am - Permalink

s.r.chinn

You're welcome, Tony. I would also suggest looking at IgorPro's Singular Value Decomposition operation

DisplayHelpTopic "MatrixSVD"

That, in conjunction with diagonalization of the covariance matrix, might be useful. SVD can be handy for doing line fits through 2- and 3-D point clouds. Finding confidence levels of the resulting coefficients is beyond my skill set.

August 11, 2020 at 10:53 am - Permalink

tony

For anyone following this thread, I have edited the code snippet to add some additional options for calculating confidence bands.

October 11, 2020 at 02:43 am - Permalink

Fit a straight line to X-Y data with correlated errors

Igor Pro 9

Igor XOP Toolkit

Igor NIDAQ Tools MX