Lognormal Distributions
rhjpires
When I look at the fitting parameters I get X0 and "Width" which I suppose to be related to the mean and the standard deviation.
Can anyone please tell me what is the relationship between X0 and mean as well as "Width" and the standard deviation of the data...
The "width" appears to be exceedingly small...my data range is 200 and I get a width of 0,8... how does this translates to the standard deviation of the data?
Cheers and thanks for the help!
A side note: I can see that there is an offset of the fitted curve relative to the histogram data, an unfortunate consequence of the way histogram data is represented in Igor. Check out these help topics:
DisplayHelpTopic "Guided Tour 3 - Histograms and Curve Fitting"
DisplayHelpTopic "Graphing Histogram Results"
DisplayHelpTopic "Curve Fitting to a Histogram"
To view the topics, copy a command, paste it into Igor's command line and press Enter.
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
November 16, 2012 at 10:01 am - Permalink
Still, this does suggest that in their revisions to Igor, the good folks at WaveMetrics might want to add "bin-centered" options to bars and cityscape modes for IP7.
November 17, 2012 at 11:14 am - Permalink
I will check these later with care and will come back in case I run into anything major!
Thank you again!
Cheers,
R.
November 21, 2012 at 07:18 am - Permalink
The help file posted here describes the calculation of the lognormal peak area, B, as:
B = A*x0*width*sqrt(pi)*exp(width^2/4)
Using simple error propagation, I believe it then follows that the uncertainty in B, sigmaB, is:
sigmaB = sqrt(sigmaB) * B
...using
sigma(exp(width^2/4)) = sigma(width) / 2
. Unless I've made an algebraic error, this is way off what MPF2 reports.I dug a bit into the MPF2 code and found that LogNormalPeakParams() in Peakfunctions2.ipf is used to calculate the lognormal uncertainties. I pasted the code used for the area below. I believe cw=coefficient wave, sw=sigma wave (?). To be honest, I haven't 100% decoded what sw means, but I think I can already conclude that the approach here is way different from my approach above.
So... has the MPF code changed to make the equation for B above invalid, or am I making a big mistake somewhere?
Thanks a lot for any help!
j
outWave[2][0] = cw[0]*cw[1]*cw[2]*sqrt(pi*exp(cw[1]*cw[1]/2))
outWave[2][1] = sw[0][0]^2*LogNormalDAreaDw0(cw)^2
outWave[2][1] += sw[1][1]^2*LogNormalDAreaDw1(cw)^2
outWave[2][1] += sw[2][2]^2*LogNormalDAreaDw2(cw)^2
outWave[2][1] += 2*sw[0][1]^2*LogNormalDAreaDw0(cw)*LogNormalDAreaDw1(cw)
outWave[2][1] += 2*sw[0][2]^2*LogNormalDAreaDw0(cw)*LogNormalDAreaDw2(cw)
outWave[2][1] += 2*sw[1][2]^2*LogNormalDAreaDw1(cw)*LogNormalDAreaDw2(cw)
outWave[2][1] = sqrt(outWave[2][1])
// helper functions
Function LogNormalDAreaDw0(w)
Wave w
return SqrtPi*w[1]*w[2]*sqrt(exp((w[1]^2)/2))
end
Function LogNormalDAreaDw1(w)
Wave w
return 0.5*SqrtPi*w[0]*w[2]*(2+w[1]^2)*sqrt(exp((w[1]^2)/2))
end
Function LogNormalDAreaDw2(w)
Wave w
return sqrt(exp((w[1]^2)/2))*SqrtPi*w[0]*w[1]
end
Function LogNormalDFWHMDw0(w)
Wave w
return exp(w[1]*SqrtLn2) - exp(-w[1]*SqrtLn2)
end
Function LogNormalDFWHMDw1(w)
Wave w
return SqrtLn2 * w[0] * (exp(-w[1]*SqrtLn2) + exp(w[1]*SqrtLn2))
end
December 15, 2012 at 03:30 am - Permalink
http://en.wikipedia.org/wiki/Cholesky_decomposition#Monte_Carlo_simulat…
You need the covariance matrix for this, which is M_Covar. The following code uses these parameters:
coefs - the fit parameters
M_covar - the covariance matrix
holdstring - which parameters were held, which were varied.
howmany - how many sample vectors you want. Make it a large number, say 1000.
Basically, the code makes a matrix filled with gaussian noise, all of standard deviation 1. You then multiply by the cholesky decomposition of the covariance matrix and take the transpose. This gives a matrix filled with correlated noise, to which you add the fitted parameters. This gives you a matrix of sample vectors. These sample vectors have the following properties:
1) The mean of the sample vectors is equal to the vector containing the fitted parameters.
2) The standard deviation of a given parameter in this matrix is equal to the uncertainty in the parameter (that's what the covariance matrix encodes).
3) In addition, the covariance between the parameters is the same as in the covariance matrix.
What you can then do is calculate the area under the log normal distribution for each of the sample vectors, using your equation above. THe mean and standard deviation of all these values give the mean area of the curve, and the uncertainty in the area. This is quite a robust method of doing the calculation you require.
wave coefs, m_covar
string holdstring
variable howmany
variable varying = 0, ii, whichcol
duplicate/free coefs, tempcoefs
duplicate/free M_covar, tempM_covar
varying = strlen(holdstring)
for(ii = dimsize(coefs, 0)-1 ; ii >= 0 ; ii-=1)
if(str2num(holdstring[ii]))
deletepoints/M=0 ii, 1, tempcoefs, tempM_covar
deletepoints/M=1 ii, 1, tempM_covar
varying -=1
endif
endfor
//create some gaussian noise
make /free/d/n=(varying, howmany) noises = gnoise(1., 2)
//and create the correlated noise from the covariance matrix.
matrixop/free correlatedNoises = (chol(tempm_covar) x noises)^t
make/n=(howmany, dimsize(coefs, 0))/d/o M_montecarlo
Multithread M_montecarlo[][] = coefs[q]
//now add the correlated noise back to the parameters
whichcol = dimsize(correlatedNoises, 1) - 1
for(ii = dimsize(coefs, 0)-1 ; ii >= 0 ; ii-=1)
if(!str2num(holdstring[ii]))
M_montecarlo[][ii] += correlatedNoises[p][whichCol]
whichCol -= 1
endif
endfor
End
December 15, 2012 at 06:22 pm - Permalink
+ sigmax0^2 * (dB/dx0)^2
+ sigmaw^2 * (dB/dw)^2
+ COVARIANCETERMS
COVARIANCETERMS=2*(dB/dA)*(dB/dxo)*cov_A_x0
+ 2*(dB/dA)*(dB/dw)*cov_A_w
+ 2*(dB/x0)*(dB/dw)*cov_x0_w
where dB/dA is the partial derivative of B with respect to A, etc.
dB/dA = x0 * w * sqrt(Pi) * exp((w^2)/4) = B/A
dB/dx0 = A * w * sqrt(Pi) * exp((w^2)/4) = B/x0
dB/dw = sqrt(Pi) * A * x0 * exp((w^2)/4)(1 + 0.5 * w^2) = B/w * (1+0.5*w^2)
Thus:
sigma_B^2 = sigmaA^2 * B^2/A^2 + sigmax0^2 * B^2/x0^2 + sigmaw^2 * (sqrt(Pi) * A * x0 * exp((w^2)/4)(1 + 0.5 * w^2))^2 + COVARIANCETERMS
= sigmaA^2 * B^2/A^2 + sigmax0^2 * B^2/x0^2 + sigmaw^2 * B^2/w^2 * (1 + 0.5 * w^2)^2 + COVARIANCETERMS
I would definitely plugin the covariance terms. You should have these in your covariance matrix.
You can see why it's easier to do a numerical calculation, as per the previous post.
A.
A.
December 15, 2012 at 09:29 pm - Permalink
As at the moment I am doing a kind-of-explorative calculation, I'd like to avoid the thoroughness of the Cholesky-Monte Carlo approach for now. But, your post points out that I am ignoring the covariance of variables when I propagate my uncertainty -- so that's likely why my approach was different (and less accurate) than the MPF code.
December 16, 2012 at 04:27 am - Permalink
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
December 17, 2012 at 09:26 am - Permalink
As a side note, you can compute the Cholesky decomposition with the chol keyword in the MatrixOP operation.
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
December 17, 2012 at 09:32 am - Permalink
I'm not sure if that's quite right John. I would expect the results from the numerical method and the propagation of error to be the same, as they are both directly calculated using the covariance matrix.
Whilst the error propagation route works well my fear is that a slight mathematical error would sink the ship. People also forget to include covariance, which is a big factor.
In comparison the numerical way is pretty much guaranteed to work, if the covariance matrix is "nice" (well conditioned, etc). It also has the advantage of working for all problems, and doesn't have to be amended.
It's not clear from your message if you actually understand the technique. What it boils down to is synthesis of many trial fit waves. If you took the mean of those fit waves it would be the same as the best fit coefficients. In addition the standard deviation of a given fit parameter over the fit waves would be the same as the parameter uncertainty, and the covariance between the parameters is correct.
These trial waves are synthesized from the covariance matrix itself (and a whole lot of gaussian noise), it's well worth your time reading the code more carefully.
Then what you do is work out the derived parameter from each of the trial waves, working out the mean and sd at the end.
A.
December 17, 2012 at 02:59 pm - Permalink
Well, see, that's the part where I think the methods differ. What you're telling me is that the method is a slick way to generate a bunch of coefficient vectors with statistics that reflect the estimated covariance matrix from the fit. But then you pass those coefficient vectors through whatever (possibly nonlinear) computation you are interested in (like computing area from width and height of a Voigt), so the output of the computation includes the full nonlinearity of the derived parameter. Standard propagation of errors applies a first derivative of the transformation and calls it good. It's a Taylor series truncated at one term.
So really your method is a hybrid- it accepts the covariance matrix from the fit as the basic coefficient error. For a nonlinear fit function that's just an approximation, sometimes not very good. But at least from that point you use exact math, and you could find asymmetric error bars if you wanted to.
Also your method has the distinct advantage of working even on computations that don't have an easy derivative. I might steal it some day to apply to the things in MPF2 that use numerical techniques to get derived parameters.
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
December 17, 2012 at 04:00 pm - Permalink