Creating synthetic data sets by randomizing the function parameters

Hello, 

I have data in a waveform that I fitted with a 5-parameter function. Unfortunately, it is not possible to analytically deduce the maximum from the fit function, I was wondering if its possible to generate, for example, 100 synthetic data sets by randomizing the fit parameters using gnoise and deduce the uncertainty numerically. Can anyone advise me on this matter? I must admit that I am not well versed in Igor Programming. Any help would be greatly appreciated. 

Please see below the fit function:

Function HEMG11(w,x) : FitFunc
    Wave w
    Variable x

    Variable temp1,temp2
    temp1=(1/2/w[1])* exp(((x-w[3])/w[1])+((w[0]^2)/(2*w[1]^2)))*erfc((1/sqrt(2))*(((x-w[3])/w[0])+(w[0]/w[1])))
    temp2=(1/2/w[2])* exp(((w[3]-x)/w[2])+((w[0]^2)/(2*w[2]^2)))*erfc((1/sqrt(2))*(((w[3]-x)/w[0])+(w[0]/w[2])))

    return (w[4])*((1-0.5)*(temp1)+(0.5)*(temp2))+w[5]
End

Greetings, 

Siva

Here is an experiment that plots the function with noise and then allows you to do a curve fit back to the noisy data.

Clarify what you mean by "... analytically deduce the maximum ...". You cannot take the analytical derivative to find the maximum? Or you are not sure how to find the maximum after the curve fit? Igor Pro cannot do the former (try Mathematica or Maple). Igor Pro can do the latter.

Clarify also what you mean by "... deduce the uncertainty numerically ...". Igor Pro returns the uncertainties on all six input parameters. What more do you want?

Test HEMG.pxp (15.63 KB)


Thank you very much for looking into my problem and creating the example file. 

The HEMG PDF neither has a defined mode nor is it possible to find it using the first derivative test. I already tried it with Mathematica. 

My current method is to find the maximum by performing wavestat of the fit curve and to print V_maxLoc.  

My problem is how to assign an uncertainty to this value? 

I was thinking of generating "n" number of distributions (or waves) by randomizing the fit parameters 'n' times. 

For example, if the following are the result of a HEMG fit to experimental data. 

ww_0    =3.1403 ± 0.503
ww_1    =2.81 ± 1.06
ww_2    =2.7278 ± 1.12
ww_3    =8.0757 ± 0.0674
ww_4    =1.974 ± 0.271
ww_5    =0.0011702 ± 0.00925

Then generate 'n' number of each parameter by randomizing them 'n' number of times 
(ww_0+ gnoise(0.503), ww_1 + gnoise(1.06) etc). This would create 'n' number of synthetic distributions.
 Fit them again with HEMG and then use the standard deviation of the maximum (from V_maxLoc) as the uncertainty.
Hope I am not being completely clueless. Would this be the same as creating 'n' number of synthetic contributions by noise to the initial curve 'n' number of times?
Thank you again for looking into this.
PS: In the end, what I need to find is the uncertainty in the so-called R80, i.e. the x-value of the 80% of the maximum in the distal falloff.


 

I guess your function is a mixture of two exponentially modified Gaussians? You may want to give a bit more details on the origin of this function. Someone might know something if we are aware of what we are dealing with. That said, I have no idea how to give an useful error even for a single EMG. The common solution is to just give up, or use the errors of the position and skew as a rough estimate. For example, ww_3 = 8.0757 ± 0.0674 may tell you the right ballpark number for the error of the position if the skew is not extreme.

But just a note on how to find the maximum in the first place. Using WaveStats is only as precise as the current fidelity of your fit output, which can be quite coarse. You might want to run Optimize to find the precise extremum, e.g.:

Optimize/A=1/L=(leftXval)/H=(rightXval) HEMG11, W_Coef

 

Thank you for the info. Very interesting. It seems that one of the EMGs may or may not dominate the position of the maximum. This probably makes a guess of the error via the position coefficient quite unreliable. Maybe someone has an idea on how to simulate the error here. I would be interested as well, for the sake of finding errors for a single EMG peak.

I wonder if it's possible (or even allowed) to use the confidence band to define the uncertainty of the R80.  I tried two methods. 

(1) simply take the width of the confidence band at R80 and, (2) use the maximum extent of the confidence band at the maximum and defined the corresponding uncertainty at R80. I used wavestat, maxLoc, and FindLevel commands to deduce the values. I compared the standard deviation of the  R80 from 20 independent experimental data sets. It seems that the first method underestimates the uncertainty and the second one overtly overestimates the uncertainty. 

Please see the attached the explanation. Please let me know your thoughts on this. 

 

Error_from_confidence_band.pdf (398.06 KB)

> I would be interested as well, for the sake of finding errors for a single EMG peak.

As near as I see, the single EMG peak lends itself to an analytical solution for the position of its maximum. The combination causes problems.

HEMG Problem.pdf (70.56 KB)

For functions having no analytic derivatives, our Multipeak Fit package finds maxima/minima using Optimize and level info like the half-max in FWHM using FindRoots.

Getting uncertainties is much more difficult! The best methods may be bootstrap or jackknife methods (which are closely related) but difficult to apply correctly to curve fits. Our support for such methods is limited, consisting primarily of the StatsResample operation.

Jeffrey, are you sure you have a solution for the position of the EMG? It seems to me as if the software you used to derive the result did not treat erfc() as the complementary error function but rather a constant, right? I would be happy to hear that you can find a solution to this problem. Maybe I am overlooking something here.

I treated it as a constant because it has no dependence on x. I will be embarrassed if I missed the lesson in my 10th grade calculus class that warned me about this obvious type of mistaken assumption.

I am not sure I understand (sorry, if I missed a joke here). The error function takes an expression including x as an argument and needs to be included in the derivative. The derivative of erfc(x) alone is -2/sqrt(pi)*exp(x^2) => see, e.g., here https://mathworld.wolfram.com/Erfc.html . So in my calculations (by hand) I get a rather complex expression which is impossible (for me) to solve for x at x=0. I never tried to run a solver over this whole thing, so again I might be overlooking something.

I was reminiscing on my 10th grade calculus lessons.

I see that I mis-parsed the expression given by the OP.

temp1=(1/2/w[1])* exp(((x-w[3])/w[1])+((w[0]^2)/(2*w[1]^2)))*erfc((1/sqrt(2))*(((x-w[3])/w[0])+(w[0]/w[1])))

I read the expression as erf{ [ 1/sqrt(2) ] }. I see however that you are right ... The erf has a dependence on x.

Ugh!

@chozo

Function dExpModGauss(w,x) : FitFunc            // ST: 210626 - this function is exposed to the user to be able to fit with curve fit dialogue
    Wave w
    Variable x
    //CurveFitDialog/ Equation:
    //CurveFitDialog/ f(x) = A*w/abs(tau)*sqrt(2*pi)/2 * exp((x0-x)/tau + 0.5*(w/abs(tau))^2) * erfc(sqrt(1/2)*(sign(tau)*(x0-x)/w + w/abs(tau))) + y0
    //CurveFitDialog/ End of Equation
    //CurveFitDialog/ Independent Variables 1
    //CurveFitDialog/ x
    //CurveFitDialog/ Coefficients 5
    //CurveFitDialog/ w[0] = A
    //CurveFitDialog/ w[1] = x0
    //CurveFitDialog/ w[2] = y0
    //CurveFitDialog/ w[3] = w
    //CurveFitDialog/ w[4] = tau
//  w[0]*w[3]/abs(w[4])*sqrt(2*pi)/2 * exp((w[1]-x)/w[4] + 0.5*(w[3]/abs(w[4]))^2) * erfc(sqrt(1/2)*(sign(w[4])*(w[1]-x)/w[3] + w[3]/abs(w[4]))) + w[2]

    variable A, B, C, D, E
    A = w[0]*w[3]/abs(w[4])*sqrt(2*pi)/2
   
    B = -1/w[4]
    C = w[1]/w[4] + 0.5*(w[3]/abs(w[4]))^2
   
    D = sqrt(1/2)*sign(w[4])*(-1)/w[3]
    E = sqrt(1/2)*(sign(w[4])*w[1]/w[3] + w[3]/abs(w[4]))
   
//  EMG = A * exp(Bx+C) * erfc(Dx+E)
   
    // f(x) = exp(bx+c), g(x) = erfc(dx+e)
    // f'(x) = b*exp(bx+c), g'(x) = -2*D/pi * exp(-(D*x+E)^2)
       
    variable f = exp(B*x + C)
    variable dfdx = B * exp(B*x + C)
    variable g = erfc(D*x + E)
    variable dgdx = -2*D/sqrt(pi) * exp(-(D*x+E)^2)

    return A * (f * dgdx + g * dfdx)
End

still not something you can solve analytically, so no advantage over optimize.

Tony, thanks for summarizing the derivate (and it seems you have found my code snippet in the Global Fit package ;). I have looked at this problem quite a few times over the last year. I guess a bunch of people which are much better at mathematics should have looked at the problem by now, but all searching for papers etc. turned up empty. Anyway, giving it another try for the heck of it: If I get this right, you end up for dEMG = 0 with:

2*D/(B*sqrt(pi)) = erfc(D*x+E)exp((D*x+E)^2)

There seems to be an boundary condition which may be not extremely wrong for an approximation of erfc() as per https://mathworld.wolfram.com/Erfc.html

erfc(x) ~ 2/sqrt(pi)* exp(-(x)^2) / (x+sqrt(x^2+a))

with a ~ 2 to 4/pi. If I get this right (it is already very late here) then there might be the solution:

x0 = (B^2/D^2 - a)/(2*B) - E/D

Still, the range for a is not too small. I may try tomorrow if this makes any sense and how far it is off. But then, I though quite a few times that things were close ...