lognormal fitting
pjfd
the problem: i have an xy data set that looks like a nice lognormal distribution. however, when i fit a get this ridiculously high chi-square value, which does not mean anything because i do not have a weighting wave -nor will i have one.
the question: which is a general question for fitting - how can i evaluate goodness of fitting without a weighting wave? is there something like a kolmogorov test that i can do (or anything else)? how? i mean, the xy data set comes from two different waves, and cannot be compared to the fitted function, which is just a single wave. If i try to interpolate the xy set to a single wave, i get meaningless waves. what am i doing wrong?
cheers
P
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com In the absence of a weighting wave, the value of chi-square reported by Igor is simply the sum of the squared residuals. So if you have lots of points, or if the residuals are large, you will get a large chi-square. Well, according to Numerical Recipes, if you don't have a weighting wave (which gives the expected distribution of residuals) you can't really assess goodness of fit, because you can't tell if the residuals are unexpectedly large (which would happen when you fit a model that doesn't represent the underlying data). Igor reports the estimated errors for the fit coefficients; is that useful to you? You can also compute a reduced chi-square, V_chisq/(n-m) where n is the number of points in the data set, and m is the number of fit coefficients. That will at least give you an idea of the size of the residuals. You can get a wave with a model value for each point in the input data. Using the Curve Fit dialog, select the Output Options tab. Select _New Wave_ in the Destination menu. Fill in a name for the wave. The command generated will put model values into each destination point corresponding to the X value for that point. That's hard to say without knowing more about your data. But if the X values are not sorted, interpolate will give pretty weird results. Also, if you have NaN's (Not a Number, or a blank cell) in your data, you will get local patches within the interpolated data that are NaN's. I believe the cubic spline interpolator (Interpolate... at the bottom of the Analysis menu) will fail completely with NaN's under some cirumstances.
August 23, 2010 at 03:54 pm - Permalink
The Interpolate2 operation (Analysis->Interpolate) automatically sorts input data and removes NaNs before doing the interpolation.
Only in the "X Coords From Dest Wave" mode when the destination is an XY pair and the X destination wave contains NaNs. This is very rare. The next release of Interpolate2 will tolerate the NaNs in this situation.
August 23, 2010 at 07:18 pm - Permalink
Then plot the Y wave of model values against the Y wave of data values, while is called a Q-Q plot (for Quantile-Quantile). The closer this line is to a straight line with slope=1, the better the fit. That gives you a visual assessment of fit. A Kolmorgov-Smirnov test (StatsKSTest) of the two waves (model values and data values) will assess this statistically.
August 23, 2010 at 09:16 pm - Permalink
thanks for the help. the newwave and kolmogorov-smirnov solved the problem. this forum is very helpful.
i am still troubled by the weighting function. how can i get one? actually, i do not even know what the residuals are.
the reduced chisqr looks ok, but how do a get the degrees of freedom? it is not going to be N-2 if my data set is over 1000 points.
P
August 24, 2010 at 06:56 am - Permalink
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com A weighting wave contains one point for each Y data point. It holds the standard deviation of the estimated measurement errors for each point. These are often the same for every point. You might get this information by doing multiple samples for each point and computing the average (Y data) and standard deviation (weight) of the multiple samples. You might know it from a knowledge of the measurement process, and there are other ways to get it. Or you might not have that information. Residuals are the difference between the fitted model and the Y data. It's what's left over that wasn't accounted for by the model. No, it will be N-4 because the built-in LogNormal fitting function has four coefficients.
August 24, 2010 at 04:26 pm - Permalink