Advice for fitting functions with a random component
dtadams
generateSpectrum([many inputs])
that simulates the results of an experiment and generates a wave. I already have a wave of actual data generated from an actual experiment of this type. The gist of what I'm looking to do is use FuncFit (or a similar function) to try many input combinations in generateSpectrum() and fit it to my existing data.I've used FuncFit successfully in the past, but only on simple functions with a small number of inputs. Here are my concerns with this one in particular:
1) The function generateSpectrum() has a very strong random component. Effectively, a lot of random noise will be added to each result.
2) In order to reduce this noise, generateSpectrum() requires much more time to run -- upwards of 60 seconds per call for greatest accuracy. There is a chance I could reduce this, but it would take considerable effort.
Before embarking on this, I'm looking for some advice to see if I can avoid getting bad results from FuncFit or overloading Igor completely. Specifically:
1) On a function with (say) 10 inputs, how many times does FuncFit need to run the function to find a fit? Is it in the 10s, 100s, or 1000s?
2) Are there ways that I can reduce the number of times FuncFit runs my function to get a faster, rougher fit?
3) How well does FuncFit handle randomness in its functions? Is its algorithm stable enough to handle a lot of random noise?
Hope this helps,
Kurt
January 8, 2015 at 12:07 am - Permalink
Igor has to compute numerical approximations of the derivatives for the Hessian matrix. That means that at each iteration the fit function must be run once with the current set of fit coefficients and again for each fit coefficient as they are perturbed one at a time. That has to be done at each value of the independent variable, so the the number of function evaluations will be N*(M+1) where N is the number of input data points and M is the number of non-held fit coefficients.
If you use a standard format fit function Igor has to call it for every data point, so you get Igor's function call overhead for every data point and every fit coefficient. If you use an all-at-once fit function you get that overhead only for each coefficient because the fit function computes all data points in a single call. If your function is quick to evaluate, that can be significantly faster. An expensive fit function won't benefit as much because the function call overhead will be a smaller proportion of the overall time.
Hm.... The V_FitTol variable might be of help. See this help topic for more info:
DisplayHelpTopic "Special Variables for Curve Fitting"
If you find that a fit goes many iterations, setting that to a larger number might truncate the number of iterations. I can't really tell you if the result will be a good one as I haven't really tried using that variable much.
Hard to say. If there's so much noise that it obscures features needed to constrain fit coefficients (think here of noise that's similar in magnitude to peak heights) then it will be a problem. Unfortunately this kind of question can't be answered in a straight-forward way. To some extent you just need to experiment to figure out the trade-offs in your system.
I take it that generateSpectrum() is computing and summing many simulations that each have a lot of noise, in order to reduce the noise. Possibly we could help you optimize the code. Some general observations:
1) Igor's compiler is primitive- it doesn't make any sort of optimizations like pulling out common subexpressions. So if you have constants that are computed, make sure you don't compute them inside a loop. If you have an expression that uses something like "2*x" in several places, pre-compute it.
2) Wave assignments are much faster than loops, and MatrixOP is faster (in general) than wave assignments. I've seen speed-ups of more than a factor of ten replacing loops and wave assignments.
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
January 8, 2015 at 09:47 am - Permalink
johnweeks -- You're right, a lot of my problems with speed are because I am using loops instead of wave operations, and I've known this for a while. I didn't know speedups could be so dramatic though, so I'll move that fix to a higher priority. Thanks so much for the thorough explanations! All of this it come in handy as I move forward with this.
January 9, 2015 at 02:42 pm - Permalink