
fast linear fit for large number of small data sets

fabrizio
// phasediff: 256xN matrix // maskwv, weightwv: 256 pt wave // delaysT, sigmaDelayst: N pt wave for (i=0; i < DimSize(Phasediff, 1) ; i +=1) matrixop/o pt=col(phasediff, i); copyscales phasediff, pt CurveFit/NTHR=0/Q line pt /W=weightwv /I=0 /M=maskwv delaysT[i] = w_coef[1]/(2*pi) sigmaDelaysT[i] = V_chisq/V_npnts endfor
This obviously takes some time (~5 minutes/1e6 datasets) and I would like to speed it up. It seems like the global fit package would be useful, but before spending too much time on it, I wanted to know your opinion on how to best speed up this code.
Thanks,
Fabrizio
If you are truly fitting 1e6 datasets using threading will be a good start. Going from 1 fit at a time to NCPUS per time will give you an NCPUS speedup.
Perhaps you can just use http://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regre…. This needs you to work out S.D's and means, but may be faster than curvefit.
July 25, 2012 at 11:33 am - Permalink
July 25, 2012 at 11:47 am - Permalink
CurveFit should be plenty fast for doing something like this, especially since fitting a line doesn't require iteration.
The first step in optimizing CurveFit is to suppress updates and particularly the progress window. To do this, make sure that you are running a version of Igor >= 6.21, and add /W=2 to your command before the fit type:
CurveFit/NTHR=0/Q /W=2 line pt /W=weightwv /I=0 /M=maskwv
This should give you a pretty serious speedup.
Also, to make things a bit more efficient, avoid the explicit MatrixOP and SetScale by using subrange syntax (as used in the multithreaded example below).
Next, make sure that you're not hit by this bug, which might have caused slower fitting, but I'm not sure if it even applies when fitting a line. Note that it is very unlikely that your Igor will have this bug unless you download nightly builds or are using 6.30b01.
The next step would be to use multithreading. Here's a solution:
Note that the multithreaded version is more complex and difficult to understand for beginning or even intermediate programmers. Note also that I didn't test the code. To use this function, replace the entire for loop in your example with
RunMyFit(phasediff, weightwv, maskwv, delaysT, sigmaDelaysT)
.July 26, 2012 at 12:10 am - Permalink
What is the default value for the
/NTHR
flag?If it is by default running multithreaded one can try turning that off explicitly.
Doing one line fit (which is just a linear regression) itself on multiple cores should not be very performant, at least for 256 points. And you want to do the multi threading for the 10e6 datasets.
You can also try the additonal flags
/N/M=0
and see if that makes it even faster.To do the point masking on all data before and then fit without maskwave might be worthwhile too.
July 26, 2012 at 02:20 am - Permalink
You might also look at the operation StatsLinearRegression. It will do multiple simultaneous linear regressions (I don't know about threading) and might do it faster than running the CurveFit overhead for every fit. It does lots more than you need, but does what you want on collections of waves. Oh, wait- I see you're using a mask wave and a weight wave. I guess you really do need CurveFit.
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
July 26, 2012 at 09:33 am - Permalink
July 26, 2012 at 10:45 am - Permalink
I played around with simple linear curve fitting (following numerical recipes in xxx). To get the same results as curvefit ... /w=.../m=... I multiply the weights (1/sigma^2) with the mask. This speeds up my code immensely as I replace ~100'000 256 points curvefits in a for loop with a handful of matrixops on a 100'000x256 matrix.
However, I have a numerical discrepancy between slopes obtained with matrixop vs curvefit methods which is low (<10%) but still higher than what I would like to see, especially as the mean of the differences for a large dataset is non-zero. Let me know if it would be helpful for me to post an example experiment (not with 100's of millions of points ;) demonstrating this. Or maybe this is to be expected...?
This is the curvefitting code:
And the matrixop code:
The resulting slopes for a given subset of data and the difference between both are displayed in the attached images.
I'm using Igor 64 6.3.0.1 build 14102 on windows 7
Thanks again,
Fabrizio
July 27, 2012 at 05:03 pm - Permalink