Calculating Theil-Sen Estimator
william.collins
I would like to calculate the Theil-Sen Estimator in an effort to analyze trends in electromyographic (EMG) activity recorded from rat muscles. The Theil-Sen Estimator is a robust linear regression approach that is insensitive outliers (see https://en.wikipedia.org/wiki/Theil–Sen_estimator). I would appreciate any advice about how to implement this in Igor Pro 8. The data are recorded in a one-dimensional Wave (single float 32 bit). I realize there are python and R packages that perform this calculation. However, I would like to incorporate this into a data analysis program I have developed in Igor Pro. Thank you in advance for your help.
Here is some code that I did not test. Note that it does not handle the case where you have multiple y values for the same x value. I hope this helps as a starting point.
Variable numPoints=numpnts(inX)
if(numPnts(inY)!=numPoints)
doAlert 0,"Wave length mismatch."
return 0
endif
Make/Free/D/N=(numPoints*(numPoints-1)) slopesWave=nan
Variable i,j,count=0,dx
for(i=0;i<numPoints;i+=1)
for(j=i+1;j<numPoints;j+=1)
dx=inX[j]-inX[i]
if(dx!=0)
slopesWave[count]=(inY[j]-inY[i])/dx
count+=1
endif
endfor
endfor
return median(slopesWave)
End
December 2, 2020 at 11:47 am - Permalink
In reply to Here is some code that I did… by Igor
Thank you!!! At first glance, this looks like it will do the trick! I'm looking forward to giving it a try!
December 2, 2020 at 01:10 pm - Permalink
Gosh AG ... Where's the MatrixOP version. :-)
I *think* (perhaps maybe possibly hopefully) that this might do the trick. My knowledge of MatrixOP in practice is ... sorely limited. And I did not test this (it is "off the top of my head" so to speak).
I imagine that the calculations could be sped up with ThreadSafe and by operating only on the diagonal elements. Finally, I have to wonder if the approach to creating a matrix would open the door to being able to calculate advanced statistics (uncertainties) more easily.
// no consistency check!!!
variable npts = numpnts(xarray)
variable meanslope
make/O/FREE/D/N=(npts,npts) dxmatrix, dymatrix
dxmatrix = calc_matrix(xarray, p, q)
dymatrix = calc_matrix(yarray, p, q)
MatrixOP/FREE/O meanslope = mean(replace((replace(dymatrix,0,NaN)/replace(dxmatrix,0,NaN)),NaN,0))
return (meanslope)
end
Function calc_matrix(wave ww, variable pp, variable qq)
variable vtrn
vrtn = ww[pp] - ww[qq]
return vrtn
end
December 2, 2020 at 04:13 pm - Permalink
In reply to Gosh AG ... Where's the… by jjweimer
Now that you mention it, I do need information in addition to the slope including the the y-intercept (the 'b' in y= mx + b), confidence intervals, and residuals. I am particularly interested in identifying outliers in the data that likely reflect different sources of excitation to the muscle. Unfortunately, my linear algebra is quite rusty. Therefore, any help with this would be much appreciated.
December 2, 2020 at 07:09 pm - Permalink
I would hold on my proposed code. My excitement to imagine streamlining the two for-endfor loops into a (p,q) matrix is fraught with presumptions. After off-line discussion with AG, I see that it needs major revisions to correct mistakes in principles and implementation.
As for getting other statistical values, here are some thoughts:
* To obtain the mean intercept <b>, you will have to determine yi - <m>*xi for each point-pair and take the mean of the resulting array.
* Suppose that you have <m> and <b>. How would you look for "outliers"? My crude first-guess approach would be as follows:
- Subtract (<m>*xi + <b>) from all yi values to obtain yi*.
- Plot the data yi* in a histogram. This will reveal visually the distribution and skew in the data points from the proposed regression fit line.
- Obtain the mean <mu> and standard error <sigma> of the histogram values.
- Decide a cutoff in <sigma> to denote points as "outliers".
December 3, 2020 at 07:51 am - Permalink
In reply to I would hold on my proposed… by jjweimer
Thank you for your follow up. This is very helpful. I agree that the distribution of the yi-<m>*xi values will yield <b>. However, again I would like to minimize the influence of outliers so I am thinking of using the median of the resulting array rather than the mean. And yes, calculating all of the yi* values is a requisite step in identifying outliers. I have an advantage in that the input data wave is a time series with a fixed delta-x (original sample rate 2Khz, data resampled to 100 hz). Thus, one criterion for a physiologically relevant outlier could be successive yi* values greater than the cutoff for a minimum period of time. In any case, this has given me lots of ideas for how to proceed. I very much appreciate your input.
December 3, 2020 at 09:59 am - Permalink
Please contact me through support@wavemetrics.com and I will help you with the implementation.
December 3, 2020 at 10:08 am - Permalink