Judging randomness
thomas_braun
I'm currently diving a bit into statistics.
The reason is I want to create a noise distribution with some frequencies removed.
But for that I would like to compare the randomness of the noise before and after filtering so that I can be sure not to mess up everything.
The following code tries to do that:
Function GetUserCDF(inX) : CDFFunc
Variable inX
return StatsNormalCDF(inX,0,1)
End
Function NewStuff()
Make/O/N=1000 data
SetRandomSeed/BETR=1 4711
data = gnoise(1, 1)
StatsKSTest/CDFF=GetUserCDF data
print "#############################"
SetRandomSeed/BETR=1 4711
data = gnoise(1, 2)
StatsKSTest/CDFF=GetUserCDF data
print "#############################"
// data has 200kHz resolution
// cut off above 30Hz
FilterFIR/LO={30/200e3, 30/200e3, 999} data
StatsKSTest/CDFF=GetUserCDF data
print "#############################"
End
Variable inX
return StatsNormalCDF(inX,0,1)
End
Function NewStuff()
Make/O/N=1000 data
SetRandomSeed/BETR=1 4711
data = gnoise(1, 1)
StatsKSTest/CDFF=GetUserCDF data
print "#############################"
SetRandomSeed/BETR=1 4711
data = gnoise(1, 2)
StatsKSTest/CDFF=GetUserCDF data
print "#############################"
// data has 200kHz resolution
// cut off above 30Hz
FilterFIR/LO={30/200e3, 30/200e3, 999} data
StatsKSTest/CDFF=GetUserCDF data
print "#############################"
End
This gives me
•Newstuff()
alpha = 0.05
N = 1000
D = 0.0259096
Critical = 0.0427766
PValue = 0.256714
#############################
alpha = 0.05
N = 1000
D = 0.0305389
Critical = 0.0427766
PValue = 0.15174
#############################
alpha = 0.05
N = 1000
D = 0.500762
Critical = 0.0427766
PValue = 3.63349e-265
#############################
And from http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm I would say, because D is larger than Critical after the filtering, that the hypothesis that the numbers come from a gaussian distribution is now rejected.
Does that make remotely sense?
Can I expect to keep the randomness with filtering?
Let me propose a different approach, if your goal is to achieve filtered Gaussian noise. I have done something roughly similar but for specific cases of white noise filtered by various causal impulse responses (which act as smoothing, or low-pass filters). If you can cast your frequency removal in the form of a causal impulse response, then find its auto-correlation function. Use the auto-correlation function to calculate a large, symmetric covariance matrix, which can be used as described in an Exchange posting to generate correlated Gaussian variates (http://www.igorexchange.com/node/5141). That example was for a bivariate distribution, but I extended the concept to a much larger multi-variate dimension size matching the row (or column) dimension of the covariance matrix. Finally, feed in uncorrelated white Gaussian noise and use Bech's Cholesky decomposition procedure with the large covariance matrix to generate filtered Gaussian noise.
There may be problems involving the size of the matrix you require, the ease of finding the Cholesky decomposition, and possible edge effects. On the other hand, if you can show your frequency filter is realizable, any such linear filtering (if properly implemented!) must preserve Gaussian randomness.
December 22, 2016 at 10:45 am - Permalink
If you are looking for Gaussian distributions you may want to use StatsJBTest. In general I would stay away from the KS tests because there are complications in figuring out proper critical values etc. Read Stephens reference for more information or contact me directly for recent practical examples.
A.G.
WaveMetrics, Inc.
December 22, 2016 at 10:50 am - Permalink
@Stephen: I'm not sure I can follow your approach 100%. I need to filter with one or multiple low/high-pass filters large waves (>10^5 points). I guess the symmetric covariance matrix is squared the size of the required points?
@AG: Thanks I'll look into that.
December 27, 2016 at 01:09 pm - Permalink
Yes, that is correct. For such large waves the method I suggested would be impractical. The covariance matrix storage requires only upper (or lower) triangular storage because of symmetry, but even so the Cholesky numerical computation would probably have many difficulties. I repeat my earlier claim: if you can prove your filtering is a realizable linear operation on the original wave, the filtered wave statistics must remain Gaussian. The single-point probability distribution function will be Normal. However, the filtered correlation properties will change.
December 28, 2016 at 02:43 am - Permalink