How to plot pdf function

Hi,
I have a set of data from which I wish to create the pdf and cdf. The data sets have NO normal distribution. I did the KStest (StatsKStest/ALPH/T=0 srcwavename,distwavename) which is independent from the distribution, and it was fine. I would like to create the adequate pdf and cdf as well. But something is wrong as the pdf doesn’t starts from 0 (at the Y axis) and accordingly the cdf doesn’t reach 1. I’ve used the commands below:
Make W_Hist
Histogram/P/B=4 srcwavename,W_Hist
Integrate/Meth=1 W_Hist/D=W_Int
Display W_Hist
Display W_Int

Thanks,
It would be helpful if you attached the actual data and results, and pointed out the specific problem you have.

I tried it with my own synthetic data set:
make/n=1000 junk=gnoise(1)
Make W_Hist
Histogram/P/B=4 junk,W_Hist
Integrate/Meth=1 W_Hist/D=W_Int
Display W_Hist
Display W_Int
SetAxis/A/E=1 left
edit W_Hist,W_Int

The Histogram command, with /B=4 chooses bins such that the first and last bins contain non-zero counts, but also contain the minimum and maximum values of the input data, so the output is pretty much guaranteed to not start at zero.

The integration ends with 0.999, which seem wrong but is a numerical artifact. Your Integrate command uses /METH=1, which chooses trapezoidal integration. While this is commonly the best choice for real data that represents some underlying smooth curve, I think it could be argued that it is not the best choice for a histogram, which represents the counts between the bin edge values. As such, each point in the histogram really represents a rectangular area, and /METH=0 or 2 would be better. Using /METH=1 gives a final value of 1.000 in my test case, but a non-zero starting value. /METH=2 adds one extra point to the output, has zero as the starting value and 1 as the ending value.

To understand these differences, read the documentation for the Integrate command, especially the Details section. Take with a grain of salt the statement there that says, "Trapezoidal integration is a more accurate method of computing the integral than rectangular integration." As I said above, the best method depends on what the data actually represent.

I have posted an Igor experiment file with my example.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
HistogramTest.pxp (50.46 KB)
Dear John,

Thank you very much for your reply, it was a great help.
I tried to create pdf at different ways now, it seems the best is, if I set B=1 instead of 4. I understand that you say the output shouldn’t start at zero But if I use the P flag: does it mean that I normalize the histo, doesn’t it? ‘Normalizes the histogram as a probability distribution function, and shifts wave scaling so that data correspond to the bin centers” so I expect the data starts from zero and end at zero, meaning that probably I have no value less and bigger than those points.
Also it says that I should use the Meth=0 or 1 but not the 2 in the next step. “When using the results with Integrate, you must use /METH=0 or 1” If i use Meth=0 instead of 1, the data indeed reach the 1 but do not start from zero. The cumulative probability function should be between 0 and 1, should't be?
I have attached the corresponding Igor file. I would like to compare the two ISI distributions (wave0, wave1) (with KS test) and also to compare the two corresponding pdf and cdf. I also attached the graph of the pdf and cdf were created in matlab from the same data set.

Thank you very much for your help,
HistogramTest2.pxp (173.77 KB) matlabcdf.pdf (11.11 KB) matlabpdf.pdf (10.27 KB)
Marti wrote:
I understand that you say the output shouldn’t start at zero But if I use the P flag: does it mean that I normalize the histo, doesn’t it? ‘Normalizes the histogram as a probability distribution function, and shifts wave scaling so that data correspond to the bin centers” so I expect the data starts from zero and end at zero, meaning that probably I have no value less and bigger than those points.

The /P flag means only that the numeric integral (when performed using rectangular integration) will be equal to 1.0. It makes no statement about the initial value. An alternate way to express that integral is that sum(histwave)*deltaX(histwave)=1. Here I have done that in your posted experiment file using one of your histogram waves:
print sum(W_Hist04)*deltax(W_Hist04)
  1

Quote:
Also it says that I should use the Meth=0 or 1 but not the 2 in the next step. “When using the results with Integrate, you must use /METH=0 or 1” If i use Meth=0 instead of 1, the data indeed reach the 1 but do not start from zero. The cumulative probability function should be between 0 and 1, should't be?

The exact quote from the documentation for the Histogram operation is "When using the results with Integrate, you must use /METH=0 or /METH=1 because the trapezoidal approximation will give an error whose magnitude depends on the distribution function starting value." This must be an error in our documentation, because /METH=1 chooses trapezoidal integration (which this quote says you should not use). It should say that you should use either /METH=0 or /METH=2. My apologies for the error in our documentation; I have entered a report in our documentation database.
Quote:
I have attached the corresponding Igor file. I would like to compare the two ISI distributions (wave0, wave1) (with KS test) and also to compare the two corresponding pdf and cdf. I also attached the graph of the pdf and cdf were created in matlab from the same data set.

I see in Graph3 a CDF that is clearly not correct, but looking through your history it appears that it was generated using Integrate/METH=1. We now realize that this is incorrect. The difference between /METH=0 and /METH=2 is the number of result points (N+1 for /METH=2), the starting value (zero in the first point for /METH=2). I believe that /METH=2 is more correct mathematically; /METH=0 is retained for backward compatibility and for the convenience of a result with N points instead of N+1 points.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
Hi John,

I see, thanks for the correction!
I've tried the final version on other pairs of data set and it seems to be good.
Thank you very much for your help!
Marti