Q: How to normalize the histogram base all total count?
yuja
I have a set of data that i wish to analyze using the histogram function, and I would like the output to be the probability (i.e. number of count in each bin divided by the total number of points in the data set) instead of counts. I've tried the option of "Normalize Result to Probability Density", but apparently this function is normalizing the area underneath the histogram to 1.
Can someone show me the easiest way to normalize the histogram to probability in Igor?
Thanks,
Yu-Ja
A probability density function has an area of 1, so if that's not what you want, I think you want something other than probability. Please tell us exactly what you want.
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
February 21, 2012 at 09:29 am - Permalink
I would like to make the sum of the probability to be 1 instead of the area.
For example, if I have a set of data [ 0.1 0.1 0.1 0.2] and analyze them using the histogram function with 2 bins, I would like to see the output to be 0.75 (3/4) for the first bin and 0.25 (1/4) for the second bin.
I hope that clarify the confusion.
Thanks again!
Yu-Ja
February 21, 2012 at 09:40 am - Permalink
wave0 = gnoise(1) // this creates 10,000 samples of Gaussian distributed noise, std.dev.=1
Make/O/N=200 wave0_Hist
Histogram/C/B={-4,0.04,200} wave0, wave0_Hist
Display wave0_Hist
print sum(wave0_Hist)
The result is 10000, the number of "counts" (points in the wave). Now if you do the same Histogram but select "Normalize Result to Probablility Density"
print area(wave0_Hist)
The result is close to the expected unity result for integrating a Probability Density Function.
Remenber that the horizontal x-axis of the Histogram should span the range of vertical (y) values in the original data set. Part of the confusion is that you are neglecting the x-scaling of the Histogram wave in your "hand-made" normalization. To do your own normalization properly (without using the /P flag) you have to divide the histogram wave by the number of points, but also multiply by the number of bins per unit amplitude. This is a complicated way to get the same result as using the /P (probability density) flag.
February 21, 2012 at 10:22 am - Permalink
wave0_Hist /= sum(wave0_Hist)
February 22, 2012 at 05:56 am - Permalink
(1) Doing the division as you indicated caused some strange effects for me (IP 6.22A, Win Vista). Finding the sum first, and then dividing worked OK, but does not give the correct p.d.f. for reasons I stated above involving the x-axis scaling and bin density.
(2) I did the /P histogram and plotted it on the same graph (but right-hand axis, blue). Its area is close to one:
0.9999
The left-hand axis plot (red) using the sum division is not properly normalized for a p.d.f.
February 22, 2012 at 07:28 am - Permalink
In retrospect, this behavior is not strange. In dividing
Wave0_Hist /= sum(Wave0_Hist)
each point on the LHS is being divided by a new and different point on the RHS as Wave0_Hist changes.February 22, 2012 at 08:32 am - Permalink
wave0_Hist /= temp
As pointed out by s.r.chinn, that is because if you write
wave0_hist /= sum(wave0_hist)
then the right side is doing a sum of a constantly changing left side.
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
February 22, 2012 at 09:19 am - Permalink
With regard to the calculation, it seemed that the original poster wanted to express each bin as a fraction of the total counts. That should be the end result of my suggestion ... when implimented correctly.
February 22, 2012 at 10:26 am - Permalink
make/O/N=4 wtest = {0.1, 0.1, 0.1, 0.2}
make/O/N=2 whist
histogram/B = {0.05, 0.1, 2} wtest, whist
whist /= numpnts(wtest) // each point is a probability of that bin
end
The customer is always right, but the request to "normalize the histogram to probability " had an ambiguous meaning. I think this horse is dead now.
February 22, 2012 at 11:28 am - Permalink