Averaging data according to categories
jcor
An example scenario: you have measured 10 different calibration solutions, of concentration 10, 20, 30, ... 100 mM, and you end up with waves like:
Category Measurement
10 10.1
10 9.8
10 10.2
... ...
20 19.6
20 19.8
20 20.4
... ...
10 10.1
10 9.8
10 10.2
... ...
20 19.6
20 19.8
20 20.4
... ...
Now you want to determine your average and standard error for each solution, so you want the output:
Category Avg StErr
10 10.1 0.2
20 19.9 0.1
... ...
100 99.8 2.1
10 10.1 0.2
20 19.9 0.1
... ...
100 99.8 2.1
So far, I've found two solutions to this, which I'll post. But one is very slow (it uses Extract) and the other has memory problems (it uses a temporary matrix to sort the data, then averages each column).
I am thinking of avoiding memory problems by limiting the size of the temporary matrix, or making it into an 8-bit unsigned integer wave and then multiplying it by the data wave to get each set of data to average. But I'm curious whether anyone has a better solution?
I would recommend the use of several waves, 2D-Waves or maybe even data folders for the different categories.
If the number of measurements is constant, you might read your data as a 1D wave and redimension it to get a 2D wave. In this case you might use the imagestats function to get your statistics.
If this is not possible you could run a loop over your categories (either you know them or extract them: a loop over you first column adding a new value to the category wave) and mask out all the wrong categories:
variable CatValue
wave Measurement, Category, Dummy
duplicate /O Measurement, Dummy
for (CatValue=10;CatValue<21;CatValue+=10)
Dummy= (Category==CatValue) ? Measurement : NaN
print "Category: "+num2str(CatValue)
wavestats dummy
endfor
end
The "Dummy= " assignment might even be used in combination with multithread in a procedure to speed it up for large(!) data sets.
Good data management is half the data treatment -- in my opinion.
If you are unfamiliar with the use of the commands please have a look at the manual.
Looking forward to other solutions,
HJ
January 22, 2015 at 12:11 pm - Permalink
http://www.igorexchange.com/node/6258
... conceptually similar to HJDrescher's, but allows the input of an unspecified number of categories.
I tried an alternative algorithm, which tried to avoid memory overflow errors, but it was an order slower than this one. So this one only includes a simple check for memory overflow problems, and otherwise the user will have to use a smaller number of data or keys (categories).
January 23, 2015 at 04:43 am - Permalink