Sort with incomplete lists
d_lenz
Hello!
I have datasets, which I would like to average. Unfortunately, they are not evenly long, always depends on measurement quality (electrophysiology).
So I may have V_rev_day1, V_rev_day2 and so on and I have textwaves, W_cond_day1, W_cond_day2..., containing text about the solutions applied. These lists are sorted alphabetically, but it may be, that on day1 I could measure cond1, cond2, cond4, and on day2 I could measure cond2 and cond4.
What is now the nicest way to (in the one hand, for displaying) sort the waves so that not-measured conditions give a NaN in the V_rev, and that on the other hand, I can average them properly?
Yours, Dominik
I believe the Waves Average package (see Analysis->Packages->Average Waves) will handle waves that have NaNs in them. By "handle" I mean that if you have 5 waves and one has NaN in a row, the average for that row will ignore the NaN and give you the average of 4 values.
September 11, 2018 at 03:40 pm - Permalink
Is
v_rev_day1
a wave storing along the columns or rowscond1
,cond2
, etc.?If so, the best thing would be to set dimension labels with the respective conditions and then use
finddimlabel
orgetdimlabel
to find if this particular label exist as you are averaging iteratively over the particular condition.•setdimlabel 1, 0, cond1, w_test
•setdimlabel 1, 1, cond2, w_test
•print finddimlabel(w_test, 1, "cond1")
0
•print finddimlabel(w_test, 1, "cond2")
1
best,
_sk
September 12, 2018 at 01:03 am - Permalink
I attach a two-version example file with two datasets, where one dataset is more complete than the other and I manually inserted points where missing. Maybe the problem gets more obvious then. I would have a W_Sol_[date] wave which contains all possible conditions, so I could sort along this wave, let's call it W_Sol_basis. But I want to have NaN in the less complete W_Sol_07_12_06 and the respective W_Vrev_[date] and W_Vzero_[date] rows, if an entry of the W_Sol_basis wave does not exist in the W_Sol_[date] wave.
So, I want to come from "Ex_data_before.pxp" to "Ex_data_after.pxp" automatically.
(I don't really understand, what the dimlabel really does.)
Yours, Dominik
September 13, 2018 at 12:41 am - Permalink
The setdimlabel was for data regularization. That being said, I would use regularized standard for the date, i.e. ISO 8601: 20180913 (YYYYMMDD), where single digit numbers are always preceded by a zero.
If there is no row or column which should exist in place of (in your example) ex_0c then there cannot be a NaN or any value there, because the storage site for this NaN does not exist in the computer memory storing your wave. This again points to data regularization.
So if I understand you correctly, I can recommend you create a solution wave, w_sol_YYYYMMDD, which has all fields already accounted for, regardless of whether on this particular day this experiment was carried out or not and instantiate the wave upon creation to NaN, something like this:
make/o/n=(6,2) w_sol_20080703 = nan
You can choose to label the columns for easy addressing or remember which index corresponds to which experiment/ condition:
setdimlabel 0, 1, cid100, w_sol_20071206
setdimlabel 0, 2, ex0a, w_sol_20071206
setdimlabel 0, 3, ex0b, w_sol_20071206
setdimlabel 0, 4, ex0c, w_sol_20071206
setdimlabel 0, 5, sulfat, w_sol_20071206
setdimlabel 1, 0, rev, w_sol_20071206
setdimlabel 1, 1, zero, w_sol_20071206
Then address fields like so:
-10.0742
// equivalent to
•print w_sol_20071206[0][0]
-10.0742
// trying an empty field
•print w_sol_20071206[%cid100][%zero]
NaN
Once your data is in shape, it is all about what you want to do: conditional statements, sorting, summation, etc.
edit:
btw, if you create a base wave, like you suggested, and set the index labels to the corresponding conditions, any duplication of the base wave will also carry over the index labels as well, in other words:
setdimlabel 0, 0, acetat, w_sol_base
setdimlabel 0, 1, cid100, w_sol_base
setdimlabel 0, 2, ex0a, w_sol_base
setdimlabel 0, 3, ex0b, w_sol_base
setdimlabel 0, 4, ex0c, w_sol_base
setdimlabel 0, 5, sulfat, w_sol_base
setdimlabel 1, 0, rev, w_sol_base
setdimlabel 1, 1, zero, w_sol_base
duplicate/o w_sol_base, w_sol_20180913
print w_sol_20180913[%acetat][%rev]
NaN
best,
_sk
September 13, 2018 at 01:56 am - Permalink
Dear _sk,
I'm halfway done, and I'm sure it will work this way. :) Just hoping that Pareto will shut up this time. ;)
Yours, Dominik
Edit: Yes, it worked, see screenshot :)
Key parts are:
sol_str=T_sol[dim]
setdimlabel 0,dim, $sol_str, T_sol,fill_rev, fill_zero
endfor
and
sol_str=dum_sol[dim]
// print sol_str
setdimlabel 0,dim, $sol_str, $trans_str
dim_str= getdimlabel ($trans_str, 0, dim)
dim_set=finddimlabel(T_sol, 0,dim_str)
dim_find=finddimlabel($trans_str, 0,dim_str)
// print dim_str, dim_set
// print dum_rev
fill_rev[dim_set]=dum_rev[dim_find]
endfor
(ignore the 0 in the first and the 1 in the second snippet, that's just due to input wave format).
Big thank you! :D
September 13, 2018 at 05:18 am - Permalink
In reply to Dear _sk, I'm halfway done,… by d_lenz
I am glad you managed to make it work.
I must say that I don't understand your code, but as long as _you_ know what it does, it's okay.
best,
_sk
September 13, 2018 at 07:28 am - Permalink