Sort with incomplete lists

Hello!

 

I have datasets, which I would like to average. Unfortunately, they are not evenly long, always depends on measurement quality (electrophysiology).

So I may have V_rev_day1, V_rev_day2 and so on and I have textwaves, W_cond_day1, W_cond_day2..., containing text about the solutions applied. These lists are sorted alphabetically, but it may be, that on day1 I could measure cond1, cond2, cond4, and on day2 I could measure cond2 and cond4.

What is now the nicest way to (in the one hand, for displaying) sort the waves so that not-measured conditions give a NaN in the V_rev, and that on the other hand, I can average them properly?

 

Yours, Dominik

I believe the Waves Average package (see Analysis->Packages->Average Waves) will handle waves that have NaNs in them. By "handle" I mean that if you have 5 waves and one has NaN in a row, the average for that row will ignore the NaN and give you the average of 4 values.

Is v_rev_day1 a wave storing along the columns or rows cond1 , cond2, etc.?

If so, the best thing would be to set dimension labels with the respective conditions and then use finddimlabel or getdimlabel to find if this particular label exist as you are averaging iteratively over the particular condition.

make/o/n=(3,2) w_test = p+q
setdimlabel 1, 0, cond1, w_test
setdimlabel 1, 1, cond2, w_test
print finddimlabel(w_test, 1, "cond1")
  0
print finddimlabel(w_test, 1, "cond2")
  1

best,

_sk

I attach a two-version example file with two datasets, where one dataset is more complete than the other and I manually inserted points where missing. Maybe the problem gets more obvious then. I would have a W_Sol_[date] wave which contains all possible conditions, so I could sort along this wave, let's call it W_Sol_basis. But I want to have NaN in the less complete W_Sol_07_12_06 and the respective W_Vrev_[date] and W_Vzero_[date] rows, if an entry of the W_Sol_basis wave does not exist in the W_Sol_[date] wave.

So, I want to come from "Ex_data_before.pxp" to "Ex_data_after.pxp" automatically.

(I don't really understand, what the dimlabel really does.)

Yours, Dominik

The setdimlabel was for data regularization. That being said, I would use regularized standard for the date, i.e. ISO 8601: 20180913 (YYYYMMDD), where single digit numbers are always preceded by a zero.

If there is no row or column which should exist in place of (in your example) ex_0c then there cannot be a NaN or any value there, because the storage site for this NaN does not exist in the computer memory storing your wave. This again points to data regularization.

So if I understand you correctly, I can recommend you create a solution wave, w_sol_YYYYMMDD, which has all fields already accounted for, regardless of whether on this particular day this experiment was carried out or not and instantiate the wave upon creation to NaN, something like this:

make/o/n=(6,2) w_sol_20071206 = nan
make/o/n=(6,2) w_sol_20080703 = nan

You can choose to label the columns for easy addressing or remember which index corresponds to which experiment/ condition:

setdimlabel 0, 0, acetat, w_sol_20071206
setdimlabel 0, 1, cid100, w_sol_20071206
setdimlabel 0, 2, ex0a, w_sol_20071206
setdimlabel 0, 3, ex0b, w_sol_20071206
setdimlabel 0, 4, ex0c, w_sol_20071206
setdimlabel 0, 5, sulfat, w_sol_20071206
setdimlabel 1, 0, rev, w_sol_20071206
setdimlabel 1, 1, zero, w_sol_20071206

Then address fields like so:

print w_sol_20071206[%acetat][%rev]
  -10.0742
// equivalent to
print w_sol_20071206[0][0]
  -10.0742
// trying an empty field
print w_sol_20071206[%cid100][%zero]
  NaN

Once your data is in shape, it is all about what you want to do: conditional statements, sorting, summation, etc.

edit:

btw, if you create a base wave, like you suggested, and set the index labels to the corresponding conditions, any duplication of the base wave will also carry over the index labels as well, in other words:

make/o/n=(6,2) w_sol_base = nan

setdimlabel 0, 0, acetat, w_sol_base
setdimlabel 0, 1, cid100, w_sol_base
setdimlabel 0, 2, ex0a, w_sol_base
setdimlabel 0, 3, ex0b, w_sol_base
setdimlabel 0, 4, ex0c, w_sol_base
setdimlabel 0, 5, sulfat, w_sol_base
setdimlabel 1, 0, rev, w_sol_base
setdimlabel 1, 1, zero, w_sol_base

duplicate/o w_sol_base, w_sol_20180913

print w_sol_20180913[%acetat][%rev]
  NaN

best,

_sk

Dear _sk,

I'm halfway done, and I'm sure it will work this way. :) Just hoping that Pareto will shut up this time. ;)

Yours, Dominik

 

Edit: Yes, it worked, see screenshot :)

Key parts are:

    for(dim=0;dim<=(exp_max-1);dim+=1)
           
        sol_str=T_sol[dim]
        setdimlabel 0,dim, $sol_str, T_sol,fill_rev, fill_zero
    endfor

and

        for(dim=1;dim<=(exp_max-1);dim+=1)
           
            sol_str=dum_sol[dim]
//          print sol_str
            setdimlabel 0,dim, $sol_str, $trans_str
            dim_str= getdimlabel ($trans_str, 0, dim)
            dim_set=finddimlabel(T_sol, 0,dim_str)
            dim_find=finddimlabel($trans_str, 0,dim_str)
//          print dim_str, dim_set
//          print dum_rev
            fill_rev[dim_set]=dum_rev[dim_find]
   
        endfor 

(ignore the 0 in the first and the 1 in the second snippet, that's just due to input wave format).

Big thank you! :D

screen_1.PNG (19.13 KB)