working on subsets of repeating data
epiphenom
Hi all, would welcome some help in the following. I have one wave containing subjects and another containing measures. Each subject has one or greater corresponding measure(s). On a per subject basis, I would like to calculate the max and min measures and highlight which subjects have a min-max range of greater than 400%.
This is what I have so far :
function screen()
variable i = 0 /// traverse main subjects wave
variable z = 0 ///traverse within each subject
wave measures
wave subjects
do
if (subjects[i] == subjects [i+1])
Make/O tempwave
tempwave[z] = measures[i]
z+=1
i+=1
else
wavestats/Q/R= [0,z] tempwave
if (((V_max - V_min)/V_min) > 4)
print subjects[i]
z+=0
i+=1
else
z = 0
i+=1
endif
endif
while (i < numpnts(subjects))
end
variable i = 0 /// traverse main subjects wave
variable z = 0 ///traverse within each subject
wave measures
wave subjects
do
if (subjects[i] == subjects [i+1])
Make/O tempwave
tempwave[z] = measures[i]
z+=1
i+=1
else
wavestats/Q/R= [0,z] tempwave
if (((V_max - V_min)/V_min) > 4)
print subjects[i]
z+=0
i+=1
else
z = 0
i+=1
endif
endif
while (i < numpnts(subjects))
end
You could use Sort for this as well. Your subjects seem to be already sorted in ascending order, but in any case, you can sort for both subject and measures, then easily grab the min and max values at the beginning and end of each sequence. First, do something like this:
Sort {subjects,measures},subjects,measures // untested
To find the 'seams' where the subject number changes (at the positions where subjects_dif != 0) then do
Extract/INDX subjects_dif, subjects_position, subjects_dif>0
// insert points for the first min and last max:
InsertPoints 0,1, subjects_position
subjects_position[numpnts(subjects_position)] = {numpnts(subject)-1}
subjects_position should then give you all the wave indices where the min/max values are stored in the order min_subject1, max_subject1, min_subject2, max_subject2 ...
This might not work so well if a subject only has one value, i.e., where the min and max are at the same position, I don't know. If you provide some test data we could play around more. You did not tell us what you would like to know, though. Is there a problem with your approach?
July 31, 2023 at 10:48 pm - Permalink
Consider using FindDuplicates to create a wave that contains one instance of each subject.
Use Extract to create a free wave of measures that correspond to a given subject.
Use WaveStats/M=1 to find maximum and minimum values from the free measures wave.
August 1, 2023 at 01:28 am - Permalink
Thank you for the responses. I'm having trouble getting the code to work for all instances (some subjects contain one measure while most contain more than one. I have shared the waves with data.
August 1, 2023 at 06:44 am - Permalink
Here is the approach mentioned by Tony:
FindDuplicates/Free/RN=subj_clean subjects
int i
for (i=0; i<numpnts(subj_clean); i++)
Extract/FREE measures, meas_temp, subjects == subj_clean[i]
WaveStats/M=1/Q meas_temp
if (((V_max - V_min)/V_min) > 4)
print subjects[i]
endif
endfor
end
Run via:
screen(measures, subjects)
August 1, 2023 at 09:08 am - Permalink
Thank you for the response, I am still working on it but it appears not to work. It's picking up subjects that don't have the 4X difference between min and max values.
August 1, 2023 at 10:28 am - Permalink
For 4x difference the equation should likely be ((V_max/V_min) > 4) not (((V_max - V_min)/V_min) > 4).
August 1, 2023 at 11:20 am - Permalink