function speed-up: extracting point numbers
ChrLie
Now I want to find the point/index numbers of AA where the values of BB occur. Finding means I duplicate AA and set a point to 1 if the number is within BB or 0 if not.
The function looks like:
function FindIt(aa, bb)
wave aa, bb
variable timer = startMStimer
Make/O/N=(numPnts(aa)) cc=0
variable i
for(i=0; i<numPnts(bb); i+=1)
Multithread cc += aa == bb[i] ? 1 : 0
endfor
Printf "Time required: %f seconds\r", (stopMStimer(timer)/1e6)
end
wave aa, bb
variable timer = startMStimer
Make/O/N=(numPnts(aa)) cc=0
variable i
for(i=0; i<numPnts(bb); i+=1)
Multithread cc += aa == bb[i] ? 1 : 0
endfor
Printf "Time required: %f seconds\r", (stopMStimer(timer)/1e6)
end
However it is annoyingly slow. For the wave lengths given above it takes on my work computer about 25s.
I'm convinced there is a better way but I just don't see it at the moment!
Any ideas are highly appreciated!
At least your code does not extract all bb indizes of the contents of aa.
So I can only guess what you want to do, so here is my try:
SetRandomSeed 123456
Make/U/I/O/N=60000 aa=NaN
Make/U/I/O/N=8000 bb=NaN
aa = abs(enoise(DimSize(bb,0)-1))
bb = DimSize(bb,0)-p
variable timer
timer = startMStimer
FindIt(aa,bb)
Printf "Time required: %f seconds\r", (stopMStimer(timer)/1e6)
timer = startMStimer
FindIt_tb(aa,bb)
Printf "Time required: %f seconds\r", (stopMStimer(timer)/1e6)
End
function FindIt(aa, bb)
wave aa, bb
Make/O/N=(numPnts(aa)) cc=0
variable i
for(i=0; i<numPnts(bb); i+=1)
Multithread cc += aa == bb[i] ? 1 : 0
endfor
end
// Assumes -1 is not a valid entry in bb
function FindIt_tb(aa, bb)
wave aa, bb
Duplicate/O aa,cc
variable i
variable size = DimSize(aa,0)
for(i=0; i<size; i+=1)
FindValue/I=(aa[i]) bb
cc[i] = V_value
endfor
end
Numbers from my machine are 8.7s for your solution and 2.4s for my try.
In case this is still too slow you would need to give more info about the situation. Is bb constant? If yes, you could try to sort bb and use a binary search[1].
Do you need the contents of aa afterwards? If no, you can just overwrite the contents of aa and skip creating cc.
[1]: http://www.igorexchange.com/node/2775
November 1, 2013 at 11:20 am - Permalink
thanks for your reply!
I have attached an experiment with real world data to make it more clear.
Wave AA and BB are sorted and are positive integers. Wave AA can contain multiple entries of e.g. BB[3] - I think this is why I can't use FindValue, because it will only find the first entry of BB[3] (?). If you execute FindIt(aa, bb) then e.g. cc[0] and cc[211] will be 1 because bb[0] and bb[1] are present in aa at these point numbers.
This is what I want to get out, or a wave which contains the actual point numbers, e.g. {0, 211, ...}
November 1, 2013 at 01:15 pm - Permalink
And as I see you are not interested in the index but just if zero I tried with
wave aa, bb
Duplicate/O aa,cc_tb
variable timer = startMStimer
variable i
variable size = DimSize(aa,0)
for(i=0; i<size; i+=1)
FindValue/I=(aa[i]) bb
cc_tb[i] = V_Value
endfor
// convert to 1 and 0
cc_tb = (cc_tb[p] == -1 ? 0 : 1)
Printf "Time required: %f seconds\r", (stopMStimer(timer)/1e6)
end
November 1, 2013 at 01:53 pm - Permalink
I can also use a wave with the point numbers as output and use something like (I still iterate over bb):
wave aa, bb
variable timer = startMStimer
Make/I/U/O/N=1 cc=0 //cc[0]=0 is always true for my aa and bb
variable size = numpnts(bb)
Variable start = 1
variable i=0, j=1, tmp
for(i=0; i<size; i+=1)
FindValue/S=(start) /I=(bb[i]) aa
if (V_value != -1)
InsertPoints j, 1, cc
cc[j] = V_Value
j+=1
i-=1
Start = V_Value+1
endif
endfor
Printf "Time required: %f seconds\r", (stopMStimer(timer)/1e6)
end
I need to check if this gives me correct data at the boundaries but I get a speedup compared to my initial version by a factor 10. This is very nice! Thanks!
November 1, 2013 at 02:58 pm - Permalink
Suppose you wrote the following expression
cc=0
...
Variable b1,b2,b3
b1=bb[i]
b2=bb[i+1]
b3=bb[i+2]
...
MatrixOp/O cc=cc+equal(aa,b1)+equal(aa,b2)+equal(aa,b3)...
and executed this in a loop.
Note also that your loop should ideally use a fixed variable rather than calling a function. For N equal() calls I'd use something like
for(i=0;i<NP;i+=N)
Hope this helps,
AG
November 1, 2013 at 04:36 pm - Permalink
Perhaps you can create the output wave at full size, initialize it to NaN, and then zap any NaNs at the end using WaveTransform zapNaNs.
November 1, 2013 at 06:39 pm - Permalink
Thomas, I should have written this at the beginning: Note that WaveMin and WaveMax of aa and bb are identical. This is why iterating over bb is sufficient. What I do is finding the values of bb in an image M_aa. Only a small proportion of M_aa falls within WaveMin(bb) and WaveMax(bb). This is why I redimension M_aa into aa, duplicate and create a point number wave pn=p and then Sort aa aa,pn. Then I can clip aa to the values of interest. Using the output from above and the clipped pn, I can reconstruct the pixel coordinates. This should be more efficient than ImageThreshold, since only a small proportion of the image needs attention.
My last version version still had some issues, but now it's running fine. I also got rid of InsertPoints, but it only made it marginally faster.
Thanks for all suggestions and pointing me to FindValue!
November 3, 2013 at 11:35 pm - Permalink
Ah that's the missing information which clarifies the differences between our approaches.
November 4, 2013 at 01:49 am - Permalink