Binary search on PRE-SORTED text waves
jamie
- I'd appreciate feedback on correctness/efficiency.
- If there is an Igor function that will do this already, I don't know about it. Maybe it could be a wish list item?
- FindLastTextOccurrence and FindFirstTextOccurrence could be combined, with another input parameter to select for first or last occurrence.
- As it stands, the match is case-insensitive. Another input parameter for case sensitivity of the search could be added to the function and passed to cmpStr; the user would have to ensure that the text wave was sorted the same way, case-insensitive or not.
//******************************************************************************************************
// Binary Searches for sorted text waves. Not exactly math utilities, but sorting is mathematical
// THESE ASSUME THAT THE TEXT WAVE IS ALREADY ALPHABETICALLY SORTED
// A function to calculate base 2 logarithms
STATIC CONSTANT log2 = 0.301029995664
Function logBase2 (theValue)
variable theValue
return log (theValue)/log2
end
//******************************************************************************************************
// finds the first occurrence of a text within a range of points in a SORTED text wave
// returns the point number of the last occurrence, or -1 times the position where thetext should be inserted if the text was not found
// Last Modified Feb 03 2012 by Jamie Boyd
function FindFirstTextOccurrence (theWave, thetext, startPos, endPos)
wave/T theWave // alpabetically sorted text wave.
string thetext // the text to find in this wave, matching the entire contents of the wave at that point
variable startPos, endPos // range over which to search
// limit start and end positions to possible values
startPos = max (0, startPos)
endPos = min (numPnts (theWave)-1, endPos)
variable iPos // the point to be compared
variable theCmp // the result of the comparison
variable firstPt =startPos , lastPt = endPos // variables the define the range oer which comparisons will be made
variable iComp, maxComp = ceil (logBase2 (endPos - StartPos + 1)) // sanity check to limit number of comparisons to theoretical max. In case wave is NOT sorted, e.g.
for (iComp =0; (firstPt <= lastPt) && (iComp <= maxComp); iComp +=1)
iPos = trunc ((firstPt + lastPt)/2)
theCmp = cmpStr (thetext, theWave [iPos])
if (theCmp ==0) //thetext is the same as theWave [iPos]
if ((iPos ==startPos) || (cmpStr (theText, theWave [iPos -1]) == 1)) // then iPos is the first occurence of thetext in theWave from startPos to endPos
return iPos
else // there are more copies of theText in theWave before iPos
lastPt = iPos-1
endif
elseif (theCmp == 1) //thetext is alphabetically after theWave [iPos]
firstPt = iPos +1
else // thetext is alphabetically before theWave [iPos]
lastPt =iPos -1
endif
endfor
if (iComp > maxComp)// max number of comparisons performed. Don't think this will ever happen
return NaN
else
if (cmpStr (thetext, theWave [iPos]) == -1)
return -(iPos-1) // After this position is where theText would fit alphabetically
else
return -iPos
endif
endif
end
//******************************************************************************************************
// finds the last occurence of a text within a range of points in a SORTED text wave
// returns the point number of the last occurrence, or -1 times the position where thetext should be inserted if the text was not found
// Last Modified Feb 03 2012 by Jamie Boyd
function FindLastTextOccurrence(theWave, thetext, startPos, endPos)
wave/T theWave // alpabetically sorted text wave.
string thetext // the text to find in this wave, matching the entire contents of the wave at that point
variable startPos, endPos // range over which to search.
// limit start and end positions to possible values
startPos = max (0, startPos)
endPos = min (numPnts (theWave)-1, endPos)
variable iPos // the point to be compared
variable theCmp // the result of the comparison
variable firstPt =startPos , lastPt = endPos // variables the define the range oer which comparisons will be made
variable iComp, maxComp = ceil (logBase2 (endPos - StartPos + 1)) // sanity check to limit number of comparisons to theoretical max. In case wave is NOT sorted, e.g.
for (iComp =0; (firstPt <= lastPt) && (iComp <= maxComp); iComp +=1)
iPos = trunc ((firstPt + lastPt)/2)
theCmp = cmpStr (thetext, theWave [iPos])
if (theCmp ==0) //thetext is the same as theWave [iPos]
if ((iPos ==endPos) || (cmpStr (theText, theWave [iPos +1]) == -1)) // then iPos is the last occurence of thetext in theWave from startPos to endPos
return iPos
else // there are more copies of theText in theWave after iPos
firstPt = iPos +1
endif
elseif (theCmp == 1) //thetext is alphabetically after theWave [iPos]
firstPt = iPos +1
else // thetext is alphabetically before theWave [iPos]
lastPt =iPos -1
endif
endfor
if (iComp > maxComp)// max number of comparisons performed. Don't think this will ever happen
return NaN
else
if (cmpStr (thetext, theWave [iPos]) == -1)
return -(iPos-1) // After this position is where theText would fit alphabetically
else
return -iPos
endif
endif
end
//******************************************************************************************************
// A test of the FindFirstTextOccurrence and FindLastTextOccurrence functions, using FindValue as a baseline
// both accuracy and speed are tested
// Last Modified Feb 03 2012 by Jamie Boyd
function testFindFirstLastText (maxN)
variable maxN
variable iPt, nPts = floor (logBase2 (maxN))
// output waves
make/o/n=(nPts) FindValueFirstTime_out, FindValueLastTime_out, FFTOtime_out, FLTOtime_out
setScale d 0,0 ,"s" FindValueFirstTime_out, FindValueLastTime_out, FFTOtime_out, FLTOtime_out
make/o/n=(nPts) FindValueFirstPos_out, FindValueLastPos_out, FFTOpos_out, FLTOpos_out
// make data wave
variable iN, iVal, valToN
make/o/T/n =(maxN) testTextWave
WAVE/T testTextWave
for (iN =0, iVal =0;iN < maxN; iVal +=1)
for (valToN = iN + ceil (logBase2 (iVal +2)); iN < valToN; iN +=1)
testTextWave [iN] = num2str (iVal)
endfor
endfor
// sort it, because CmpStr does not do alphanumeric comparison
Sort testTextWave, testTextWave
// varables for finding and timing
string toFind
variable fPos, myTimer, elapsedTime
for (iPt =0; iPt < nPts; ipt += 1)
// find increasingly further values in the wave
toFind = testTextWave [2^ (iPt + 2)]
// find first occurrence
// use find value
myTimer = startmstimer
FindValue/S=0 /TEXT=toFind/TXOP=4 testTextWave
elapsedTime = stopMSTimer(myTimer )/1e06
FindValueFirstTime_out [iPt] = elapsedTime
FindValueFirstPos_out [iPt] = V_Value
// use FindFirstTextOccurrence
myTimer = startmstimer
fPos = FindFirstTextOccurrence (testTextWave, toFind, 0, maxN)
elapsedTime = stopMSTimer(myTimer )
FFTOtime_out [iPt] = elapsedTime/1e06
FFTOpos_out [iPt] = fPos
// find last occurrence
// with findValue followed by a loop
myTimer = startmstimer
FindValue/S=0 /TEXT=toFind/TXOP=4 testTextWave
for (iN= V_Value +1;cmpStr (testTextWave [iN], toFind) ==0 && iN < maxN; iN +=1)
endfor
elapsedTime = stopMSTimer(myTimer )/1e06
FindValueLastTime_out [iPt] = elapsedTime
FindValueLastPos_out [iPt] = iN-1
// with FIndLastTextOccurrence
myTimer = startmstimer
fPos = FindLastTextOccurrence (testTextWave, toFind, 0, maxN)
elapsedTime = stopMSTimer(myTimer )
FLTOtime_out [iPt] = elapsedTime/1e06
FLTOpos_out [iPt] = fPos
endfor
display FFTOtime_out, FLTOtime_out vs FindValueFirstPos_out
modifygraph rgb = (0,0,0)
appendtoGraph FindValueFirstTime_out, FindValueLastTime_out vs FindValueLastPos_out
modifygraph mode =4
label bottom "Points in Text Wave"
label left "Time to Find Element (\\U)"
Legend/C/N=text0/F=0/B=1/A=MT
ModifyGraph log(bottom)=1
edit FindValueFirstPos_out, FFTOpos_out, FindValueLastPos_out, FLTOpos_out
end
// Binary Searches for sorted text waves. Not exactly math utilities, but sorting is mathematical
// THESE ASSUME THAT THE TEXT WAVE IS ALREADY ALPHABETICALLY SORTED
// A function to calculate base 2 logarithms
STATIC CONSTANT log2 = 0.301029995664
Function logBase2 (theValue)
variable theValue
return log (theValue)/log2
end
//******************************************************************************************************
// finds the first occurrence of a text within a range of points in a SORTED text wave
// returns the point number of the last occurrence, or -1 times the position where thetext should be inserted if the text was not found
// Last Modified Feb 03 2012 by Jamie Boyd
function FindFirstTextOccurrence (theWave, thetext, startPos, endPos)
wave/T theWave // alpabetically sorted text wave.
string thetext // the text to find in this wave, matching the entire contents of the wave at that point
variable startPos, endPos // range over which to search
// limit start and end positions to possible values
startPos = max (0, startPos)
endPos = min (numPnts (theWave)-1, endPos)
variable iPos // the point to be compared
variable theCmp // the result of the comparison
variable firstPt =startPos , lastPt = endPos // variables the define the range oer which comparisons will be made
variable iComp, maxComp = ceil (logBase2 (endPos - StartPos + 1)) // sanity check to limit number of comparisons to theoretical max. In case wave is NOT sorted, e.g.
for (iComp =0; (firstPt <= lastPt) && (iComp <= maxComp); iComp +=1)
iPos = trunc ((firstPt + lastPt)/2)
theCmp = cmpStr (thetext, theWave [iPos])
if (theCmp ==0) //thetext is the same as theWave [iPos]
if ((iPos ==startPos) || (cmpStr (theText, theWave [iPos -1]) == 1)) // then iPos is the first occurence of thetext in theWave from startPos to endPos
return iPos
else // there are more copies of theText in theWave before iPos
lastPt = iPos-1
endif
elseif (theCmp == 1) //thetext is alphabetically after theWave [iPos]
firstPt = iPos +1
else // thetext is alphabetically before theWave [iPos]
lastPt =iPos -1
endif
endfor
if (iComp > maxComp)// max number of comparisons performed. Don't think this will ever happen
return NaN
else
if (cmpStr (thetext, theWave [iPos]) == -1)
return -(iPos-1) // After this position is where theText would fit alphabetically
else
return -iPos
endif
endif
end
//******************************************************************************************************
// finds the last occurence of a text within a range of points in a SORTED text wave
// returns the point number of the last occurrence, or -1 times the position where thetext should be inserted if the text was not found
// Last Modified Feb 03 2012 by Jamie Boyd
function FindLastTextOccurrence(theWave, thetext, startPos, endPos)
wave/T theWave // alpabetically sorted text wave.
string thetext // the text to find in this wave, matching the entire contents of the wave at that point
variable startPos, endPos // range over which to search.
// limit start and end positions to possible values
startPos = max (0, startPos)
endPos = min (numPnts (theWave)-1, endPos)
variable iPos // the point to be compared
variable theCmp // the result of the comparison
variable firstPt =startPos , lastPt = endPos // variables the define the range oer which comparisons will be made
variable iComp, maxComp = ceil (logBase2 (endPos - StartPos + 1)) // sanity check to limit number of comparisons to theoretical max. In case wave is NOT sorted, e.g.
for (iComp =0; (firstPt <= lastPt) && (iComp <= maxComp); iComp +=1)
iPos = trunc ((firstPt + lastPt)/2)
theCmp = cmpStr (thetext, theWave [iPos])
if (theCmp ==0) //thetext is the same as theWave [iPos]
if ((iPos ==endPos) || (cmpStr (theText, theWave [iPos +1]) == -1)) // then iPos is the last occurence of thetext in theWave from startPos to endPos
return iPos
else // there are more copies of theText in theWave after iPos
firstPt = iPos +1
endif
elseif (theCmp == 1) //thetext is alphabetically after theWave [iPos]
firstPt = iPos +1
else // thetext is alphabetically before theWave [iPos]
lastPt =iPos -1
endif
endfor
if (iComp > maxComp)// max number of comparisons performed. Don't think this will ever happen
return NaN
else
if (cmpStr (thetext, theWave [iPos]) == -1)
return -(iPos-1) // After this position is where theText would fit alphabetically
else
return -iPos
endif
endif
end
//******************************************************************************************************
// A test of the FindFirstTextOccurrence and FindLastTextOccurrence functions, using FindValue as a baseline
// both accuracy and speed are tested
// Last Modified Feb 03 2012 by Jamie Boyd
function testFindFirstLastText (maxN)
variable maxN
variable iPt, nPts = floor (logBase2 (maxN))
// output waves
make/o/n=(nPts) FindValueFirstTime_out, FindValueLastTime_out, FFTOtime_out, FLTOtime_out
setScale d 0,0 ,"s" FindValueFirstTime_out, FindValueLastTime_out, FFTOtime_out, FLTOtime_out
make/o/n=(nPts) FindValueFirstPos_out, FindValueLastPos_out, FFTOpos_out, FLTOpos_out
// make data wave
variable iN, iVal, valToN
make/o/T/n =(maxN) testTextWave
WAVE/T testTextWave
for (iN =0, iVal =0;iN < maxN; iVal +=1)
for (valToN = iN + ceil (logBase2 (iVal +2)); iN < valToN; iN +=1)
testTextWave [iN] = num2str (iVal)
endfor
endfor
// sort it, because CmpStr does not do alphanumeric comparison
Sort testTextWave, testTextWave
// varables for finding and timing
string toFind
variable fPos, myTimer, elapsedTime
for (iPt =0; iPt < nPts; ipt += 1)
// find increasingly further values in the wave
toFind = testTextWave [2^ (iPt + 2)]
// find first occurrence
// use find value
myTimer = startmstimer
FindValue/S=0 /TEXT=toFind/TXOP=4 testTextWave
elapsedTime = stopMSTimer(myTimer )/1e06
FindValueFirstTime_out [iPt] = elapsedTime
FindValueFirstPos_out [iPt] = V_Value
// use FindFirstTextOccurrence
myTimer = startmstimer
fPos = FindFirstTextOccurrence (testTextWave, toFind, 0, maxN)
elapsedTime = stopMSTimer(myTimer )
FFTOtime_out [iPt] = elapsedTime/1e06
FFTOpos_out [iPt] = fPos
// find last occurrence
// with findValue followed by a loop
myTimer = startmstimer
FindValue/S=0 /TEXT=toFind/TXOP=4 testTextWave
for (iN= V_Value +1;cmpStr (testTextWave [iN], toFind) ==0 && iN < maxN; iN +=1)
endfor
elapsedTime = stopMSTimer(myTimer )/1e06
FindValueLastTime_out [iPt] = elapsedTime
FindValueLastPos_out [iPt] = iN-1
// with FIndLastTextOccurrence
myTimer = startmstimer
fPos = FindLastTextOccurrence (testTextWave, toFind, 0, maxN)
elapsedTime = stopMSTimer(myTimer )
FLTOtime_out [iPt] = elapsedTime/1e06
FLTOpos_out [iPt] = fPos
endfor
display FFTOtime_out, FLTOtime_out vs FindValueFirstPos_out
modifygraph rgb = (0,0,0)
appendtoGraph FindValueFirstTime_out, FindValueLastTime_out vs FindValueLastPos_out
modifygraph mode =4
label bottom "Points in Text Wave"
label left "Time to Find Element (\\U)"
Legend/C/N=text0/F=0/B=1/A=MT
ModifyGraph log(bottom)=1
edit FindValueFirstPos_out, FFTOpos_out, FindValueLastPos_out, FLTOpos_out
end
Forum
Support
Gallery
Igor Pro 9
Learn More
Igor XOP Toolkit
Learn More
Igor NIDAQ Tools MX
Learn More
FindValue
operationFebruary 2, 2012 at 06:14 pm - Permalink
The FindValue operation does not assume that the wave is sorted. As such, it must iterate through each point in the wave, i.e., no binary search goodness. With large enough N (probably far larger than I will ever use, but for the sake of argument, remember that Igor can make ginormous waves), even an operation built in compiled code (like FindValue) will take longer than a more efficient algorithm in slower-running Igor script.
Also, the FindValue operation does not give an easy way to find the last occurrence of a value in a wave. One could find the first matching value with FindValue and then iterate from there point by point with cmpStr, but that will get slower than a binary search pretty quickly.
Try this with maxN = 65536
variable maxN
make/o/n=(ln (maxN)) FindValueTime_out, FFTOtime_out, FindValuePos_out, FFTOpos_out, N_out
setScale d 0,0 ,"s", FFTOtime_out, FindValueTime_out
variable iN, N, iVal, nVal, stopVal, fPos
string toFind
variable myTimer, elapsedTime
// examine a geometric progression of N
for (N=2; N < maxN +1; N *= 2)
// make a text wave with increasingly longer series of the same number string
make/o/T/n =(N) testTextWave
for (iN =0, iVal =1;iN < N; iVal +=1)
for (stopVal = round (iN + 1 + ln (iVal)); iN < stopVal; iN +=1)
testTextWave [iN] = num2str (iVal)
endfor
endfor
// sort it, because CmpStr does not do alphanumeric comparison
Sort testTextWave, testTextWave
// find last value in wae
toFind = testTextWave [numPnts (testTextWave) -1]
// use find value
myTimer = startmstimer
FindValue/S=0 /TEXT=toFind/TXOP=4 testTextWave
elapsedTime = stopMSTimer(myTimer )/1e06
fPos = V_Value
FindValueTime_out [ln (N)] = elapsedTime
FindValuePos_out [ln (N)] = fPos
// use FindFirstTextOccurrence
myTimer = startmstimer
fPos = FindFirstTextOccurrence (testTextWave, toFind, 0)
elapsedTime = stopMSTimer(myTimer )
FFTOtime_out [ln (N)] = elapsedTime/1e06
FFTOpos_out [ln (N)] = fPos
N_out [ln (N)] = N
endfor
display FFTOtime_out vs N_out
modifygraph rgb = (0,0,0)
appendtoGraph FindValueTime_out vs N_out
modifygraph mode =4
label bottom "Points in Text Wave"
label left "Time to Find Element (\\U)"
ModifyGraph log(bottom)=1
Legend/C/N=text0/A=MT/J "\\s(FFTOtime_out) Find First Text Occurrence\r\\s(FindValueTime_out) Find Value"
end
On my computer (old and busted G4 powerbook) the 2 methods approach equality at N = 4096. At bigger N, FindFirstTextOccurrence is faster.
Dr. Jamie Boyd, Ph.D.
UBC In Vivo Imaging Core
February 2, 2012 at 10:35 pm - Permalink
This is a visualization of testfval with 2^18. This function tests best case performance for FindFirstTextOccurence.
The line
needs to be changed so that it compiles *and* it requires rtGlobals=1.
September 21, 2019 at 01:12 pm - Permalink