
Binary search on PRE-SORTED text waves

jamie
- I'd appreciate feedback on correctness/efficiency.
- If there is an Igor function that will do this already, I don't know about it. Maybe it could be a wish list item?
- FindLastTextOccurrence and FindFirstTextOccurrence could be combined, with another input parameter to select for first or last occurrence.
- As it stands, the match is case-insensitive. Another input parameter for case sensitivity of the search could be added to the function and passed to cmpStr; the user would have to ensure that the text wave was sorted the same way, case-insensitive or not.
//****************************************************************************************************** // Binary Searches for sorted text waves. Not exactly math utilities, but sorting is mathematical // THESE ASSUME THAT THE TEXT WAVE IS ALREADY ALPHABETICALLY SORTED // A function to calculate base 2 logarithms STATIC CONSTANT log2 = 0.301029995664 Function logBase2 (theValue) variable theValue return log (theValue)/log2 end //****************************************************************************************************** // finds the first occurrence of a text within a range of points in a SORTED text wave // returns the point number of the last occurrence, or -1 times the position where thetext should be inserted if the text was not found // Last Modified Feb 03 2012 by Jamie Boyd function FindFirstTextOccurrence (theWave, thetext, startPos, endPos) wave/T theWave // alpabetically sorted text wave. string thetext // the text to find in this wave, matching the entire contents of the wave at that point variable startPos, endPos // range over which to search // limit start and end positions to possible values startPos = max (0, startPos) endPos = min (numPnts (theWave)-1, endPos) variable iPos // the point to be compared variable theCmp // the result of the comparison variable firstPt =startPos , lastPt = endPos // variables the define the range oer which comparisons will be made variable iComp, maxComp = ceil (logBase2 (endPos - StartPos + 1)) // sanity check to limit number of comparisons to theoretical max. In case wave is NOT sorted, e.g. for (iComp =0; (firstPt <= lastPt) && (iComp <= maxComp); iComp +=1) iPos = trunc ((firstPt + lastPt)/2) theCmp = cmpStr (thetext, theWave [iPos]) if (theCmp ==0) //thetext is the same as theWave [iPos] if ((iPos ==startPos) || (cmpStr (theText, theWave [iPos -1]) == 1)) // then iPos is the first occurence of thetext in theWave from startPos to endPos return iPos else // there are more copies of theText in theWave before iPos lastPt = iPos-1 endif elseif (theCmp == 1) //thetext is alphabetically after theWave [iPos] firstPt = iPos +1 else // thetext is alphabetically before theWave [iPos] lastPt =iPos -1 endif endfor if (iComp > maxComp)// max number of comparisons performed. Don't think this will ever happen return NaN else if (cmpStr (thetext, theWave [iPos]) == -1) return -(iPos-1) // After this position is where theText would fit alphabetically else return -iPos endif endif end //****************************************************************************************************** // finds the last occurence of a text within a range of points in a SORTED text wave // returns the point number of the last occurrence, or -1 times the position where thetext should be inserted if the text was not found // Last Modified Feb 03 2012 by Jamie Boyd function FindLastTextOccurrence(theWave, thetext, startPos, endPos) wave/T theWave // alpabetically sorted text wave. string thetext // the text to find in this wave, matching the entire contents of the wave at that point variable startPos, endPos // range over which to search. // limit start and end positions to possible values startPos = max (0, startPos) endPos = min (numPnts (theWave)-1, endPos) variable iPos // the point to be compared variable theCmp // the result of the comparison variable firstPt =startPos , lastPt = endPos // variables the define the range oer which comparisons will be made variable iComp, maxComp = ceil (logBase2 (endPos - StartPos + 1)) // sanity check to limit number of comparisons to theoretical max. In case wave is NOT sorted, e.g. for (iComp =0; (firstPt <= lastPt) && (iComp <= maxComp); iComp +=1) iPos = trunc ((firstPt + lastPt)/2) theCmp = cmpStr (thetext, theWave [iPos]) if (theCmp ==0) //thetext is the same as theWave [iPos] if ((iPos ==endPos) || (cmpStr (theText, theWave [iPos +1]) == -1)) // then iPos is the last occurence of thetext in theWave from startPos to endPos return iPos else // there are more copies of theText in theWave after iPos firstPt = iPos +1 endif elseif (theCmp == 1) //thetext is alphabetically after theWave [iPos] firstPt = iPos +1 else // thetext is alphabetically before theWave [iPos] lastPt =iPos -1 endif endfor if (iComp > maxComp)// max number of comparisons performed. Don't think this will ever happen return NaN else if (cmpStr (thetext, theWave [iPos]) == -1) return -(iPos-1) // After this position is where theText would fit alphabetically else return -iPos endif endif end //****************************************************************************************************** // A test of the FindFirstTextOccurrence and FindLastTextOccurrence functions, using FindValue as a baseline // both accuracy and speed are tested // Last Modified Feb 03 2012 by Jamie Boyd function testFindFirstLastText (maxN) variable maxN variable iPt, nPts = floor (logBase2 (maxN)) // output waves make/o/n=(nPts) FindValueFirstTime_out, FindValueLastTime_out, FFTOtime_out, FLTOtime_out setScale d 0,0 ,"s" FindValueFirstTime_out, FindValueLastTime_out, FFTOtime_out, FLTOtime_out make/o/n=(nPts) FindValueFirstPos_out, FindValueLastPos_out, FFTOpos_out, FLTOpos_out // make data wave variable iN, iVal, valToN make/o/T/n =(maxN) testTextWave WAVE/T testTextWave for (iN =0, iVal =0;iN < maxN; iVal +=1) for (valToN = iN + ceil (logBase2 (iVal +2)); iN < valToN; iN +=1) testTextWave [iN] = num2str (iVal) endfor endfor // sort it, because CmpStr does not do alphanumeric comparison Sort testTextWave, testTextWave // varables for finding and timing string toFind variable fPos, myTimer, elapsedTime for (iPt =0; iPt < nPts; ipt += 1) // find increasingly further values in the wave toFind = testTextWave [2^ (iPt + 2)] // find first occurrence // use find value myTimer = startmstimer FindValue/S=0 /TEXT=toFind/TXOP=4 testTextWave elapsedTime = stopMSTimer(myTimer )/1e06 FindValueFirstTime_out [iPt] = elapsedTime FindValueFirstPos_out [iPt] = V_Value // use FindFirstTextOccurrence myTimer = startmstimer fPos = FindFirstTextOccurrence (testTextWave, toFind, 0, maxN) elapsedTime = stopMSTimer(myTimer ) FFTOtime_out [iPt] = elapsedTime/1e06 FFTOpos_out [iPt] = fPos // find last occurrence // with findValue followed by a loop myTimer = startmstimer FindValue/S=0 /TEXT=toFind/TXOP=4 testTextWave for (iN= V_Value +1;cmpStr (testTextWave [iN], toFind) ==0 && iN < maxN; iN +=1) endfor elapsedTime = stopMSTimer(myTimer )/1e06 FindValueLastTime_out [iPt] = elapsedTime FindValueLastPos_out [iPt] = iN-1 // with FIndLastTextOccurrence myTimer = startmstimer fPos = FindLastTextOccurrence (testTextWave, toFind, 0, maxN) elapsedTime = stopMSTimer(myTimer ) FLTOtime_out [iPt] = elapsedTime/1e06 FLTOpos_out [iPt] = fPos endfor display FFTOtime_out, FLTOtime_out vs FindValueFirstPos_out modifygraph rgb = (0,0,0) appendtoGraph FindValueFirstTime_out, FindValueLastTime_out vs FindValueLastPos_out modifygraph mode =4 label bottom "Points in Text Wave" label left "Time to Find Element (\\U)" Legend/C/N=text0/F=0/B=1/A=MT ModifyGraph log(bottom)=1 edit FindValueFirstPos_out, FFTOpos_out, FindValueLastPos_out, FLTOpos_out end

Forum

Support

Gallery
Igor Pro 9
Learn More
Igor XOP Toolkit
Learn More
Igor NIDAQ Tools MX
Learn More
FindValue
operationFebruary 2, 2012 at 06:14 pm - Permalink
The FindValue operation does not assume that the wave is sorted. As such, it must iterate through each point in the wave, i.e., no binary search goodness. With large enough N (probably far larger than I will ever use, but for the sake of argument, remember that Igor can make ginormous waves), even an operation built in compiled code (like FindValue) will take longer than a more efficient algorithm in slower-running Igor script.
Also, the FindValue operation does not give an easy way to find the last occurrence of a value in a wave. One could find the first matching value with FindValue and then iterate from there point by point with cmpStr, but that will get slower than a binary search pretty quickly.
Try this with maxN = 65536
On my computer (old and busted G4 powerbook) the 2 methods approach equality at N = 4096. At bigger N, FindFirstTextOccurrence is faster.
Dr. Jamie Boyd, Ph.D.
UBC In Vivo Imaging Core
February 2, 2012 at 10:35 pm - Permalink
This is a visualization of testfval with 2^18. This function tests best case performance for FindFirstTextOccurence.
The line
needs to be changed so that it compiles *and* it requires rtGlobals=1.
September 21, 2019 at 01:12 pm - Permalink