Identifying first/last occurrence and making flexible dimension waves

Hi, 

These probably have trivial solutions but I'd be thankful if you could please help me get to them.. 

1. I have a 1D text wave that has let's say 300 rows. Now strings in some cells are repeated and these might not be continuous, i.e. any random indices of rows may contain the same string value. What I want to do is to identify the index of the first occurrence and the last occurrence of any particular string. I was going the messy route of running a for loop, identifying the indices where a string value occurs, storing them in a temp 1D wave, and then picking the maximum and minimum from the temp wave. However, I'm sure there must be a much more elegant way to do this. Kindly advise. 

2. How can I make a 1D or 2D wave that has flexible dimensions? For example, I am running some calculations and the results have to load into a new 1D or 2D results wave. However, I don't know yet what would be the size of the results wave and I can't anticipate it either. So instead of predefining the dimensions, is there a way to use the make syntax that allows for producing waves that grow in size as values get loaded into them until the final value is loaded? 

Thanks a ton in advance for all your help! 

Sincerely, 

Peeyush

1. This depends a bit on whether you already know the repeated string or not (i.e., if the repeated string only reveals itself by being repeated in the wave). If you know the string then you could just run FindValue/TEXT=theString twice, once with and without the /R flag (reverse) to find the first and the last occurrence.

2. Yes, that is actually very easy (but not many users seem to know). The following syntax extends the wave by one and puts in the new value (10 in this example):

myWave[DimSize(myWave,0)] = {10}

This way the wave grows naturally by appending new values. You can use the same approach for any dimension. If you want to build the wave from scratch just start with zero points (i.e., Make/N=0). I don't know if this method is particular memory- and / or cpu-friendly, since the memory allocation for the wave has to be adjusted constantly. Another approach would be to build a bigger wave first and then trim all unused cells after the fact once.

I think the problem 1 can all be solved by FindDuplicates with some smart flags, like /INDX etc. If not, at least FindDuplicates will return list of duplicates and then Grep will return for every duplicate a list of indexes where that string is. Take lowest and largest index for that string and you have what you need. 

For 2, if you need to extend the wave few times, this isd fine. If you need too do it many times, it may be better to extend it by some large number (100, 1000?) and use indexes until you run out. Then extend again. And at the end trim off what was not used. 

1. If you just want to know the first and last occurrence, FindValue/TEXT=theString works good enough. If you want to know further things, such as where the perticular string are or how many occurences are there, some programming is needed. You can try the following code to see whether or not it meets your need.

BTW, I also dont like loops so I avoid using loops as far as possible, so you dont see any loops in the procedure:)

function FindRepeatsIndex(wave/T wt,string s0,variable flag)
    variable n0=numpnts(wt)
    duplicate/O/T/free wt,wtmp
    make/O/N=(n0)/free windex=x
    make/O/N=(n0+2)/free wmask
    sort wtmp,windex,wtmp
    sort {wtmp,windex} windex
    wmask[1,n0]=!abs(cmpstr(s0,wtmp[p-1]))
    findlevels/Q wmask,0.99
    wave w_FindLevels
    variable result=-1
    if(numpnts(w_FindLevels)==0)
        return result
    endif
    switch(flag)
        case 0: //return startIndex of string s0
            result= windex[round(w_FindLevels[0])-1]
            break
        case 1: //return endIndex of string s0
            result= windex[round(w_FindLevels[1])-1]
            break
        case 2: //return repeats number of string s0
            result= sum(wmask)
            break
        case 3: //create wave named w_index stores index number of s0
            make/O/N=(sum(wmask)) w_index
            w_index=windex[p+(round(w_FindLevels[0])-1)]
            result=sum(wmask)
            break
    endswitch
    return result
end

2. Another way, you may use a string variable temporaryly instead of a wave .

Well, that is cool and smart code. But would 

Grep /E=StringToFind/INDX/Q txtWave     and simple analysis - min value, max value and number of points - of W_Index be easier? 

Grep help:   /INDX Creates in the current data folder an output wave W_Index containing the line numbers (or row numbers) where matching lines were found. If this is the only output you need, also use the /Q flag.

Hi All, 

Thank you so much for your advice and suggestions! I have learned a lot from your comments and highly appreciate them. .. Unfortunately, I haven't been able to solve my problem yet so I figured I'll share the wave with you that has the repeated string values. Kindly see the attached pxp file. 

The strings repeating in this wave are continuous, i.e. the same string values fall in continuous rows wherever they do. Now I need to process this wave such that only the highest index duplicate value (last occurrence) survives while the rest get deleted, and the remaining wave remains intact (although it would shorten in dimension and that's fine). These string values are actually rownames for a 2D wave. So I also need to delete corresponding row numbers from that 2D wave as well. 

Kindly advise how could I achieve this.. 

Thanks a ton in advance! 

Sincerely, 

Peeyush 

Rownames.pxp (9.9 KB)

FindDuplicates/INDX is your friend here. Since you want to keep the last entries, you need to reverse the list first. Here is some code:

Function DeleteDupRows(Wave/T w)
    Variable i, pnts = DimSize(w,0)
    Duplicate/free w, temp
    Make/free/N=(pnts) sortw = p
    Sort/R sortw, temp
    FindDuplicates/free/INDX=DupIndex temp
    DupIndex = (pnts-1-DupIndex[p])
    for (i=0; i<numpnts(DupIndex); i++)
        DeletePoints/M=0 DupIndex[i], 1, w
    endfor
End

The loop over DupIndex is to delete the rows in both your rownames wave and the 2D wave. Note that DupIndex refers to rows in the non-reversed list here. If it was all just one wave, on you could use FindDuplicates to extract your unique elements directly without the loop.