Identifying first/last occurrence and making flexible dimension waves

1. This depends a bit on whether you already know the repeated string or not (i.e., if the repeated string only reveals itself by being repeated in the wave). If you know the string then you could just run FindValue/TEXT=theString twice, once with and without the /R flag (reverse) to find the first and the last occurrence.

2. Yes, that is actually very easy (but not many users seem to know). The following syntax extends the wave by one and puts in the new value (10 in this example):

myWave[DimSize(myWave,0)] = {10}

This way the wave grows naturally by appending new values. You can use the same approach for any dimension. If you want to build the wave from scratch just start with zero points (i.e., Make/N=0). I don't know if this method is particular memory- and / or cpu-friendly, since the memory allocation for the wave has to be adjusted constantly. Another approach would be to build a bigger wave first and then trim all unused cells after the fact once.

Log in or register to post comments

September 4, 2021 at 05:56 pm - Permalink

ilavsky

I think the problem 1 can all be solved by FindDuplicates with some smart flags, like /INDX etc. If not, at least FindDuplicates will return list of duplicates and then Grep will return for every duplicate a list of indexes where that string is. Take lowest and largest index for that string and you have what you need.

For 2, if you need to extend the wave few times, this isd fine. If you need too do it many times, it may be better to extend it by some large number (100, 1000?) and use indexes until you run out. Then extend again. And at the end trim off what was not used.

Log in or register to post comments

September 5, 2021 at 03:36 am - Permalink

wings

1. If you just want to know the first and last occurrence, FindValue/TEXT=theString works good enough. If you want to know further things, such as where the perticular string are or how many occurences are there, some programming is needed. You can try the following code to see whether or not it meets your need.

BTW, I also dont like loops so I avoid using loops as far as possible, so you dont see any loops in the procedure:)

function FindRepeatsIndex(wave/T wt,string s0,variable flag)
    variable n0=numpnts(wt)
    duplicate/O/T/free wt,wtmp
    make/O/N=(n0)/free windex=x
    make/O/N=(n0+2)/free wmask
    sort wtmp,windex,wtmp
    sort {wtmp,windex} windex
    wmask[1,n0]=!abs(cmpstr(s0,wtmp[p-1]))
    findlevels/Q wmask,0.99
    wave w_FindLevels
    variable result=-1
    if(numpnts(w_FindLevels)==0)
        return result
    endif
    switch(flag)
        case 0: //return startIndex of string s0
            result= windex[round(w_FindLevels[0])-1]
            break
        case 1: //return endIndex of string s0
            result= windex[round(w_FindLevels[1])-1]
            break
        case 2: //return repeats number of string s0
            result= sum(wmask)
            break
        case 3: //create wave named w_index stores index number of s0
            make/O/N=(sum(wmask)) w_index
            w_index=windex[p+(round(w_FindLevels[0])-1)]
            result=sum(wmask)
            break
    endswitch
    return result
end

2. Another way, you may use a string variable temporaryly instead of a wave .

Log in or register to post comments

September 5, 2021 at 06:26 am - Permalink

ilavsky

Well, that is cool and smart code. But would

Grep /E=StringToFind/INDX/Q txtWave and simple analysis - min value, max value and number of points - of W_Index be easier?

Grep help: /INDX Creates in the current data folder an output wave W_Index containing the line numbers (or row numbers) where matching lines were found. If this is the only output you need, also use the /Q flag.

Log in or register to post comments

September 5, 2021 at 09:36 am - Permalink

wings

Yes, grep is more easier and clear indeed, thanks.

Log in or register to post comments

September 5, 2021 at 03:47 pm - Permalink

Peeyush Khare

Hi All,

Thank you so much for your advice and suggestions! I have learned a lot from your comments and highly appreciate them. .. Unfortunately, I haven't been able to solve my problem yet so I figured I'll share the wave with you that has the repeated string values. Kindly see the attached pxp file.

The strings repeating in this wave are continuous, i.e. the same string values fall in continuous rows wherever they do. Now I need to process this wave such that only the highest index duplicate value (last occurrence) survives while the rest get deleted, and the remaining wave remains intact (although it would shorten in dimension and that's fine). These string values are actually rownames for a 2D wave. So I also need to delete corresponding row numbers from that 2D wave as well.

Kindly advise how could I achieve this..

Thanks a ton in advance!

Sincerely,

Peeyush

Attachments Rownames.pxp (9.9 KB)

Log in or register to post comments

September 7, 2021 at 08:38 am - Permalink

chozo

FindDuplicates/INDX is your friend here. Since you want to keep the last entries, you need to reverse the list first. Here is some code:

Function DeleteDupRows(Wave/T w)
    Variable i, pnts = DimSize(w,0)
    Duplicate/free w, temp
    Make/free/N=(pnts) sortw = p
    Sort/R sortw, temp
    FindDuplicates/free/INDX=DupIndex temp
    DupIndex = (pnts-1-DupIndex[p])
    for (i=0; i<numpnts(DupIndex); i++)
        DeletePoints/M=0 DupIndex[i], 1, w
    endfor
End

The loop over DupIndex is to delete the rows in both your rownames wave and the 2D wave. Note that DupIndex refers to rows in the non-reversed list here. If it was all just one wave, on you could use FindDuplicates to extract your unique elements directly without the loop.

Log in or register to post comments

September 7, 2021 at 09:17 am - Permalink

Peeyush Khare

This did it! Thanks a lot chozo, and also for explaining it! I really appreciate it!

Log in or register to post comments

September 9, 2021 at 03:12 am - Permalink

chozo

Great, that it worked.

Log in or register to post comments

September 9, 2021 at 04:37 am - Permalink