Grep match issue
Hello,
I am using Grep to match some chemical formulas between two 1D text waves, and align 2 corresponding numeric waves. Kindly see the attached table (TablePK) for the text waves. I am noticing an interesting behavior. Grep matches a lot of formulas but skips some of them somehow. For example, it won't match CO even though CO exists in both waves.
My syntax:
for(i=0;i<dimsize(text2match,0);i+=1)
Grep/INDX/Q/E = text2match[i] referencetextwave
wave W_Index
MatchedData[W_Index[0]] = data2match[i]
endfor
The MatchedData wave appears with all NaN values in the row corresponding to "CO" in the text wave. Same at a few other places as well. I am not sure why this happens. Please advise.
Sincerely,
Peeyush
If you use the debugger you will find that Grep does match, e.g., 'CO2plus2' with the input 'CO' instead of 'CO' itself. I am not fully sure why. But since you are just using the first match, and it seems to me that you rather want an exact match, why not use FindValue instead. Here is a start:
int i
for(i=0;i<dimsize(matchThis,0);i+=1)
FindValue/TEXT=(matchThis[i])/TXOP=4 withThis
if (V_value > -1 && V_value < numpnts(withThis))
print i, withThis[V_value]
endif
endfor
end
Sorry if I misunderstood the question. Could you give a more complete picture of the task you want to do, including the MatchData and data2match waves.
March 1, 2024 at 02:29 am - Permalink
Hi chozo,
Thanks a lot for the advice! Sure, I'll switch to FindValue/Text to accomplish the task. Thanks also for the code!
Meanwhile, since I am indeed looking for exact matches, I am still a little perplexed why Grep doesn't match CO in one text wave to the CO in the other text wave.. I think the situation is same in a few other cases where the formula string contains only two letters..
It would be good to have some thoughts on this as well since I use Grep a lot and sometimes the waves are big enough that such an error can evade detection..
Sincerely,
Peeyush
March 1, 2024 at 02:52 am - Permalink
Grep should work as well. Maybe you can try to restrict to the full expression and see how this goes:
Grep/INDX/Q/E=("^"+text2match[i]+"$") referencetextwave
March 1, 2024 at 03:19 am - Permalink
As chozo wrote, Grep is matching all wave points that contain CO. You are saving only the first of those matches.
Here is a useful function that tells you how many points in a textwave w match a regular expression:
Grep/E=RegEx/Q w
return v_value
end
1
•print IsInWave(SpeciesMassText_Acet, "CO")
6
you can then use this function, for instance, to extract matches:
•extract SpeciesMassText_Acet, matches, IsInWave(toMatch, "^"+SpeciesMassText_Acet+"$")
•print matches
matches[0]= {"C","N","CO"}
March 1, 2024 at 04:53 am - Permalink
Using the regex with Grep did the trick!
Thanks a lot for the advice and all the very helpful info, chozo and tony. I highly appreciate it. :)
Sincerely,
Peeyush
March 11, 2024 at 06:27 am - Permalink
good.
In case it wasn't clear, you can avoid loops completely if you use extract (with the /INDX flag if you need index values) together with the IsInWave function to find matches in the text2match (not data2match) wave.
extract/INDX data2match, matches, IsInWave(text2match, "^"+data2match+"$")
March 11, 2024 at 06:39 am - Permalink
Great, thanks for the advice, Tony. I hadn't used "IsInWave" until now, but this appears to be very useful. I'll stitch it into my codes moving forward.
Sincerely,
Peeyush
March 11, 2024 at 06:45 am - Permalink
IsInWave is not a built-in function, it's the little wrapper for grep that I wrote in my post somewhere above.
March 11, 2024 at 06:48 am - Permalink
Right, right.. sorry I mean I hadn't used something akin to your "IsInWave" function until now, but this is pretty useful. I'll include a function to this effect moving forward, especially since I really enjoy avoiding loops wherever I can..
March 11, 2024 at 06:59 am - Permalink