Regex advice
Peeyush Khare
Hello,
I am trying to filter a 1D text wave1 that looks something like this: 100.0787, C4H7O2, C7H8NO3, 230.0045, C8H8NS, ....
I am trying to get to the regex expression that would allow to extract only CHO containing cells from this wave.
I tried Grep/INDX/Q/E={"[NS]",1} wave1. This removes all N and S containing compounds but leaves the numeric entries in there. I need to output a purely CHO-containing wave.
Please advise..
Sincerely,
Peeyush
you can use multiple regex expressions with grep, so try these:
must have C, H, or O, can have numbers:
must have C, H and O:
December 2, 2023 at 02:16 am - Permalink
Hi Tony,
Thanks for the advice. I tried these but somehow I'm still struggling. Please see the 1D wave attached that I'm trying to filter.
The "Na" when it is not in the beginning of a formula is an adduct ion, so I replacestring it with a "" before proceeding. I do this because I'm working with multiple filters, one of which is CHN species (i.e. C,H and N containing compounds but no other elements in the formula). Here, the "N" from "Na" interferes in getting the reduced nitrogen-containing species out.
Meanwhile, I tried for CHO species but it is somehow not pulling only the purely C,H and O containing formulas.
Would be thankful for your advice.
Sincerely,
Peeyush
December 2, 2023 at 06:25 am - Permalink
Perhaps this will work.
// input string wave, returns string list
Function/S grep_removestuff(wave strwave, [variable keepdigits, string ignorelttrs])
string rstr
wfprintf rstr, "%s;", tstrwave
if (ParamIsDefault(keepdigits))
rstr = GrepList(rstr,"^[[:alpha:]]")
endif
if (!ParamIsDefault(ignorelttrs))
string mtchstr
mtchstr = "(?i)[" + ignorelttrs +"]+"
rstr = GrepList(rstr,mtchstr,1)
endif
return rstr
end
December 2, 2023 at 09:10 am - Permalink
This one
grep/E="^[CHO0-9]*$"/E="[CHO]"
ensures that all characters fall within those specified in the character class in the first regular expression, and at least one character matches those specified in the second. That excludes Na. If you want to catch the formulae with a trailing +, you have to add that to the first expression.
try
grep/E="^[CHO0-9+]*$"/E="[CHO]" speciesmasstext_out
I don't *think* you have to escape the + in the character class.
December 2, 2023 at 09:48 am - Permalink
If the output must have all three elements, use
or simply (perhaps less simply?)
December 2, 2023 at 09:53 am - Permalink