Help parsing ugly string
ilavsky
Hello,
data file formats... Love them. A new (for me) image has header which I need to parse into numbers. Header contains few lines (with \n\r as separators). Here is the header:
OD SAPPHIRE 4.0
COMPRESSION=TY6( 25.4%)
NX= 487 NY= 407 OI= 7037 OL= 0
NHEADER= 6576 NG= 512 NS= 768 NK= 1024 NS= 512 NH= 2048
NSUPPLEMENT= 0
TIME=Fri Oct 25 17:11:57 2024
COMPRESSION=TY6( 25.4%)
NX= 487 NY= 407 OI= 7037 OL= 0
NHEADER= 6576 NG= 512 NS= 768 NK= 1024 NS= 512 NH= 2048
NSUPPLEMENT= 0
TIME=Fri Oct 25 17:11:57 2024
I need to pull out NX, NY, and NHEADER. I can read this byte by byte and figure this out, but it is really ugly. Does anyone have some smart suggestion how do I split each line into parts so I can evaluate each, please? I have seen some cool miracle code here recently ;-)
Thanks ahead!
Hi
If the headers are stable then you could use scanf.
nvar nx,ny
sscanf input, "NX= %d NY= %d OI= 1 OL= 1", Nx,Ny
end
and then a similar one for NHEADER.
Andy
October 27, 2024 at 02:58 pm - Permalink
@Andy: Did you mean to write 'variable' instead of 'nvar'?
Since I am playing around with regex recently, here is also a SplitString option:
string regex = "", nx_str, ny_str, nh_str
regex += "NX="
regex += "(?: )*([0-9]*)" // skip space, get numbers
regex += "(?:\r|\n|.)*?" // skip stuff until next
regex += "NY="
regex += "(?: )*([0-9]*)"
regex += "(?:\r|\n|.)*?"
regex += "NHEADER="
regex += "(?: )*([0-9]*)"
SplitString/E=(regex) in, nx_str, ny_str, nh_str
return [str2num(nx_str), str2num(ny_str), str2num(nh_str)]
End
Function testRegex()
string header = " OD SAPPHIRE 4.0\r\n\r\nCOMPRESSION=TY6( 25.4%)\r\n\r\nNX= 487 NY= 407 OI= 7037 OL= 0\r\nfgjhgjnbnn\r\nNHEADER= 6576 NG= 512 NS= 768 NK= 1024 NS= 512 NH= 2048\r\n\r\nNSUPPLEMENT= 0\r\n\r\nTIME=Fri Oct 25 17:11:57 2024 "
variable nx, ny, nh
[nx, ny, nh] = parseHeader(header)
print nx, ny, nh
End
You could also build a custom function which pulls the value from whichever key you request:
string str
SplitString/E=(key+"=(?: )*([0-9]*)") in, str
return str2num(str)
End
Function testRegex()
string header = " OD SAPPHIRE 4.0\r\n\r\nCOMPRESSION=TY6( 25.4%)\r\n\r\nNX= 487 NY= 407 OI= 7037 OL= 0\r\nfgjhgjnbnn\r\nNHEADER= 6576 NG= 512 NS= 768 NK= 1024 NS= 512 NH= 2048\r\n\r\nNSUPPLEMENT= 0\r\n\r\nTIME=Fri Oct 25 17:11:57 2024 "
print parseHeader(header, "NX")
print parseHeader(header, "NY")
print parseHeader(header, "NHEADER")
End
October 27, 2024 at 11:23 pm - Permalink
In reply to @Andy: Did you mean to write… by chozo
Actually I did use NVAR intentionally since I wanted to avoid the multiple return complexities.
Andy
October 28, 2024 at 05:25 am - Permalink
Ah I see. I didn't realize you expected nx, ny to exist already as global variables. This is certainly a way to avoid multiple return statements.
October 28, 2024 at 05:36 am - Permalink
Thanks! sscanf is elegant, but I really like the regex and seems to me to be more useful. Especially the parseHeader function is excellent suggestion. I hoped there is similar solution, but regex and me are not good friends.
Thank you!
October 28, 2024 at 10:53 am - Permalink
sscanf and regrex are great solution. Here is my more lazy approach if you are only interested in NX, NY, and NHEADER
It looks like information are separated by various number of white space and returns. We can first use TrimString function to clean that up and use that as a separator. Then do a NumberByKey to get the values using "= " (equal space) as the keySepStr.
return NumberByKey(keyName, TrimString(strInput, 1), "= ", " ", 1)
end
This does not work for the TIME and COMPRESSION as there is no space after the equal and the data contains space in it.
November 14, 2024 at 07:14 am - Permalink