Help parsing ugly string

Hello,

data file formats... Love them.  A new (for me) image has header which I need to parse into numbers. Header contains few lines (with \n\r as separators). Here is the header:

  OD SAPPHIRE  4.0

COMPRESSION=TY6( 25.4%)

NX= 487 NY= 407 OI=   7037 OL=      0

NHEADER=   6576 NG=    512 NS=    768 NK=   1024 NS=    512 NH=   2048

NSUPPLEMENT=      0

TIME=Fri Oct 25 17:11:57 2024                 

I need to pull out NX, NY, and NHEADER. I can read this byte by byte and figure this out, but it is really ugly. Does anyone have some smart suggestion how do I split each line into parts so I can evaluate each, please? I have seen some cool miracle code here recently ;-) 

Thanks ahead!

Hi

If the headers are stable then you could use scanf.

function use_sscanf(string input)

    nvar nx,ny
    sscanf input, "NX= %d NY= %d OI=   1 OL=      1", Nx,Ny
end

and then a similar one for NHEADER.

Andy

@Andy: Did you mean to write 'variable' instead of 'nvar'?

Since I am playing around with regex recently, here is also a SplitString option:

Function [variable nx, variable ny, variable nh] parseHeader(string in)
    string regex = "", nx_str, ny_str, nh_str
    regex += "NX="
    regex += "(?: )*([0-9]*)"       // skip space, get numbers
    regex += "(?:\r|\n|.)*?"        // skip stuff until next
    regex += "NY="
    regex += "(?: )*([0-9]*)"
    regex += "(?:\r|\n|.)*?"
    regex += "NHEADER="
    regex += "(?: )*([0-9]*)"
    SplitString/E=(regex) in, nx_str, ny_str, nh_str
    return [str2num(nx_str), str2num(ny_str), str2num(nh_str)]
End

Function testRegex()
    string header = "  OD SAPPHIRE  4.0\r\n\r\nCOMPRESSION=TY6( 25.4%)\r\n\r\nNX= 487 NY= 407 OI=   7037 OL=      0\r\nfgjhgjnbnn\r\nNHEADER=   6576 NG=    512 NS=    768 NK=   1024 NS=    512 NH=   2048\r\n\r\nNSUPPLEMENT=      0\r\n\r\nTIME=Fri Oct 25 17:11:57 2024        "
    variable nx, ny, nh
    [nx, ny, nh] = parseHeader(header)
    print nx, ny, nh
End

You could also build a custom function which pulls the value from whichever key you request:

Function parseHeader(string in, string key)
    string str
    SplitString/E=(key+"=(?: )*([0-9]*)") in, str
    return str2num(str)
End

Function testRegex()
    string header = "  OD SAPPHIRE  4.0\r\n\r\nCOMPRESSION=TY6( 25.4%)\r\n\r\nNX= 487 NY= 407 OI=   7037 OL=      0\r\nfgjhgjnbnn\r\nNHEADER=   6576 NG=    512 NS=    768 NK=   1024 NS=    512 NH=   2048\r\n\r\nNSUPPLEMENT=      0\r\n\r\nTIME=Fri Oct 25 17:11:57 2024        "
    print parseHeader(header, "NX")
    print parseHeader(header, "NY")
    print parseHeader(header, "NHEADER")
End

 

Ah I see. I didn't realize you expected nx, ny to exist already as global variables. This is certainly a way to avoid multiple return statements.

Thanks! sscanf is elegant, but I really like the regex and seems to me to be more useful. Especially the  parseHeader function is excellent suggestion. I hoped there is similar solution, but regex and me are not good friends. 

Thank you!

sscanf and regrex are great solution. Here is my more lazy approach if you are only interested in NX, NY, and NHEADER

It looks like information are separated by various number of white space and returns. We can first use TrimString function to clean that up and use that as a separator. Then do a NumberByKey to get the values using "= " (equal space) as the keySepStr.

Function ReturnKeyValue(String strInput, String keyName)
    return NumberByKey(keyName, TrimString(strInput, 1), "= ", " ", 1)
end

This does not work for the TIME and COMPRESSION as there is no space after the equal and the data contains space in it.