Remove non-readable characters from a string
maru
Hi,
I am loading a header of a tiff image. The header was successfully loaded as a string but it contains some non-readable characters.
Is it possible to remove these non-readable characters from the string?
Function/S GetHeader(path, endline)
String path // file path
Variable endline // terminal line for the header, line number starts from zero.
String buffer
Variable refNum
Open/R refNum as path
string header = ""
variable i = 0
Do
FReadLine refNum, buffer //Read one line each until find the specified string
header += buffer
i++
while(i<endline) // to avoid loading all lines. if your header is larger than 100 line, increase the number.
return header
End
String path // file path
Variable endline // terminal line for the header, line number starts from zero.
String buffer
Variable refNum
Open/R refNum as path
string header = ""
variable i = 0
Do
FReadLine refNum, buffer //Read one line each until find the specified string
header += buffer
i++
while(i<endline) // to avoid loading all lines. if your header is larger than 100 line, increase the number.
return header
End
I think you could use CleanupName to do that if the lines are <255 characters long e.g.:
FReadLine refNum, buffer //Read one line each until find the specified string
header += CleanupName(buffer,1) // nix non-ASCII characters
i++
while(i<endline) // to avoid loading all lines. if your header is larger than 100 line, increase the number.
Else you might have to add a sub loop for longer buffers.
July 20, 2021 at 07:31 am - Permalink
First, you might consider using Igor's ImageLoad operation to load your tiff file. It creates an S_info output variable. While that isn't currently documented (I have asked the author to document it), it contains the "# Pixel_size..." string that I think you are trying to get. ImageLoad contains flags that allow you to read the tags from the image. You can also read just the metadata without reading in the actual image (/RTIO flag).
If you want to eliminate the non-printable characters (most of which are due to null bytes in the string), you could use this ConvertTextEncoding command:
String converted = ConvertTextEncoding(header, 1, 1, 3, 2)
That will treat the input string as UTF-8 (ASCII is valid UTF-8) and convert to UTF-8. Any invalid UTF-8 sequences will be silently dropped.
July 20, 2021 at 07:47 am - Permalink
Thank you very much for the help. ImageLoad/RTIO works perfectly.
I have also tried the CleanupName and ConvertTextEncoding, but these codes could not remove the non-readable characters.
July 20, 2021 at 05:12 pm - Permalink
I have done this before using brute force. Use char2num for each character in the string and compare with acceptable ranges.
0-9 is 48-57, A=65, Z=90, a=97, z=122, and then you need to add what other characters you will allow (=, . , ;, etc). And then brute force through the string, taking character one at time from input string and appending to output string only the acceptable ones. I was looking at our Python implementation of similar function and it does something similar. It will not be pretty of fast, but these headers cannot be too long anyway.
Alternative is to switch the Pilatus you have into writing Nexus file - if the software support you have handles it (I heard newer ones do). Nexus (=HDF5) Igor can read natively in IP9 and with xop in IP8 and before.
July 20, 2021 at 08:12 pm - Permalink
In reply to I have done this before… by ilavsky
Thank you for the suggestion. The brute method will completely solve my problem. I will try that. Unfortunately, HDF5 is currently not supported in the instruments I am using.
July 21, 2021 at 06:49 pm - Permalink
I have created a code snippet that addresses the original question: https://www.wavemetrics.com/code-snippet/remove-non-printable-character…
Note that when loading a TIFF image you will probably be better off using ImageLoad directly to load the image. If you need the information in the header, you can use the /RAT flag (or /LTMD flag if you use /BIGT=1). That will store information that is in the header in waves in a Tagn data folder. If you don't care about the image data and just want the metadata, you can use the /RTIO flag.
July 28, 2021 at 08:02 am - Permalink