read in data (txt file) with special time format
zek
I try to read in a file like this
Datum Zeit Tmp_ak WGms WR DatumZeit
17.09.2016 09.50.00.02 15.8 1.4 248 17.09.2016 09.50.00.02
17.09.2016 09.50.00.12 15.7 1.3 253.6 17.09.2016 09.50.00.12
17.09.2016 09.50.00.22 15.7 1.3 254.8 17.09.2016 09.50.00.22
17.09.2016 09.50.00.32 15.7 1.4 255.1 17.09.2016 09.50.00.32
17.09.2016 09.50.00.42 15.7 1.3 254.9 17.09.2016 09.50.00.42
17.09.2016 09.50.00.52 15.7 1.3 253.2 17.09.2016 09.50.00.52
17.09.2016 09.50.00.62 15.7 1.3 253.5 17.09.2016 09.50.00.62
17.09.2016 09.50.00.72 15.7 1.3 250.9 17.09.2016 09.50.00.72
17.09.2016 09.50.00.82 15.7 1.3 250.4 17.09.2016 09.50.00.82
17.09.2016 09.50.00.92 15.7 1.2 250.7 17.09.2016 09.50.00.92
First line is the column name; first and second column gives date and time, which I want to merge in one column. The column time has a very special time format. The other columns are temperature, wind speed and wind direction and DateTime.
I tried to load this file with the Data-Loader from Igor Pro, but I has problems with this special time format. Does anyone has a solution or has somebody already some code, which is useful for the read-in of such type of files?
Best regards
Kerstin
I recommend saving this as an Igor Procedure file so that it will be automatically loaded when you launch Igor. Execute this for details:
DisplayHelpTopic "Special Folders"
If this does not work with your file then it is probably due to formatting subtleties. In that case, attach a zip archive of your file so I can see exactly what the file contains.
// dd.mm.yyyy<space>hh.mm.ss.ff<space> ...
// where hh.mm.ss.ff is a time of day with fractional seconds. The procedure changes the
// dots in the time to colons so that it works with LoadWave/J, writes a temporary file containing
// the massaged text, and loads the data from the temporary file.
// There is a DeleteFile call at the end of the procedure to delete the temporary file. If you want to
// inspect the temporary file, comment out the DeleteFile call.
// In addition to the issue of the formatting of the time, the file has another peculiarity as posted
// at http://www.igorexchange.com/node/7831. It looks like this:
// Datum Zeit Tmp_ak WGms WR DatumZeit
// 17.09.2016 09.50.00.02 15.8 1.4 248 17.09.2016 09.50.00.02
// The column names for the first date/time have a space between them but the column name for the last
// date/time has no space. To sidestep this issue, I am ignoring the column names altogether and
// using LoadWave /B to set the column names.
// There is another problem. The use of space to separate the date and time as well as to separate one column
// from the next creates problems with LoadWave. Therefore the procedure replaces the space used to separate
// one column from the next with a tab.
Menu "Load Waves"
"Load Kerstin File...", LoadKerstinFile("", "")
End
static Function/S FixText(textIn)
String textIn
String textOut = ""
Variable numBytes = strlen(textIn)
Variable bytesLeft = numBytes
Variable offset = 0
for(offset=0; offset<numBytes; )
String ch = textIn[offset]
Variable isSpace = CmpStr(ch," ")==0
if (isSpace && bytesLeft>=12) // Space and sufficient remaining bytes to be hh.mm.ss.ff?
String section = textIn[offset+1,offset+11]
String regExp = "[[:digit:]]{2}\.[[:digit:]]{2}\.[[:digit:]]{2}\.[[:digit:]]{2}" // hh.mm.ss.ff
if (GrepString(section,regExp))
section = ReplaceString(".", section, ":")
textOut += ch + section
Variable sectionLength = strlen(section)
offset += 1 + sectionLength // 1 for ch (space)
bytesLeft -= 1 + sectionLength
continue
endif
endif
if (isSpace)
ch = "\t" // Replace space separating columns with tab
endif
textOut += ch
offset += 1
bytesLeft -= 1
endfor
return textOut
End
// LoadKerstinFile(pathName, fileName)
// A data file has unwanted line breaks for lines longer than 80 characters
// This routine creates a temporary version of the file without the bad line breaks
// and loads data from the temporary file.
Function LoadKerstinFile(pathName, fileName)
String pathName // Name of an Igor symbolic path or "".
String fileName // Name of file or full path to file.
Variable refNum
// First get a valid reference to a file.
if ((strlen(pathName)==0) || (strlen(fileName)==0))
// Display dialog looking for file.
// Replace /T="????" with, for example, /T=".dat" if your files are .dat files.
Open /D /R /P=$pathName /T="????" refNum as fileName
fileName = S_fileName // S_fileName is set by Open/D
if (strlen(fileName) == 0) // User cancelled?
return -1
endif
endif
// Open source file and read the raw text from it into a string variable
Open/Z=1/R/P=$pathName refNum as fileName
if (V_flag != 0)
return -1 // Error of some kind
endif
FStatus refNum // Sets V_logEOF
Variable numBytesInFile = V_logEOF
String text = PadString("", numBytesInFile, 0x20)
FBinRead refNum, text // Read entire file into variable.
Close refNum
// Fix the text
text = FixText(text) // Remove bad line breaks
// Write the fixed text to a temporary file
String tempFileName = fileName + ".noindex" // Use of .noindex prevents Spotlight from indexing the file. Otherwise we get an error when we try to delete the file because Spotlight has it open.
Open refNum as tempFileName
FBinWrite refNum, text
Close refNum
String columnInfoStr = ""
columnInfoStr += "N=DatumZeitA,F=8;" // <date><space><time>
columnInfoStr += "N=TmpAk,F=0;" // <number>
columnInfoStr += "N=WGms,F=0;" // <number>
columnInfoStr += "N=WR,F=0;" // <number>
columnInfoStr += "N=DatumZeitB,F=8;" // <date><space><time>
// Load the temporary file
// The /L flag causes LoadWave to load the data starting from line 1 (zero-based), skipping the name line which is problematic.
// The /B flag specifies the column names and formats.
// The /R flag specifies the date format as dd.mm.yy.
LoadWave/J/D/P=$pathName/E=1/L={0,1,0,0,0}/B=columnInfoStr/K=0/R={English,2,2,2,2,"DayOfMonth.Month.Year",40} tempFileName
if (V_flag == 0)
Printf "An error occurred while loading data from \"%s\"\r", S_fileName
else
Printf "Loaded data from \"%s\"\r", S_fileName
endif
// Make table bigger
MoveWindow 10, 50, 800, 400
// Set date format to show fractional seconds
ModifyTable showFracSeconds[1]=1,digits[1]=2, width[1]=175
ModifyTable showFracSeconds[5]=1,digits[5]=2, width[5]=175
// Delete the temporary file
DeleteFile /P=$pathName tempFileName
return 0
End
July 10, 2017 at 10:02 am - Permalink
The procedure works, :-).
Not very fast, probably of the big data file.
I attached one of my data file.
Best regards
Kerstin
July 14, 2017 at 02:32 am - Permalink
Why do you have duplicate columns in the data file? (Datum, Zeit) in the beginning, (Datum, Zeit) at the end? Each line begins and ends with this tuple. For the file that you posted if you remove the last two columns, the reduction is substantial: 70173015 bytes > 42524260 bytes (~67MB > ~41MB).
Under a linux or cygwin environment you'd do something like:
cat wind_test_file.txt | awk '{print $1,$2,$3,$4,$5}' > wind_test_file_out
Another recommendation is to change the delimiter to a comma. Then, use the
Data > Load Waves > Load delimited text...
menu, to load the data keeping the time as a text. Parse it after loading.Under linux or cygwin in vim you'd issue the following to substitute space for comma:
:1,$s/ /,/g
Another recommendation is to combine the date and time into one variable ala julian date, unix time, etc. This will provide further savings on the file size, but will increase processing upon file loading.
edit: As a matter of fact, you don't need the date. It's a waste of bytes. Tell the person generating these files to name the filenames with the corresponding date, or put it in the beginning of each file as a header or something, and dump data only for this particular day in this particular file. I would also use fractional time. Even though a delta t would be enough (which btw is not precisely 100ms, just a tiny bit less, ~99.99(93)ms)
best,
_sk
July 14, 2017 at 06:24 am - Permalink