Load data in lines
neutron
I'm working on a project where data is generated for many different systems and while I have been able to manage with most of the files, this one format (shown below) has me stumped. I wasn't involved in the format schemes and have no control over them. I would be extremely grateful if someone here would help me figure out how to read in this data!
- There are 5 header lines I would like to ignore, except that I do want to take the run number (line 3) from it to use as a prefix for the wavenames
- Each line after the header is the equivalent of one wave, starting with
-- the date/time in the format yyyy-mm-dd_hh:mm:ss.ddd_UTC
-- the "unit" number, i.e. 0, 1 etc
-- the actual data values
# AcqX
# 2011-04-22_12:36:13.390_UTC
# Run 33
# Frequency data (New software)
# Date_Time Iteration Data (4000 points at 100.000000Hz)... Before Precession = 52 s. Precession duration = 40 s.
2011-04-26_19:34:39.342_UTC 0 33162 32345 31586 31055 30877 31092 31653 32426 33239 33899 34257 34231 33826 33135 32319 31563 31045 30879 31107 31677 .....
2011-04-26_19:36:15.130_UTC 1 34111 34747 34889 34499 33670 32589 31510 30681 30289 30428 31062 32047 33155 34131 34749 34871 34471 33633 32559 31490 .....
Many thanks!
Here's one solution:
- Open the file with
Open
.- Set up a for loop that calls
FReadLine refNum, lineContents
and breaks when FReadLine returns the empty string (lineContents is the name of a string variable that you declare). Do nothing with the first 5 lines.- Inside the for loop, get the number of entries in each line by calling
ItemsInList(lineContents, " ")
. Subtract 2 from this to avoid counting the two header entries (or you can parse them if you like).- Allocate a wave with the appropriate number of points.
- Then set up another for loop inside the first one that loops over all the items. Use
StringFromList(j, lineContents, " ")
, where 'j' is the second loop counter. Note that this is not very efficient but it should get the job done. For increased speed you can try looping over the string directly, though whether it's faster or not is hard to predict. For each item callstr2num
and store it in the wave.- Repeat this for all the lines in the file, each time making a new wave. Don't forget to close the file at the end using
Close
.For my own convenience I'm assuming that you have some experience with Igor programming, and that these instructions make sense. If not then just let us know.
April 27, 2011 at 09:21 am - Permalink
NOTE: You will need to set the variable numColumns to the right number. It needs to be set to the number of data columns plus five. If this number varies from file to file then the function will need to be modified to count the number of data columns.
// # AcqX
// # 2011-04-22_12:36:13.390_UTC
// # Run 33
// # Frequency data (New software)
// # Date_Time Iteration Data (4000 points at 100.000000Hz)... Before Precession = 52 s. Precession duration = 40 s.
// 2011-04-26_19:34:39.342_UTC 0 33162 32345 31586 ...
// 2011-04-26_19:36:15.130_UTC 1 34111 34747 34889 ...
// The run number is loaded into a global variable named runNumber in the current data folder.
Function LoadNeutronData(pathName, fileName)
String pathName // Name of Igor symbolic path or "" for dialog
String fileName // File name or "" for dialog
// Print columnInfoStr // For debugging only
Variable firstRow = 5
Variable numColumns = 25
Variable defaultFieldWidth = 6
String columnInfoStr = ""
columnInfoStr += "N=UTCDate,F=6,W=10;" // The date
columnInfoStr += "N='_skip_',F=-2,W=1;" // Skip underscore after the date
columnInfoStr += "N=UTCTime,F=7,W=12;" // The time
columnInfoStr += "N='_skip_',F=-2,W=5;" // Skip "_UTC" and space after the time
columnInfoStr += "N=unit,F=0,W=2;" // Unit number
columnInfoStr += "C=100,F=0,W=6;" // Handles the remaining columns
LoadWave /F={numColumns,defaultFieldWidth,0} /R={English,2,2,2,2,"Year-Month-DayOfMonth",40} /O /L={0, firstRow, 0, 0, 0} /B=columnInfoStr /A=data /E=1 /P=$pathName fileName
// Combine date and time into date/time
Wave UTCDate, UTCTime
UTCDate += UTCTime
RemoveFromTable UTCTime
KillWaves /Z UTCTime // No longer needed
ModifyTable format(UTCDate)=8, showFracSeconds(UTCDate)=1, width(UTCDate)=180
// Load the run number
String filePath = S_path + S_fileName // Set by LoadWave
String text
Variable refNum
Open /R refNum as filePath
FReadLine refNum, text // Skip first line
FReadLine refNum, text // Skip second line
FReadLine refNum, text
Variable/G runNumber // Creates global variable
sscanf text, "# Run %d", runNumber // Sets global variable
Close refNum
End
Function LoadNeutronDataDialog()
LoadNeutronData("", "")
End
April 27, 2011 at 11:07 am - Permalink
I just tested the code from you, hrodstein, and I've realized that I didn't describe the situation properly.
All the points in an individual line (about 4000 of them in this case) make up one wave. In the extract I showed, there are then 2 waves, each with 20 points in the excerpt (4000 in reality). For example, wave 0 started at 2011-04-26_19:34:39.342_UTC, and has data points
33162
32345
31586
..
..
My apologies for not being clearer. Ideally I would try 741's suggestion on my own, but I'm a little pressed for time. :-(
April 28, 2011 at 02:34 am - Permalink
"Load Neutron", /Q, LoadNeutron()
End
constant kNeutronHeaderLength = 5 // number of lines to skip at the beginning of the file
constant knEntriesToSkip = 2 // number of entries to skip at the beginning of a line
// entries are separated by spaces, so "2011-04-26_19:34:39.342_UTC"
// is a single entry
constant knTrailingSpaces = 0 // IMPORTANT: set this the number of trailing spaces at the end of the line
Function LoadNeutron()
variable refNum
// this selects the file but does not open it
Open /R/D /F="All Files:.*;" refNum
if (strlen(S_fileName) == 0)
// user cancel
return 0
endif
// open the file
Open /R refNum as S_fileName
string lineContent
variable i
// the first lines are header only, skip these
for (i = 0; i < kNeutronHeaderLength; i+=1)
FReadLine refNum, lineContent
endfor
// the main loop
// read all lines separately. Each line is a full wave
variable nValues, j
string outputWaveName, strValue
for (i = 0; ; i += 1)
FReadLine refNum, lineContent
if (strlen(lineContent) == 0)
// no more data to be read
return 0
endif
nValues = ItemsInList(lineContent, " ") - knEntriesToSkip - knTrailingSpaces
outputWaveName = "W_Neutron" + num2str(i)
Make /O/N=(nValues) /D $outputWaveName
wave output = $outputWaveName
// get out the numbers
for (j = 0; j < nValues; j+=1)
strValue = StringFromList(j + knEntriesToSkip, lineContent, " ")
output[j] = str2num(strValue)
endfor
endfor
Close refNum
End
There are a few gotcha's involved since this is a very impromptu parsing approach.
- I'm ignoring the first two entries on a line, are those important for you?
- If there are trailing spaces after the last value of each line, they will mess up the counting of the number of values. To address this, check the number of trailing spaces on the data lines in a real file (likely zero or one) and change
constant knTrailingSpaces = 0
to the correct value.- The function does not check if there are any previously loaded W_Neutron waves, it just makes the required number and overwrites any that may exist. This is important not only because data could be lost, but also because the following could happen: say you load a file with 10 datasets, and then you load one with 5 datasets. The first load will create W_Neutron1 through 10, and the second will overwrite W_Neutron1 through 5, but leave 6 through 10 untouched. This might fool you into thinking that the second file contained 10 datasets.
All of this can be addressed by making the function a bit more clever, but this is just the bare minimum.
April 28, 2011 at 08:59 am - Permalink
This skips the date/time data and loads the rest of each row into a wave.
"Load Neutron Data . . .", LoadNeutronData("", "")
End
static Function ExtractRowDataInto1DWaves(mat, baseName)
Wave mat // Matrix containing row-oriented data
String baseName // Base name of output waves
Variable numRows, numColumns
Variable row
String name
numRows = DimSize(mat, 0)
numColumns = DimSize(mat, 1)
row = 0
do
name = baseName + num2istr(row)
Make/O/D/N=(numColumns) $name
// Store matrix data in the output wave.
Wave w = $name
w = mat[row][p]
row += 1
while (row <= numRows-1)
End
Function LoadNeutronData(pathName, fileName)
String pathName // Name of path or "" for dialog.
String fileName // Name of file or "" for dialog.
Variable linesToSkip = 5
Variable linesToLoad = 0 // Number of lines to load or 0 for auto (load all lines).
Variable columnsToSkip = 2 // Number of columns to skip. Skips date/time data and unit number.
Variable columnsToLoad = 0 // Number of columns to load or 0 for auto (load all columns).
String baseName ="data" // Base name to use for new waves.
Variable makeTable = 1 // 1 == make a table showing new waves.
LoadWave/Q/J/M/D/A=tempLoadRowDataMatrix/P=$pathName/K=0/L={0,linesToSkip,linesToLoad,columnsToSkip,columnsToLoad} /V={" ", "", 0, 1} fileName
if (V_flag == 0)
return -1 // Probably user cancelled
endif
ExtractRowDataInto1DWaves(tempLoadRowDataMatrix0, baseName)
Variable numRows = DimSize(tempLoadRowDataMatrix0, 0)
KillWaves tempLoadRowDataMatrix0
if ((makeTable==1) %& (numRows>0))
Variable row = 0
String name
Edit
do
name = baseName + num2istr(row)
AppendToTable $name
row += 1
while (row <= numRows-1)
endif
return 0
End
April 28, 2011 at 01:33 pm - Permalink
One thing is that the delimiter is actually a tab rather than a space. Is that a problem?
I've checked the trailing spaces: it is zero.
hrodstein: With your version, I get the error message shown in the -0 image.
741: Your version goes through the motion, but it doesn't load the data in the file (-1 image).
For now, I'm going to take a break. I'm using emacs and sed to convert the files to formats I can read in with Igor, and I'll return to this problem later. Thank you two for your efforts -- I'll report back if I make any progress.
April 29, 2011 at 06:06 am - Permalink
Possibly. The example you posted in your first post loaded fine with my code, but it had spaces. Howard's code might be failing for the same reason.
This should deal with tabs:
"Load Neutron", /Q, LoadNeutron()
End
constant kNeutronHeaderLength = 5 // number of lines to skip at the beginning of the file
constant knEntriesToSkip = 2 // number of entries to skip at the beginning of a line
// entries are separated by spaces, so "2011-04-26_19:34:39.342_UTC"
// is a single entry
constant knTrailingSpaces = 0 // IMPORTANT: set this the number of trailing tabs at the end of the line
Function LoadNeutron()
variable refNum
// this selects the file but does not open it
Open /R/D /F="All Files:.*;" refNum
if (strlen(S_fileName) == 0)
// user cancel
return 0
endif
// open the file
Open /R refNum as S_fileName
string lineContent
variable i
// the first lines are header only, skip these
for (i = 0; i < kNeutronHeaderLength; i+=1)
FReadLine refNum, lineContent
endfor
// the main loop
// read all lines separately. Each line is a full wave
variable nValues, j
string outputWaveName, strValue
for (i = 0; ; i += 1)
FReadLine refNum, lineContent
if (strlen(lineContent) == 0)
// no more data to be read
return 0
endif
nValues = ItemsInList(lineContent, "\t") - knEntriesToSkip - knTrailingSpaces
outputWaveName = "W_Neutron" + num2str(i)
Make /O/N=(nValues) /D $outputWaveName
wave output = $outputWaveName
// get out the numbers
for (j = 0; j < nValues; j+=1)
strValue = StringFromList(j + knEntriesToSkip, lineContent, "\t")
output[j] = str2num(strValue)
endfor
endfor
Close refNum
End
Notice that all I've done is change " " to "\t" in the calls to StringFromList and ItemsInList.
If this doesn't work either then you might want to include an actual data file.
April 29, 2011 at 07:07 am - Permalink
And we have a winner... :-)
I can't explain why the tabs got changed to spaces when I pasted the extract into my original post.
So cool, thanks a lot!
Have a nice weekend!
April 29, 2011 at 08:45 am - Permalink
I did assume that the delimiter was a space since that is what was in your original post.
If you want to send a sample file to support@wavemetrics.com, I will see what is going on. If so, please zip it to prevent line wrapping by the email process.
April 30, 2011 at 10:59 am - Permalink
"Load Neutron Data . . .", LoadNeutronData("", "")
End
static Function ExtractRowDataInto1DWaves(mat, baseName)
Wave mat // Matrix containing row-oriented data
String baseName // Base name of output waves
Variable numRows, numColumns
Variable row
String name
numRows = DimSize(mat, 0)
numColumns = DimSize(mat, 1)
row = 0
do
name = baseName + num2istr(row)
Make/O/D/N=(numColumns) $name
// Store matrix data in the output wave.
Wave w = $name
w = mat[row][p]
row += 1
while (row <= numRows-1)
End
Function LoadNeutronData(pathName, fileName)
String pathName // Name of path or "" for dialog.
String fileName // Name of file or "" for dialog.
Variable linesToSkip = 5
Variable linesToLoad = 0 // Number of lines to load or 0 for auto (load all lines).
Variable columnsToSkip = 2 // Number of columns to skip. Skips date/time data and unit number.
Variable columnsToLoad = 0 // Number of columns to load or 0 for auto (load all columns).
String baseName ="data" // Base name to use for new waves.
Variable makeTable = 1 // 1 == make a table showing new waves.
LoadWave/Q/J/M/D/A=tempLoadRowDataMatrix/P=$pathName/K=0/L={0,linesToSkip,linesToLoad,columnsToSkip,columnsToLoad} fileName
if (V_flag == 0)
return -1 // Probably user cancelled
endif
ExtractRowDataInto1DWaves(tempLoadRowDataMatrix0, baseName)
Variable numRows = DimSize(tempLoadRowDataMatrix0, 0)
KillWaves tempLoadRowDataMatrix0
if ((makeTable==1) %& (numRows>0))
Variable row = 0
String name
Edit
do
name = baseName + num2istr(row)
AppendToTable $name
row += 1
while (row <= numRows-1)
endif
return 0
End
May 2, 2011 at 08:06 am - Permalink