
Load data in lines

neutron
I'm working on a project where data is generated for many different systems and while I have been able to manage with most of the files, this one format (shown below) has me stumped. I wasn't involved in the format schemes and have no control over them. I would be extremely grateful if someone here would help me figure out how to read in this data!
- There are 5 header lines I would like to ignore, except that I do want to take the run number (line 3) from it to use as a prefix for the wavenames
- Each line after the header is the equivalent of one wave, starting with
-- the date/time in the format yyyy-mm-dd_hh:mm:ss.ddd_UTC
-- the "unit" number, i.e. 0, 1 etc
-- the actual data values
# AcqX
# 2011-04-22_12:36:13.390_UTC
# Run 33
# Frequency data (New software)
# Date_Time Iteration Data (4000 points at 100.000000Hz)... Before Precession = 52 s. Precession duration = 40 s.
2011-04-26_19:34:39.342_UTC 0 33162 32345 31586 31055 30877 31092 31653 32426 33239 33899 34257 34231 33826 33135 32319 31563 31045 30879 31107 31677 .....
2011-04-26_19:36:15.130_UTC 1 34111 34747 34889 34499 33670 32589 31510 30681 30289 30428 31062 32047 33155 34131 34749 34871 34471 33633 32559 31490 .....
Many thanks!
Here's one solution:
- Open the file with
Open
.- Set up a for loop that calls
FReadLine refNum, lineContents
and breaks when FReadLine returns the empty string (lineContents is the name of a string variable that you declare). Do nothing with the first 5 lines.- Inside the for loop, get the number of entries in each line by calling
ItemsInList(lineContents, " ")
. Subtract 2 from this to avoid counting the two header entries (or you can parse them if you like).- Allocate a wave with the appropriate number of points.
- Then set up another for loop inside the first one that loops over all the items. Use
StringFromList(j, lineContents, " ")
, where 'j' is the second loop counter. Note that this is not very efficient but it should get the job done. For increased speed you can try looping over the string directly, though whether it's faster or not is hard to predict. For each item callstr2num
and store it in the wave.- Repeat this for all the lines in the file, each time making a new wave. Don't forget to close the file at the end using
Close
.For my own convenience I'm assuming that you have some experience with Igor programming, and that these instructions make sense. If not then just let us know.
April 27, 2011 at 09:21 am - Permalink
NOTE: You will need to set the variable numColumns to the right number. It needs to be set to the number of data columns plus five. If this number varies from file to file then the function will need to be modified to count the number of data columns.
April 27, 2011 at 11:07 am - Permalink
I just tested the code from you, hrodstein, and I've realized that I didn't describe the situation properly.
All the points in an individual line (about 4000 of them in this case) make up one wave. In the extract I showed, there are then 2 waves, each with 20 points in the excerpt (4000 in reality). For example, wave 0 started at 2011-04-26_19:34:39.342_UTC, and has data points
33162
32345
31586
..
..
My apologies for not being clearer. Ideally I would try 741's suggestion on my own, but I'm a little pressed for time. :-(
April 28, 2011 at 02:34 am - Permalink
There are a few gotcha's involved since this is a very impromptu parsing approach.
- I'm ignoring the first two entries on a line, are those important for you?
- If there are trailing spaces after the last value of each line, they will mess up the counting of the number of values. To address this, check the number of trailing spaces on the data lines in a real file (likely zero or one) and change
constant knTrailingSpaces = 0
to the correct value.- The function does not check if there are any previously loaded W_Neutron waves, it just makes the required number and overwrites any that may exist. This is important not only because data could be lost, but also because the following could happen: say you load a file with 10 datasets, and then you load one with 5 datasets. The first load will create W_Neutron1 through 10, and the second will overwrite W_Neutron1 through 5, but leave 6 through 10 untouched. This might fool you into thinking that the second file contained 10 datasets.
All of this can be addressed by making the function a bit more clever, but this is just the bare minimum.
April 28, 2011 at 08:59 am - Permalink
This skips the date/time data and loads the rest of each row into a wave.
April 28, 2011 at 01:33 pm - Permalink
One thing is that the delimiter is actually a tab rather than a space. Is that a problem?
I've checked the trailing spaces: it is zero.
hrodstein: With your version, I get the error message shown in the -0 image.
741: Your version goes through the motion, but it doesn't load the data in the file (-1 image).
For now, I'm going to take a break. I'm using emacs and sed to convert the files to formats I can read in with Igor, and I'll return to this problem later. Thank you two for your efforts -- I'll report back if I make any progress.
April 29, 2011 at 06:06 am - Permalink
Possibly. The example you posted in your first post loaded fine with my code, but it had spaces. Howard's code might be failing for the same reason.
This should deal with tabs:
Notice that all I've done is change " " to "\t" in the calls to StringFromList and ItemsInList.
If this doesn't work either then you might want to include an actual data file.
April 29, 2011 at 07:07 am - Permalink
And we have a winner... :-)
I can't explain why the tabs got changed to spaces when I pasted the extract into my original post.
So cool, thanks a lot!
Have a nice weekend!
April 29, 2011 at 08:45 am - Permalink
I did assume that the delimiter was a space since that is what was in your original post.
If you want to send a sample file to support@wavemetrics.com, I will see what is going on. If so, please zip it to prevent line wrapping by the email process.
April 30, 2011 at 10:59 am - Permalink
May 2, 2011 at 08:06 am - Permalink