extracting data from a wave to make a new wave

I recently noticed a loading problem with data from a .txt file due to inconsistent spacing in the file. By loading the .txt data into one wave, I was able to load all the data correctly. My problem now is getting the data I want out of this wave. The .txt file contains 4 columns of data, and I need the fourth column of numbers only. I am loading this file as a text wave. A lab mate suggested I use stringbykey, but I haven't figured out how this operator can be incorporated into a useful function. Any ideas?

Here are the functions I'm writing in total. The first loads peak areas from several chromatograms and the second loads the concentrations used for instrument calibration purposes. The idea is to match the correct peak areas with concentration values for each ion by wave name and ultimately generate linear calibration curves for them all - of which i am only interested in the fitted curve's equation.

Thanks in advance for any help.
#pragma rtGlobals=3     // Use modern global access method and strict wave access.
Function load(pathName, filePath)
    string pathName, filePath
   
    if (strlen(filePath) == 0)
        Open/D/R/P=$pathName refNum as filePath
        if (strlen(S_fileName) == 0)
            return -1
        endif
        filePath = S_fileName
    endif
   
    LoadWave/J/D/O/K=0/A/B="N='_skip_'; N='_skip_'; N='_skip_'; N='Sig1';"/P=$pathName filePath
    wave/T Sig1
    make/o/N=(numPnts(Sig1)) PeakArea
    PeakArea = str2num(Sig1[p])
    MatrixOp/o PeakArea = replace(PeakArea, NaN, 0)
    edit PeakArea
   
    variable index
    string name
    make/o/N=10 tempwave
   
    for(index = 0; index<12; index+=1)
        name = "PA" + num2str(index)
        extract/o PeakArea, tempwave, p>=14*index && p<=14*index + 13
        duplicate tempwave, $name
    endfor
   
    killwaves tempwave
   
    make/o/N=10 PA0, PA1, PA2, PA3, PA4, PA5, PA6, PA7, PA8, PA9, PA10, PA11
    appendtotable PA0, PA1, PA2, PA3, PA4, PA5, PA6, PA7, PA8, PA9, PA10, PA11
   
    load2("","")
   
End

Function load2(pathName, filePath)
    string pathName, filePath
   
    if (strlen(filePath) == 0)
        Open/D/R/P=$pathName refNumas filePath
        if (strlen(S_fileName) == 0)
            return -1
        endif
        filePath = S_fileName
    endif
   
    LoadWave/J/D/O/K=1/A/B="N='Concentration';"/P=$pathName filePath
    wave Concentration
    appendtotable Concentration
   
    variable index
    string name
    make/o/N=10 tempwave
   
    for(index=0; index<12; index+=1)
        name = "Conc" + num2str(index)
        extract/o Concentration, tempwave, p>=10*index && p<= 10*index + 9
        duplicate tempwave, $name
        appendtotable $name
    endfor
   
    killwaves tempwave
   
End
I think we'll need to have you upload your data file in order to figure out what you're up against.

Also, bracket your code with <igor> and </igor> tags.

--Jim Prouty
Software Engineer, WaveMetrics, Inc.
The data looks like this:

1 water n.a. n.a. 2 water n.a. n.a. 3 A1 4.794 0.0080 4 A2 4.797 0.0308 5 A3 4.800 0.0620 6 A4 4.800 0.1190 7 A5 4.800 0.1454 8 A6 4.800 0.5873 9 A7 4.797 1.2083 10 A8 4.797 2.5246 11 snow 4.800 0.3649 12 snow 4.800 0.3662 13 water n.a. n.a. 14 water n.a. n.a. 1 water n.a. n.a. 2 water n.a. n.a. etc...

Loading the wave as text is reasonable given the unusual file formatting.
Loading only column 4 seems to work fine:
    LoadWave/J/D/O/K=0/A/B="N='_skip_'; N='_skip_'; N='_skip_'; N='Sig1';"/P=$pathName filePath


You then replace the n.a. values with 0 (which is a little questionable).

These questions:

A lab mate suggested I use stringbykey, but I haven't figured out how this operator can be incorporated into a useful function. Any ideas?

Here are the functions I'm writing in total. The first loads peak areas from several chromatograms and the second loads the concentrations used for instrument calibration purposes. The idea is to match the correct peak areas with concentration values for each ion by wave name and ultimately generate linear calibration curves for them all - of which i am only interested in the fitted curve's equation.


do not make any sense to me. I may need more coffee.

--Jim Prouty
Software Engineer, WaveMetrics, Inc.
yes, that was my thought originally ;)
however, while playing with the data in a table, i realized that some of the values were not being loaded. for example,
10 C8 9.940 15.7039
In that line of data, the fourth column remains blank when loaded by my program.
Since i can't be sure if the spacing in the txt files will be uniform and can't double check for errors when the deluge of data arrives, this problem seems to be a serious concern.
I did some backtracking by loading the waves in a different manner, and found that loading the txt into one wave works. all the data is there.
That's why I'm asking how to pull a wave from a wave. Does that make my problem clearer?
Thanks!
appaloosahotdog wrote:

That's why I'm asking how to pull a wave from a wave. Does that make my problem clearer?
Thanks!


If you are asking how to extract the 4th column from a 2D text wave into a 1-column ("1-D") numeric wave, then that I can answer:


Function Demo()
    Make/O/N=(20,4)/T tw= num2str(enoise(1))
    Wave fourthColumn =  ExtractTextColumnAsNumericWave(tw, 3, "FourthColumnWave")
End
Function/WAVE ExtractTextColumnAsNumericWave(twoDWave, column, usingName)
    WAVE/T twoDWave
    Variable column
    String usingName

    Variable rows= DimSize(twoDWave,0)
    Make/O/N=(rows) $usingName/WAVE=w
    w= str2num(twoDWave[p][column]) // this is a "Waveform Assignment" statement
    return w
End


--Jim Prouty
Software Engineer, WaveMetrics, Inc.
appaloosahotdog wrote:
yes, that was my thought originally ;)
however, while playing with the data in a table, i realized that some of the values were not being loaded. for example,
10 C8 9.940 15.7039
In that line of data, the fourth column remains blank when loaded by my program.
Since i can't be sure if the spacing in the txt files will be uniform and can't double check for errors when the deluge of data arrives, this problem seems to be a serious concern.
I did some backtracking by loading the waves in a different manner, and found that loading the txt into one wave works. all the data is there.
That's why I'm asking how to pull a wave from a wave. Does that make my problem clearer?
Thanks!


The 'problem' with your data is that some lines contain a fourth tab character, making these lines essentially 5 columns in width. The loading command interprets the data as four columns (probably concluding from the structure of the first line) and then never looks again and just loads the "emptiness" after the 1st tab character, ignoring the data which comes afterwards. That's why loading into one column works (I guess you use something like LoadWave/J/O/K=2/A/V={"", "",0,0}). If you come that far, all that's left to do is to get rid of the additional tab character and reinterpret the data, like this:
Function Extract4thColumn(w)
    wave/T w
    w = ReplaceString("\t\t",w[p], "\t")        // kill 2nd tab
    make/d/o/n=(DimSize(w,0)) Fourthcolumn
    Fourthcolumn = str2num(StringFromList(3, w[p], "\t"))       // get the data
End

Hope that helps.
The "random" occurrence of the 4th tab does make this a problem. While this may not be the most efficient method, you could read 1 line at a time, taking the last entry (treating each row as a list with a tab separator).

function test()

    variable refnum
    open/R  refnum  //modify to take path if you don't want the dialog pop-up
    string str
    variable i=0
    make/t/o/n=(400000) wave0   //make wave extra large to start to ensure enough entries.
                                                    //alternative is to check number of rows first and make the wave to the proper size
    do
        freadline refnum, str
        wave0[i]= stringfromlist(itemsinlist(str, "\t")-1, str, "\t")   //extract last entry for row i
        fstatus refnum
        i+=1
    while (v_filepos<v_logeof)      //stop when we reach end of file
    close refnum
   
    redimension/n=(i) wave0      //delete extra entries from initially large array
    //add code here to change "n.a" from text file to NaN
    //modify as needed to accept file path and name wave properly
end
Hi guys,

I've succeeded in loading the txt file into a single wave with this code:

LoadWave/J/D/E=1/K=2/A/V={""," $",0,0}/L={0,0,0,0,1}/P=$pathName filePath
    wave wave0


But neither of the functions you've suggested extract the fourth column of data from wave0.

attached is the full code using jim's suggestion:

Function load(pathName, filePath)
    string pathName, filePath
   
    if (strlen(filePath) == 0)
        Open/D/R/P=$pathName refNum as filePath
        if (strlen(S_fileName) == 0)
            return -1
        endif
        filePath = S_fileName
    endif
   
    LoadWave/J/D/E=1/K=2/A/V={""," $",0,0}/L={0,0,0,0,1}/P=$pathName filePath
    wave wave0
    ExtractTextColumnAsNumericWave(wave0, 4, "Sig1")
   
    wave/T Sig1
    make/o/N=(numPnts(Sig1)) PeakArea
    PeakArea = str2num(Sig1[p])
    MatrixOp/o PeakArea = replace(PeakArea, NaN, 0)
    edit PeakArea
   
    variable index
    string name
    make/o/N=10 tempwave
   
    for(index = 0; index<12; index+=1)
        name = "PA" + num2str(index)
        extract/o PeakArea, tempwave, p>=14*index && p<=14*index + 13
        duplicate tempwave, $name
    endfor
   
    killwaves tempwave
   
    make/o/N=10 PA0, PA1, PA2, PA3, PA4, PA5, PA6, PA7, PA8, PA9, PA10, PA11
    appendtotable PA0, PA1, PA2, PA3, PA4, PA5, PA6, PA7, PA8, PA9, PA10, PA11
   
    load2("","")
   
End

Function load2(pathName, filePath)
    string pathName, filePath
   
    if (strlen(filePath) == 0)
        Open/D/R/P=$pathName refNumas filePath
        if (strlen(S_fileName) == 0)
            return -1
        endif
        filePath = S_fileName
    endif
   
    LoadWave/J/D/O/K=1/A/B="N='Concentration';"/P=$pathName filePath
    wave Concentration
    appendtotable Concentration
   
    variable index
    string name
    make/o/N=10 tempwave
   
    for(index=0; index<12; index+=1)
        name = "Conc" + num2str(index)
        extract/o Concentration, tempwave, p>=10*index && p<= 10*index + 9
        duplicate tempwave, $name
        appendtotable $name
    endfor
   
    killwaves tempwave
   
End



Function/WAVE ExtractTextColumnAsNumericWave(twoDWave, column, usingName)
    WAVE/T twoDWave
    Variable column
    String usingName
 
    Variable rows= DimSize(twoDWave,0)
    Make/O/N=(rows) $usingName/WAVE=w
    w= str2num(twoDWave[p][column]) // this is a "Waveform Assignment" statement
    return w
End
Chozo, I've tried your suggestion as well and been unable to make it work:

#pragma rtGlobals=3     // Use modern global access method and strict wave access.
Function load(pathName, filePath)
    string pathName, filePath
   
    if (strlen(filePath) == 0)
        Open/D/R/P=$pathName refNum as filePath
        if (strlen(S_fileName) == 0)
            return -1
        endif
        filePath = S_fileName
    endif
   
    LoadWave/J/D/E=1/K=2/A/V={""," $",0,0}/L={0,0,0,0,1}/P=$pathName filePath
    wave wave0
    Extract4thColumn(wave0)
   
    wave/T Sig1
    make/o/N=(numPnts(Sig1)) PeakArea
    PeakArea = str2num(Sig1[p])
    MatrixOp/o PeakArea = replace(PeakArea, NaN, 0)
    edit PeakArea
   
    variable index
    string name
    make/o/N=10 tempwave
   
    for(index = 0; index<12; index+=1)
        name = "PA" + num2str(index)
        extract/o PeakArea, tempwave, p>=14*index && p<=14*index + 13
        duplicate tempwave, $name
    endfor
   
    killwaves tempwave
   
    make/o/N=10 PA0, PA1, PA2, PA3, PA4, PA5, PA6, PA7, PA8, PA9, PA10, PA11
    appendtotable PA0, PA1, PA2, PA3, PA4, PA5, PA6, PA7, PA8, PA9, PA10, PA11
   
    load2("","")
   
End

Function load2(pathName, filePath)
    string pathName, filePath
   
    if (strlen(filePath) == 0)
        Open/D/R/P=$pathName refNumas filePath
        if (strlen(S_fileName) == 0)
            return -1
        endif
        filePath = S_fileName
    endif
   
    LoadWave/J/D/O/K=1/A/B="N='Concentration';"/P=$pathName filePath
    wave Concentration
    appendtotable Concentration
   
    variable index
    string name
    make/o/N=10 tempwave
   
    for(index=0; index<12; index+=1)
        name = "Conc" + num2str(index)
        extract/o Concentration, tempwave, p>=10*index && p<= 10*index + 9
        duplicate tempwave, $name
        appendtotable $name
    endfor
   
    killwaves tempwave
   
End



Function Extract4thColumn(w)
    wave/T w
    w = ReplaceString("\t\t",w[p], "\t")
    make/d/o/n=(DimSize(w,0)) Sig1
    Sig1 = str2num(StringFromList(3, w[p], "\t"))
End
I think there is a much easier way to do this. Tell LoadWave that the fourth column (column 3 in Igor-zero-based terms) is numeric:
Function Test2()
    String columnInfo = "N='_skip_';N='_skip_';N='_skip_';N='Sig1',T=4;N='_skip_';"
    LoadWave/J/D/O/A/B=columnInfo /P=$pathName filePath
End


I had to remove the /K flag and add ",T=4" to the /B flag.

I also added an extra "skip" at the end so LoadWave will ignore the extra tabs found on some lines.

This gives you a numeric output wave.

The "n.a" entries load as NaN. If you want to replace NaN with zero, add this:
    String name = StringFromList(0, S_waveNames)        // Name of wave
    Wave w = $name                      // Wave reference
    w = NumType(w)==2 ? 0:w         // Replace NaN with zero

appaloosahotdog wrote:
hrodstein, this method still loads point 163 as value 0 instead of 15.7039.

That's because that line of text in the file has two tabs before column 3 instead of 1. I thought the extra tabs were all at the end of a given line.

In this case I would create a filtered version of the file by replacing tab-tab with tab. For an example of creating a filtered file and loading the filtered file, see:
http://www.igorexchange.com/node/1221

Your FixText routine would look like this:
//  FixText(textIn)
//  Returns cleaned up version of text by replacing <tab><tab> with <tab>
static Function/S FixText(textIn)
    String textIn
 
    String textOut = ReplaceString("\t\t", textIn, "\t")
    return textOut
End


While you are at it you might as well remove the stray tabs at the end of some lines:
//  FixText(textIn)
//  Returns cleaned up version of text by replacing <tab><tab> with <tab>.
//  Also removes a single stray tab at the end of any line.
static Function/S FixText(textIn)
    String textIn
 
    // Replace tab-tab with tab
    String textOut = ReplaceString("\t\t", textIn, "\t")
   
    // Get rid of stray tab at the end of a line
    textOut = ReplaceString("\t\r", textOut, "\r")      // Handles tab-CR
    textOut = ReplaceString("\t\n", textOut, "\n")      // Handles tab-LF
   
    return textOut
End


appaloosahotdog wrote:
Chozo, I've tried your suggestion as well and been unable to make it work:
...


I took your data and used the loading command, then applied my function and it is working just fine. I guess there is something not working with the implementation. For example, here:
...
    LoadWave/J/D/E=1/K=2/A/V={""," $",0,0}/L={0,0,0,0,1}/P=$pathName filePath
    wave/T wave0    // you loaded a text wave, so you need the /T flag here. Also my function eats only text waves
    Extract4thColumn(wave0)
 
    wave Sig1   // the output is a numerical wave, so no need for a /T flag here
    //make/o/N=(numPnts(Sig1)) PeakArea // you don't need this part,  ...
    //PeakArea = str2num(Sig1[p])   //... because it was already done in the function Extract4thColumn
    // (you can work with Sig1 directly from here on)
...


Note also that the 'n.a.' rows turn out as NaNs in the numerical wave obviously. You can convert these later to zeros if you wish.