Importing Data from DFT Calculations

#pragma TextEncoding = "UTF-8" #pragma rtGlobals=3 // Use modern global access method and strict wave access. Function LoadDFT(DFTdata,pathName) String DFTdata //Desired name of file String pathName //Symbolic path where desired file is present String DFTFolder=GetDataFolder(1) String foldername= "root:"+RemoveEnding(DFTData,".out") //Names folder by taking the file name and removing the .out, necessary for proper file parsing String DFTData2=RemoveEnding(DFTData,".out") //Names the files by taking the file name and removing the .out ending String columnInfoStr = " " //Contains set of names for each column in the .out file columnInfoStr += "C=1,F=0,W=3,N='_skip_';" columnInfoStr += "C=1,F=0,W=11,N=EnergyH_"+DFTdata2+";" columnInfoStr += "C=1,F=0,W=11,N=OS_"+DFTdata2+";" columnInfoStr += "C=1,F=0,W=11,N=TDMx_"+DFTdata2+";" columnInfoStr += "C=1,F=0,W=11,N=TDMy_"+DFTdata2+";" columnInfoStr += "C=1,F=0,W=11,N=TDMz_"+DFTdata2+";" columnInfoStr += "C=1,F=0,W=18,N='_skip_';" columnInfoStr += "C=1,F=0,W=18,N='_skip_';" NewDataFolder/O/S $foldername //Makes a data folder based print DFTData //prints the loaded files LoadWave/J/B=columnInfoStr/D/W/E=0/K=0/V={"\t, "," $",1,1}/F={6,1,0}/N/O/Q/P=$pathName DFTData End

hrodstein

Here is an example of how you can determine the first line containing data:
http://www.igorexchange.com/node/4856

June 1, 2018 at 08:36 am - Permalink

thomas_braun

Do you know the exact line were loading should start? If yes you can use /L from LoadWave. If no you can try to find it via Open/FReadLine etc. If FreadLine is too slow and the file fits in memory you can also use FBinRead to read it into one shot.

June 1, 2018 at 08:41 am - Permalink

tony

So what you need to know is the start (and perhaps end) position of the data in the file.

You could use Open to open the file, FReadline in a loop with a counter to find the start position by text comparison, Close to close the file, then construct a LoadWave command to skip the unneeded lines and load the data.

Alternatively, Grep can be used to find a text marker in the file and return the line through V_startParagraph.

June 1, 2018 at 08:46 am - Permalink

vmmr5596

Hmmm, so I tried to modify hrodstein's example for my data set, but I keep getting the message "No data found in file". For some reason, it's not finding the keyword I specify. I've tried using different keywords but none are being registered/found. I think the problem might be due to the buffer or the text strings. Any thoughts?

#pragma rtGlobals=3     // Use modern global access method and strict wave access.
 
Function FindFirstDataLine(pathName, filePath)
    String pathName     // Name of symbolic path or ""
    String filePath         // Name of file or partial path relative to symbolic path.
 
    Variable refNum
 
    Open/R/P=$pathName refNum as filePath
    
    String buffer, text
    Variable line = 0
 
    do
        FReadLine refNum, buffer
        if (strlen(buffer) == 0)
            Close refNum
            //print "Can't find keyword"
            return -1                       // The expected keyword was not found in the file
        endif
        text = buffer[0,1]
        if (CmpStr(text," # ") == 0)        
            Close refNum
            return line + 1                 // Success: The next line is the first data line.
            print "Success!"
        endif
        line += 1
    while(1)
 
    return -1       // We will never get here
End
 
Function LoadDataFile(pathName, filePath, extension)
    String pathName     // Name of symbolic path or "" to display dialog.
    String filePath         // Name of file or "" to display dialog. Can also be full or partial path relative to symbolic path.
    String extension            // e.g., ".dat" for .dat files. "????" for all files.
 
    Variable refNum
 
    // Possibly display Open File dialog.
    if ((strlen(pathName)==0) || (strlen(filePath)==0))
        Open /D /R /P=$pathName /T=(extension) refNum as filePath
        filePath = S_fileName           // S_fileName is set by Open/D
        if (strlen(filePath) == 0)      // User cancelled?
            return -1
        endif
        // filePath is now a full path to the file.
    endif
 
    Variable firstDataLine = FindFirstDataLine(pathName, filePath)
 
    if (firstDataLine < 0)
        Printf "No data found in file %s\r", filePath
        return -1
    endif
 
    LoadWave /J /D /O /E=1 /K=0 /L={0,firstDataLine,1000,2,5} /P=$pathName filePath
 
    return 0
End

June 1, 2018 at 04:49 pm - Permalink

hrodstein

The first problem is that this:

text = buffer[0,1]

needs to be changed to this:

text = buffer[0,2]

since you are comparing to " # " (your target string) which is three bytes.

The next problem is that your target string appears in this line (line 2021, zero-based), which is before your data:

 Core hole found (by occ.) in alpha space, orbital #   6

I changed your FindFirstDataLine function to add this:

String targetString = " #   1"
Variable targetStringLength = strlen(targetString)

Then I changed this:

return line+1                   // Success: The next line is is the first data line.

to this:

// Print line   // For debugging only
return line                 // Success: This is is the first data line.

Now it prints the correct line number: 2032 (zero-based)

The next problem is that your file is space-delimited and the LoadWave operation defaults to comma and tab as delimiters. I fixed this by adding a /V flag:

LoadWave /J /D /O /E=1 /K=0 /L={0,firstDataLine,1000,2,5} /V={" ", "", 0, 0} /P=$pathName filePath

With that, it seems to do the right thing. That is, it loads this:


1	287.0055	0.007305	0.014191	0.02894
...

Your next task is to change FindFirstDataLine to FindFirstAndLastDataLines.

But first, there is another problem. Lines 999 and 1000 of the data looks like this:


 # 999   377.5523  0.000006  -0.000367   0.000177  -0.000682      0.0000      337.5637
 #1000   377.7301  0.000005  -0.000327   0.000165  -0.000623      0.0000      151.3746

Because space is a delimiter, and there is no space after the # character in line 1000, that line appears to LoadWave to have one fewer column than line 999. This causes the wrong data to be loaded starting at line 1000. I will give some thought to how to fix this.

June 1, 2018 at 07:13 pm - Permalink

hrodstein

Your file is actually a FORTRAN-style fixed field file and so can be loaded using LoadWave/F instead of LoadWave/J.

I made this change. It also requires using the /B flag to specify the width in bytes of each column of data. /B also lets you name each column and skip whatever columns you don't want to load.

I also morphed FindFirstDataLine into FindFirstLineAndNumLines.

The result successfully loads your example file:

#pragma TextEncoding = "UTF-8"
#pragma rtGlobals=3     // Use modern global access method and strict wave access.
 
Function FindFirstLineAndNumLines(pathName, filePath, firstDataLine, numDataLines)
    String pathName     // Name of symbolic path or ""
    String filePath         // Name of file or partial path relative to symbolic path
    Variable &firstDataLine // Pass-by-reference output
    Variable &numDataLines  // Pass-by-reference output
 
    firstDataLine = -1
    numDataLines = -1
 
    Variable refNum
 
    Open/R/P=$pathName refNum as filePath
 
    String buffer, text
    Variable line = 0
    
    String targetString = " #   1"
    Variable targetStringLength = strlen(targetString)
    
    // Find first line
    do
        FReadLine refNum, buffer
        if (strlen(buffer) == 0)
            Close refNum
            return -1                       // The expected keyword was not found in the file
        endif
        text = buffer[0,targetStringLength-1]
        if (CmpStr(text,targetString) == 0) 
            firstDataLine = line
            break                           // This is is the first data line
        endif
        line += 1
    while(1)
    
    // Find last line
    targetString = " #"
    targetStringLength = strlen(targetString)
    do
        FReadLine refNum, buffer
        if (strlen(buffer) == 0)
            // Ran out of lines - assume this is the last line of data
            line += 1
            break
        endif
        text = buffer[0,targetStringLength-1]
        if (CmpStr(text,targetString) != 0) // Line does not start with "<space>#>?
            // This is is the line after the last data line
            break   
        endif
        line += 1
    while(1)
    
    numDataLines = line - firstDataLine + 1
    
    // Print firstDataLine, numDataLines        // For debugging only
 
    Close refNum
 
    return 0        // Success
End
 
Function LoadDataFile(pathName, filePath, extension)
    String pathName     // Name of symbolic path or "" to display dialog.
    String filePath         // Name of file or "" to display dialog. Can also be full or partial path relative to symbolic path.
    String extension            // e.g., ".dat" for .dat files. "????" for all files.
 
    Variable refNum
 
    // Possibly display Open File dialog.
    if ((strlen(pathName)==0) || (strlen(filePath)==0))
        Open /D /R /P=$pathName /T=(extension) refNum as filePath
        filePath = S_fileName           // S_fileName is set by Open/D
        if (strlen(filePath) == 0)      // User cancelled?
            return -1
        endif
        // filePath is now a full path to the file.
    endif
 
    Variable firstDataLine, numLines 
    Variable result = FindFirstLineAndNumLines(pathName, filePath, firstDataLine, numLines)
    if (result != 0)
        Printf "No data found in file %s\r", filePath
        return -1
    endif
 
    // Example Data:
    // #   1   287.0055  0.007305   0.014191   0.028940   0.000089      0.0000       72.7191
    // # 999   377.5523  0.000006  -0.000367   0.000177  -0.000682      0.0000      337.5637
    // #1000   377.7301  0.000005  -0.000327   0.000165  -0.000623      0.0000      151.3746
    
    String columnInfoStr = ""       // Prepare parameter for /B flag
    columnInfoStr += "N='_skip_',W=2;"
    columnInfoStr += "N='Column1',W=4;"
    columnInfoStr += "N='Column2',W=11;"
    columnInfoStr += "N='Column3',W=10;"
    columnInfoStr += "N='Column4',W=11;"
    columnInfoStr += "N='Column5',W=11;"
    columnInfoStr += "N='_skip_',W=11;"
    columnInfoStr += "N='_skip_',W=12;"
    columnInfoStr += "N='_skip_',W=14;"
    
    LoadWave /F={9, 11, 0} /B=columnInfoStr /D /O /E=1 /K=0 /L={0,firstDataLine,numLines,0,0} /P=$pathName filePath
 
    return 0
End
 
Function Test()
    LoadDataFile("home", "Sample.txt", ".txt")
End

June 1, 2018 at 09:38 pm - Permalink

vmmr5596

That worked beautifully! Thank you very much hrodstein for the detailed explanation.

Best wishes,

Vic

June 4, 2018 at 03:29 pm - Permalink