Importing Data from DFT Calculations
vmmr5596
I'm trying to import data from DFT calculations I've been running. The problem is that the data that I'm trying to import is not always located in the same line and the files are massive. The beginning of the data I want can be identified by a couple of keywords (i.e. Rigid spectral shift, NEXAFS, etc) as shown below from an excerpt of the original file:
Xray absorption (XAS, NEXAFS) calculation
Core hole found (by occ.) in alpha space, orbital # 6
Core hole located at center C1
Orbital energy core hole = -10.69511 H ( -291.03103 eV)
Rigid spectral shift = 0.00000 eV
Ionization potential = 291.03103 eV
Core -> unocc. excitations (X-ray absorption), dipole only :
E (eV) OSCL oslx osly oslz osc(r2)
---------------------------------------------------------------------------------------
# 1 287.0055 0.007305 0.014191 0.028940 0.000089 0.0000 72.7191
# 2 287.9669 0.000221 0.002439 0.005033 0.000011 0.0000 126.2999
# 3 288.4817 0.004180 -0.010701 -0.021838 -0.000099 0.0000 147.8532
# 4 289.1079 0.000177 0.002519 0.004267 -0.000622 0.0000 128.5743
My current workaround involves copying and pasting the portion of the data I want (everything in the first 6 columns below the dotted line) into a separate text file and importing that through IGOR, this is however, tedious since the original files are extremely large (can be in the millions of lines) and would like to automate the process if possible(example below):
E (eV) OSCL oslx osly oslz osc(r2)
---------------------------------------------------------------------------------------
1 287.0055 0.007305 0.014191 0.028940 0.000089 0.0000 72.7191
2 287.9669 0.000221 0.002439 0.005033 0.000011 0.0000 126.2999
3 288.4817 0.004180 -0.010701 -0.021838 -0.000099 0.0000 147.8532
4 289.1079 0.000177 0.002519 0.004267 -0.000622 0.0000 128.5743
Out of those millions of lines, I only need about 1000 that contain my data. Here is the current import procedure I'm using for my workaround:
#pragma TextEncoding = "UTF-8"
#pragma rtGlobals=3 // Use modern global access method and strict wave access.
Function LoadDFT(DFTdata,pathName)
String DFTdata //Desired name of file
String pathName //Symbolic path where desired file is present
String DFTFolder=GetDataFolder(1)
String foldername= "root:"+RemoveEnding(DFTData,".out") //Names folder by taking the file name and removing the .out, necessary for proper file parsing
String DFTData2=RemoveEnding(DFTData,".out") //Names the files by taking the file name and removing the .out ending
String columnInfoStr = " " //Contains set of names for each column in the .out file
columnInfoStr += "C=1,F=0,W=3,N='_skip_';"
columnInfoStr += "C=1,F=0,W=11,N=EnergyH_"+DFTdata2+";"
columnInfoStr += "C=1,F=0,W=11,N=OS_"+DFTdata2+";"
columnInfoStr += "C=1,F=0,W=11,N=TDMx_"+DFTdata2+";"
columnInfoStr += "C=1,F=0,W=11,N=TDMy_"+DFTdata2+";"
columnInfoStr += "C=1,F=0,W=11,N=TDMz_"+DFTdata2+";"
columnInfoStr += "C=1,F=0,W=18,N='_skip_';"
columnInfoStr += "C=1,F=0,W=18,N='_skip_';"
NewDataFolder/O/S $foldername //Makes a data folder based
print DFTData //prints the loaded files
LoadWave/J/B=columnInfoStr/D/W/E=0/K=0/V={"\t, "," $",1,1}/F={6,1,0}/N/O/Q/P=$pathName DFTData
End
#pragma rtGlobals=3 // Use modern global access method and strict wave access.
Function LoadDFT(DFTdata,pathName)
String DFTdata //Desired name of file
String pathName //Symbolic path where desired file is present
String DFTFolder=GetDataFolder(1)
String foldername= "root:"+RemoveEnding(DFTData,".out") //Names folder by taking the file name and removing the .out, necessary for proper file parsing
String DFTData2=RemoveEnding(DFTData,".out") //Names the files by taking the file name and removing the .out ending
String columnInfoStr = " " //Contains set of names for each column in the .out file
columnInfoStr += "C=1,F=0,W=3,N='_skip_';"
columnInfoStr += "C=1,F=0,W=11,N=EnergyH_"+DFTdata2+";"
columnInfoStr += "C=1,F=0,W=11,N=OS_"+DFTdata2+";"
columnInfoStr += "C=1,F=0,W=11,N=TDMx_"+DFTdata2+";"
columnInfoStr += "C=1,F=0,W=11,N=TDMy_"+DFTdata2+";"
columnInfoStr += "C=1,F=0,W=11,N=TDMz_"+DFTdata2+";"
columnInfoStr += "C=1,F=0,W=18,N='_skip_';"
columnInfoStr += "C=1,F=0,W=18,N='_skip_';"
NewDataFolder/O/S $foldername //Makes a data folder based
print DFTData //prints the loaded files
LoadWave/J/B=columnInfoStr/D/W/E=0/K=0/V={"\t, "," $",1,1}/F={6,1,0}/N/O/Q/P=$pathName DFTData
End
I've attached a much smaller version of the DFT output that I'm trying to import. I'm using Igor 8 in case that helps. Any suggestions? Thanks for your help in advance!
Best,
Vic
http://www.igorexchange.com/node/4856
June 1, 2018 at 08:36 am - Permalink
June 1, 2018 at 08:41 am - Permalink
You could use
Open
to open the file,FReadline
in a loop with a counter to find the start position by text comparison,Close
to close the file, then construct aLoadWave
command to skip the unneeded lines and load the data.Alternatively,
Grep
can be used to find a text marker in the file and return the line through V_startParagraph.June 1, 2018 at 08:46 am - Permalink
Function FindFirstDataLine(pathName, filePath)
String pathName // Name of symbolic path or ""
String filePath // Name of file or partial path relative to symbolic path.
Variable refNum
Open/R/P=$pathName refNum as filePath
String buffer, text
Variable line = 0
do
FReadLine refNum, buffer
if (strlen(buffer) == 0)
Close refNum
//print "Can't find keyword"
return -1 // The expected keyword was not found in the file
endif
text = buffer[0,1]
if (CmpStr(text," # ") == 0)
Close refNum
return line + 1 // Success: The next line is the first data line.
print "Success!"
endif
line += 1
while(1)
return -1 // We will never get here
End
Function LoadDataFile(pathName, filePath, extension)
String pathName // Name of symbolic path or "" to display dialog.
String filePath // Name of file or "" to display dialog. Can also be full or partial path relative to symbolic path.
String extension // e.g., ".dat" for .dat files. "????" for all files.
Variable refNum
// Possibly display Open File dialog.
if ((strlen(pathName)==0) || (strlen(filePath)==0))
Open /D /R /P=$pathName /T=(extension) refNum as filePath
filePath = S_fileName // S_fileName is set by Open/D
if (strlen(filePath) == 0) // User cancelled?
return -1
endif
// filePath is now a full path to the file.
endif
Variable firstDataLine = FindFirstDataLine(pathName, filePath)
if (firstDataLine < 0)
Printf "No data found in file %s\r", filePath
return -1
endif
LoadWave /J /D /O /E=1 /K=0 /L={0,firstDataLine,1000,2,5} /P=$pathName filePath
return 0
End
June 1, 2018 at 04:49 pm - Permalink
needs to be changed to this:
since you are comparing to " # " (your target string) which is three bytes.
The next problem is that your target string appears in this line (line 2021, zero-based), which is before your data:
I changed your FindFirstDataLine function to add this:
Variable targetStringLength = strlen(targetString)
Then I changed this:
to this:
return line // Success: This is is the first data line.
Now it prints the correct line number: 2032 (zero-based)
The next problem is that your file is space-delimited and the LoadWave operation defaults to comma and tab as delimiters. I fixed this by adding a /V flag:
With that, it seems to do the right thing. That is, it loads this:
1 287.0055 0.007305 0.014191 0.02894 ...
Your next task is to change FindFirstDataLine to FindFirstAndLastDataLines.
But first, there is another problem. Lines 999 and 1000 of the data looks like this:
# 999 377.5523 0.000006 -0.000367 0.000177 -0.000682 0.0000 337.5637 #1000 377.7301 0.000005 -0.000327 0.000165 -0.000623 0.0000 151.3746
Because space is a delimiter, and there is no space after the # character in line 1000, that line appears to LoadWave to have one fewer column than line 999. This causes the wrong data to be loaded starting at line 1000. I will give some thought to how to fix this.
June 1, 2018 at 07:13 pm - Permalink
I made this change. It also requires using the /B flag to specify the width in bytes of each column of data. /B also lets you name each column and skip whatever columns you don't want to load.
I also morphed FindFirstDataLine into FindFirstLineAndNumLines.
The result successfully loads your example file:
#pragma rtGlobals=3 // Use modern global access method and strict wave access.
Function FindFirstLineAndNumLines(pathName, filePath, firstDataLine, numDataLines)
String pathName // Name of symbolic path or ""
String filePath // Name of file or partial path relative to symbolic path
Variable &firstDataLine // Pass-by-reference output
Variable &numDataLines // Pass-by-reference output
firstDataLine = -1
numDataLines = -1
Variable refNum
Open/R/P=$pathName refNum as filePath
String buffer, text
Variable line = 0
String targetString = " # 1"
Variable targetStringLength = strlen(targetString)
// Find first line
do
FReadLine refNum, buffer
if (strlen(buffer) == 0)
Close refNum
return -1 // The expected keyword was not found in the file
endif
text = buffer[0,targetStringLength-1]
if (CmpStr(text,targetString) == 0)
firstDataLine = line
break // This is is the first data line
endif
line += 1
while(1)
// Find last line
targetString = " #"
targetStringLength = strlen(targetString)
do
FReadLine refNum, buffer
if (strlen(buffer) == 0)
// Ran out of lines - assume this is the last line of data
line += 1
break
endif
text = buffer[0,targetStringLength-1]
if (CmpStr(text,targetString) != 0) // Line does not start with "<space>#>?
// This is is the line after the last data line
break
endif
line += 1
while(1)
numDataLines = line - firstDataLine + 1
// Print firstDataLine, numDataLines // For debugging only
Close refNum
return 0 // Success
End
Function LoadDataFile(pathName, filePath, extension)
String pathName // Name of symbolic path or "" to display dialog.
String filePath // Name of file or "" to display dialog. Can also be full or partial path relative to symbolic path.
String extension // e.g., ".dat" for .dat files. "????" for all files.
Variable refNum
// Possibly display Open File dialog.
if ((strlen(pathName)==0) || (strlen(filePath)==0))
Open /D /R /P=$pathName /T=(extension) refNum as filePath
filePath = S_fileName // S_fileName is set by Open/D
if (strlen(filePath) == 0) // User cancelled?
return -1
endif
// filePath is now a full path to the file.
endif
Variable firstDataLine, numLines
Variable result = FindFirstLineAndNumLines(pathName, filePath, firstDataLine, numLines)
if (result != 0)
Printf "No data found in file %s\r", filePath
return -1
endif
// Example Data:
// # 1 287.0055 0.007305 0.014191 0.028940 0.000089 0.0000 72.7191
// # 999 377.5523 0.000006 -0.000367 0.000177 -0.000682 0.0000 337.5637
// #1000 377.7301 0.000005 -0.000327 0.000165 -0.000623 0.0000 151.3746
String columnInfoStr = "" // Prepare parameter for /B flag
columnInfoStr += "N='_skip_',W=2;"
columnInfoStr += "N='Column1',W=4;"
columnInfoStr += "N='Column2',W=11;"
columnInfoStr += "N='Column3',W=10;"
columnInfoStr += "N='Column4',W=11;"
columnInfoStr += "N='Column5',W=11;"
columnInfoStr += "N='_skip_',W=11;"
columnInfoStr += "N='_skip_',W=12;"
columnInfoStr += "N='_skip_',W=14;"
LoadWave /F={9, 11, 0} /B=columnInfoStr /D /O /E=1 /K=0 /L={0,firstDataLine,numLines,0,0} /P=$pathName filePath
return 0
End
Function Test()
LoadDataFile("home", "Sample.txt", ".txt")
End
June 1, 2018 at 09:38 pm - Permalink
Best wishes,
Vic
June 4, 2018 at 03:29 pm - Permalink