
Importing Data from DFT Calculations
vmmr5596
I'm trying to import data from DFT calculations I've been running. The problem is that the data that I'm trying to import is not always located in the same line and the files are massive. The beginning of the data I want can be identified by a couple of keywords (i.e. Rigid spectral shift, NEXAFS, etc) as shown below from an excerpt of the original file:
Xray absorption (XAS, NEXAFS) calculation
Core hole found (by occ.) in alpha space, orbital # 6
Core hole located at center C1
Orbital energy core hole = -10.69511 H ( -291.03103 eV)
Rigid spectral shift = 0.00000 eV
Ionization potential = 291.03103 eV
Core -> unocc. excitations (X-ray absorption), dipole only :
E (eV) OSCL oslx osly oslz osc(r2)
---------------------------------------------------------------------------------------
# 1 287.0055 0.007305 0.014191 0.028940 0.000089 0.0000 72.7191
# 2 287.9669 0.000221 0.002439 0.005033 0.000011 0.0000 126.2999
# 3 288.4817 0.004180 -0.010701 -0.021838 -0.000099 0.0000 147.8532
# 4 289.1079 0.000177 0.002519 0.004267 -0.000622 0.0000 128.5743
My current workaround involves copying and pasting the portion of the data I want (everything in the first 6 columns below the dotted line) into a separate text file and importing that through IGOR, this is however, tedious since the original files are extremely large (can be in the millions of lines) and would like to automate the process if possible(example below):
E (eV) OSCL oslx osly oslz osc(r2)
---------------------------------------------------------------------------------------
1 287.0055 0.007305 0.014191 0.028940 0.000089 0.0000 72.7191
2 287.9669 0.000221 0.002439 0.005033 0.000011 0.0000 126.2999
3 288.4817 0.004180 -0.010701 -0.021838 -0.000099 0.0000 147.8532
4 289.1079 0.000177 0.002519 0.004267 -0.000622 0.0000 128.5743
Out of those millions of lines, I only need about 1000 that contain my data. Here is the current import procedure I'm using for my workaround:
#pragma TextEncoding = "UTF-8" #pragma rtGlobals=3 // Use modern global access method and strict wave access. Function LoadDFT(DFTdata,pathName) String DFTdata //Desired name of file String pathName //Symbolic path where desired file is present String DFTFolder=GetDataFolder(1) String foldername= "root:"+RemoveEnding(DFTData,".out") //Names folder by taking the file name and removing the .out, necessary for proper file parsing String DFTData2=RemoveEnding(DFTData,".out") //Names the files by taking the file name and removing the .out ending String columnInfoStr = " " //Contains set of names for each column in the .out file columnInfoStr += "C=1,F=0,W=3,N='_skip_';" columnInfoStr += "C=1,F=0,W=11,N=EnergyH_"+DFTdata2+";" columnInfoStr += "C=1,F=0,W=11,N=OS_"+DFTdata2+";" columnInfoStr += "C=1,F=0,W=11,N=TDMx_"+DFTdata2+";" columnInfoStr += "C=1,F=0,W=11,N=TDMy_"+DFTdata2+";" columnInfoStr += "C=1,F=0,W=11,N=TDMz_"+DFTdata2+";" columnInfoStr += "C=1,F=0,W=18,N='_skip_';" columnInfoStr += "C=1,F=0,W=18,N='_skip_';" NewDataFolder/O/S $foldername //Makes a data folder based print DFTData //prints the loaded files LoadWave/J/B=columnInfoStr/D/W/E=0/K=0/V={"\t, "," $",1,1}/F={6,1,0}/N/O/Q/P=$pathName DFTData End
I've attached a much smaller version of the DFT output that I'm trying to import. I'm using Igor 8 in case that helps. Any suggestions? Thanks for your help in advance!
Best,
Vic
http://www.igorexchange.com/node/4856
June 1, 2018 at 08:36 am - Permalink
June 1, 2018 at 08:41 am - Permalink
You could use
Open
to open the file,FReadline
in a loop with a counter to find the start position by text comparison,Close
to close the file, then construct aLoadWave
command to skip the unneeded lines and load the data.Alternatively,
Grep
can be used to find a text marker in the file and return the line through V_startParagraph.June 1, 2018 at 08:46 am - Permalink
June 1, 2018 at 04:49 pm - Permalink
needs to be changed to this:
since you are comparing to " # " (your target string) which is three bytes.
The next problem is that your target string appears in this line (line 2021, zero-based), which is before your data:
I changed your FindFirstDataLine function to add this:
Then I changed this:
to this:
Now it prints the correct line number: 2032 (zero-based)
The next problem is that your file is space-delimited and the LoadWave operation defaults to comma and tab as delimiters. I fixed this by adding a /V flag:
With that, it seems to do the right thing. That is, it loads this:
1 287.0055 0.007305 0.014191 0.02894 ...
Your next task is to change FindFirstDataLine to FindFirstAndLastDataLines.
But first, there is another problem. Lines 999 and 1000 of the data looks like this:
# 999 377.5523 0.000006 -0.000367 0.000177 -0.000682 0.0000 337.5637 #1000 377.7301 0.000005 -0.000327 0.000165 -0.000623 0.0000 151.3746
Because space is a delimiter, and there is no space after the # character in line 1000, that line appears to LoadWave to have one fewer column than line 999. This causes the wrong data to be loaded starting at line 1000. I will give some thought to how to fix this.
June 1, 2018 at 07:13 pm - Permalink
I made this change. It also requires using the /B flag to specify the width in bytes of each column of data. /B also lets you name each column and skip whatever columns you don't want to load.
I also morphed FindFirstDataLine into FindFirstLineAndNumLines.
The result successfully loads your example file:
June 1, 2018 at 09:38 pm - Permalink
Best wishes,
Vic
June 4, 2018 at 03:29 pm - Permalink