Loading DPC files in Igor
AskAl
I have been trying recently to load a DPC file (Dynamic photon counting file) on Igor. However I have been having some difficulty with this.
The DPC file format according to the manual of the acquisition program is something like the following:
Header:
Bytes Content
0-1 Characters IM
2-3 Comment length in bytes ( ComLen)
4-5 Width of the image in pixels (iDX)
6-7 Height of the image in lines (iDY)
8-9 X-Offset (iX)
10-11 Y-Offset (iY)
12-13 File type: 2=16 Bit
14-64 Reserved
64-nnn Comment area can contain any information. It is used by the program to store the status string and the Calibration tables (if any)
nnn+1-End Data Area (Starts at address 64+ ComLen)
The Data Area looks like n times the following diagram, where there is one set of such data
for every frame. The first entry is the time stamp relative to the origin in ms, followed by a
the coordinates of all photons counted within this frame. The end of the data for the first
frame is indicated by a delimiter (0xFFFFFFFF). This is repeated for all frames recorded.
Byte Coordinates Remark
00-03 Timestamp Time in ms from first image (long int)
04-05 X0 Photon 0, X-Coord.
06-07 Y0 Photon 0, Y-Coord.
08-09 X1 Photon 1, X-Coord.
10-11 Y1 Photon 1, Y-Coord.
Etc. Etc. Etc.
Etc. Etc. Etc.
20-24 0xFFFFFFFF Delimiter (Long int, all bits set to 1)
I have been trying to load one of these files in with the in built Igor functions but it has not been possible.
Any advice on which load procedure(or command) would better suit this file format? I am really not interested in loading the header, I just want to load the information from the data area in a 2 column wave. Any help would be greatly appreciated.
Alexis
You can read the header, excluding the comment, into and Igor structure. This will allow you to determine where the comment ends and where the data begins.
This is fairly difficult programming and will require at least intermediate Igor programming skills.
Here is an example that illustrates these techniques:
http://www.igorexchange.com/node/5328
I can't tell, from your description, what the layout of the data for a frame is. Specifically, I can't tell how to determine how many XY photon pairs there are (how to determine what Etc. Etc. Etc. means, precisely).
Also, you don't say if the data is big-endian or little-endian.
It would be good if you can find documentation for the file format and post a link to it.
Also, attach a sample file and include an explanation of what you want to get out of it, such as how many waves and what dimensionality.
May 24, 2016 at 03:46 pm - Permalink
My guess is that you read that from the first two bytes, which look similar to the TIFF standard (where the first two bytes contain either 'II' or 'MM').
May 25, 2016 at 10:14 am - Permalink
And my guess is that you meant "either 'IM' or 'MI'".
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
May 25, 2016 at 12:11 pm - Permalink
The code I used (the part that does the actual reading of the photon variables anyway, the full reading function is attached) looks something like this:
Fbinread/U/F=2 refnum, bytereadX
Fbinread/U/F=2 refnum, bytereadY ////======read photons
Fstatus refnum
if(bytereadX==65535 && bytereadY==65535) // if you read a delimeter value
Fsetpos refnum, V_filepos-4 // reset the file position
if(jj!=numofframes-1)
Fbinread/F=5 refnum, delimeter // read out the delimeter and move on except if it is the last frame
fpoint[jj]=ii
jj+=1
ii-=1
else
print/D "End of frames reached breaking: Fileposition=",V_filepos,V_logEoF//,jj,ii
break
endif
else
X0[ii]=bytereadX;Y0[ii]=bytereadY
endif
bytereadY=0;bytereadX=0
endfor
I am trying to speed this up though as the files are of the order of 10MB and even more sometimes and reading them into igor really takes a long time.
Is there some way to parallelize the reading from the file so that I can speed up the loading, or any other tips that may speed up the function? Typically 5X10e4< numofframes < 2X10e5 and ii goes up to 10e^6 and more.
Also here is a link to one of the files if anyone is interested: https://www.dropbox.com/s/34n5d3qtkftqvzo/TR4_GAIN63_I3d.dpt?dl=0
Cheers
AA
October 4, 2016 at 03:37 am - Permalink
I first tried running your function with Igor 7 on Macintosh and it seemed like it was taking forever. In order to see what was being so slow, I did the following:
1. Added
#include <FunctionProfiling>
to the top of the procedure file and then recompiled.2. Added a
BeginFunctionProfiling()
call to your code just before theMake/O/N=2 IM
line3. Added a
EndFunctionProfiling()
call to your code just after yourClose refNum
line.4. Because my initial attempt to load the file seemed to never finish, I added the following code at the bottom of your for loop, just before the endfor:
break
endif
5. Added
return 0
after the call toEndFunctionProfiling()
I then executed your function. It executed quickly (because of the early exit) and then the function profiling report was displayed. The report indicated that about 99% of the execution time was spent on the
Fstatus refnum
line within the for loop.Igor 7 has a new operation
FGetPos
that can be used in place of Fstatus when you only need to have the V_filepos variable set. In this particular call, that seems to be all you're using. ReplacingFstatus refnum
withFGetPos refnum
makes your code much faster. After removing the changes I made above (keeping only the FGetPos instead of Fstatus change), I can now load the test file you made available in about 5 seconds.I suspect that you could still improve the performance of your code even further but maybe this change will make it fast enough for your purposes.
October 4, 2016 at 08:43 am - Permalink
Many thanks for looking into this. Your suggestions did help a lot. Initially the file you said was taking forever was taking about 40s to load now it is down to 2-3s. Yeah I forgot to comment out a couple of the references to other functions, I realized just after uploading the file.
The profiling functions really helped as I wasn't aware of their existence thanks for pointing them out (they will come in handy on the analysis/corellation of the data which is also quite slow atm). Your first straightforward trick to replace the Fstatus with Fgetpos sped up things substantially. I also added some more multithreading references where I could and made some minor changes (reading both X and Y at the same time) and now at least it is workable for the larger files so a 10MB file takes about 30s which is more or less ok (there are a couple of 100MB files that would take about 5min to load but still much better than the previous hour or so that I had to wait).
With the latest alterations now 80% of the time is spent actually reading out from the file ie in the Fbinread command. However there is still a 10% of the time that is used to parse the read out values to the waves that store them in Igor. Is there a way to actually read out directly into some predefined position of a wave/array?
I tried something like
Fbinread/U/F=2 refnum, Y[ii]
but Igor wouldn't have it. Is there any work around for this.AA
October 5, 2016 at 03:14 am - Permalink
Are you mixing the commands to read one frame with those to advance frame-by-frame. If so, I wonder if this approach is compounding the overhead. What would happen if you make a loop just to read a single frame, code that in to a threadsafe function call, and then call it within a loop to read frame-by-frame?
As to the request to speed up the storage ... could this be done by reading one frame into a structure instead of a wave? Or can the system be tricked to read each frame as an XY image instead of reading in to X and Y waves?
--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAH
October 5, 2016 at 04:52 am - Permalink
The data that are read out, that I need for the analysis are the X & Y coordinates and and how many of them are in each shot/frame. So I use two waves for X and Y and one that stores progressively how many points were read before the frame ends. The catch 22, is that you don't know a-priori how many coordinates are in each frame, it varies from frame to frame (with a poissonian distribution) that's why I have to test every time if the frame/shot has ended so that I can then store the value of where this happened as well. So I think it wouldn't be possible to use threadsafe for this type of data readout?!
I tried reading in a [1,2] matrix the XY values instead of single X,Y variables but that didn't really speed up the process.
Since you don't know the number of the points in the frame before you read it out, I don't see how this could be done. There is a process to extract the individual frames into images. However that really doesn't help. The reason the data are stored like this is for compression reasons. You only store the locations of a 2d array where you record a count in each shot. typically this is pretty low so you save a huge amount of space. I once tried to extract the frames into images with a standard tiff compression and a ~1mb dpc file ended up in something like ~2GB of images.
October 5, 2016 at 05:45 am - Permalink
I also wonder if reading the X, Y data in to a string list and then dumping that to a wave after the frame is read would be faster? I think MatrixOP or something? IOW, I wonder if the overhead to sequence the data in to the wave on a point-by-point basis could be reduced by doing the read to a faster storage method and then reprocessing back to a wave _en mass_.
In any case, it sounds as though you have a workable answer. Everything now is just mucking about on the details. :-)
--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAH
October 6, 2016 at 05:44 am - Permalink
Since I was curious to see how much faster this would be, I implemented it (correctly even, I think).
String Setname ="LL"
Prompt SETNAME, "Name of set"
DoPrompt "Input name for Set", setname
if(V_flag)
print "Cancelled by user"
return -1
endif
NewDataFolder/O/S root:$"PS_"+setname
variable dpt=150000,numofFrames,refNum,pos,dptdif=0,ComLen,Nframes
String pathName,str,fullPath,fileFilters = "Data Files (*.dpt,*.dpc):.dpt,.dpc;"
String/G fileName,expdetails=""
Variable/G photonN, timestamp
Open/F=filefilters/R/Z=2 refNum //as fileName // Store results from Open in a safe place.
fullPath = S_fileName
filename=S_filename
if (V_flag == -1)
Print "DemoOpen cancelled by user."
return -1
endif
if (V_flag != 0)
DoAlert 0, "Error in DemoOpen"
return V_flag
endif
Variable start = StopMSTimer(-2)
Make/O/N=2 IM //Store 2 first Characters
Make/O/N=31 header=0 //Make Wave to store header values
Fbinread/U/F=1 refnum, IM //First 2 characters should be IM (73 77)
Fbinread/F=2 refnum, header //read the first 62 bytes of the header
variable ii,jj,BytereadX,BytereadY,numofphotons,delimeter
Make/O/T HContent
Hcontent= {"comment length","width of image","height of image","x offset","y offset","file type 2=16bit","reserved"} // explanation of the header values read for future reference
Fstatus refnum
/// read the first 11 lines (the comment section) into a string
FReadLine refNum, str // Read first line into string variable
expdetails=str
for(ii=0;ii<10;ii+=1) //read following 10 lines and append to expdetails
FReadLine refNum, str
expdetails+=str
endfor
//now get number of frames/exposures from the expdetails
Nframes=str2num(expdetails[strsearch(expdetails,"NrExposure=",0,2)+11,strsearch(expdetails,",",strsearch(expdetails,"NrExposure=",0,2),2)-1])
variable/G threshold=str2num(expdetails[strsearch(expdetails,"threshold=",0,2)+10,strsearch(expdetails,"[",strsearch(expdetails,"threshold=",0,2),2)-1])
numofFrames=Nframes
Variable/G frames=numofFrames
Fsetpos refnum, 64+header[0] // go to the start of the data
Fstatus refnum
// Determine the number of bytes from the start of the data to the end of the file
Variable numDataBytes = V_logEOF-V_filepos
// Each frame has one timestamp (32-bit int = 4 bytes) and one delimiter (32-bit int = 4 bytes) for a total of 8 bytes plus the photon coordinate data
// If we assume that the NrExposure value in the comments is accurate, then we can calculate the expected number
// of photons following this formula:
// numPhotons = (numDataBytes - (# bytes for all non-photon frame data)) / 4
// since each photon coordinate is 2 16-bit (2 byte) integers = 4 bytes
Variable numPhotons = (numDataBytes - (8*numofFrames))/4
Variable intWavePoints = numDataBytes / 4 // 4 bytes per integer
Make/O/N=(intWavePoints)/I/U intWave
Make/O/N=(numPhotons) X0=0,Y0=0
Make/O/N=(numofFrames)/I/U Fpoint
Make/O/N=(numofFrames)/I/U timestamps
// Read in all of the data into an unsigned 32-bit integer wave.
Fbinread refnum, intWave
// Now go through intWave and extract the timestamps and photon coordinates.
Variable intCounter = 0
Variable frameCounter = 0
Variable needTimestamp = 1 // Whether we expect the next integer read from the wave to be a timestamp.
Variable photonCounter = 0
int thisInt, photonXPos, photonYPos
For (intCounter = 0; intCounter < intWavePoints; intCounter += 1)
if (intWave[intCounter] == 0xFFFFFFFF) // This is a frame delimiter
frameCounter += 1
needTimestamp = 1
continue
endif
if (needTimestamp)
timestamps[frameCounter] = intWave[intCounter]
needTimestamp = 0
Fpoint[frameCounter] = intCounter
else
// Extract photon coordinates.
thisInt = intWave[intCounter]
// This logic assumes that the data read from the file is in
// little Endian byte order. That assumption is made elsewhere,
// so it seems appropriate to make it here as well.
photonXPos = thisInt & 0xFFFF
photonYPos = thisInt >> 16
X0[photonCounter] = photonXPos
Y0[photonCounter] = photonYPos
photonCounter += 1
endif
EndFor
Close refNum //close file
printf "File read took %g seconds.\r", (StopMSTimer(-2) - start) / 1e6
fpoint[jj]=ii
Duplicate/O fpoint PhPFr
PhPFr[1,*]=Fpoint[p]-Fpoint[p-1]
wavestats/Q PhPFr
photonN=V_sum
if(numpnts(X0)>photonN)
redimension/N=(photonN) X0,Y0
endif
//Y0-=1
wavestats/Q Y0
variable y1=V_max-V_min+1
wavestats/Q X0
variable X1=V_max-V_min+1
// print x1,y1
//=======================Make 2D image if more than 1 line/streak is read
if(x1>1)
Make/O/D/N=(X1,Y1)/D XYZimageCountRate
Make/O/D/N=(X1,Y1)/D XYZimagecount
Duplicate/O X0 ZZ
wave Im2d=XYZimagecountRate
wave Im2Dc=XYZimagecount
IM2d=0;IM2Dc=0;zz=1 ////zz=1 1 photon for every set of coordinates
ImageFromXYZ /AS {x0,y0,zz}, IM2D, IM2dC
IM2d/=numofframes
endif
//=========================
KillWaves/Z Fpoint,ZZ//,XYZimagecount
#if 0
///test to see if file loaded ok and print relevant values
if(IM[0]==73 && IM [1]==77 && jj==numofframes-1 && V_filepos==V_logEoF) ////add other controls as well
printf "File appears to have loaded fine\r Number of frames read=%g // Photon Coordinates read=%g \r",numofFrames,ii
if(ii!=PhotonN)
printf "read %g more photons than anticipated will try to read the file again \r" -(ii-photonN)
// LoadPhotonCountingDif(-(ii-PhotonN),filename)
else
print/D "Average photons per frame=",ii/numofFrames
print/D "Average photons per line (only valid for DPT files) =",(ii/numofFrames)/(X1)
endif
endif
KillWaves/Z IM
Y0-=1 //get rid of zero pixel that is never occupied and causes nan values
string filetype=filename[strlen(filename)-3,strlen(filename)]
if(cmpstr("dpt",filetype)==0 && X1>1)
print "DPT filetype detected, proceeding to bin data in single line"
// BinInSinglecolumn() //do binning debugging
place0Framesatend()
else
Duplicate/O PhPFr PHline
MakeYavg()
place0Framesatend()
endif
// and place the null frames at the end (speeds up the G calculation algorithm by moving the null values at the end and breaking when the first zero is detected)
#endif
End
This version took about 0.7 seconds to load your test file versus about 2.6 seconds using your original function.
While playing around with this, I think I found an error in your code. You have this:
but I don't think that's right. That formula gives more photons than seem to actually be in the file. If I understand the file format correctly, I think my formula for number of photons is correct.
For what it's worth, the X0 and Y0 waves generated by your version and my version seem to be identical except that your version has (-5, -5) for quite a few rows at the end. I think this is because you're calculating more photons than are actually present in the file.
By the way, you'll need Igor Pro 7 to run my version of the function. Hopefully you're already using IP7.
October 6, 2016 at 09:20 am - Permalink
I have tried the function you posted and I must say that it worked like a charm. I had to spend some time to figure out what the bitwise operations were actually doing as I haven't used any bitwise functions in the past. There was only a small issue with the values stored in fpoint that was giving always 2 counts more than the actual coordinates in the individual frames plus 9 more in the first frame. So I just replaced that part and then everything worked perfectly.
Variable intWavePoints = numDataBytes / 4 // 4 bytes per integer
Make/O/N=(intWavePoints)/I/U intWave
Make/O/N=(numPhotons) X0=0,Y0=0
//Make/O/N=(Frames)/I/U Fpoint
Make/O/N=(Frames)/I/U timestamps
Make/O/N=(Frames) PhPfr //photons per frame
// Read in all of the data into an unsigned 32-bit integer wave.
Fbinread refnum, intWave
Close refNum //close file
// Now go through intWave and extract the timestamps and photon coordinates.
Variable intCounter = 0
Variable frameCounter = 0
Variable needTimestamp = 1 // Whether we expect the next integer read from the wave to be a timestamp.
Variable photonCounter = 0
variable framephotonCounter=0
int thisInt, photonXPos, photonYPos
For (intCounter = 0; intCounter < intWavePoints; intCounter += 1)
if (intWave[intCounter] == 0xFFFFFFFF) // This is a frame delimiter
PhPfr[frameCounter]=framephotonCounter
framephotonCounter=0
frameCounter += 1
needTimestamp = 1
continue
endif
if (needTimestamp)
timestamps[frameCounter] = intWave[intCounter]
needTimestamp = 0
else
// Extract photon coordinates.
thisInt = intWave[intCounter]
// This logic assumes that the data read from the file is in
// little Endian byte order. That assumption is made elsewhere,
// so it seems appropriate to make it here as well.
photonXPos = thisInt & 0xFFFF
photonYPos = thisInt >> 16
X0[photonCounter] = photonXPos
Y0[photonCounter] = photonYPos
photonCounter += 1
framephotonCounter+=1
endif
EndFor
The speed that it loads the files is also unbelievably fast. With your code the biggest file that I have that is about 370MB took less than 1 min which is amazing and I doubt I will have to import files much bigger than that.
Regarding the photon number I had there, you are right it wasn't correct. From the file format listed above I was making out that the delimeter is 5 bytes but obviously that is not correct (?!) and the delimeter is 4 bytes. Also I had mistakenly assumed that there is a single time stamp in the beginning of the data which apparently isn't correct and there is a timestamp in every frame. Due to these mistakes my original photonN estimation was wrong so I had just set it to something I was sure was bigger than the actual number and just resized the coordinate array afterwards, which of course is not good coding and also computationally more expensive.
The -5 values were left over from some debugging initialization I had previously used.
jjweimer ,
I think that the multi-threading the way you suggest it would probably work but since aclight's version works now so well I think it might just be an overkill. The second method that you mention is similar I guess to the implementation of aclight and that seems to work very good. Maybe for files on the GB order the multi-threading would make sense, although as I said this is not on the horizon for the near future, although I know of other users (in other scientific groups) of the instrument that outputs these files and they are integrating over multiple days that would result in several tens of millions of frames since the frame acquisition rate is 100Hz.
Thank you both for your input.
I will upload the loading function when I have finished mucking up the details. :) and then what is left is to time optimize the analysis functions.
October 11, 2016 at 03:16 am - Permalink