Reading custom binary file - really slow
Hi there,
I have to import an unusual filetype into Igor pro, from some electrophysiological hardware. The datafile format is a little odd - it has 1024 bytes of header info in ASCII, followed by binary with multiple records parsed into the following chunks:
- One little-endian int64 timestamp (actually a sample number; this can be converted to seconds using the sampleRate variable in the header)
- One little-endian uint16 number (N) indicating the samples per record (always 1024, at least for now)
- One little-endian uint16 recording number (version 0.2 and higher)
- 1024 big-endian int16 samples
- 10-byte record marker (0 1 2 3 4 5 6 7 8 255)
I've been using FBinRead to bring the data into Igor 64 (in Igor 7). My two strategies have been to load in the actual data into separate waves of chunks of 1024, or to import the entire file and then delete the points that don't include data (code below). Both methods are painfully slow - the former takes around 7.5 minutes to load ~900 seconds of data (30kHz samplng) while the latter takes around the same amount of time. Loading the whole binary file (including unwanted info) takes around one second, so the lost time is the massive number of repeated wave operations. Is there a clever way of speeding this up? If it were only 7.5 mins per experiment, it would be fine, but that is for a single channel and my experiment records 64 channels at a time, so I'm looking at about 7 hours to import an experiment...
Any advice to optimise this would be gratefully received! The hardware manufacturers provide Python and Matlab code that can import a file within less than a minute, so I assume there must be a better way...
Loading 1024-byte chunks:
Make /o /n=1 $(NewName)
Wave ImportedWave = $(FileName)
variable read_point
make /o /n=1024 read_buffer
wave read_buffer
Variable maximum_byte
Fstatus reference
maximum_byte =V_logEOF
start_byte +=12
//start_byte=maximum_byte
Variable bitVs = bitVolts[0]
Do
if(start_byte<maximum_byte)
FSetPos reference, start_byte
make /o /n=1024 $("read_buffer"+num2str(counter))
wave read_buffer = $("read_buffer"+num2str(counter))
FBinRead /B=2 /F=2 reference, read_buffer
read_buffer *= bitVs
counter+=1
start_byte+=2070
endif
while(start_byte<maximum_byte)
Concatenate /kill /o /NP Wavelist("read_buffer*",";",""), ImportedWave
variable samplingF = 1/samplerate[0]
SetScale /P x 0, samplingF, "s", ImportedWave
Setscale /P y, 0, 0, "V", ImportedWave
ImportedWave /= 1e6 // convert from µV to V
Deleting unwanted points:
Fstatus reference
Variable maximum_byte
maximum_byte =V_logEOF
Make /o /n=((maximum_byte-1024)/2) $(NewName) // convert size from bytes to int16
Wave ImportedWave = $(NewName)
variable read_point
Variable bitVs = bitVolts[0]
FBinRead /B=2 /F=2 reference, ImportedWave // import whole file minus header
ImportedWave *= bitVs // scale wave
Variable first_point = 0
variable last_point = numpnts(ImportedWave)
// trim out uncessary points, 12 bytes before and 22 bytes after. Loaded binary as int16 so
// 2 bytes = 1 point
Do
deletepoints first_point, 6, ImportedWave
deletepoints (first_point+1024), 5, ImportedWave
first_point +=1024
while(first_point<last_point)
variable samplingF = 1/samplerate[0]
SetScale /P x 0, samplingF, "s", ImportedWave
Setscale /P y, 0, 0, "V", ImportedWave
ImportedWave /= 1e6 // convert from µV to V
The deletepoints operation is likely the issue. Off the top of my head, I have to wonder whether you could instead do this ...
* Split the source data wave into separate "chunks" with something like
chunk2 = source[pstart2,pend2]
...
(a cleverer method would be to do this using a matrix and a for loop)
* Concatenate the chunks back to a contiguous wave.
November 4, 2018 at 05:54 am - Permalink
I agree that calling DeletePoints in a loop will be slow. Each time you delete a point, all subsequent points in the wave have to be moved in memory.
Instead, create another wave, call it OutputWave, of the correct final size. Then copy the wanted points from ImportedWave to OutputWave. Then kill ImportedWave.
November 4, 2018 at 09:30 am - Permalink
Thank you both - the concatenate method cut the run time down to about 4 minutes for the same test data, and the duplicating and copying points method from hrodstein got it down to around 6 seconds. Happy days! :)
November 4, 2018 at 12:37 pm - Permalink
In reply to I agree that calling… by hrodstein
Is there a performance issue if you kill points from the "bottom" end of the wave?
November 5, 2018 at 08:54 am - Permalink
Is there a performance issue if you kill points from the "bottom" end of the wave?
Deleting from the end would be better but still not good.
Anytime you delete points, any points after the deleted points must be moved in memory. When done in a loop, this will be slow.
In some cases you can use the Extract operation to create a wave with deleted points. Extract handles the speed issue internally.
November 5, 2018 at 09:36 am - Permalink
In reply to Is there a performance issue… by hrodstein
Another approach, which might or might not be faster, is to set the value of points you want to delete to NaN (not a number) within the loop. Then, after the loop has executed, call WaveTransform zapNaNs. This assumes that NaN is an invalid value in your particular waves. If you expect any NaNs, then obviously you wouldn't want to do this.
November 5, 2018 at 02:38 pm - Permalink
Is that an ABF v2 file? There exists a loader using the vendor ABF DLL. See https://www.wavemetrics.com/project/bpc_ReadAbf.
November 8, 2018 at 12:41 pm - Permalink
In reply to Is that an ABF v2 file?… by thomas_braun
Nope, it's the OpenEphys data format (http://www.open-ephys.org/), which uses the Intan RHD system. Thankfully, I've solved the problem thanks to one of the previous suggestions.
November 8, 2018 at 12:59 pm - Permalink
In reply to Nope, it's the OpenEphys… by mick6116
Please let the reader from the future know how you solved this problem. They'll be grateful.
November 9, 2018 at 02:29 pm - Permalink
Ah yes, good point. Here we go:
string FileName, NewName
print "Start:", time()
SetDataFolder root:
variable start_byte = 1024 // Skip 1024 byte header
variable Reference // file reference
open /R /P=DataFolder Reference as filename // path already defined as "datafolder"
// read header
string buffer
variable buffer2
make /o /n=1 /T format,description,date_created
make /o /n=1 version,header_bytes,channel,channelType,sampleRate,blockLength,bufferSize,bitVolts
Freadline /T=";" reference, buffer
format = buffer
Freadline /T=";" reference, buffer
sscanf buffer, " \nheader.version = %f;", buffer2
version = buffer2
Freadline /T=";" reference, buffer
sscanf buffer, " \nheader.header_bytes = %f;", buffer2
header_bytes = buffer2
Freadline /T=";" reference, buffer
description = buffer
Freadline /T=";" reference, buffer
date_created = buffer
Freadline /T=";" reference, buffer
sscanf buffer, " \nheader.channel = %f;", buffer2
channel = buffer2
Freadline /T=";" reference, buffer
// channelType = buffer2
Freadline /T=";" reference, buffer
sscanf buffer, " \nheader.sampleRate = %f;", buffer2
sampleRate = buffer2
Freadline /T=";" reference, buffer
sscanf buffer, " \nheader.blockLength = %f;", buffer2
blockLength = buffer2
Freadline /T=";" reference, buffer
sscanf buffer, " \nheader.bufferSize = %f;", buffer2
bufferSize = buffer2
Freadline /T=";" reference, buffer
sscanf buffer, " \nheader.bitVolts = %f;", buffer2
bitVolts = buffer2
Fgetpos reference
FSetPos reference, start_byte // Skip header
Fstatus reference
Variable maximum_byte
maximum_byte =V_logEOF
Make /o /n=((maximum_byte-1024)/2) ImportedWave // convert size from bytes to int16
variable num_records=numpnts(ImportedWave)/1035
Make /o /n=(1024*num_records) /o $(NewName)
Wave OutputWave = $(NewName)
variable read_point
Variable bitVs = bitVolts[0]
FBinRead /B=2 /F=2 reference, ImportedWave // import whole file minus header
ImportedWave *= bitVs // scale wave
variable first_point = 0
variable last_point = numpnts(ImportedWave)
Variable input_left=6
variable input_right=1029
variable output_left=0
variable output_right=1023
// trim out uncessary points, 12 bytes before and 22 bytes after. Loaded binary as int16 so
// 2 bytes = 1 point
variable counter = 0
Do
OutputWave[output_left,output_right]=ImportedWave[p+input_left]
input_left+=11
input_right+=1035
output_left+=1024
output_right+=1024
first_point+=1024 // legacy counter, can switch this at some point
counter+=1
while(first_point<=last_point)
variable samplingF = 1/samplerate[0]
SetScale /P x 0, samplingF, "s", OutputWave
Setscale /P y, 0, 0, "V", OutputWave
OutputWave /= 1e6 // convert from µV to V
close reference
killwaves importedwave
print "End:",time(),"; loaded"+nameofwave(OutputWave)
close reference
print ("Loaded "+nameofwave(OutputWave))
End
November 10, 2018 at 11:18 am - Permalink