
Reading custom binary file - really slow

Hi there,
I have to import an unusual filetype into Igor pro, from some electrophysiological hardware. The datafile format is a little odd - it has 1024 bytes of header info in ASCII, followed by binary with multiple records parsed into the following chunks:
- One little-endian int64 timestamp (actually a sample number; this can be converted to seconds using the sampleRate variable in the header)
- One little-endian uint16 number (N) indicating the samples per record (always 1024, at least for now)
- One little-endian uint16 recording number (version 0.2 and higher)
- 1024 big-endian int16 samples
- 10-byte record marker (0 1 2 3 4 5 6 7 8 255)
I've been using FBinRead to bring the data into Igor 64 (in Igor 7). My two strategies have been to load in the actual data into separate waves of chunks of 1024, or to import the entire file and then delete the points that don't include data (code below). Both methods are painfully slow - the former takes around 7.5 minutes to load ~900 seconds of data (30kHz samplng) while the latter takes around the same amount of time. Loading the whole binary file (including unwanted info) takes around one second, so the lost time is the massive number of repeated wave operations. Is there a clever way of speeding this up? If it were only 7.5 mins per experiment, it would be fine, but that is for a single channel and my experiment records 64 channels at a time, so I'm looking at about 7 hours to import an experiment...
Any advice to optimise this would be gratefully received! The hardware manufacturers provide Python and Matlab code that can import a file within less than a minute, so I assume there must be a better way...
Loading 1024-byte chunks:
FSetPos reference, start_byte // Skip header Make /o /n=1 $(NewName) Wave ImportedWave = $(FileName) variable read_point make /o /n=1024 read_buffer wave read_buffer Variable maximum_byte Fstatus reference maximum_byte =V_logEOF start_byte +=12 //start_byte=maximum_byte Variable bitVs = bitVolts[0] Do if(start_byte<maximum_byte) FSetPos reference, start_byte make /o /n=1024 $("read_buffer"+num2str(counter)) wave read_buffer = $("read_buffer"+num2str(counter)) FBinRead /B=2 /F=2 reference, read_buffer read_buffer *= bitVs counter+=1 start_byte+=2070 endif while(start_byte<maximum_byte) Concatenate /kill /o /NP Wavelist("read_buffer*",";",""), ImportedWave variable samplingF = 1/samplerate[0] SetScale /P x 0, samplingF, "s", ImportedWave Setscale /P y, 0, 0, "V", ImportedWave ImportedWave /= 1e6 // convert from µV to V
Deleting unwanted points:
FSetPos reference, start_byte // Skip header Fstatus reference Variable maximum_byte maximum_byte =V_logEOF Make /o /n=((maximum_byte-1024)/2) $(NewName) // convert size from bytes to int16 Wave ImportedWave = $(NewName) variable read_point Variable bitVs = bitVolts[0] FBinRead /B=2 /F=2 reference, ImportedWave // import whole file minus header ImportedWave *= bitVs // scale wave Variable first_point = 0 variable last_point = numpnts(ImportedWave) // trim out uncessary points, 12 bytes before and 22 bytes after. Loaded binary as int16 so // 2 bytes = 1 point Do deletepoints first_point, 6, ImportedWave deletepoints (first_point+1024), 5, ImportedWave first_point +=1024 while(first_point<last_point) variable samplingF = 1/samplerate[0] SetScale /P x 0, samplingF, "s", ImportedWave Setscale /P y, 0, 0, "V", ImportedWave ImportedWave /= 1e6 // convert from µV to V
The deletepoints operation is likely the issue. Off the top of my head, I have to wonder whether you could instead do this ...
* Split the source data wave into separate "chunks" with something like
(a cleverer method would be to do this using a matrix and a for loop)
* Concatenate the chunks back to a contiguous wave.
November 4, 2018 at 05:54 am - Permalink
I agree that calling DeletePoints in a loop will be slow. Each time you delete a point, all subsequent points in the wave have to be moved in memory.
Instead, create another wave, call it OutputWave, of the correct final size. Then copy the wanted points from ImportedWave to OutputWave. Then kill ImportedWave.
November 4, 2018 at 09:30 am - Permalink
Thank you both - the concatenate method cut the run time down to about 4 minutes for the same test data, and the duplicating and copying points method from hrodstein got it down to around 6 seconds. Happy days! :)
November 4, 2018 at 12:37 pm - Permalink
In reply to I agree that calling… by hrodstein
Is there a performance issue if you kill points from the "bottom" end of the wave?
November 5, 2018 at 08:54 am - Permalink
Is there a performance issue if you kill points from the "bottom" end of the wave?
Deleting from the end would be better but still not good.
Anytime you delete points, any points after the deleted points must be moved in memory. When done in a loop, this will be slow.
In some cases you can use the Extract operation to create a wave with deleted points. Extract handles the speed issue internally.
November 5, 2018 at 09:36 am - Permalink
In reply to Is there a performance issue… by hrodstein
Another approach, which might or might not be faster, is to set the value of points you want to delete to NaN (not a number) within the loop. Then, after the loop has executed, call WaveTransform zapNaNs. This assumes that NaN is an invalid value in your particular waves. If you expect any NaNs, then obviously you wouldn't want to do this.
November 5, 2018 at 02:38 pm - Permalink
Is that an ABF v2 file? There exists a loader using the vendor ABF DLL. See https://www.wavemetrics.com/project/bpc_ReadAbf.
November 8, 2018 at 12:41 pm - Permalink
In reply to Is that an ABF v2 file?… by thomas_braun
Nope, it's the OpenEphys data format (http://www.open-ephys.org/), which uses the Intan RHD system. Thankfully, I've solved the problem thanks to one of the previous suggestions.
November 8, 2018 at 12:59 pm - Permalink
In reply to Nope, it's the OpenEphys… by mick6116
Please let the reader from the future know how you solved this problem. They'll be grateful.
November 9, 2018 at 02:29 pm - Permalink
Ah yes, good point. Here we go:
November 10, 2018 at 11:18 am - Permalink