Reading custom binary file - really slow

FSetPos reference, start_byte // Skip header Make /o /n=1 $(NewName) Wave ImportedWave = $(FileName) variable read_point make /o /n=1024 read_buffer wave read_buffer Variable maximum_byte Fstatus reference maximum_byte =V_logEOF start_byte +=12 //start_byte=maximum_byte Variable bitVs = bitVolts[0] Do if(start_byte<maximum_byte) FSetPos reference, start_byte make /o /n=1024 $("read_buffer"+num2str(counter)) wave read_buffer = $("read_buffer"+num2str(counter)) FBinRead /B=2 /F=2 reference, read_buffer read_buffer *= bitVs counter+=1 start_byte+=2070 endif while(start_byte<maximum_byte) Concatenate /kill /o /NP Wavelist("read_buffer*",";",""), ImportedWave variable samplingF = 1/samplerate[0] SetScale /P x 0, samplingF, "s", ImportedWave Setscale /P y, 0, 0, "V", ImportedWave ImportedWave /= 1e6 // convert from µV to V

FSetPos reference, start_byte // Skip header Fstatus reference Variable maximum_byte maximum_byte =V_logEOF Make /o /n=((maximum_byte-1024)/2) $(NewName) // convert size from bytes to int16 Wave ImportedWave = $(NewName) variable read_point Variable bitVs = bitVolts[0] FBinRead /B=2 /F=2 reference, ImportedWave // import whole file minus header ImportedWave *= bitVs // scale wave Variable first_point = 0 variable last_point = numpnts(ImportedWave) // trim out uncessary points, 12 bytes before and 22 bytes after. Loaded binary as int16 so // 2 bytes = 1 point Do deletepoints first_point, 6, ImportedWave deletepoints (first_point+1024), 5, ImportedWave first_point +=1024 while(first_point<last_point) variable samplingF = 1/samplerate[0] SetScale /P x 0, samplingF, "s", ImportedWave Setscale /P y, 0, 0, "V", ImportedWave ImportedWave /= 1e6 // convert from µV to V

jjweimer

The deletepoints operation is likely the issue. Off the top of my head, I have to wonder whether you could instead do this ...

* Split the source data wave into separate "chunks" with something like

chunk1 = source[pstart1,pend1]
chunk2 = source[pstart2,pend2]
...

(a cleverer method would be to do this using a matrix and a for loop)

* Concatenate the chunks back to a contiguous wave.

November 4, 2018 at 05:54 am - Permalink

hrodstein

I agree that calling DeletePoints in a loop will be slow. Each time you delete a point, all subsequent points in the wave have to be moved in memory.

Instead, create another wave, call it OutputWave, of the correct final size. Then copy the wanted points from ImportedWave to OutputWave. Then kill ImportedWave.

November 4, 2018 at 09:30 am - Permalink

mick6116

Thank you both - the concatenate method cut the run time down to about 4 minutes for the same test data, and the duplicating and copying points method from hrodstein got it down to around 6 seconds. Happy days! :)

November 4, 2018 at 12:37 pm - Permalink

jtigor

hrodstein wrote:
I agree that calling DeletePoints in a loop will be slow. Each time you delete a point, all subsequent points in the wave have to be moved in memory.

Is there a performance issue if you kill points from the "bottom" end of the wave?

November 5, 2018 at 08:54 am - Permalink

hrodstein

Is there a performance issue if you kill points from the "bottom" end of the wave?

Deleting from the end would be better but still not good.

Anytime you delete points, any points after the deleted points must be moved in memory. When done in a loop, this will be slow.

In some cases you can use the Extract operation to create a wave with deleted points. Extract handles the speed issue internally.

November 5, 2018 at 09:36 am - Permalink

aclight

Another approach, which might or might not be faster, is to set the value of points you want to delete to NaN (not a number) within the loop. Then, after the loop has executed, call WaveTransform zapNaNs. This assumes that NaN is an invalid value in your particular waves. If you expect any NaNs, then obviously you wouldn't want to do this.

November 5, 2018 at 02:38 pm - Permalink

thomas_braun

Is that an ABF v2 file? There exists a loader using the vendor ABF DLL. See https://www.wavemetrics.com/project/bpc_ReadAbf.

November 8, 2018 at 12:41 pm - Permalink

mick6116

Nope, it's the OpenEphys data format (http://www.open-ephys.org/), which uses the Intan RHD system. Thankfully, I've solved the problem thanks to one of the previous suggestions.

November 8, 2018 at 12:59 pm - Permalink

JimProuty

Please let the reader from the future know how you solved this problem. They'll be grateful.

November 9, 2018 at 02:29 pm - Permalink

mick6116

Ah yes, good point. Here we go:

Function LoadOpenEphysData(FileName,NewName)
    
    string FileName, NewName
    print "Start:", time()
 
    SetDataFolder root:
    
    variable start_byte = 1024          // Skip 1024 byte header
    variable Reference                      // file reference
 
    open /R  /P=DataFolder Reference as filename    // path already defined as "datafolder"
    // read header
    string buffer
    variable buffer2
    make /o /n=1 /T format,description,date_created
    make /o /n=1 version,header_bytes,channel,channelType,sampleRate,blockLength,bufferSize,bitVolts
    Freadline /T=";" reference, buffer
    format = buffer
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.version = %f;", buffer2
    version = buffer2
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.header_bytes = %f;", buffer2
    header_bytes = buffer2
    Freadline /T=";" reference, buffer
    description = buffer
    Freadline /T=";" reference, buffer
    date_created = buffer
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.channel = %f;", buffer2
    channel = buffer2
    Freadline /T=";" reference, buffer
//  channelType = buffer2
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.sampleRate = %f;", buffer2
    sampleRate = buffer2
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.blockLength = %f;", buffer2
    blockLength = buffer2
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.bufferSize = %f;", buffer2
    bufferSize = buffer2
    Freadline /T=";" reference, buffer
    sscanf buffer, " \nheader.bitVolts = %f;", buffer2
    bitVolts = buffer2
    Fgetpos reference
 
    FSetPos reference, start_byte                       // Skip header
    Fstatus reference
    Variable maximum_byte
    maximum_byte =V_logEOF 
    
    Make /o /n=((maximum_byte-1024)/2) ImportedWave // convert size from bytes to int16
    
    
    variable num_records=numpnts(ImportedWave)/1035
    Make /o /n=(1024*num_records) /o $(NewName)
    Wave OutputWave = $(NewName)    
    
    variable read_point
    Variable bitVs = bitVolts[0]
    
    FBinRead /B=2 /F=2 reference, ImportedWave // import whole file minus header
    ImportedWave *= bitVs               // scale wave
    
    variable first_point = 0
    variable last_point = numpnts(ImportedWave)
    Variable input_left=6
    variable input_right=1029
    variable output_left=0
    variable output_right=1023
    
    // trim out uncessary points, 12 bytes before and 22 bytes after. Loaded binary as int16 so
    // 2 bytes = 1 point
    
    variable counter = 0
    Do
        OutputWave[output_left,output_right]=ImportedWave[p+input_left]
        input_left+=11
        input_right+=1035
        output_left+=1024
        output_right+=1024
        first_point+=1024               // legacy counter, can switch this at some point
        counter+=1
    while(first_point<=last_point)
 
 
    
    variable samplingF = 1/samplerate[0]
    
    SetScale /P x 0, samplingF, "s", OutputWave
    Setscale /P y, 0, 0, "V", OutputWave
    OutputWave /= 1e6           // convert from µV to V
    
    close reference
    killwaves importedwave
    print "End:",time(),"; loaded"+nameofwave(OutputWave)
    close reference
    print ("Loaded "+nameofwave(OutputWave))
End

November 10, 2018 at 11:18 am - Permalink