How would speed this up?
Hi,
I am working on some image analysis specifically looking at textures within an image. I can't share an image because they are proprietary, but think of them as a materials TEM (transmission Electron Microscopy) image with different particles and regions. The first order request is to quantify the areas of the different phases, though I think there is benefit in making the quantification more explicit and to that end I am using the ImageGCLM function.
The basic idea I have an image 4096x4096 pixels and I am looking at different regions and in general I am binning the image into 64x64 pixel regions and running the ImageGCLM function. This function works and seems to produce the results anticipated. The downside is that it takes ~90+ seconds to run on Mac with M1 and ~85 sec on Mac 3.8GHz i7 Intel CPU. This function gets run which each image loaded so the user is sitting twiddling thumbs for a while.
Are there ways to speed this up perhaps with multi-threading? I have not used multi-threading so I am completely unfamiliar withers use. Other ideas?
variable index,maxindex,starttime
starttime = datetime
maxindex=(dimsize(input,0)/binsize)^2
Make/O /n=(maxindex,16) GCLM
setlabels("x;y;cluster;angular;contrast;correlation;squares;inv_Diff;Sum_Ave;sum_var;sum_entropy;entropy;dif_var;dif_entropy;info_cor1;info_cor2",GCLM,1)
Make/FREE /N=(binsize,binsize) tempimage
GCLM[][0]= floor(p/binsize)
GCLM[][1]= mod(p,binsize)
variable rs,re,cs,ce
for(index=0;index<maxindex;index+=1)
rs = gclm[index][0]*binsize
re = rs+binsize-1
cs = gclm[index][1]*binsize
ce = cs+binsize-1
matrixop/O/Free subimage =subrange(input,rs,re,cs,ce)
imageGLCM/FREE /DETP=tempGCLM/HTFP subimage
GCLM[index][3,]= tempGCLM[q-3]
endfor
print (datetime-starttime)
end
Andy
From the top of my head and untested:
variable rs,re,cs,ce
rs = gclm[index][0]*binsize
re = rs+binsize-1
cs = gclm[index][1]*binsize
ce = cs+binsize-1
matrixop/O/Free subimage =subrange(input,rs,re,cs,ce)
imageGLCM/FREE /DETP=tempGCLM/HTFP subimage
GCLM[index][3,]= tempGCLM[q-3]
End
Make/FREE/N=(maxindex) indexHelper
MultiThread indexHelper[] = DoAnalysis(input, gclm, p)
It is a bit nasty that DoAnalysis writes into the shared GCLM wave, but as it is using different row indizes it is safe. An alternative would be to let DoAnalysis return tempGCLM, store it into a wave ref wave and extract the parameters from there.
April 17, 2023 at 08:35 am - Permalink
Hi Thomas,
It looks like a speed up of 6X from 94 seconds to 16 seconds.
Here is what I have per your guidance. Basically replacing the for loop with the multithread assignment.
variable index,maxindex,starttime
starttime = datetime
maxindex=(dimsize(input,0)/binsize)^2
Make/O /n=(maxindex,16) GCLM
setlabels("x;y;cluster;angular;contrast;correlation;squares;inv_Diff;Sum_Ave;sum_var;sum_entropy;entropy;dif_var;dif_entropy;info_cor1;info_cor2",GCLM,1)
Make/FREE /N=(binsize,binsize) tempimage
GCLM[][0]= floor(p/binsize)
GCLM[][1]= mod(p,binsize)
variable rs,re,cs,ce
Make/FREE/N=(maxindex) indexHelper
MultiThread indexHelper[] = DoAnalysis(input, gclm, binsize, p)
print (datetime-starttime)
end
threadsafe Function DoAnalysis(WAVE input, WAVE gclm, variable bin size, variable index)
variable rs,re,cs,ce
rs = gclm[index][0]*binsize
re = rs+binsize-1
cs = gclm[index][1]*binsize
ce = cs+binsize-1
matrixop/O/Free subimage =subrange(input,rs,re,cs,ce)
imageGLCM/FREE /DETP=tempGCLM/HTFP subimage
GCLM[index][3,]= tempGCLM[q-3]
End
Thank you very much.
Andy
April 17, 2023 at 09:16 am - Permalink
Nice! Another thing to look out for is if the image is needlessly in FP64 where FP32 would be enough.
April 17, 2023 at 10:43 am - Permalink
Hi,
In this case the image is an unsigned Byte 8 bit - gray scale image transformed from a jpg, but it is good to keep that in mind.
Also I tested the new code on my iMac with 3.8GHz i7 Intel chip and the improvement was even greater from 91 sec to 10 sec.
Andy
April 17, 2023 at 11:52 am - Permalink
Try using Redimension/RMD instead of MatrixOp subrange. The two use different memory allocation strategies and one might be faster than the other in certain situations.
April 17, 2023 at 11:57 am - Permalink
Hi,
Compared:
To
duplicate/free /RMD=[rs,re][cs,ce] input, subimage
To create to subimage in the multithreaded function and the results were identical. I think it is the imageGCLM that is the main limiter in the overall time.
Andy
April 17, 2023 at 12:19 pm - Permalink
Hi,
Buttressed with a wee bit knowledge, I have tried to apply multithreading to a different problem, obviously without success or I wouldn't be posting.
Here is what I have been running.
from the command line:
where:
wave zero_point, temperature, r_time_dif
//add a point that reflects the end of data
duplicate/free zero_point, zp
insertPoints /v=(numpnts(temperature)) numpnts(temperature),1,zp
variable index,maxindex
maxindex = numpnts(zero_point)
duplicate/Free r_time_dif, time_interval
time_interval = r_time_dif*(temperature>min_temp)
duplicate/FREE zero_point,time_at_temp
for(index=0;index<maxindex;index+=1)
time_at_temp[index]=sum(time_interval,zp[index],(zp[index+1]-1))
endfor
return time_at_temp
end
This seems to work, but is on the slow side taking 75 secs.
My attempts to speed this up include a base function to replace the command line call
variable starttime=datetime
make/FREE/N=(dimsize(input,1)) indexhelper
multithread indexhelper[] = time_temperature(input,p)
print datetime-starttime
end
and a new function that writes the results in the function
variable min_temp
min_temp =indexToScale(input,coltemp,1)
wave zero_point, temperature, r_time_dif
//add a point that reflects the end of data
duplicate/FREE zero_point, zp
insertPoints /v=(numpnts(temperature)) numpnts(temperature),1,zp
variable index,maxindex
maxindex = numpnts(zero_point)
duplicate/FREE r_time_dif, time_interval
time_interval = r_time_dif*(temperature>min_temp)
duplicate/FREE zero_point,time_at_temp
for(index=0;index<maxindex;index+=1)
input[index][coltemp]=sum(time_interval,zp[index],(zp[index+1]-1))
endfor
end
The error is in the duplicate where it cannot find wave. So how do I do this correctly?
Andy
July 10, 2023 at 07:55 am - Permalink
Hi,
Simplified the thread safe function.
variable min_temp
min_temp =indexToScale(input,coltemp,1)
wave zp, temperature, r_time_dif
variable index,maxindex
maxindex = numpnts(zp)-1
duplicate/FREE r_time_dif, time_interval
time_interval = r_time_dif*(temperature>min_temp)
for(index=0;index<maxindex;index+=1)
input[index][coltemp]=sum(time_interval,zp[index],(zp[index+1]-1))
endfor
end
Still getting error: While executing Duplicate, the following error occurred: expected wave name. I checked and duplicate is shown as thread safe, but not automatically multithreaded. Do I need to do something here and if so what?
Andy
July 10, 2023 at 08:53 am - Permalink
If you are executing code in preemptive threads, this is what multithread normally does, you can't access the main threads datafolder hierarchy. So duplicate fails because r_time_dif is a null wave reference.
July 10, 2023 at 08:58 am - Permalink
In reply to If you are executing code in… by thomas_braun
Thanks Thomas,
With that insight I refactored the code to put the wave references into the calling function and pass it to the thread safe function.
wave tr =r_time_dif
wave zp,temperature
variable starttime=datetime
make/FREE/N=(dimsize(input,1)) indexhelper
multithread indexhelper[] = time_temperature(input,tr,zp,temperature,p)
print datetime-starttime
doupdate
end
threadsafe function time_temperature(wave input,wave time_ref,wave zp,wave temperature, variable coltemp)
variable min_temp
min_temp =indexToScale(input,coltemp,1)
variable index,maxindex
maxindex = numpnts(zp)-1
duplicate/FREE time_ref, time_interval
time_interval = time_ref*(temperature>min_temp)
for(index=0;index<maxindex;index+=1)
input[index][coltemp]=sum(time_interval,zp[index],(zp[index+1]-1))
endfor
end
it seems to work almost insanely fast going from 75 sec to 0.1 sec.
However I am now seeing some what strange behavior.
Prior to running the function I set the the receiving wave, summary_100, to zero. I want make sure some previous step didn't put data in so I can see a change.
I have a table open with summary_100 showing and it is all zeros.
I run the multithreaded function and the table still shows all zeros and as I am on a Mac that table is not the front most window because I needed to use the command window to execute the command. If I now bring that window to the front just by clicking on the title bar all the values in the table go from zero to their calculated values. Is this expected behavior and if so is there a way to automatically refresh. Some of the graphs I have created with summary_100 also are not refreshed.
Andy
July 10, 2023 at 09:24 am - Permalink
There is no summary_100 in your code in the last post, but I guess this is input? Writing into waves from preemptive threads might skip some updating logic in order to speed it up. I would add a `input[0][0] += 0` in the non-threadsafe function to force an update.
The for loop can also be replaced with p,q,r,s indexing in time_temperature.
July 10, 2023 at 09:31 am - Permalink