If any faster solution to this nested loop
shuvamsarkarhere
Hi everyone,
I was wondering if anyone can help me with this piece of nested loop posted below. I am working with 4d matrices and this loop is taking about 50 secs to calculate the result and this part of the whole program needs to run through many iterations, which is taking a large time to get me the final results.
If there is a better solution/algorithm which can run faster would be appreciated.
for(d=0;d<d3;d+=1) for(m=0;m<f3;m+=1) for(l=0;l<f3;l+=1) s=0 for(j=0;j<ysize3;j+=1) for(i=0;i<xsize3;i+=1) for(k=0;k<zsize;k+=1) s+=res2[i+l][j+m][k][d] endfor endfor endfor for(n=0;n<n3;n+=1) grd_L_ker3[l][m][d][n]=s endfor endfor endfor endfor grd_L_ker3*=err
Thanks in advance.
Shuvam
Hi,
Not sure I fully understand the code to suggest exactly what to do. It looks like you are summing groups of voxels and storing the result in grd_L_ker3. So the j, I and k loops could be cut out and written as a function where your l and m offsets are arguments and the function extracts the relevant voxels from the 4D wave and returns its sum. In fact, the chunk specified by n would be the same for all chunks, if I'm not mistaken, so that loop is redundant too. You can probably do the whole thing as a direct assignment to grd_L_ker3 with the right expression (one-liner)
January 5, 2021 at 02:06 pm - Permalink
It would have been helpful to have some background for this application. Otherwise (as noted by the previous comment), the code does not make much sense when the same sum is set for all chunks. Off hand this looks like some kind of a convolution with a constant kernel or a strange 3D boxcar averaging.
The fact that the index d appears in the sum in the last dimension and then in the grd_L_ker3 in the third dimension combined with the fact that the code does not show any indication of how it handles the boundaries of the matrix (possibly in d3, f3) makes me wonder if it performs its intended task. If it turns out that the code is correct as shown here then I would suggest breaking the sum loops using MatrixOP to extract chunk d of the wave res2, then replace the sum with MatrixFilter with the keyword avg (you get the actual sum by multiplying the result with factor=xsize3*ysize3*zsize3 which you would hopefully compute once outside the loop. Note that MatrixFilter does have code to handle the boundaries of the array so you may want to make sure that it works for your application.
AG
January 5, 2021 at 03:37 pm - Permalink
In reply to It would have been helpful… by Igor
Dear AG,
Your assumptions are right, I am working with kernels and its responses (here res2) and the loss function (of Convolutional Neural Network). In grd_L_ker3, I am calculating the gradients of the loss functions wrt the kernel weights.
The res2's last dimension and grd_L_ker3's 3rd dimension no.s are the same, so I put that way.
Thanks for your suggestion, this would be useful. I will try this.
BTW: I am also calculating the convolutions using loops as I found that the default convolve operation in MatrixOP handles the convolution with either zero-padding/reflection padding. But in this case, I actually do not want any kind of padding which means the response will be in smaller in size than the input wave. Is there any option in Igor so that I can do this convolution without any padding?
January 6, 2021 at 12:04 am - Permalink
In reply to Hi, Not sure I fully… by sjr51
Hi sjr51,
Thanks for pointing out that redundant loop I was using. I will remove that.
January 6, 2021 at 12:07 am - Permalink
Hello shuvam,
I don't think MatrixOP convolution is useful here because it operates on a layer by layer basis and this is probably not what you need.
Depending on the size of your kernel you should consider implementing the 3D convolution using the FFT operation.
If you must use explicit loops then try to multithread your code. That should give you an extra factor of ~n in speed (n is the number of available threads).
AG
January 6, 2021 at 09:58 am - Permalink