
ThreadSafeFunction

Erik_Kran
Hi, below is a program that uses Threadsafe. Powerbook 12 cores Apple M3 pro. igor Pro 8.04 I use 12 parallel cores.
There are two loops, and a message is returned as "While executing user function optimized, the following error occurred: Invalid Thread Group ID or index."
I suspect it is due to the still running calculations, so I have added a test with ThreadGroupWait., which does not work. Error message = "While executing a wave read, the following error occurred: Attempt to operate on a null (missing) wave"
Thanks for your help.
ThreadSafe Function Transmittance_ComputeOne(index) Variable index Wave/C delta1, delta3 Wave/C P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, n_w Wave T_FWF1, T_FWF2, T_FWF Variable/C ci = sqrt(-1) Variable/C d1 = delta1[index], d3 = delta3[index] Variable/C a, b, c, d P1_3D[0][0][index] = exp(ci * d1) P1_3D[0][1][index] = 0 P1_3D[1][0][index] = 0 P1_3D[1][1][index] = exp(-ci * d1) P3_3D[0][0][index] = exp(ci * d3) P3_3D[0][1][index] = 0 P3_3D[1][0][index] = 0 P3_3D[1][1][index] = exp(-ci * d3) // M_TOT1 = M01 * P1 * M12 a = M01_3D[0][0][index]*P1_3D[0][0][index] + M01_3D[0][1][index]*P1_3D[1][0][index] b = M01_3D[0][0][index]*P1_3D[0][1][index] + M01_3D[0][1][index]*P1_3D[1][1][index] c = M01_3D[1][0][index]*P1_3D[0][0][index] + M01_3D[1][1][index]*P1_3D[1][0][index] d = M01_3D[1][0][index]*P1_3D[0][1][index] + M01_3D[1][1][index]*P1_3D[1][1][index] M_TOT1_3D[0][0][index] = a * M12_3D[0][0][index] + b * M12_3D[1][0][index] M_TOT1_3D[0][1][index] = a * M12_3D[0][1][index] + b * M12_3D[1][1][index] M_TOT1_3D[1][0][index] = c * M12_3D[0][0][index] + d * M12_3D[1][0][index] M_TOT1_3D[1][1][index] = c * M12_3D[0][1][index] + d * M12_3D[1][1][index] // M_TOT2 = M23 * P3 * M30 a = M23_3D[0][0][index]*P3_3D[0][0][index] + M23_3D[0][1][index]*P3_3D[1][0][index] b = M23_3D[0][0][index]*P3_3D[0][1][index] + M23_3D[0][1][index]*P3_3D[1][1][index] c = M23_3D[1][0][index]*P3_3D[0][0][index] + M23_3D[1][1][index]*P3_3D[1][0][index] d = M23_3D[1][0][index]*P3_3D[0][1][index] + M23_3D[1][1][index]*P3_3D[1][1][index] M_TOT2_3D[0][0][index] = a * M30_3D[0][0][index] + b * M30_3D[1][0][index] M_TOT2_3D[0][1][index] = a * M30_3D[0][1][index] + b * M30_3D[1][1][index] M_TOT2_3D[1][0][index] = c * M30_3D[0][0][index] + d * M30_3D[1][0][index] M_TOT2_3D[1][1][index] = c * M30_3D[0][1][index] + d * M30_3D[1][1][index] T_FWF1[index] = real(n_w[index]) * (1 / cabs(M_TOT1_3D[1][1][index]))^2 T_FWF2[index] = (1 / real(n_w[index])) * (1 / cabs(M_TOT2_3D[1][1][index]))^2 T_FWF[index] = T_FWF1[index] * T_FWF2[index] End Function optimized(e_f1, e_f2) Variable e_f1, e_f2 // µm Variable/G e_f1_global, e_f2_global, e_w_global e_f1_global = e_f1 e_f2_global = e_f2 e_w_global = 3000 Wave n, k, wavenumber, lambda, T_FWF, T_FWF1, T_FWF2, T_BACK, TRANSMIT_2FILM Wave/C n_w, n_v, m Variable timerrefnum = StartMSTimer Variable/C ci = sqrt(-1) Variable Npts = numpnts(wavenumber) Make/O/C/N=(Npts) delta1, delta2, delta3 delta1 = (2 * pi / lambda) * m * e_f1 delta3 = (2 * pi / lambda) * m * e_f2 Make/O/C/N=(Npts) r01, r12, r23, r30, rw0 r01 = (n_v - m) / (n_v + m) r12 = (m - n_w) / (m + n_w) r23 = (n_w - m) / (n_w + m) r30 = (m - n_v) / (m + n_v) rw0 = (n_w - n_v) / (n_w + n_v) Make/O/C/N=(Npts) t01, t12, t23, t30, tw0, t0w t01 = 2 * n_v / (n_v + m) t12 = 2 * m / (m + n_w) t23 = 2 * n_w / (n_w + m) t30 = 2 * m / (m + n_v) tw0 = 2 * n_w / (n_w + n_v) t0w = 2 / (n_w + n_v) Make/O/C/N=(2,2,Npts) P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D Make/O/N=(Npts) T_FWF1, T_FWF2, T_FWF, T_BACK, TRANSMIT_2FILM // Remplissage des matrices 3D, dimensions Npts pour chaque wavenumber Variable i for(i = 0; i < Npts; i += 1) M01_3D[0][0][i] = 1 / t01[i]; M01_3D[0][1][i] = r01[i] / t01[i] M01_3D[1][0][i] = r01[i] / t01[i]; M01_3D[1][1][i] = 1 / t01[i] M12_3D[0][0][i] = 1 / t12[i]; M12_3D[0][1][i] = r12[i] / t12[i] M12_3D[1][0][i] = r12[i] / t12[i]; M12_3D[1][1][i] = 1 / t12[i] M23_3D[0][0][i] = 1 / t23[i]; M23_3D[0][1][i] = r23[i] / t23[i] M23_3D[1][0][i] = r23[i] / t23[i]; M23_3D[1][1][i] = 1 / t23[i] M30_3D[0][0][i] = 1 / t30[i]; M30_3D[0][1][i] = r30[i] / t30[i] M30_3D[1][0][i] = r30[i] / t30[i]; M30_3D[1][1][i] = 1 / t30[i] endfor // 12 cores paralel Variable nThreads = ThreadProcessorCount // core number = 12 Variable n_loop_b12 = floor(Npts / nThreads) // number of loops, each bloc contains 12 Variable n_loop_reste=Npts-n_loop_b12*nThreads Variable t Variable tgID = ThreadGroupCreate(nThreads) print "tgID =", tgID , " n_loop_b12 =", n_loop_b12 , " n_loop_reste=", n_loop_reste , " nthreads-1=", nThreads-1 variable j, threadGroupStatus for(i = 0; i < n_loop_b12; i += 1) for(j = 0; j < nThreads; j += 1) ThreadStart tgID, j, Transmittance_ComputeOne(j+i*nThreads) endfor do threadGroupStatus = ThreadGroupWait(tgID,100) while (threadGroupStatus != 0) endfor /// last few for(i = 0; i < n_loop_reste; i += 1) ThreadStart tgID, i, Transmittance_ComputeOne(Npts-n_loop_b12*nThreads+i) endfor T_BACK = ((4 * real(n_w)) / (1 + real(n_w))^2)^2 TRANSMIT_2FILM = T_FWF / T_BACK Variable microseconds = StopMSTimer(timerrefnum) Print microseconds / 10000, "ms" End
Here is a cleaner code, but pb not fixed
March 29, 2025 at 07:17 am - Permalink
I have not looked at your function in detail but I did notice that there is no call to ThreadGroupRelease.
The "ThreadSafe Functions and Multitasking" help topic says "Once you are finished with a given thread group, call ThreadGroupRelease.".
Also, so that others can try debugging it, you might consider trying to simplify the code as much as possible and then posting an experiment file that contains the function and the waves needed to run it.
March 29, 2025 at 08:31 am - Permalink
Attached is the Igor 8.04 file
It's quite messy so I've tried to tidy a little bit. Here are the functions :
Calculate_Transmittance_FilmWindowFilm(e_f1, e_f2) - it calculates the transmission of two thin films with thickness e_f1 and e_f2 (typically, 2.45 and 0.2 mm). The function calls generate_wave().
I need to lower the execution time, so tried to use Theashade functions.
Thanks in advance.
March 29, 2025 at 09:42 am - Permalink
Translate certain explicit for-loop constructions to implicit wave arithmetic. Here is one example.
March 29, 2025 at 11:22 am - Permalink
Using your experiment file, I executed this:
I got no errors and it printed this:
1403.06 ms
March 29, 2025 at 02:31 pm - Permalink
This function is the initial function that needs to be optimized. It works well, but is slow. I've tried to optimize through the optimize(eF1,eF2) function, using threading. For this function only 100 loops of the "ThreadSafe Function Transmittance_ComputeOne" lead to printing. It does not work. Tehre is no error message, I guess there's an issue with NaN or other.
Following advice from jjweimer I have edited a new code (below) by means of implicit and using 3D wave for matrix multiplication. It works well and faster (~ 170 ms).
The code is below. Maybe there is some way to improve it further through multithreading?
Is it worth moving to Igor Pro 9 (now I have Igor 8.04) for multithreading ?
Thanks.
March 30, 2025 at 04:49 am - Permalink
There are places where you calculate the same wave expressions multiple times. This is somewhat obscured by representing the same sum in two different ways, for example "n_v+m" and "m+n_v". So I would regularize this by changing this:
with this:
Now you might gain some speed by using free waves to store intermediate waves, like this:
You should check my changes to make sure I didn't mix something up.
You could do something similar with "2*pi/lambda" and "real(n_w)" each of which appears twice.
March 30, 2025 at 07:24 am - Permalink
You may be able to gain some efficiency using FastOp in expressions like
FastOp n_v_plus_m += m
execute DisplayHelpTopic "FastOP" for help
March 31, 2025 at 12:55 am - Permalink
Looks like you also need to fix the indexing here (and elsewhere) - at the moment you are calculating only the ith value of T_FWF:
most likely [i] should be replaced by [p] in a few places
March 31, 2025 at 02:11 am - Permalink
Hi Tony, Thanks for the hint. However the code does not compile, for
It seems it does no support the variables V1 and V2. I've not found insights into this in the Display Topic. Error message is "Syntax does not conform to FastOp requirements."
March 31, 2025 at 02:30 am - Permalink
it looks to me like you need to change [q] to [r] everywhere and remove [i] (or replace with [p] if you prefer)
March 31, 2025 at 02:34 am - Permalink
In reply to Hi Tony, Thanks for the hint… by Erik_Kran
try
FastOp T_FWF1 = (V1) * n_w_real
you need the parentheses. i just typed the code here without checking anything, probably there are other errors.
March 31, 2025 at 02:40 am - Permalink
Ah yes it compiles, the variable needs parentheses and to be first.
The execution time of the the initial code is 284 ms. With FASTOP it is 3515 ms :-(
not so fast finally ;-)
March 31, 2025 at 02:52 am - Permalink
In reply to Ah yes it compiles, the… by Erik_Kran
are you comparing your original version that computes one point with the FastOp version that operates on the whole wave?
it's unlikely that FastOp is significantly slower than the same wave assignment calculated without optimization.
In other words: remove all instances of [i] from your code, then compare speed with addition of FastOp.
March 31, 2025 at 02:58 am - Permalink
below the two codes : origin and with FastOp
And the second codes that includes FastOp :
March 31, 2025 at 03:02 am - Permalink
There's a for-loop in your code that was not in the function that you posted here
https://www.wavemetrics.com/comment/25693#comment-25693
Add the FastOp part to the code without the loop and it should run faster. It isn't supposed to go inside the explicit loop.
March 31, 2025 at 03:19 am - Permalink
What you probably mean to do (in the code with 3D matrices) is something like this:
March 31, 2025 at 03:39 am - Permalink
I did it but did not see any improvment. I've tried for files with 10000 and 1E6 lines, but no change.
March 31, 2025 at 10:18 am - Permalink
Make sure that your different code versions achieve the same result before comparing execution speed. Clear any saved data from intermediate steps before testing the code to make sure that it actually works. If most of the processing time is spent on matrix multiplication, optimizing the 1D wave assignments using FastOp is not going to make much difference to the total processing time, but those steps should be far more efficient than the non-optimized equivalents.
April 1, 2025 at 02:16 am - Permalink