ThreadSafeFunction | Igor Pro by WaveMetrics

Here is a cleaner code, but pb not fixed

ThreadSafe Function Transmittance_ComputeOne(index)
    Variable index
    Wave/C delta1, delta3
    Wave/C P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, n_w
    Wave T_FWF1, T_FWF2, T_FWF
 
    Variable/C ci = sqrt(-1)
    Variable/C d1 = delta1[index], d3 = delta3[index]
    Variable/C a, b, c, d
 
    P1_3D[0][0][index] = exp(ci * d1)
    P1_3D[0][1][index] = 0
    P1_3D[1][0][index] = 0
    P1_3D[1][1][index] = exp(-ci * d1)
 
    P3_3D[0][0][index] = exp(ci * d3)
    P3_3D[0][1][index] = 0
    P3_3D[1][0][index] = 0
    P3_3D[1][1][index] = exp(-ci * d3)
 
    // M_TOT1 = M01 * P1 * M12
    a = M01_3D[0][0][index]*P1_3D[0][0][index] + M01_3D[0][1][index]*P1_3D[1][0][index]
    b = M01_3D[0][0][index]*P1_3D[0][1][index] + M01_3D[0][1][index]*P1_3D[1][1][index]
    c = M01_3D[1][0][index]*P1_3D[0][0][index] + M01_3D[1][1][index]*P1_3D[1][0][index]
    d = M01_3D[1][0][index]*P1_3D[0][1][index] + M01_3D[1][1][index]*P1_3D[1][1][index]
 
    M_TOT1_3D[0][0][index] = a * M12_3D[0][0][index] + b * M12_3D[1][0][index]
    M_TOT1_3D[0][1][index] = a * M12_3D[0][1][index] + b * M12_3D[1][1][index]
    M_TOT1_3D[1][0][index] = c * M12_3D[0][0][index] + d * M12_3D[1][0][index]
    M_TOT1_3D[1][1][index] = c * M12_3D[0][1][index] + d * M12_3D[1][1][index]
 
    // M_TOT2 = M23 * P3 * M30
    a = M23_3D[0][0][index]*P3_3D[0][0][index] + M23_3D[0][1][index]*P3_3D[1][0][index]
    b = M23_3D[0][0][index]*P3_3D[0][1][index] + M23_3D[0][1][index]*P3_3D[1][1][index]
    c = M23_3D[1][0][index]*P3_3D[0][0][index] + M23_3D[1][1][index]*P3_3D[1][0][index]
    d = M23_3D[1][0][index]*P3_3D[0][1][index] + M23_3D[1][1][index]*P3_3D[1][1][index]
 
    M_TOT2_3D[0][0][index] = a * M30_3D[0][0][index] + b * M30_3D[1][0][index]
    M_TOT2_3D[0][1][index] = a * M30_3D[0][1][index] + b * M30_3D[1][1][index]
    M_TOT2_3D[1][0][index] = c * M30_3D[0][0][index] + d * M30_3D[1][0][index]
    M_TOT2_3D[1][1][index] = c * M30_3D[0][1][index] + d * M30_3D[1][1][index]
 
    T_FWF1[index] = real(n_w[index]) * (1 / cabs(M_TOT1_3D[1][1][index]))^2
    T_FWF2[index] = (1 / real(n_w[index])) * (1 / cabs(M_TOT2_3D[1][1][index]))^2
    T_FWF[index] = T_FWF1[index] * T_FWF2[index]
    
    return stopMSTimer(-2)
    
End
 
 
 
 
Function optimized(e_f1, e_f2)
    Variable e_f1, e_f2  // µm
    Wave n, k, wavenumber, lambda, T_FWF, T_BACK, TRANSMIT_2FILM
    Wave/C n_w, n_v, m
 
    Variable timerrefnum = StartMSTimer
    Variable/C ci = sqrt(-1)
    Variable Npts = numpnts(wavenumber)
 
    Make/O/C/N=(Npts) delta1, delta3
    delta1 = (2 * pi / lambda) * m * e_f1
    delta3 = (2 * pi / lambda) * m * e_f2
 
    Make/O/C/N=(Npts) r01, r12, r23, r30, rw0
    r01 = (n_v - m) / (n_v + m)
    r12 = (m - n_w) / (m + n_w)
    r23 = (n_w - m) / (n_w + m)
    r30 = (m - n_v) / (m + n_v)
    rw0 = (n_w - n_v) / (n_w + n_v)
 
    Make/O/C/N=(Npts) t01, t12, t23, t30, tw0, t0w
    t01 = 2 * n_v / (n_v + m)
    t12 = 2 * m / (m + n_w)
    t23 = 2 * n_w / (n_w + m)
    t30 = 2 * m / (m + n_v)
    tw0 = 2 * n_w / (n_w + n_v)
    t0w = 2 / (n_w + n_v)
 
    Make/O/C/N=(2,2,Npts) P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D
    Make/O/N=(Npts) T_FWF1, T_FWF2, T_FWF, T_BACK, TRANSMIT_2FILM
 
 
//  Remplissage des matrices 3D, dimensions Npts pour chaque wavenumber
    Variable i
    for(i = 0; i < Npts; i += 1)
        M01_3D[0][0][i] = 1 / t01[i]; M01_3D[0][1][i] = r01[i] / t01[i]
        M01_3D[1][0][i] = r01[i] / t01[i]; M01_3D[1][1][i] = 1 / t01[i]
 
        M12_3D[0][0][i] = 1 / t12[i]; M12_3D[0][1][i] = r12[i] / t12[i]
        M12_3D[1][0][i] = r12[i] / t12[i]; M12_3D[1][1][i] = 1 / t12[i]
 
        M23_3D[0][0][i] = 1 / t23[i]; M23_3D[0][1][i] = r23[i] / t23[i]
        M23_3D[1][0][i] = r23[i] / t23[i]; M23_3D[1][1][i] = 1 / t23[i]
 
        M30_3D[0][0][i] = 1 / t30[i]; M30_3D[0][1][i] = r30[i] / t30[i]
        M30_3D[1][0][i] = r30[i] / t30[i]; M30_3D[1][1][i] = 1 / t30[i]
    endfor
 
 
// 12 cores paralel
    Variable nThreads = ThreadProcessorCount // core number = 12
    Variable n_loop_b12 = floor(Npts / nThreads) // number of loops, each bloc contains 12
    Variable n_loop_reste=Npts-n_loop_b12*nThreads
    Variable tgID = ThreadGroupCreate(nThreads)  
    
    print "tgID =", tgID , "   n_loop_b12 =", n_loop_b12 , "   n_loop_reste=", n_loop_reste , "   nthreads-1=", nThreads-1
    
    variable j
 
   for(i = 0; i < n_loop_b12; i += 1)
            for(j = 0; j < nThreads; j += 1)
                ThreadStart tgID, j, Transmittance_ComputeOne(10000)//(j+i*nThreads)
        endfor
            do
            variable threadGroupStatus = ThreadGroupWait(tgID,100)
                while (threadGroupStatus != 0)
    endfor
    
    /// last few
    
    for(i = 0; i < n_loop_reste; i += 1)
        ThreadStart tgID, i, Transmittance_ComputeOne(Npts-n_loop_b12*nThreads+i)
    endfor
 
   
    T_BACK = ((4 * real(n_w)) / (1 + real(n_w))^2)^2
    TRANSMIT_2FILM = T_FWF / T_BACK
 
    Variable microseconds = StopMSTimer(timerrefnum)
    Print microseconds / 10000, "ms"
End

Log in or register to post comments

March 29, 2025 at 07:17 am - Permalink

hrodstein

I have not looked at your function in detail but I did notice that there is no call to ThreadGroupRelease.

The "ThreadSafe Functions and Multitasking" help topic says "Once you are finished with a given thread group, call ThreadGroupRelease.".

Also, so that others can try debugging it, you might consider trying to simplify the code as much as possible and then posting an experiment file that contains the function and the waves needed to run it.

Log in or register to post comments

March 29, 2025 at 08:31 am - Permalink

Erik_Kran

Attached is the Igor 8.04 file

It's quite messy so I've tried to tidy a little bit. Here are the functions :

Function Calculate_Transmittance_FilmWindowFilm(e_f1, e_f2) + Function generate_wave()

Calculate_Transmittance_FilmWindowFilm(e_f1, e_f2) - it calculates the transmission of two thin films with thickness e_f1 and e_f2 (typically, 2.45 and 0.2 mm). The function calls generate_wave().

I need to lower the execution time, so tried to use Theashade functions.

ThreadSafe Function Transmittance_ComputeOne(index, etc....)

Function optimized(e_f1, e_f2)

Thanks in advance.

ThreadSafe Function Transmittance_ComputeOne(index, T_FWF1, T_FWF2, T_FWF, P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, n_w, tM01, tP1, tM12, tM_tot2, tM23, tP3, tM30, tM_tot1)
    Variable index
    wave T_FWF1, T_FWF2, T_FWF
    wave/C P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, n_w, tM01, tP1, tM12, tM_tot2, tM23, tP3, tM30, tM_tot1
    
    variable/C a, b, c, d 
    Variable/C ci=sqrt(-1) 
 
//    Make/O/C/N=(2,2) tM01, tP1, tM12, tM_tot2, tM23, tP3, tM30, tM_tot1
    tM01[][]=M01_3D[p][q][index]
     tP1[][]=P1_3D[p][q][index]
      tM12[][]=M12_3D[p][q][index]
       tM_tot2[][]=M_tot2_3D[p][q][index]
        tM23[][]=M23_3D[p][q][index]
         tP3[][]=P3_3D[p][q][index]
          tM30[][]=M30_3D[p][q][index]
          
    MatrixOp /C /O tM_tot1= tM01 x tP1 x tM12
    MatrixOp /C /O tM_tot2= tM23 x tP3 x tM30
    
    MatrixOp /C /O tM_tot1= tM01 x tP1 x tM12
    MatrixOp /C /O tM_tot2= tM23 x tP3 x tM30
 
    T_FWF1[index] = real(n_w[index]) * (1 / cabs(tM_TOT1[1][1]))^2
    T_FWF2[index] = (1 / real(n_w[index])) * (1 / cabs(tM_TOT2[1][1]))^2
    T_FWF[index] = T_FWF1[index] * T_FWF2[index]
   
   print "index=", index
 
End
 
 
 
 
Function optimized(e_f1, e_f2)
    Variable e_f1, e_f2  // µm
    Wave n, k, wavenumber, lambda, T_FWF, T_BACK, TRANSMIT_2FILM, T_FWF1, T_FWF2
    Wave/C n_w, n_v, m
    wave/C   P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, tM01, tP1, tM12, tM_tot2, tM23, tP3, tM30, tM_tot1
 
 
    Variable timerrefnum = StartMSTimer
    Variable/C ci = sqrt(-1)
    Variable Npts = numpnts(wavenumber), indix
 
    Make/O/C/N=(Npts) delta1, delta3
    delta1 = (2 * pi / lambda) * m * e_f1
    delta3 = (2 * pi / lambda) * m * e_f2
 
    Make/O/C/N=(Npts) r01, r12, r23, r30, rw0
    r01 = (n_v - m) / (n_v + m)
    r12 = (m - n_w) / (m + n_w)
    r23 = (n_w - m) / (n_w + m)
    r30 = (m - n_v) / (m + n_v)
    rw0 = (n_w - n_v) / (n_w + n_v)
 
    Make/O/C/N=(Npts) t01, t12, t23, t30, tw0, t0w
    t01 = 2 * n_v / (n_v + m)
    t12 = 2 * m / (m + n_w)
    t23 = 2 * n_w / (n_w + m)
    t30 = 2 * m / (m + n_v)
    tw0 = 2 * n_w / (n_w + n_v)
    t0w = 2 / (n_w + n_v)
 
 
//  Remplissage des matrices 3D, dimensions Npts pour chaque wavenumber
 
   
    Variable/C d1, d3
 
    Variable i
    for(i = 0; i < Npts; i += 1)
            d1 = delta1[i]; d3 = delta3[i]
            M01_3D[0][0][i] = 1 / t01[i]; M01_3D[0][1][i] = r01[i] / t01[i]
            M01_3D[1][0][i] = r01[i] / t01[i]; M01_3D[1][1][i] = 1 / t01[i]
 
            M12_3D[0][0][i] = 1 / t12[i]; M12_3D[0][1][i] = r12[i] / t12[i]
            M12_3D[1][0][i] = r12[i] / t12[i]; M12_3D[1][1][i] = 1 / t12[i]
 
            M23_3D[0][0][i] = 1 / t23[i]; M23_3D[0][1][i] = r23[i] / t23[i]
            M23_3D[1][0][i] = r23[i] / t23[i]; M23_3D[1][1][i] = 1 / t23[i]
 
            M30_3D[0][0][i] = 1 / t30[i]; M30_3D[0][1][i] = r30[i] / t30[i]
            M30_3D[1][0][i] = r30[i] / t30[i]; M30_3D[1][1][i] = 1 / t30[i]
        
            P1_3D[0][0][i] = exp(ci * d1)
            P1_3D[0][1][i] = 0
            P1_3D[1][0][i] = 0
            P1_3D[1][1][i] = exp(-ci * d1)
 
            P3_3D[0][0][i] = exp(ci * d3)
            P3_3D[0][1][i] = 0
            P3_3D[1][0][i] = 0
            P3_3D[1][1][i] = exp(-ci * d3)
    
    endfor
 
 
// 12 cores paralel
    Variable nThreads = ThreadProcessorCount // core number = 12
    Variable n_loop_b12 = floor(Npts / nThreads) // number of loops, each bloc contains 12
    Variable n_loop_reste=Npts-n_loop_b12*nThreads
    Variable tgID = ThreadGroupCreate(nThreads)  
    
    print "tgID =", tgID , "   n_loop_b12 =", n_loop_b12 , "   n_loop_reste=", n_loop_reste , "   nthreads-1=", nThreads-1
    
    variable j
 
   for(i = 0; i < n_loop_b12; i += 1)
            for(j = 0; j < nThreads; j += 1)
            indix=nthreads*i+j
            print "indix=", indix
                ThreadStart tgID, j, Transmittance_ComputeOne(indix,P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, n_w, T_FWF1, T_FWF2, T_FWF, tM01, tP1, tM12, tM_tot2, tM23, tP3, tM30, tM_tot1)
        endfor
 
            do
            variable threadGroupStatus = ThreadGroupWait(tgID,100)
                while (threadGroupStatus != 0)
    endfor
    
    /// last few
    
    for(i = 0; i < n_loop_reste; i += 1)
        ThreadStart tgID, i, Transmittance_ComputeOne(i,P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, n_w, T_FWF1, T_FWF2, T_FWF, tM01, tP1, tM12, tM_tot2, tM23, tP3, tM30, tM_tot1)
    endfor
 
   variable killallrunning=ThreadGroupRelease(-2)
   
    T_BACK = ((4 * real(n_w)) / (1 + real(n_w))^2)^2
    TRANSMIT_2FILM = T_FWF / T_BACK
 
    Variable microseconds = StopMSTimer(timerrefnum)
    Print microseconds / 10000, "ms"
End

Attachments For_TEST_Wametrics_Forum.pxp (17.96 MB)

Log in or register to post comments

March 29, 2025 at 09:42 am - Permalink

jjweimer

Translate certain explicit for-loop constructions to implicit wave arithmetic. Here is one example.

    for(i = 0; i < Npts; i += 1)
            ...
            M01_3D[0][0][i] = 1 / t01[i]; M01_3D[0][1][i] = r01[i] / t01[i]
            ...
    endfor
 
 
M01_3D[0][0][] = (1/t01[q]); M01_3D[0][1][q] = (r01[q] / t01[q])

Log in or register to post comments

March 29, 2025 at 11:22 am - Permalink

hrodstein

Using your experiment file, I executed this:

Calculate_Transmittance_FilmWindowFilm(2.45, 0.2)

I got no errors and it printed this:

1403.06 ms

Log in or register to post comments

March 29, 2025 at 02:31 pm - Permalink

Erik_Kran

This function is the initial function that needs to be optimized. It works well, but is slow. I've tried to optimize through the optimize(eF1,eF2) function, using threading. For this function only 100 loops of the "ThreadSafe Function Transmittance_ComputeOne" lead to printing. It does not work. Tehre is no error message, I guess there's an issue with NaN or other.

Following advice from jjweimer I have edited a new code (below) by means of implicit and using 3D wave for matrix multiplication. It works well and faster (~ 170 ms).

The code is below. Maybe there is some way to improve it further through multithreading?

Is it worth moving to Igor Pro 9 (now I have Igor 8.04) for multithreading ?

Thanks.

Function OPTIMIZED_VECTORIZED_Transmittance_FWF(e_f1, e_f2)
variable e_f1, e_f2 // µm
 
wave n, k, wavenumber, lambda, T_FWF, T_FWF1, T_FWF2, T_BACK, TRANSMIT_2FILM
wave /C n_w, n_v, m, r01, r12, r23, r30, rw0, t01, t12, r23, t23, r30, t30, rw0, tw0, t0w, P1_3D, P3_3D, M01_3D, M12_3D, M30_3D, M23_3D, Mw0_3D
wave /C P1,P2, P3, M01, M12, M23, M30, Mw0, M_TOT1, M_TOT2, delta1, delta3
variable i, timerrefnum, microseconds
variable /c ci
variable Npnts=numpnts(wavenumber)
 
timerRefNum = StartMSTimer
 
ci=sqrt(-1)
    r01=(n_v-m)/(n_v+m) // vacuum / film 1
    r12=(m-n_w)/(m+n_w) // film1/window
    r23=(n_w-m)/(n_w+m) // window / film2
    r30=(m-n_v)/(m+n_v) // film2/vide
    rw0=(n_w-n_v)/(n_w+n_v) // substrat vide
 
    t01=2*n_v/(n_v+m)       // vacuum / film 1
    t12=2*m/(m+n_w)         // film1/window
    t23=2*n_w/(n_w+m)       // window / film2
    t30=2*m/(m+n_v)         // film2/vide
    tw0=2*n_w/(n_w+n_v) // substrat vide
    t0w=2/(n_w+n_v)
 
 
delta1=(2*pi/lambda)*m[i]*e_f1      // film principal
delta3=(2*pi/lambda)*m[i]*e_f2      // film parasite
 
// remplissage des matrices
// matrice de phase dans le film de glace
 
 
P1_3D[0][0][]=exp(ci*delta1[q])
P1_3D[0][1][]=0
P1_3D[1][0][]=0
P1_3D[1][1][]=exp(-ci*delta1[q])
 
 
// matrice de phase dans le film arrière
P3_3D[0,0][]=exp(ci*delta3[q])
P3_3D[0][1][]=0
P3_3D[1][0][]=0
P3_3D[1][1][]=exp(-ci*delta3[q])
 
/// vide - film m=n+ik
 
// M_3D[1][1][] = R[q]  
 
M01_3D[0][0][]=1
M01_3D[0][1][]=r01[q]
M01_3D[1][0][]=r01[q]
M01_3D[1][1][]=1
 
M01_3D=M01_3D/t01[q]
 
/// Matrice film - substrat window
M12_3D[0][0][]=1            
M12_3D[0][1][]=r12[q]
M12_3D[1][0][]=r12[q]
M12_3D[1][1][]=1
 
M12_3D=M12_3D/t12[i]
 
M23_3D[0][0][]=1
M23_3D[0][1][]=r23[q]
M23_3D[1][0][]=r23[q]
M23_3D[1][1][]=1
 
M23_3D=M23_3D/t23[q]
 
M30_3D[0][0][]=1
M30_3D[0][1][]=r30[q]
M30_3D[1][0][]=r30[q]
M30_3D[1][1][]=1
 
M30_3D=M30_3D/t30[q]
 
/// Matrice substrat window - vide
 
Mw0_3D[0][0][]=1
Mw0_3D[0][1][]=rw0[q]
Mw0_3D[1][0][]=rw0[q]
Mw0_3D[1][1][]=1
 
Mw0_3D=Mw0_3D/tw0[q]
 
MatrixOp /C /O M_tot1_3D= M01_3D x P1_3D x M12_3D
MatrixOp /C /O M_tot2_3D= M23_3D x P3_3D x M30_3D
 
/// Transmission total SYSTEME FILM+SUBSTRAT+FILM ARRIERE
T_FWF1[i]=(real(n_w[i]/1))*(1/cabs(M_TOT1[1][1]))^2 // du vide vers le substrat, on divise par n du milieu de sortie divisé par milieu d'entrée
T_FWF2[i]=(1/real(n_w[i]))*(1/cabs(M_TOT2[1][1]))^2  // du substrat vers le vide
T_FWF[i]=T_FWF1[i]*T_FWF2[i]
 
/// TRANSMITTANCE SUBSTRAT SEUL (SPECTRE DE BACKGROUND)
T_back=((4*real(n_w))/(1+real(n_w))^2)^2 // Cas vide - substrat - vide, SANS interférences
 
/// CALCUL DU SPECTRE DE TRANSMITTANCE, DIRECTEMENT COMPARABLE A LA MESURE
TRANSMIT_2FILM=T_FWF/T_BACK
 
    microSeconds = StopMSTimer(timerRefNum)
    Print microSeconds/1000, "ms"
 
End

Log in or register to post comments

March 30, 2025 at 04:49 am - Permalink

hrodstein

There are places where you calculate the same wave expressions multiple times. This is somewhat obscured by representing the same sum in two different ways, for example "n_v+m" and "m+n_v". So I would regularize this by changing this:

    r01=(n_v-m)/(n_v+m)     // vacuum / film 1
    r12=(m-n_w)/(m+n_w)     // film1/window (n_w+m)
    r23=(n_w-m)/(n_w+m)     // window / film2
    r30=(m-n_v)/(m+n_v      // film2/vide (n_v+m)
    rw0=(n_w-n_v)/(n_w+n_v)     // substrat vide
 
    t01=2*n_v/(n_v+m)       // vacuum / film 1
    t12=2*m/(m+n_w)         // film1/window
    t23=2*n_w/(n_w+m)       // window / film2
    t30=2*m/(m+n_v)         // film2/vide (n_v+m)
    tw0=2*n_w/(n_w+n_v)     // substrat vide
    t0w=2/(n_w+n_v)

with this:

    // Replace m+n_w with n_w+m
    // Replace m+n_v with n_v+m
    
    r01=(n_v-m)/(n_v+m)     // vacuum / film 1
    r12=(m-n_w)/(n_w+m)     // film1/window
    r23=(n_w-m)/(n_w+m)     // window / film2
    r30=(m-n_v)/(n_v+m)     // film2/vide
    rw0=(n_w-n_v)/(n_w+n_v) // substrat vide
 
    t01=2*n_v/(n_v+m)       // vacuum / film 1
    t12=2*m/(n_w+m)         // film1/window
    t23=2*n_w/(n_w+m)       // window / film2
    t30=2*m/(n_v+m)         // film2/vide
    tw0=2*n_w/(n_w+n_v)     // substrat vide
    t0w=2/(n_w+n_v)

Now you might gain some speed by using free waves to store intermediate waves, like this:

    // Now use free waves to avoid calculating the same vectors multiple times
 
    Duplicate/FREE n_v, n_v_plus_m; n_v_plus_m += m // n_v+m
    Duplicate/FREE n_w, n_w_plus_m; n_w_plus_m += m // n_w+m
    Duplicate/FREE n_w, n_w_plus_n_v; n_w_plus_n_v += n_v   // n_w+n_v
    Duplicate/FREE m, two_m; two_m *= 2 // 2*m
    Duplicate/FREE two_n_w, n_w; two_n_w *= 2   // 2*n_w
    
    r01=(n_v-m)/(n_v_plus_m)        // vacuum / film 1
    r12=(m-n_w)/(n_w_plus_m)        // film1/window
    r23=(n_w-m)/(n_w_plus_m)        // window / film2
    r30=(m-n_v)/(n_v_plus_m)        // film2/vide
    rw0=(n_w-n_v)/(n_w_plus_n_v)    // substrat vide
 
    t01=2*n_v/(n_v_plus_m)          // vacuum / film 1
    t12=two_m/(n_w_plus_m)          // film1/window
    t23=two_n_w/(n_w_plus_m)        // window / film2
    t30=two_m/(n_v_plus_m)          // film2/vide
    tw0=two_n_w/(n_w_plus_n_v)      // substrat vide
    t0w=2/(n_w_plus_n_v)

You should check my changes to make sure I didn't mix something up.

You could do something similar with "2*pi/lambda" and "real(n_w)" each of which appears twice.

Log in or register to post comments

March 30, 2025 at 07:24 am - Permalink

tony

You may be able to gain some efficiency using FastOp in expressions like

FastOp n_v_plus_m += m

execute DisplayHelpTopic "FastOP" for help

Log in or register to post comments

March 31, 2025 at 12:55 am - Permalink

tony

Looks like you also need to fix the indexing here (and elsewhere) - at the moment you are calculating only the ith value of T_FWF:

T_FWF1[i]=(real(n_w[i]/1))*(1/cabs(M_TOT1[1][1]))^2 // du vide vers le substrat, on divise par n du milieu de sortie divisé par milieu d'entrée
T_FWF2[i]=(1/real(n_w[i]))*(1/cabs(M_TOT2[1][1]))^2  // du substrat vers le vide
T_FWF[i]=T_FWF1[i]*T_FWF2[i]
 
// should probably be something like
 
duplicate/free T_FWF n_w_real
n_w_real = real(n_w)
variable V1 = (1/cabs(M_TOT1[1][1]))^2
variable V2 = (1/cabs(M_TOT2[1][1]))^2
 
FastOp T_FWF1 = n_w_real * V1
FastOp T_FWF2 = 1 / n_w_real
FastOp T_FWF2 = T_FWF2 * V2 
FastOp T_FWF = T_FWF1 * T_FWF2

most likely [i] should be replaced by [p] in a few places

Log in or register to post comments

March 31, 2025 at 02:11 am - Permalink

Erik_Kran

Hi Tony, Thanks for the hint. However the code does not compile, for

FastOp T_FWF1 = n_w_real*V1

FastOp T_FWF2 = n_w_real*V2

It seems it does no support the variables V1 and V2. I've not found insights into this in the Display Topic. Error message is "Syntax does not conform to FastOp requirements."

Log in or register to post comments

March 31, 2025 at 02:30 am - Permalink

tony

P1_3D[0][0][]=exp(ci*delta1[q])
P1_3D[0][1][]=0
P1_3D[1][0][]=0
P1_3D[1][1][]=exp(-ci*delta1[q])
 
// is the same as
 
variable/C C1 = exp(ci * delta1[0])
variable/C C2 = exp(-ci * delta1[1])
 
P1_3D[0][0][]=C1
P1_3D[0][1][]=0
P1_3D[1][0][]=0
P1_3D[1][1][]=C2
 
// but maybe you mean to do this
 
P1_3D[0][0][]=exp(ci*delta1[r])
P1_3D[0][1][]=0
P1_3D[1][0][]=0
P1_3D[1][1][]=exp(-ci*delta1[r])

it looks to me like you need to change [q] to [r] everywhere and remove [i] (or replace with [p] if you prefer)

Log in or register to post comments

March 31, 2025 at 02:34 am - Permalink

tony

Erik_Kran wrote:

Hi Tony, Thanks for the hint. However the code does not compile, for
FastOp T_FWF1 = n_w_real*V1
FastOp T_FWF2 = n_w_real*V2
It seems it does no support the variables V1 and V2. I've not found insights into this in the Display Topic. Error message is "Syntax does not conform to FastOp requirements."

try

FastOp T_FWF1 = (V1) * n_w_real

you need the parentheses. i just typed the code here without checking anything, probably there are other errors.

Log in or register to post comments

March 31, 2025 at 02:40 am - Permalink

Erik_Kran

Ah yes it compiles, the variable needs parentheses and to be first.

The execution time of the the initial code is 284 ms. With FASTOP it is 3515 ms :-(

not so fast finally ;-)

Log in or register to post comments

March 31, 2025 at 02:52 am - Permalink

tony

Erik_Kran wrote:

Ah yes it compiles, the variable needs parentheses and to be first.

The execution time of the the initial code is 284 ms. With FASTOP it is 3515 ms :-(

not so fast finally ;-)

are you comparing your original version that computes one point with the FastOp version that operates on the whole wave?

it's unlikely that FastOp is significantly slower than the same wave assignment calculated without optimization.

In other words: remove all instances of [i] from your code, then compare speed with addition of FastOp.

Log in or register to post comments

March 31, 2025 at 02:58 am - Permalink

Erik_Kran

below the two codes : origin and with FastOp

Function Calculate_Transmittance_FilmWindowFilm(e_f1, e_f2)
variable e_f1, e_f2 // µm
 
wave n, k, wavenumber, lambda, T_FWF, T_FWF1, T_FWF2, T_BACK, TRANSMIT_2FILM
wave /C n_w, n_v, m, r01, r12, r23, r30, rw0, t01, t12, r23, t23, r30, t30, rw0, tw0, t0w
wave /C P1,P2, P3, M01, M12, M23, M30, Mw0, M_TOT1, M_TOT2
variable i, timerrefnum, microseconds, e_w=3000
variable /c ci, delta1, delta2, delta3
 
timerRefNum = StartMSTimer
 
ci=sqrt(-1)
    r01=(n_v-m)/(n_v+m) // vacuum / film 1
    r12=(m-n_w)/(m+n_w) // film1/window
    r23=(n_w-m)/(n_w+m) // window / film2
    r30=(m-n_v)/(m+n_v) // film2/vide
    rw0=(n_w-n_v)/(n_w+n_v) // substrat vide
 
    t01=2*n_v/(n_v+m)       // vacuum / film 1
    t12=2*m/(m+n_w)         // film1/window
    t23=2*n_w/(n_w+m)       // window / film2
    t30=2*m/(m+n_v)         // film2/vide
    tw0=2*n_w/(n_w+n_v) // substrat vide
    t0w=2/(n_w+n_v)
 
// début de la boucle
 
Make /O/N=(numpnts(wavenumber)) AAA_TEMPORAIRE
 
for(i=0;i<numpnts(wavenumber);i+=1)
 
delta1=(2*pi/lambda[i])*m[i]*e_f1       // film principal
delta2=(2*pi/lambda[i])*n_w[i]*e_w  // épaisseur fenêtre
delta3=(2*pi/lambda[i])*m[i]*e_f2       // film parasite
 
// remplissage des matrices
// matrice de phase dans le film de glace
P1[0][0]=exp(ci*delta1)
P1[0][1]=0
P1[1][0]=0
P1[1][1]=exp(-ci*delta1)
 
// remplissage des matrices
// matrice de phase dans le substrat (en général pas utilisé)
P2[0][0]=exp(ci*delta2)
P2[0][1]=0
P2[1][0]=0
P2[1][1]=exp(-ci*delta2)
 
// matrice de phase dans le film arrière
P3[0,0]=exp(ci*delta3)
P3[0][1]=0
P3[1][0]=0
P3[1][1]=exp(-ci*delta3)
 
/// vide - film m=n+ik
 
M01[0][0]=1
M01[0][1]=r01[i]
M01[1][0]=r01[i]
M01[1][1]=1
 
M01=M01/t01[i]
 
/// Matrice film - substrat window
M12[0][0]=1         
M12[0][1]=r12[i]
M12[1][0]=r12[i]
M12[1][1]=1
 
M12=M12/t12[i]
 
M23[0][0]=1
M23[0][1]=r23[i]
M23[1][0]=r23[i]
M23[1][1]=1
 
M23=M23/t23[i]
 
M30[0][0]=1
M30[0][1]=r30[i]
M30[1][0]=r30[i]
M30[1][1]=1
 
M30=M30/t30[i]
 
/// Matrice substrat window - vide
 
Mw0[0][0]=1
Mw0[0][1]=rw0[i]
Mw0[1][0]=rw0[i]
Mw0[1][1]=1
 
Mw0=Mw0/tw0[i]
 
/// matrice du système optique complet FILM+SUBSTRAT+FILM ARRIERE, ET AVEC INTERFERENCE DANS SUBSTRAT
// MatrixOp /C /O M_tot= M01 x P1 x M12 x P2 x M23 x P3 x M30
 
/// matrice du système optique complet FILM+SUBSTRAT+FILM ARRIERE, ET SANS INTERFERENCE DANS SUBSTRAT
// MatrixOp /C /O M_tot= M01 x P1 x M12 x M23 x P3 x M30
 
// Matrice sans film arrière, pas interférence dans substrat
// 0 = vide, milieu de départ
// 1 = film 1 (le plus épais)
// 2 = substrat (pas de P2 on néglihe les ininterférneces dans le substrat)
// 3 = film parasite
 
MatrixOp /C /O M_tot1= M01 x P1 x M12
MatrixOp /C /O M_tot2= M23 x P3 x M30
 
 
/// Transmission total SYSTEME FILM+SUBSTRAT+FILM ARRIERE
T_FWF1[i]=(real(n_w[i]/1))*(1/cabs(M_TOT1[1][1]))^2 // du vide vers le substrat, on divise par n du milieu de sortie divisé par milieu d'entrée
T_FWF2[i]=(1/real(n_w[i]))*(1/cabs(M_TOT2[1][1]))^2  // du substrat vers le vide
T_FWF[i]=T_FWF1[i]*T_FWF2[i]
 
//duplicate/free T_FWF n_w_real
//n_w_real = real(n_w)
//variable V1 = (1/cabs(M_TOT1[1][1]))^2
//variable V2 = (1/cabs(M_TOT2[1][1]))^2
 
//FastOp T_FWF1 = (V1)*n_w_real
//FastOp T_FWF2 = 1 / n_w_real
//FastOp T_FWF2 = (V2)*T_FWF2
//FastOp T_FWF = T_FWF1 * T_FWF2
 
endfor
 
 
 
/// TRANSMITTANCE SUBSTRAT SEUL (SPECTRE DE BACKGROUND)
T_back=((4*real(n_w))/(1+real(n_w))^2)^2 // Cas vide - substrat - vide, SANS interférences
 
 
/// CALCUL DU SPECTRE DE TRANSMITTANCE, DIRECTEMENT COMPARABLE A LA MESURE
TRANSMIT_2FILM=T_FWF/T_BACK
 
 
    microSeconds = StopMSTimer(timerRefNum)
    Print microSeconds/1000, "ms"
 
End

And the second codes that includes FastOp :

Function Calculate_Transmittance_FilmWindowFilm(e_f1, e_f2)
variable e_f1, e_f2 // µm
 
wave n, k, wavenumber, lambda, T_FWF, T_FWF1, T_FWF2, T_BACK, TRANSMIT_2FILM
wave /C n_w, n_v, m, r01, r12, r23, r30, rw0, t01, t12, r23, t23, r30, t30, rw0, tw0, t0w
wave /C P1,P2, P3, M01, M12, M23, M30, Mw0, M_TOT1, M_TOT2
variable i, timerrefnum, microseconds, e_w=3000
variable /c ci, delta1, delta2, delta3
 
timerRefNum = StartMSTimer
 
ci=sqrt(-1)
    r01=(n_v-m)/(n_v+m) // vacuum / film 1
    r12=(m-n_w)/(m+n_w) // film1/window
    r23=(n_w-m)/(n_w+m) // window / film2
    r30=(m-n_v)/(m+n_v) // film2/vide
    rw0=(n_w-n_v)/(n_w+n_v) // substrat vide
 
    t01=2*n_v/(n_v+m)       // vacuum / film 1
    t12=2*m/(m+n_w)         // film1/window
    t23=2*n_w/(n_w+m)       // window / film2
    t30=2*m/(m+n_v)         // film2/vide
    tw0=2*n_w/(n_w+n_v) // substrat vide
    t0w=2/(n_w+n_v)
 
// début de la boucle
 
Make /O/N=(numpnts(wavenumber)) AAA_TEMPORAIRE
 
for(i=0;i<numpnts(wavenumber);i+=1)
 
delta1=(2*pi/lambda[i])*m[i]*e_f1       // film principal
delta2=(2*pi/lambda[i])*n_w[i]*e_w  // épaisseur fenêtre
delta3=(2*pi/lambda[i])*m[i]*e_f2       // film parasite
 
// remplissage des matrices
// matrice de phase dans le film de glace
P1[0][0]=exp(ci*delta1)
P1[0][1]=0
P1[1][0]=0
P1[1][1]=exp(-ci*delta1)
 
// remplissage des matrices
// matrice de phase dans le substrat (en général pas utilisé)
P2[0][0]=exp(ci*delta2)
P2[0][1]=0
P2[1][0]=0
P2[1][1]=exp(-ci*delta2)
 
// matrice de phase dans le film arrière
P3[0,0]=exp(ci*delta3)
P3[0][1]=0
P3[1][0]=0
P3[1][1]=exp(-ci*delta3)
 
/// vide - film m=n+ik
 
M01[0][0]=1
M01[0][1]=r01[i]
M01[1][0]=r01[i]
M01[1][1]=1
 
M01=M01/t01[i]
 
/// Matrice film - substrat window
M12[0][0]=1         
M12[0][1]=r12[i]
M12[1][0]=r12[i]
M12[1][1]=1
 
M12=M12/t12[i]
 
M23[0][0]=1
M23[0][1]=r23[i]
M23[1][0]=r23[i]
M23[1][1]=1
 
M23=M23/t23[i]
 
M30[0][0]=1
M30[0][1]=r30[i]
M30[1][0]=r30[i]
M30[1][1]=1
 
M30=M30/t30[i]
 
/// Matrice substrat window - vide
 
Mw0[0][0]=1
Mw0[0][1]=rw0[i]
Mw0[1][0]=rw0[i]
Mw0[1][1]=1
 
Mw0=Mw0/tw0[i]
 
/// matrice du système optique complet FILM+SUBSTRAT+FILM ARRIERE, ET AVEC INTERFERENCE DANS SUBSTRAT
// MatrixOp /C /O M_tot= M01 x P1 x M12 x P2 x M23 x P3 x M30
 
/// matrice du système optique complet FILM+SUBSTRAT+FILM ARRIERE, ET SANS INTERFERENCE DANS SUBSTRAT
// MatrixOp /C /O M_tot= M01 x P1 x M12 x M23 x P3 x M30
 
// Matrice sans film arrière, pas interférence dans substrat
// 0 = vide, milieu de départ
// 1 = film 1 (le plus épais)
// 2 = substrat (pas de P2 on néglihe les ininterférneces dans le substrat)
// 3 = film parasite
 
MatrixOp /C /O M_tot1= M01 x P1 x M12
MatrixOp /C /O M_tot2= M23 x P3 x M30
 
 
/// Transmission total SYSTEME FILM+SUBSTRAT+FILM ARRIERE
//T_FWF1[i]=(real(n_w[i]/1))*(1/cabs(M_TOT1[1][1]))^2 // du vide vers le substrat, on divise par n du milieu de sortie divisé par milieu d'entrée
//T_FWF2[i]=(1/real(n_w[i]))*(1/cabs(M_TOT2[1][1]))^2  // du substrat vers le vide
//T_FWF[i]=T_FWF1[i]*T_FWF2[i]
 
duplicate/free T_FWF n_w_real
n_w_real = real(n_w)
variable V1 = (1/cabs(M_TOT1[1][1]))^2
variable V2 = (1/cabs(M_TOT2[1][1]))^2
 
FastOp T_FWF1 = (V1)*n_w_real
FastOp T_FWF2 = 1 / n_w_real
FastOp T_FWF2 = (V2)*T_FWF2
FastOp T_FWF = T_FWF1 * T_FWF2
 
endfor
 
 
 
/// TRANSMITTANCE SUBSTRAT SEUL (SPECTRE DE BACKGROUND)
T_back=((4*real(n_w))/(1+real(n_w))^2)^2 // Cas vide - substrat - vide, SANS interférences
 
 
/// CALCUL DU SPECTRE DE TRANSMITTANCE, DIRECTEMENT COMPARABLE A LA MESURE
TRANSMIT_2FILM=T_FWF/T_BACK
 
 
    microSeconds = StopMSTimer(timerRefNum)
    Print microSeconds/1000, "ms"
 
End

Log in or register to post comments

March 31, 2025 at 03:02 am - Permalink

tony

There's a for-loop in your code that was not in the function that you posted here

https://www.wavemetrics.com/comment/25693#comment-25693

Add the FastOp part to the code without the loop and it should run faster. It isn't supposed to go inside the explicit loop.

Log in or register to post comments

March 31, 2025 at 03:19 am - Permalink

tony

What you probably mean to do (in the code with 3D matrices) is something like this:

duplicate/free T_FWF n_w_real
n_w_real = real(n_w)
 
T_FWF1= n_w_real * (1/cabs(M_tot1_3D[1][1][p]))^2
T_FWF2=(1/n_w_real)*(1/cabs(M_tot2_3D[1][1][p]))^2
FastOp T_FWF = T_FWF1 * T_FWF2

Log in or register to post comments

March 31, 2025 at 03:39 am - Permalink

Erik_Kran

I did it but did not see any improvment. I've tried for files with 10000 and 1E6 lines, but no change.

Log in or register to post comments

March 31, 2025 at 10:18 am - Permalink

tony

Make sure that your different code versions achieve the same result before comparing execution speed. Clear any saved data from intermediate steps before testing the code to make sure that it actually works. If most of the processing time is spent on matrix multiplication, optimizing the 1D wave assignments using FastOp is not going to make much difference to the total processing time, but those steps should be far more efficient than the non-optimized equivalents.

Log in or register to post comments

April 1, 2025 at 02:16 am - Permalink

johnweeks

I have a question about your original threaded code: your worker function (that's the one labelled ThreadSafe) makes some wave references at the start:

    Wave/C delta1, delta3
    Wave/C P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, n_w
    Wave T_FWF1, T_FWF2, T_FWF

Are you aware that each thread has it's own data folder tree? If those waves are in the main thread data folders, then they are not accessible to your worker function. Getting error information from a thread is tricky...

Log in or register to post comments

April 1, 2025 at 10:12 am - Permalink