ThreadSafeFunction

Hi, below is a program that uses Threadsafe. Powerbook 12 cores Apple M3 pro. igor Pro 8.04 I use 12 parallel cores.

There are two loops, and a message is returned as "While executing user function optimized, the following error occurred: Invalid Thread Group ID or index."

I suspect it is due to the still running calculations, so I have added a test with ThreadGroupWait., which does not work. Error message = "While executing a wave read, the following error occurred: Attempt to operate on a null (missing) wave"

Thanks for your help.

 

ThreadSafe Function Transmittance_ComputeOne(index)
    Variable index
 
    Wave/C delta1, delta3
    Wave/C P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, n_w
    Wave T_FWF1, T_FWF2, T_FWF
 
    Variable/C ci = sqrt(-1)
    Variable/C d1 = delta1[index], d3 = delta3[index]
    Variable/C a, b, c, d
 
    P1_3D[0][0][index] = exp(ci * d1)
    P1_3D[0][1][index] = 0
    P1_3D[1][0][index] = 0
    P1_3D[1][1][index] = exp(-ci * d1)
 
    P3_3D[0][0][index] = exp(ci * d3)
    P3_3D[0][1][index] = 0
    P3_3D[1][0][index] = 0
    P3_3D[1][1][index] = exp(-ci * d3)
 
    // M_TOT1 = M01 * P1 * M12
    a = M01_3D[0][0][index]*P1_3D[0][0][index] + M01_3D[0][1][index]*P1_3D[1][0][index]
    b = M01_3D[0][0][index]*P1_3D[0][1][index] + M01_3D[0][1][index]*P1_3D[1][1][index]
    c = M01_3D[1][0][index]*P1_3D[0][0][index] + M01_3D[1][1][index]*P1_3D[1][0][index]
    d = M01_3D[1][0][index]*P1_3D[0][1][index] + M01_3D[1][1][index]*P1_3D[1][1][index]
 
    M_TOT1_3D[0][0][index] = a * M12_3D[0][0][index] + b * M12_3D[1][0][index]
    M_TOT1_3D[0][1][index] = a * M12_3D[0][1][index] + b * M12_3D[1][1][index]
    M_TOT1_3D[1][0][index] = c * M12_3D[0][0][index] + d * M12_3D[1][0][index]
    M_TOT1_3D[1][1][index] = c * M12_3D[0][1][index] + d * M12_3D[1][1][index]
 
    // M_TOT2 = M23 * P3 * M30
    a = M23_3D[0][0][index]*P3_3D[0][0][index] + M23_3D[0][1][index]*P3_3D[1][0][index]
    b = M23_3D[0][0][index]*P3_3D[0][1][index] + M23_3D[0][1][index]*P3_3D[1][1][index]
    c = M23_3D[1][0][index]*P3_3D[0][0][index] + M23_3D[1][1][index]*P3_3D[1][0][index]
    d = M23_3D[1][0][index]*P3_3D[0][1][index] + M23_3D[1][1][index]*P3_3D[1][1][index]
 
    M_TOT2_3D[0][0][index] = a * M30_3D[0][0][index] + b * M30_3D[1][0][index]
    M_TOT2_3D[0][1][index] = a * M30_3D[0][1][index] + b * M30_3D[1][1][index]
    M_TOT2_3D[1][0][index] = c * M30_3D[0][0][index] + d * M30_3D[1][0][index]
    M_TOT2_3D[1][1][index] = c * M30_3D[0][1][index] + d * M30_3D[1][1][index]
 
    T_FWF1[index] = real(n_w[index]) * (1 / cabs(M_TOT1_3D[1][1][index]))^2
    T_FWF2[index] = (1 / real(n_w[index])) * (1 / cabs(M_TOT2_3D[1][1][index]))^2
    T_FWF[index] = T_FWF1[index] * T_FWF2[index]
End
 
 
 
 
Function optimized(e_f1, e_f2)
    Variable e_f1, e_f2  // µm
    Variable/G e_f1_global, e_f2_global, e_w_global
 
    e_f1_global = e_f1
    e_f2_global = e_f2
    e_w_global = 3000
 
    Wave n, k, wavenumber, lambda, T_FWF, T_FWF1, T_FWF2, T_BACK, TRANSMIT_2FILM
    Wave/C n_w, n_v, m
 
    Variable timerrefnum = StartMSTimer
    Variable/C ci = sqrt(-1)
    Variable Npts = numpnts(wavenumber)
 
    Make/O/C/N=(Npts) delta1, delta2, delta3
    delta1 = (2 * pi / lambda) * m * e_f1
    delta3 = (2 * pi / lambda) * m * e_f2
 
    Make/O/C/N=(Npts) r01, r12, r23, r30, rw0
    r01 = (n_v - m) / (n_v + m)
    r12 = (m - n_w) / (m + n_w)
    r23 = (n_w - m) / (n_w + m)
    r30 = (m - n_v) / (m + n_v)
    rw0 = (n_w - n_v) / (n_w + n_v)
 
    Make/O/C/N=(Npts) t01, t12, t23, t30, tw0, t0w
    t01 = 2 * n_v / (n_v + m)
    t12 = 2 * m / (m + n_w)
    t23 = 2 * n_w / (n_w + m)
    t30 = 2 * m / (m + n_v)
    tw0 = 2 * n_w / (n_w + n_v)
    t0w = 2 / (n_w + n_v)
 
    Make/O/C/N=(2,2,Npts) P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D
    Make/O/N=(Npts) T_FWF1, T_FWF2, T_FWF, T_BACK, TRANSMIT_2FILM
 
 
//  Remplissage des matrices 3D, dimensions Npts pour chaque wavenumber
    Variable i
    for(i = 0; i < Npts; i += 1)
        M01_3D[0][0][i] = 1 / t01[i]; M01_3D[0][1][i] = r01[i] / t01[i]
        M01_3D[1][0][i] = r01[i] / t01[i]; M01_3D[1][1][i] = 1 / t01[i]
 
        M12_3D[0][0][i] = 1 / t12[i]; M12_3D[0][1][i] = r12[i] / t12[i]
        M12_3D[1][0][i] = r12[i] / t12[i]; M12_3D[1][1][i] = 1 / t12[i]
 
        M23_3D[0][0][i] = 1 / t23[i]; M23_3D[0][1][i] = r23[i] / t23[i]
        M23_3D[1][0][i] = r23[i] / t23[i]; M23_3D[1][1][i] = 1 / t23[i]
 
        M30_3D[0][0][i] = 1 / t30[i]; M30_3D[0][1][i] = r30[i] / t30[i]
        M30_3D[1][0][i] = r30[i] / t30[i]; M30_3D[1][1][i] = 1 / t30[i]
    endfor
 
 
// 12 cores paralel
    Variable nThreads = ThreadProcessorCount // core number = 12
    Variable n_loop_b12 = floor(Npts / nThreads) // number of loops, each bloc contains 12
    Variable n_loop_reste=Npts-n_loop_b12*nThreads
 
    Variable t
    Variable tgID = ThreadGroupCreate(nThreads)  
 
    print "tgID =", tgID , "   n_loop_b12 =", n_loop_b12 , "   n_loop_reste=", n_loop_reste , "   nthreads-1=", nThreads-1
 
 
    variable j, threadGroupStatus
 
   for(i = 0; i < n_loop_b12; i += 1)
            for(j = 0; j < nThreads; j += 1)
                ThreadStart tgID, j, Transmittance_ComputeOne(j+i*nThreads)
        endfor
             do
            threadGroupStatus = ThreadGroupWait(tgID,100)
                while (threadGroupStatus != 0)
    endfor
 
    /// last few
    for(i = 0; i < n_loop_reste; i += 1)
        ThreadStart tgID, i, Transmittance_ComputeOne(Npts-n_loop_b12*nThreads+i)
    endfor
 
 
 
 
 
    T_BACK = ((4 * real(n_w)) / (1 + real(n_w))^2)^2
    TRANSMIT_2FILM = T_FWF / T_BACK
 
    Variable microseconds = StopMSTimer(timerrefnum)
    Print microseconds / 10000, "ms"
End

 

 

 

 

Here is a cleaner code, but pb not fixed

 

ThreadSafe Function Transmittance_ComputeOne(index)
    Variable index
    Wave/C delta1, delta3
    Wave/C P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, n_w
    Wave T_FWF1, T_FWF2, T_FWF
 
    Variable/C ci = sqrt(-1)
    Variable/C d1 = delta1[index], d3 = delta3[index]
    Variable/C a, b, c, d
 
    P1_3D[0][0][index] = exp(ci * d1)
    P1_3D[0][1][index] = 0
    P1_3D[1][0][index] = 0
    P1_3D[1][1][index] = exp(-ci * d1)
 
    P3_3D[0][0][index] = exp(ci * d3)
    P3_3D[0][1][index] = 0
    P3_3D[1][0][index] = 0
    P3_3D[1][1][index] = exp(-ci * d3)
 
    // M_TOT1 = M01 * P1 * M12
    a = M01_3D[0][0][index]*P1_3D[0][0][index] + M01_3D[0][1][index]*P1_3D[1][0][index]
    b = M01_3D[0][0][index]*P1_3D[0][1][index] + M01_3D[0][1][index]*P1_3D[1][1][index]
    c = M01_3D[1][0][index]*P1_3D[0][0][index] + M01_3D[1][1][index]*P1_3D[1][0][index]
    d = M01_3D[1][0][index]*P1_3D[0][1][index] + M01_3D[1][1][index]*P1_3D[1][1][index]
 
    M_TOT1_3D[0][0][index] = a * M12_3D[0][0][index] + b * M12_3D[1][0][index]
    M_TOT1_3D[0][1][index] = a * M12_3D[0][1][index] + b * M12_3D[1][1][index]
    M_TOT1_3D[1][0][index] = c * M12_3D[0][0][index] + d * M12_3D[1][0][index]
    M_TOT1_3D[1][1][index] = c * M12_3D[0][1][index] + d * M12_3D[1][1][index]
 
    // M_TOT2 = M23 * P3 * M30
    a = M23_3D[0][0][index]*P3_3D[0][0][index] + M23_3D[0][1][index]*P3_3D[1][0][index]
    b = M23_3D[0][0][index]*P3_3D[0][1][index] + M23_3D[0][1][index]*P3_3D[1][1][index]
    c = M23_3D[1][0][index]*P3_3D[0][0][index] + M23_3D[1][1][index]*P3_3D[1][0][index]
    d = M23_3D[1][0][index]*P3_3D[0][1][index] + M23_3D[1][1][index]*P3_3D[1][1][index]
 
    M_TOT2_3D[0][0][index] = a * M30_3D[0][0][index] + b * M30_3D[1][0][index]
    M_TOT2_3D[0][1][index] = a * M30_3D[0][1][index] + b * M30_3D[1][1][index]
    M_TOT2_3D[1][0][index] = c * M30_3D[0][0][index] + d * M30_3D[1][0][index]
    M_TOT2_3D[1][1][index] = c * M30_3D[0][1][index] + d * M30_3D[1][1][index]
 
    T_FWF1[index] = real(n_w[index]) * (1 / cabs(M_TOT1_3D[1][1][index]))^2
    T_FWF2[index] = (1 / real(n_w[index])) * (1 / cabs(M_TOT2_3D[1][1][index]))^2
    T_FWF[index] = T_FWF1[index] * T_FWF2[index]
    
    return stopMSTimer(-2)
    
End
 
 
 
 
Function optimized(e_f1, e_f2)
    Variable e_f1, e_f2  // µm
    Wave n, k, wavenumber, lambda, T_FWF, T_BACK, TRANSMIT_2FILM
    Wave/C n_w, n_v, m
 
    Variable timerrefnum = StartMSTimer
    Variable/C ci = sqrt(-1)
    Variable Npts = numpnts(wavenumber)
 
    Make/O/C/N=(Npts) delta1, delta3
    delta1 = (2 * pi / lambda) * m * e_f1
    delta3 = (2 * pi / lambda) * m * e_f2
 
    Make/O/C/N=(Npts) r01, r12, r23, r30, rw0
    r01 = (n_v - m) / (n_v + m)
    r12 = (m - n_w) / (m + n_w)
    r23 = (n_w - m) / (n_w + m)
    r30 = (m - n_v) / (m + n_v)
    rw0 = (n_w - n_v) / (n_w + n_v)
 
    Make/O/C/N=(Npts) t01, t12, t23, t30, tw0, t0w
    t01 = 2 * n_v / (n_v + m)
    t12 = 2 * m / (m + n_w)
    t23 = 2 * n_w / (n_w + m)
    t30 = 2 * m / (m + n_v)
    tw0 = 2 * n_w / (n_w + n_v)
    t0w = 2 / (n_w + n_v)
 
    Make/O/C/N=(2,2,Npts) P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D
    Make/O/N=(Npts) T_FWF1, T_FWF2, T_FWF, T_BACK, TRANSMIT_2FILM
 
 
//  Remplissage des matrices 3D, dimensions Npts pour chaque wavenumber
    Variable i
    for(i = 0; i < Npts; i += 1)
        M01_3D[0][0][i] = 1 / t01[i]; M01_3D[0][1][i] = r01[i] / t01[i]
        M01_3D[1][0][i] = r01[i] / t01[i]; M01_3D[1][1][i] = 1 / t01[i]
 
        M12_3D[0][0][i] = 1 / t12[i]; M12_3D[0][1][i] = r12[i] / t12[i]
        M12_3D[1][0][i] = r12[i] / t12[i]; M12_3D[1][1][i] = 1 / t12[i]
 
        M23_3D[0][0][i] = 1 / t23[i]; M23_3D[0][1][i] = r23[i] / t23[i]
        M23_3D[1][0][i] = r23[i] / t23[i]; M23_3D[1][1][i] = 1 / t23[i]
 
        M30_3D[0][0][i] = 1 / t30[i]; M30_3D[0][1][i] = r30[i] / t30[i]
        M30_3D[1][0][i] = r30[i] / t30[i]; M30_3D[1][1][i] = 1 / t30[i]
    endfor
 
 
// 12 cores paralel
    Variable nThreads = ThreadProcessorCount // core number = 12
    Variable n_loop_b12 = floor(Npts / nThreads) // number of loops, each bloc contains 12
    Variable n_loop_reste=Npts-n_loop_b12*nThreads
    Variable tgID = ThreadGroupCreate(nThreads)  
    
    print "tgID =", tgID , "   n_loop_b12 =", n_loop_b12 , "   n_loop_reste=", n_loop_reste , "   nthreads-1=", nThreads-1
    
    variable j
 
   for(i = 0; i < n_loop_b12; i += 1)
            for(j = 0; j < nThreads; j += 1)
                ThreadStart tgID, j, Transmittance_ComputeOne(10000)//(j+i*nThreads)
        endfor
            do
            variable threadGroupStatus = ThreadGroupWait(tgID,100)
                while (threadGroupStatus != 0)
    endfor
    
    /// last few
    
    for(i = 0; i < n_loop_reste; i += 1)
        ThreadStart tgID, i, Transmittance_ComputeOne(Npts-n_loop_b12*nThreads+i)
    endfor
 
   
    T_BACK = ((4 * real(n_w)) / (1 + real(n_w))^2)^2
    TRANSMIT_2FILM = T_FWF / T_BACK
 
    Variable microseconds = StopMSTimer(timerrefnum)
    Print microseconds / 10000, "ms"
End

 

I have not looked at your function in detail but I did notice that there is no call to ThreadGroupRelease.

The "ThreadSafe Functions and Multitasking" help topic says "Once you are finished with a given thread group, call ThreadGroupRelease.".

Also, so that others can try debugging it, you might consider trying to simplify the code as much as possible and then posting an experiment file that contains the function and the waves needed to run it.

Attached is the Igor 8.04 file

It's quite messy so I've tried to tidy a little bit. Here are the functions :

Function Calculate_Transmittance_FilmWindowFilm(e_f1, e_f2) + Function generate_wave()

Calculate_Transmittance_FilmWindowFilm(e_f1, e_f2) - it calculates the transmission of two thin films with thickness e_f1 and e_f2 (typically, 2.45 and 0.2 mm). The function calls generate_wave().

I need to lower the execution time, so tried to use Theashade functions. 

ThreadSafe Function Transmittance_ComputeOne(index, etc....)
Function optimized(e_f1, e_f2)

Thanks in advance.

ThreadSafe Function Transmittance_ComputeOne(index, T_FWF1, T_FWF2, T_FWF, P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, n_w, tM01, tP1, tM12, tM_tot2, tM23, tP3, tM30, tM_tot1)
    Variable index
    wave T_FWF1, T_FWF2, T_FWF
    wave/C P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, n_w, tM01, tP1, tM12, tM_tot2, tM23, tP3, tM30, tM_tot1
    
    variable/C a, b, c, d 
    Variable/C ci=sqrt(-1) 
 
//    Make/O/C/N=(2,2) tM01, tP1, tM12, tM_tot2, tM23, tP3, tM30, tM_tot1
    tM01[][]=M01_3D[p][q][index]
     tP1[][]=P1_3D[p][q][index]
      tM12[][]=M12_3D[p][q][index]
       tM_tot2[][]=M_tot2_3D[p][q][index]
        tM23[][]=M23_3D[p][q][index]
         tP3[][]=P3_3D[p][q][index]
          tM30[][]=M30_3D[p][q][index]
          
    MatrixOp /C /O tM_tot1= tM01 x tP1 x tM12
    MatrixOp /C /O tM_tot2= tM23 x tP3 x tM30
    
    MatrixOp /C /O tM_tot1= tM01 x tP1 x tM12
    MatrixOp /C /O tM_tot2= tM23 x tP3 x tM30
 
    T_FWF1[index] = real(n_w[index]) * (1 / cabs(tM_TOT1[1][1]))^2
    T_FWF2[index] = (1 / real(n_w[index])) * (1 / cabs(tM_TOT2[1][1]))^2
    T_FWF[index] = T_FWF1[index] * T_FWF2[index]
   
   print "index=", index
 
End
 
 
 
 
Function optimized(e_f1, e_f2)
    Variable e_f1, e_f2  // µm
    Wave n, k, wavenumber, lambda, T_FWF, T_BACK, TRANSMIT_2FILM, T_FWF1, T_FWF2
    Wave/C n_w, n_v, m
    wave/C   P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, tM01, tP1, tM12, tM_tot2, tM23, tP3, tM30, tM_tot1
 
 
    Variable timerrefnum = StartMSTimer
    Variable/C ci = sqrt(-1)
    Variable Npts = numpnts(wavenumber), indix
 
    Make/O/C/N=(Npts) delta1, delta3
    delta1 = (2 * pi / lambda) * m * e_f1
    delta3 = (2 * pi / lambda) * m * e_f2
 
    Make/O/C/N=(Npts) r01, r12, r23, r30, rw0
    r01 = (n_v - m) / (n_v + m)
    r12 = (m - n_w) / (m + n_w)
    r23 = (n_w - m) / (n_w + m)
    r30 = (m - n_v) / (m + n_v)
    rw0 = (n_w - n_v) / (n_w + n_v)
 
    Make/O/C/N=(Npts) t01, t12, t23, t30, tw0, t0w
    t01 = 2 * n_v / (n_v + m)
    t12 = 2 * m / (m + n_w)
    t23 = 2 * n_w / (n_w + m)
    t30 = 2 * m / (m + n_v)
    tw0 = 2 * n_w / (n_w + n_v)
    t0w = 2 / (n_w + n_v)
 
 
//  Remplissage des matrices 3D, dimensions Npts pour chaque wavenumber
 
   
    Variable/C d1, d3
 
    Variable i
    for(i = 0; i < Npts; i += 1)
            d1 = delta1[i]; d3 = delta3[i]
            M01_3D[0][0][i] = 1 / t01[i]; M01_3D[0][1][i] = r01[i] / t01[i]
            M01_3D[1][0][i] = r01[i] / t01[i]; M01_3D[1][1][i] = 1 / t01[i]
 
            M12_3D[0][0][i] = 1 / t12[i]; M12_3D[0][1][i] = r12[i] / t12[i]
            M12_3D[1][0][i] = r12[i] / t12[i]; M12_3D[1][1][i] = 1 / t12[i]
 
            M23_3D[0][0][i] = 1 / t23[i]; M23_3D[0][1][i] = r23[i] / t23[i]
            M23_3D[1][0][i] = r23[i] / t23[i]; M23_3D[1][1][i] = 1 / t23[i]
 
            M30_3D[0][0][i] = 1 / t30[i]; M30_3D[0][1][i] = r30[i] / t30[i]
            M30_3D[1][0][i] = r30[i] / t30[i]; M30_3D[1][1][i] = 1 / t30[i]
        
            P1_3D[0][0][i] = exp(ci * d1)
            P1_3D[0][1][i] = 0
            P1_3D[1][0][i] = 0
            P1_3D[1][1][i] = exp(-ci * d1)
 
            P3_3D[0][0][i] = exp(ci * d3)
            P3_3D[0][1][i] = 0
            P3_3D[1][0][i] = 0
            P3_3D[1][1][i] = exp(-ci * d3)
    
    endfor
 
 
// 12 cores paralel
    Variable nThreads = ThreadProcessorCount // core number = 12
    Variable n_loop_b12 = floor(Npts / nThreads) // number of loops, each bloc contains 12
    Variable n_loop_reste=Npts-n_loop_b12*nThreads
    Variable tgID = ThreadGroupCreate(nThreads)  
    
    print "tgID =", tgID , "   n_loop_b12 =", n_loop_b12 , "   n_loop_reste=", n_loop_reste , "   nthreads-1=", nThreads-1
    
    variable j
 
   for(i = 0; i < n_loop_b12; i += 1)
            for(j = 0; j < nThreads; j += 1)
            indix=nthreads*i+j
            print "indix=", indix
                ThreadStart tgID, j, Transmittance_ComputeOne(indix,P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, n_w, T_FWF1, T_FWF2, T_FWF, tM01, tP1, tM12, tM_tot2, tM23, tP3, tM30, tM_tot1)
        endfor
 
            do
            variable threadGroupStatus = ThreadGroupWait(tgID,100)
                while (threadGroupStatus != 0)
    endfor
    
    /// last few
    
    for(i = 0; i < n_loop_reste; i += 1)
        ThreadStart tgID, i, Transmittance_ComputeOne(i,P1_3D, P3_3D, M01_3D, M12_3D, M23_3D, M30_3D, M_TOT1_3D, M_TOT2_3D, n_w, T_FWF1, T_FWF2, T_FWF, tM01, tP1, tM12, tM_tot2, tM23, tP3, tM30, tM_tot1)
    endfor
 
   variable killallrunning=ThreadGroupRelease(-2)
   
    T_BACK = ((4 * real(n_w)) / (1 + real(n_w))^2)^2
    TRANSMIT_2FILM = T_FWF / T_BACK
 
    Variable microseconds = StopMSTimer(timerrefnum)
    Print microseconds / 10000, "ms"
End

 

For_TEST_Wametrics_Forum.pxp (17.96 MB)

Translate certain explicit for-loop constructions to implicit wave arithmetic. Here is one example.

    for(i = 0; i < Npts; i += 1)
            ...
            M01_3D[0][0][i] = 1 / t01[i]; M01_3D[0][1][i] = r01[i] / t01[i]
            ...
    endfor
 
 
M01_3D[0][0][] = (1/t01[q]); M01_3D[0][1][q] = (r01[q] / t01[q])

 

Using your experiment file, I executed this:

Calculate_Transmittance_FilmWindowFilm(2.45, 0.2)

I got no errors and it printed this:

1403.06  ms

 

This function is the initial function that needs to be optimized. It works well, but is slow. I've tried to optimize through the optimize(eF1,eF2) function, using threading. For this function only 100 loops of the "ThreadSafe Function Transmittance_ComputeOne" lead to printing. It does not work. Tehre is no error message, I guess there's an issue with NaN or other.

Following advice from jjweimer I have edited a new code (below) by means of implicit and using 3D wave for matrix multiplication. It works well and faster (~ 170 ms).

The code is below. Maybe there is some way to improve it further through multithreading?

Is it worth moving to Igor Pro 9 (now I have Igor 8.04) for multithreading ?

Thanks.

 

Function OPTIMIZED_VECTORIZED_Transmittance_FWF(e_f1, e_f2)
variable e_f1, e_f2 // µm
 
wave n, k, wavenumber, lambda, T_FWF, T_FWF1, T_FWF2, T_BACK, TRANSMIT_2FILM
wave /C n_w, n_v, m, r01, r12, r23, r30, rw0, t01, t12, r23, t23, r30, t30, rw0, tw0, t0w, P1_3D, P3_3D, M01_3D, M12_3D, M30_3D, M23_3D, Mw0_3D
wave /C P1,P2, P3, M01, M12, M23, M30, Mw0, M_TOT1, M_TOT2, delta1, delta3
variable i, timerrefnum, microseconds
variable /c ci
variable Npnts=numpnts(wavenumber)
 
timerRefNum = StartMSTimer
 
ci=sqrt(-1)
    r01=(n_v-m)/(n_v+m) // vacuum / film 1
    r12=(m-n_w)/(m+n_w) // film1/window
    r23=(n_w-m)/(n_w+m) // window / film2
    r30=(m-n_v)/(m+n_v) // film2/vide
    rw0=(n_w-n_v)/(n_w+n_v) // substrat vide
 
    t01=2*n_v/(n_v+m)       // vacuum / film 1
    t12=2*m/(m+n_w)         // film1/window
    t23=2*n_w/(n_w+m)       // window / film2
    t30=2*m/(m+n_v)         // film2/vide
    tw0=2*n_w/(n_w+n_v) // substrat vide
    t0w=2/(n_w+n_v)
 
 
delta1=(2*pi/lambda)*m[i]*e_f1      // film principal
delta3=(2*pi/lambda)*m[i]*e_f2      // film parasite
 
// remplissage des matrices
// matrice de phase dans le film de glace
 
 
P1_3D[0][0][]=exp(ci*delta1[q])
P1_3D[0][1][]=0
P1_3D[1][0][]=0
P1_3D[1][1][]=exp(-ci*delta1[q])
 
 
// matrice de phase dans le film arrière
P3_3D[0,0][]=exp(ci*delta3[q])
P3_3D[0][1][]=0
P3_3D[1][0][]=0
P3_3D[1][1][]=exp(-ci*delta3[q])
 
/// vide - film m=n+ik
 
// M_3D[1][1][] = R[q]  
 
M01_3D[0][0][]=1
M01_3D[0][1][]=r01[q]
M01_3D[1][0][]=r01[q]
M01_3D[1][1][]=1
 
M01_3D=M01_3D/t01[q]
 
/// Matrice film - substrat window
M12_3D[0][0][]=1            
M12_3D[0][1][]=r12[q]
M12_3D[1][0][]=r12[q]
M12_3D[1][1][]=1
 
M12_3D=M12_3D/t12[i]
 
M23_3D[0][0][]=1
M23_3D[0][1][]=r23[q]
M23_3D[1][0][]=r23[q]
M23_3D[1][1][]=1
 
M23_3D=M23_3D/t23[q]
 
M30_3D[0][0][]=1
M30_3D[0][1][]=r30[q]
M30_3D[1][0][]=r30[q]
M30_3D[1][1][]=1
 
M30_3D=M30_3D/t30[q]
 
/// Matrice substrat window - vide
 
Mw0_3D[0][0][]=1
Mw0_3D[0][1][]=rw0[q]
Mw0_3D[1][0][]=rw0[q]
Mw0_3D[1][1][]=1
 
Mw0_3D=Mw0_3D/tw0[q]
 
MatrixOp /C /O M_tot1_3D= M01_3D x P1_3D x M12_3D
MatrixOp /C /O M_tot2_3D= M23_3D x P3_3D x M30_3D
 
/// Transmission total SYSTEME FILM+SUBSTRAT+FILM ARRIERE
T_FWF1[i]=(real(n_w[i]/1))*(1/cabs(M_TOT1[1][1]))^2 // du vide vers le substrat, on divise par n du milieu de sortie divisé par milieu d'entrée
T_FWF2[i]=(1/real(n_w[i]))*(1/cabs(M_TOT2[1][1]))^2  // du substrat vers le vide
T_FWF[i]=T_FWF1[i]*T_FWF2[i]
 
/// TRANSMITTANCE SUBSTRAT SEUL (SPECTRE DE BACKGROUND)
T_back=((4*real(n_w))/(1+real(n_w))^2)^2 // Cas vide - substrat - vide, SANS interférences
 
/// CALCUL DU SPECTRE DE TRANSMITTANCE, DIRECTEMENT COMPARABLE A LA MESURE
TRANSMIT_2FILM=T_FWF/T_BACK
 
    microSeconds = StopMSTimer(timerRefNum)
    Print microSeconds/1000, "ms"
 
End

 

There are places where you calculate the same wave expressions multiple times. This is somewhat obscured by representing the same sum in two different ways, for example "n_v+m" and "m+n_v". So I would regularize this by changing this:

    r01=(n_v-m)/(n_v+m)     // vacuum / film 1
    r12=(m-n_w)/(m+n_w)     // film1/window (n_w+m)
    r23=(n_w-m)/(n_w+m)     // window / film2
    r30=(m-n_v)/(m+n_v      // film2/vide (n_v+m)
    rw0=(n_w-n_v)/(n_w+n_v)     // substrat vide
 
    t01=2*n_v/(n_v+m)       // vacuum / film 1
    t12=2*m/(m+n_w)         // film1/window
    t23=2*n_w/(n_w+m)       // window / film2
    t30=2*m/(m+n_v)         // film2/vide (n_v+m)
    tw0=2*n_w/(n_w+n_v)     // substrat vide
    t0w=2/(n_w+n_v)

with this:

    // Replace m+n_w with n_w+m
    // Replace m+n_v with n_v+m
    
    r01=(n_v-m)/(n_v+m)     // vacuum / film 1
    r12=(m-n_w)/(n_w+m)     // film1/window
    r23=(n_w-m)/(n_w+m)     // window / film2
    r30=(m-n_v)/(n_v+m)     // film2/vide
    rw0=(n_w-n_v)/(n_w+n_v) // substrat vide
 
    t01=2*n_v/(n_v+m)       // vacuum / film 1
    t12=2*m/(n_w+m)         // film1/window
    t23=2*n_w/(n_w+m)       // window / film2
    t30=2*m/(n_v+m)         // film2/vide
    tw0=2*n_w/(n_w+n_v)     // substrat vide
    t0w=2/(n_w+n_v)

Now you might gain some speed by using free waves to store intermediate waves, like this:

    // Now use free waves to avoid calculating the same vectors multiple times
 
    Duplicate/FREE n_v, n_v_plus_m; n_v_plus_m += m // n_v+m
    Duplicate/FREE n_w, n_w_plus_m; n_w_plus_m += m // n_w+m
    Duplicate/FREE n_w, n_w_plus_n_v; n_w_plus_n_v += n_v   // n_w+n_v
    Duplicate/FREE m, two_m; two_m *= 2 // 2*m
    Duplicate/FREE two_n_w, n_w; two_n_w *= 2   // 2*n_w
    
    r01=(n_v-m)/(n_v_plus_m)        // vacuum / film 1
    r12=(m-n_w)/(n_w_plus_m)        // film1/window
    r23=(n_w-m)/(n_w_plus_m)        // window / film2
    r30=(m-n_v)/(n_v_plus_m)        // film2/vide
    rw0=(n_w-n_v)/(n_w_plus_n_v)    // substrat vide
 
    t01=2*n_v/(n_v_plus_m)          // vacuum / film 1
    t12=two_m/(n_w_plus_m)          // film1/window
    t23=two_n_w/(n_w_plus_m)        // window / film2
    t30=two_m/(n_v_plus_m)          // film2/vide
    tw0=two_n_w/(n_w_plus_n_v)      // substrat vide
    t0w=2/(n_w_plus_n_v)

You should check my changes to make sure I didn't mix something up.

You could do something similar with "2*pi/lambda" and "real(n_w)" each of which appears twice.

 

 

 

 

You may be able to gain some efficiency using FastOp in expressions like

FastOp n_v_plus_m += m

execute DisplayHelpTopic "FastOP" for help

Looks like you also need to fix the indexing here (and elsewhere) - at the moment you are calculating only the ith value of T_FWF:

T_FWF1[i]=(real(n_w[i]/1))*(1/cabs(M_TOT1[1][1]))^2 // du vide vers le substrat, on divise par n du milieu de sortie divisé par milieu d'entrée
T_FWF2[i]=(1/real(n_w[i]))*(1/cabs(M_TOT2[1][1]))^2  // du substrat vers le vide
T_FWF[i]=T_FWF1[i]*T_FWF2[i]
 
// should probably be something like
 
duplicate/free T_FWF n_w_real
n_w_real = real(n_w)
variable V1 = (1/cabs(M_TOT1[1][1]))^2
variable V2 = (1/cabs(M_TOT2[1][1]))^2
 
FastOp T_FWF1 = n_w_real * V1
FastOp T_FWF2 = 1 / n_w_real
FastOp T_FWF2 = T_FWF2 * V2 
FastOp T_FWF = T_FWF1 * T_FWF2

most likely [i] should be replaced by [p] in a few places

Hi Tony, Thanks for the hint. However the code does not compile, for

FastOp T_FWF1 = n_w_real*V1
FastOp T_FWF2 = n_w_real*V2

It seems it does no support the variables V1 and V2. I've not found insights into this in the Display Topic. Error message is "Syntax does not conform to FastOp requirements."

P1_3D[0][0][]=exp(ci*delta1[q])
P1_3D[0][1][]=0
P1_3D[1][0][]=0
P1_3D[1][1][]=exp(-ci*delta1[q])
 
// is the same as
 
variable/C C1 = exp(ci * delta1[0])
variable/C C2 = exp(-ci * delta1[1])
 
P1_3D[0][0][]=C1
P1_3D[0][1][]=0
P1_3D[1][0][]=0
P1_3D[1][1][]=C2
 
// but maybe you mean to do this
 
P1_3D[0][0][]=exp(ci*delta1[r])
P1_3D[0][1][]=0
P1_3D[1][0][]=0
P1_3D[1][1][]=exp(-ci*delta1[r])

it looks to me like you need to change [q] to [r] everywhere and remove [i] (or replace with [p] if you prefer)

In reply to by Erik_Kran

Erik_Kran wrote:

Hi Tony, Thanks for the hint. However the code does not compile, for

FastOp T_FWF1 = n_w_real*V1
FastOp T_FWF2 = n_w_real*V2

It seems it does no support the variables V1 and V2. I've not found insights into this in the Display Topic. Error message is "Syntax does not conform to FastOp requirements."

try 

FastOp T_FWF1 = (V1) * n_w_real

you need the parentheses. i just typed the code here without checking anything, probably there are other errors.

Ah yes it compiles, the variable needs parentheses and to be first.

The execution time of the the initial code is 284 ms. With FASTOP it is 3515 ms :-(

not so fast finally  ;-)

In reply to by Erik_Kran

Erik_Kran wrote:

Ah yes it compiles, the variable needs parentheses and to be first.

The execution time of the the initial code is 284 ms. With FASTOP it is 3515 ms :-(

not so fast finally  ;-)

are you comparing your original version that computes one point with the FastOp version that operates on the whole wave?

it's unlikely that FastOp is significantly slower than the same wave assignment calculated without optimization.

 

In other words: remove all instances of [i] from your code, then compare speed with addition of FastOp.

below the two codes : origin and with FastOp

Function Calculate_Transmittance_FilmWindowFilm(e_f1, e_f2)
variable e_f1, e_f2 // µm
 
wave n, k, wavenumber, lambda, T_FWF, T_FWF1, T_FWF2, T_BACK, TRANSMIT_2FILM
wave /C n_w, n_v, m, r01, r12, r23, r30, rw0, t01, t12, r23, t23, r30, t30, rw0, tw0, t0w
wave /C P1,P2, P3, M01, M12, M23, M30, Mw0, M_TOT1, M_TOT2
variable i, timerrefnum, microseconds, e_w=3000
variable /c ci, delta1, delta2, delta3
 
timerRefNum = StartMSTimer
 
ci=sqrt(-1)
    r01=(n_v-m)/(n_v+m) // vacuum / film 1
    r12=(m-n_w)/(m+n_w) // film1/window
    r23=(n_w-m)/(n_w+m) // window / film2
    r30=(m-n_v)/(m+n_v) // film2/vide
    rw0=(n_w-n_v)/(n_w+n_v) // substrat vide
 
    t01=2*n_v/(n_v+m)       // vacuum / film 1
    t12=2*m/(m+n_w)         // film1/window
    t23=2*n_w/(n_w+m)       // window / film2
    t30=2*m/(m+n_v)         // film2/vide
    tw0=2*n_w/(n_w+n_v) // substrat vide
    t0w=2/(n_w+n_v)
 
// début de la boucle
 
Make /O/N=(numpnts(wavenumber)) AAA_TEMPORAIRE
 
for(i=0;i<numpnts(wavenumber);i+=1)
 
delta1=(2*pi/lambda[i])*m[i]*e_f1       // film principal
delta2=(2*pi/lambda[i])*n_w[i]*e_w  // épaisseur fenêtre
delta3=(2*pi/lambda[i])*m[i]*e_f2       // film parasite
 
// remplissage des matrices
// matrice de phase dans le film de glace
P1[0][0]=exp(ci*delta1)
P1[0][1]=0
P1[1][0]=0
P1[1][1]=exp(-ci*delta1)
 
// remplissage des matrices
// matrice de phase dans le substrat (en général pas utilisé)
P2[0][0]=exp(ci*delta2)
P2[0][1]=0
P2[1][0]=0
P2[1][1]=exp(-ci*delta2)
 
// matrice de phase dans le film arrière
P3[0,0]=exp(ci*delta3)
P3[0][1]=0
P3[1][0]=0
P3[1][1]=exp(-ci*delta3)
 
/// vide - film m=n+ik
 
M01[0][0]=1
M01[0][1]=r01[i]
M01[1][0]=r01[i]
M01[1][1]=1
 
M01=M01/t01[i]
 
/// Matrice film - substrat window
M12[0][0]=1         
M12[0][1]=r12[i]
M12[1][0]=r12[i]
M12[1][1]=1
 
M12=M12/t12[i]
 
M23[0][0]=1
M23[0][1]=r23[i]
M23[1][0]=r23[i]
M23[1][1]=1
 
M23=M23/t23[i]
 
M30[0][0]=1
M30[0][1]=r30[i]
M30[1][0]=r30[i]
M30[1][1]=1
 
M30=M30/t30[i]
 
/// Matrice substrat window - vide
 
Mw0[0][0]=1
Mw0[0][1]=rw0[i]
Mw0[1][0]=rw0[i]
Mw0[1][1]=1
 
Mw0=Mw0/tw0[i]
 
/// matrice du système optique complet FILM+SUBSTRAT+FILM ARRIERE, ET AVEC INTERFERENCE DANS SUBSTRAT
// MatrixOp /C /O M_tot= M01 x P1 x M12 x P2 x M23 x P3 x M30
 
/// matrice du système optique complet FILM+SUBSTRAT+FILM ARRIERE, ET SANS INTERFERENCE DANS SUBSTRAT
// MatrixOp /C /O M_tot= M01 x P1 x M12 x M23 x P3 x M30
 
// Matrice sans film arrière, pas interférence dans substrat
// 0 = vide, milieu de départ
// 1 = film 1 (le plus épais)
// 2 = substrat (pas de P2 on néglihe les ininterférneces dans le substrat)
// 3 = film parasite
 
MatrixOp /C /O M_tot1= M01 x P1 x M12
MatrixOp /C /O M_tot2= M23 x P3 x M30
 
 
/// Transmission total SYSTEME FILM+SUBSTRAT+FILM ARRIERE
T_FWF1[i]=(real(n_w[i]/1))*(1/cabs(M_TOT1[1][1]))^2 // du vide vers le substrat, on divise par n du milieu de sortie divisé par milieu d'entrée
T_FWF2[i]=(1/real(n_w[i]))*(1/cabs(M_TOT2[1][1]))^2  // du substrat vers le vide
T_FWF[i]=T_FWF1[i]*T_FWF2[i]
 
//duplicate/free T_FWF n_w_real
//n_w_real = real(n_w)
//variable V1 = (1/cabs(M_TOT1[1][1]))^2
//variable V2 = (1/cabs(M_TOT2[1][1]))^2
 
//FastOp T_FWF1 = (V1)*n_w_real
//FastOp T_FWF2 = 1 / n_w_real
//FastOp T_FWF2 = (V2)*T_FWF2
//FastOp T_FWF = T_FWF1 * T_FWF2
 
endfor
 
 
 
/// TRANSMITTANCE SUBSTRAT SEUL (SPECTRE DE BACKGROUND)
T_back=((4*real(n_w))/(1+real(n_w))^2)^2 // Cas vide - substrat - vide, SANS interférences
 
 
/// CALCUL DU SPECTRE DE TRANSMITTANCE, DIRECTEMENT COMPARABLE A LA MESURE
TRANSMIT_2FILM=T_FWF/T_BACK
 
 
    microSeconds = StopMSTimer(timerRefNum)
    Print microSeconds/1000, "ms"
 
End

 

 

And the second codes that includes FastOp :

Function Calculate_Transmittance_FilmWindowFilm(e_f1, e_f2)
variable e_f1, e_f2 // µm
 
wave n, k, wavenumber, lambda, T_FWF, T_FWF1, T_FWF2, T_BACK, TRANSMIT_2FILM
wave /C n_w, n_v, m, r01, r12, r23, r30, rw0, t01, t12, r23, t23, r30, t30, rw0, tw0, t0w
wave /C P1,P2, P3, M01, M12, M23, M30, Mw0, M_TOT1, M_TOT2
variable i, timerrefnum, microseconds, e_w=3000
variable /c ci, delta1, delta2, delta3
 
timerRefNum = StartMSTimer
 
ci=sqrt(-1)
    r01=(n_v-m)/(n_v+m) // vacuum / film 1
    r12=(m-n_w)/(m+n_w) // film1/window
    r23=(n_w-m)/(n_w+m) // window / film2
    r30=(m-n_v)/(m+n_v) // film2/vide
    rw0=(n_w-n_v)/(n_w+n_v) // substrat vide
 
    t01=2*n_v/(n_v+m)       // vacuum / film 1
    t12=2*m/(m+n_w)         // film1/window
    t23=2*n_w/(n_w+m)       // window / film2
    t30=2*m/(m+n_v)         // film2/vide
    tw0=2*n_w/(n_w+n_v) // substrat vide
    t0w=2/(n_w+n_v)
 
// début de la boucle
 
Make /O/N=(numpnts(wavenumber)) AAA_TEMPORAIRE
 
for(i=0;i<numpnts(wavenumber);i+=1)
 
delta1=(2*pi/lambda[i])*m[i]*e_f1       // film principal
delta2=(2*pi/lambda[i])*n_w[i]*e_w  // épaisseur fenêtre
delta3=(2*pi/lambda[i])*m[i]*e_f2       // film parasite
 
// remplissage des matrices
// matrice de phase dans le film de glace
P1[0][0]=exp(ci*delta1)
P1[0][1]=0
P1[1][0]=0
P1[1][1]=exp(-ci*delta1)
 
// remplissage des matrices
// matrice de phase dans le substrat (en général pas utilisé)
P2[0][0]=exp(ci*delta2)
P2[0][1]=0
P2[1][0]=0
P2[1][1]=exp(-ci*delta2)
 
// matrice de phase dans le film arrière
P3[0,0]=exp(ci*delta3)
P3[0][1]=0
P3[1][0]=0
P3[1][1]=exp(-ci*delta3)
 
/// vide - film m=n+ik
 
M01[0][0]=1
M01[0][1]=r01[i]
M01[1][0]=r01[i]
M01[1][1]=1
 
M01=M01/t01[i]
 
/// Matrice film - substrat window
M12[0][0]=1         
M12[0][1]=r12[i]
M12[1][0]=r12[i]
M12[1][1]=1
 
M12=M12/t12[i]
 
M23[0][0]=1
M23[0][1]=r23[i]
M23[1][0]=r23[i]
M23[1][1]=1
 
M23=M23/t23[i]
 
M30[0][0]=1
M30[0][1]=r30[i]
M30[1][0]=r30[i]
M30[1][1]=1
 
M30=M30/t30[i]
 
/// Matrice substrat window - vide
 
Mw0[0][0]=1
Mw0[0][1]=rw0[i]
Mw0[1][0]=rw0[i]
Mw0[1][1]=1
 
Mw0=Mw0/tw0[i]
 
/// matrice du système optique complet FILM+SUBSTRAT+FILM ARRIERE, ET AVEC INTERFERENCE DANS SUBSTRAT
// MatrixOp /C /O M_tot= M01 x P1 x M12 x P2 x M23 x P3 x M30
 
/// matrice du système optique complet FILM+SUBSTRAT+FILM ARRIERE, ET SANS INTERFERENCE DANS SUBSTRAT
// MatrixOp /C /O M_tot= M01 x P1 x M12 x M23 x P3 x M30
 
// Matrice sans film arrière, pas interférence dans substrat
// 0 = vide, milieu de départ
// 1 = film 1 (le plus épais)
// 2 = substrat (pas de P2 on néglihe les ininterférneces dans le substrat)
// 3 = film parasite
 
MatrixOp /C /O M_tot1= M01 x P1 x M12
MatrixOp /C /O M_tot2= M23 x P3 x M30
 
 
/// Transmission total SYSTEME FILM+SUBSTRAT+FILM ARRIERE
//T_FWF1[i]=(real(n_w[i]/1))*(1/cabs(M_TOT1[1][1]))^2 // du vide vers le substrat, on divise par n du milieu de sortie divisé par milieu d'entrée
//T_FWF2[i]=(1/real(n_w[i]))*(1/cabs(M_TOT2[1][1]))^2  // du substrat vers le vide
//T_FWF[i]=T_FWF1[i]*T_FWF2[i]
 
duplicate/free T_FWF n_w_real
n_w_real = real(n_w)
variable V1 = (1/cabs(M_TOT1[1][1]))^2
variable V2 = (1/cabs(M_TOT2[1][1]))^2
 
FastOp T_FWF1 = (V1)*n_w_real
FastOp T_FWF2 = 1 / n_w_real
FastOp T_FWF2 = (V2)*T_FWF2
FastOp T_FWF = T_FWF1 * T_FWF2
 
endfor
 
 
 
/// TRANSMITTANCE SUBSTRAT SEUL (SPECTRE DE BACKGROUND)
T_back=((4*real(n_w))/(1+real(n_w))^2)^2 // Cas vide - substrat - vide, SANS interférences
 
 
/// CALCUL DU SPECTRE DE TRANSMITTANCE, DIRECTEMENT COMPARABLE A LA MESURE
TRANSMIT_2FILM=T_FWF/T_BACK
 
 
    microSeconds = StopMSTimer(timerRefNum)
    Print microSeconds/1000, "ms"
 
End

 

What you probably mean to do (in the code with 3D matrices) is something like this:

duplicate/free T_FWF n_w_real
n_w_real = real(n_w)
 
T_FWF1= n_w_real * (1/cabs(M_tot1_3D[1][1][p]))^2
T_FWF2=(1/n_w_real)*(1/cabs(M_tot2_3D[1][1][p]))^2
FastOp T_FWF = T_FWF1 * T_FWF2

 

I did it but did not see any improvment. I've tried for files with 10000 and 1E6 lines, but no change.

Make sure that your different code versions achieve the same result before comparing execution speed. Clear any saved data from intermediate steps before testing the code to make sure that it actually works. If most of the processing time is spent on matrix multiplication, optimizing the 1D wave assignments using FastOp is not going to make much difference to the total processing time, but those steps should be far more efficient than the non-optimized equivalents.