I just noticed adding the threadsafe keyword to a user function adds about a 20% execution time without even calling the function in a multithreaded use. Is this expected? I guess I had assumed threadsafe was just for the compiler: flag an error if I try to call a non-threadsafe function/operation with multthreading or from within another threadsafe function, but it seems to be affecting execution even if only called in the main thread.
If you're having trouble observing this, I should concede I only checked with one function, so it's not a comphensive study, although I did check a few combinations of calculation sizes including small and large numbers of points in the implicit loop caused by using a wave assignment statement. You may notice the function I based this one is a direct cut and paste from Peak Functions.ipf, written to go with the Multipeak Fitting package. I started looking at this because I was actually hoping to be able to call the XOP peak functions from MultiPeak Fit with the "multithread" statement. I use them in the HITRAN procedures package, often to calculate a simulated spectrum of thousands of peaks in thousands of output wavelengths. As you'll see from the benchmark (and no real surprise) worrying about a few extra processors on the user defined function is pointless if one has an equivalent XOP function to call. However, I'm assuming the XOP could similarly be sped up by multithreading the wave assignment statement if it were threadsafe, which it is not. I would assume MultiPeak fitting would similarly gain from making the peak functions threadsafe and multithreading the fit, but only if FuncFit can be multithreaded, and I'm not sure if/how one can/would do that. Alternately, if threadsafe were just a compiler directive, then I could whine about why the XOP version peak functions aren't already marked threadsafe. But if it's going to add 20% calculation time and not help speed up MultiPeak Fit, then I can see why the XOP peak functions are not marked as threadsafe even though they could be.
Function fLorentzianFit(w,x) Wave w; Variablex
Variabler= w[0] variable npts= numpnts(w),i=1 do if( i>=npts ) break endif r += w[i]/((x-w[i+1])^2+w[i+2]) i+=3 while(1) returnr End
Function BenchMarkWrapper(destsize, numpeaks) variable destsize, numpeaks variable functime, tsfunctime, mtfunctime, xoptime
BenchMarkit(destsize, numpeaks, functime, tsfunctime, mtfunctime, xoptime) print"basic function completed in ", functime, "s" print"Thread Safe function completed in ", tsfunctime, "s" print"Multithreaded function completed in ", mtfunctime, "s" print"XOP function completed in ", xoptime, "s"
End
and from the command line, the results I get (on a 2 processor netbook running Windows 7) are
•BenchMarkWrapper(100000,10)
basic function completed in 1.21606s
Thread Safe function completed in 1.50094s
Multithreaded function completed in 0.791856s
XOP function completed in 0.0787004s
•BenchMarkWrapper(10,100000)
basic function completed in 1.13298s
Thread Safe function completed in 1.39786s
Multithreaded function completed in 0.771549s
XOP function completed in 0.0189796s
More of an aside. It's nearly always a _lot_ faster to write an all-at-once fitting function. See the following (not tested for correctness). You remove the overhead of calling the function many, many, times. You just call it once.
threadsafeFunction ARJN_TS_fLorentzianFit(w, yy, xx): fitfunc Wave w, yy, xx
multithread yy = w[0]
variable nreps= (numpnts(w) - 1)/3 variable ii for(ii = 0 ; ii < nreps ; ii += 1) multithread yy[] += w[3* ii + 1]/((xx[p] - w[3* ii + 2])^2+w[3* ii + 3]) endfor End
Thanks for the code ikonen. The relative slowdown is on my machine also 20%.
•BenchMarkWrapper(100000,10)
basic function completed in 0.280375 s
Thread Safe function completed in 0.340006 s
Multithreaded function completed in 0.0754768 s
XOP function completed in 0.0195716 s
•BenchMarkWrapper(10,100000)
basic function completed in 0.251737 s
Thread Safe function completed in 0.298765 s
Multithreaded function completed in 0.0569062 s
XOP function completed in 0.00619713 s
I also thought that the "threadsafe" keyword is just a hint for the compiler.
August 27, 2013 at 04:18 am - Permalink
If you're having trouble observing this, I should concede I only checked with one function, so it's not a comphensive study, although I did check a few combinations of calculation sizes including small and large numbers of points in the implicit loop caused by using a wave assignment statement. You may notice the function I based this one is a direct cut and paste from Peak Functions.ipf, written to go with the Multipeak Fitting package. I started looking at this because I was actually hoping to be able to call the XOP peak functions from MultiPeak Fit with the "multithread" statement. I use them in the HITRAN procedures package, often to calculate a simulated spectrum of thousands of peaks in thousands of output wavelengths. As you'll see from the benchmark (and no real surprise) worrying about a few extra processors on the user defined function is pointless if one has an equivalent XOP function to call. However, I'm assuming the XOP could similarly be sped up by multithreading the wave assignment statement if it were threadsafe, which it is not. I would assume MultiPeak fitting would similarly gain from making the peak functions threadsafe and multithreading the fit, but only if FuncFit can be multithreaded, and I'm not sure if/how one can/would do that. Alternately, if threadsafe were just a compiler directive, then I could whine about why the XOP version peak functions aren't already marked threadsafe. But if it's going to add 20% calculation time and not help speed up MultiPeak Fit, then I can see why the XOP peak functions are not marked as threadsafe even though they could be.
Wave w; Variable x
Variable r= w[0]
variable npts= numpnts(w),i=1
do
if( i>=npts )
break
endif
r += w[i]/((x-w[i+1])^2+w[i+2])
i+=3
while(1)
return r
End
Threadsafe Function TS_fLorentzianFit(w,x)
Wave w; Variable x
Variable r= w[0]
variable npts= numpnts(w),i=1
do
if( i>=npts )
break
endif
r += w[i]/((x-w[i+1])^2+w[i+2])
i+=3
while(1)
return r
End
Function BenchMarkit(destsize, numpeaks, functime, tsfunctime, mtfunctime, xoptime)
variable destsize, numpeaks
variable &functime, &tsfunctime, &mtfunctime, &xoptime
make /free /d /n = (destsize) outputwave
make /free /d /n = (3*numpeaks + 1) coefs
coefs[0] = 0
coefs[1,3*numpeaks;3] = enoise(1)
coefs[2,3*numpeaks;3] = enoise(destsize / 2) + destsize / 2
coefs[3,3*numpeaks;3] = exp(gnoise(1) + 1)
variable timerref = StartMSTimer
outputwave = fLorentzianFit(coefs,x)
functime = StopMSTimer(timerref)* 1e-6
timerref = StartMSTimer
outputwave = TS_fLorentzianFit(coefs,x)
tsfunctime = StopMSTimer(timerref)* 1e-6
timerref = StartMSTimer
multithread outputwave = TS_fLorentzianFit(coefs,x)
mtfunctime = StopMSTimer(timerref)* 1e-6
timerref = StartMSTimer
outputwave = LorentzianFit(coefs,x)
xoptime = StopMSTimer(timerref) * 1e-6
End
Function BenchMarkWrapper(destsize, numpeaks)
variable destsize, numpeaks
variable functime, tsfunctime, mtfunctime, xoptime
BenchMarkit(destsize, numpeaks, functime, tsfunctime, mtfunctime, xoptime)
print "basic function completed in ", functime, "s"
print "Thread Safe function completed in ", tsfunctime, "s"
print "Multithreaded function completed in ", mtfunctime, "s"
print "XOP function completed in ", xoptime, "s"
End
and from the command line, the results I get (on a 2 processor netbook running Windows 7) are
basic function completed in 1.21606 s
Thread Safe function completed in 1.50094 s
Multithreaded function completed in 0.791856 s
XOP function completed in 0.0787004 s
•BenchMarkWrapper(10,100000)
basic function completed in 1.13298 s
Thread Safe function completed in 1.39786 s
Multithreaded function completed in 0.771549 s
XOP function completed in 0.0189796 s
August 27, 2013 at 08:23 pm - Permalink
Wave w, yy, xx
multithread yy = w[0]
variable nreps= (numpnts(w) - 1) / 3
variable ii
for(ii = 0 ; ii < nreps ; ii += 1)
multithread yy[] += w[3 * ii + 1]/((xx[p] - w[3 * ii + 2])^2+w[3 * ii + 3])
endfor
End
//edited for correctness
August 28, 2013 at 05:30 pm - Permalink
•BenchMarkWrapper(100000,10) basic function completed in 0.280375 s Thread Safe function completed in 0.340006 s Multithreaded function completed in 0.0754768 s XOP function completed in 0.0195716 s •BenchMarkWrapper(10,100000) basic function completed in 0.251737 s Thread Safe function completed in 0.298765 s Multithreaded function completed in 0.0569062 s XOP function completed in 0.00619713 s
August 28, 2013 at 12:24 pm - Permalink
August 28, 2013 at 05:42 pm - Permalink
August 28, 2013 at 07:43 pm - Permalink