Threadsafe overhead

Could you post a small example, so that I can try it out myself?
I also thought that the "threadsafe" keyword is just a hint for the compiler.

Log in or register to post comments

August 27, 2013 at 04:18 am - Permalink

ikonen

Sure:

If you're having trouble observing this, I should concede I only checked with one function, so it's not a comphensive study, although I did check a few combinations of calculation sizes including small and large numbers of points in the implicit loop caused by using a wave assignment statement. You may notice the function I based this one is a direct cut and paste from Peak Functions.ipf, written to go with the Multipeak Fitting package. I started looking at this because I was actually hoping to be able to call the XOP peak functions from MultiPeak Fit with the "multithread" statement. I use them in the HITRAN procedures package, often to calculate a simulated spectrum of thousands of peaks in thousands of output wavelengths. As you'll see from the benchmark (and no real surprise) worrying about a few extra processors on the user defined function is pointless if one has an equivalent XOP function to call. However, I'm assuming the XOP could similarly be sped up by multithreading the wave assignment statement if it were threadsafe, which it is not. I would assume MultiPeak fitting would similarly gain from making the peak functions threadsafe and multithreading the fit, but only if FuncFit can be multithreaded, and I'm not sure if/how one can/would do that. Alternately, if threadsafe were just a compiler directive, then I could whine about why the XOP version peak functions aren't already marked threadsafe. But if it's going to add 20% calculation time and not help speed up MultiPeak Fit, then I can see why the XOP peak functions are not marked as threadsafe even though they could be.

Function fLorentzianFit(w,x)
    Wave w; Variable x
    
    Variable r= w[0]
    variable npts= numpnts(w),i=1
    do
        if( i>=npts )
            break
        endif
        r += w[i]/((x-w[i+1])^2+w[i+2])
        i+=3
    while(1)
    return r
End
 
Threadsafe Function  TS_fLorentzianFit(w,x)
    Wave w; Variable x
    
    Variable r= w[0]
    variable npts= numpnts(w),i=1
    do
        if( i>=npts )
            break
        endif
        r += w[i]/((x-w[i+1])^2+w[i+2])
        i+=3
    while(1)
    return r
End
 
Function BenchMarkit(destsize, numpeaks, functime, tsfunctime, mtfunctime, xoptime)
    variable destsize, numpeaks
    variable &functime, &tsfunctime, &mtfunctime, &xoptime
    make /free /d /n = (destsize) outputwave
    make /free /d /n = (3*numpeaks + 1) coefs
    coefs[0] = 0
    coefs[1,3*numpeaks;3] = enoise(1)
    coefs[2,3*numpeaks;3] =  enoise(destsize / 2) + destsize / 2
    coefs[3,3*numpeaks;3] = exp(gnoise(1) + 1)
    variable timerref = StartMSTimer
    outputwave = fLorentzianFit(coefs,x)
    functime = StopMSTimer(timerref)* 1e-6
    timerref = StartMSTimer
    outputwave = TS_fLorentzianFit(coefs,x)
    tsfunctime = StopMSTimer(timerref)* 1e-6
    timerref = StartMSTimer
    multithread outputwave = TS_fLorentzianFit(coefs,x)
    mtfunctime = StopMSTimer(timerref)* 1e-6
    timerref = StartMSTimer
    outputwave = LorentzianFit(coefs,x)
    xoptime = StopMSTimer(timerref) * 1e-6
End
 
Function BenchMarkWrapper(destsize, numpeaks)
    variable destsize, numpeaks
    variable functime, tsfunctime, mtfunctime, xoptime
    BenchMarkit(destsize, numpeaks, functime, tsfunctime, mtfunctime, xoptime)
    print "basic function completed in ", functime, "s"
    print "Thread Safe function completed in ", tsfunctime, "s"
    print "Multithreaded function completed in ", mtfunctime, "s"
    print "XOP function completed in ", xoptime, "s"
 
End

and from the command line, the results I get (on a 2 processor netbook running Windows 7) are

•BenchMarkWrapper(100000,10)
  basic function completed in   1.21606  s
  Thread Safe function completed in   1.50094  s
  Multithreaded function completed in   0.791856  s
  XOP function completed in   0.0787004  s
•BenchMarkWrapper(10,100000)
  basic function completed in   1.13298  s
  Thread Safe function completed in   1.39786  s
  Multithreaded function completed in   0.771549  s
  XOP function completed in   0.0189796  s

Log in or register to post comments

August 27, 2013 at 08:23 pm - Permalink

andyfaff

More of an aside. It's nearly always a _lot_ faster to write an all-at-once fitting function. See the following (not tested for correctness). You remove the overhead of calling the function many, many, times. You just call it once.

threadsafe Function  ARJN_TS_fLorentzianFit(w, yy, xx): fitfunc
    Wave w, yy, xx
 
        multithread yy = w[0]
 
    variable nreps= (numpnts(w) - 1) / 3
        variable ii
        for(ii = 0 ; ii < nreps ; ii += 1)
                  multithread yy[] += w[3 * ii + 1]/((xx[p] - w[3 * ii + 2])^2+w[3 * ii + 3])
        endfor
End

//edited for correctness

Log in or register to post comments

August 28, 2013 at 05:30 pm - Permalink

thomas_braun

Thanks for the code ikonen. The relative slowdown is on my machine also 20%.


•BenchMarkWrapper(100000,10)
  basic function completed in   0.280375  s
  Thread Safe function completed in   0.340006  s
  Multithreaded function completed in   0.0754768  s
  XOP function completed in   0.0195716  s
•BenchMarkWrapper(10,100000)
  basic function completed in   0.251737  s
  Thread Safe function completed in   0.298765  s
  Multithreaded function completed in   0.0569062  s
  XOP function completed in   0.00619713  s

Log in or register to post comments

August 28, 2013 at 12:24 pm - Permalink

andyfaff

Log in or register to post comments

August 28, 2013 at 05:42 pm - Permalink

Larry Hutchinson

Threadsafe functions have extra overhead and calling threadsafe from non ts has even more.

Log in or register to post comments

August 28, 2013 at 07:43 pm - Permalink