Beginners multithreading in fit functions
nate_church
Apologies for what is probably an elementary question and for subjecting readers to what is probably offensively clumsy code. I have cobbled together a fit function to simultaneously fit surfaces to the 4 layers of 3D wave. These surfaces are defined by signal sources which have 6 parameters each and are a function of 2 independent variables, and the number of these sources can be defined by setting the length of the coef wave fed to the FitFuncMD function. All this to say that it would be nice to be able to use the multithreading abilities of FitFuncMD to speed things up, but I find that if I declare dipolexyz2() to be threadsafe, it runs quickly and without error but gives answers that are clearly different from when I run it as is.
Apart from overly verbose and possibly misguided coding, what am I misunderstanding about multithreading? Am I wrong in thinking that FitFuncMD should be able to use multiple processors if the called fit function uses thread safe operations/functions and doesn't change the waves that it's operating on (broadly speaking)? I don't see that I have broken those commandments. I do use a text wave for the fitting constraints, but it seemed that that was legal as long as the function that called FitFuncMD wasn't threaded. I must be misunderstanding something fundamental, but I don't know where to start looking.
What I believe is the offending code is as follows. There are two more functions that I don't reproduce here called in the primary fit function, but they are identical to dipolex_quad() except that the formula varies.
function dipolexyz2(w,xa,ya,za) : FitFunc
wave w
variable xa,ya,za
variable sourceno, result=0
for(sourceno=0; sourceno<numpnts(w)/6; sourceno+=1)
if (za==0)
result += dipolex_quad(w,xa,ya, sourceno)
endif
if (za==1)
result += dipoley_quad(w,xa,ya, sourceno)
endif
if (za==2)
result += dipolez_quad(w,xa,ya, sourceno)
endif
if (za==3)
result += sqrt((dipolex_quad(w,xa,ya, sourceno))^2+(dipoley_quad(w,xa,ya, sourceno))^2+(dipolez_quad(w,xa,ya, sourceno))^2)
endif
endfor
return result
end
function dipolex_quad(w,xa,ya, sourceno)
wave w
variable xa,ya, sourceno
variable mx=w[sourceno*6],my=w[sourceno*6+1],mz=w[sourceno*6+2],xb=w[sourceno*6+3],yb=w[sourceno*6+4],h=w[sourceno*6+5]
variable xij,yij,zij,rij
variable bx_quad, delx, dely, theta, i1, i2
nvar/z diam=root:diam, source_int=root:source_int, thick=root:thick, sensor=root:sensor
if (!nvar_exists(source_int))
variable/g root:diam=0, root:source_int=1, root:thick=0, root:sensor=0
endif
zij=h
bx_quad=0
switch(source_int)
case 1: // Dipole (point) source
xij=xa-xb
yij=ya-yb
rij=sqrt(xij^2 + yij^2 + zij^2)
// Formula in following line (and equivalents below) will vary for dipoley_quad and dipolez_quad
bx_quad=3*xij*xij*mx/(4*pi*rij^5) - mx/(4*pi*rij^3) + 3*xij*yij*my/(4*pi*rij^5) + 3*(xij*h)*mz/(4*pi*rij^5)
break
case 2: // 7-point quadrature on a disk with radius = diam/2
delx=0
dely=0
xij=(xa-delx)-xb
yij=(ya-dely)-yb
rij=sqrt(xij^2 + yij^2 + zij^2)
bx_quad+=0.25*(3*xij*xij*mx/(4*pi*rij^5) - mx/(4*pi*rij^3) + 3*xij*yij*my/(4*pi*rij^5) + 3*(xij*h)*mz/(4*pi*rij^5))
for (theta=0; theta<6; theta+=1)
delx=cos(theta*pi/3)*sqrt(2/3)*(diam/2)
dely=sin(theta*pi/3)*sqrt(2/3)*(diam/2)
xij=(xa-delx)-xb
yij=(ya-dely)-yb
rij=sqrt(xij^2 + yij^2 + zij^2)
bx_quad+=(0.125)*(3*xij*xij*mx/(4*pi*rij^5) - mx/(4*pi*rij^3) + 3*xij*yij*my/(4*pi*rij^5) + 3*(xij*h)*mz/(4*pi*rij^5))
endfor
break
case 3: // 9-point quadrature on a square of size diam or cube with height thick
case 4:
make/o/n=3 weights = {5/9,8/9,5/9}, delta = {-sqrt(3/5), 0, sqrt(3/5)}
weights/=2
delta*=(diam/2)
for (i1=0; i1<3; i1+=1) // Square plane or top layer of cube
for (i2=0; i2<3; i2+=1)
xij=(xa-delta[i1])-xb
yij=(ya-delta[i2])-yb
rij=sqrt(xij^2 + yij^2 + zij^2)
bx_quad+=weights[i1]*weights[i2]*(3*xij*xij*mx/(4*pi*rij^5) - mx/(4*pi*rij^3) + 3*xij*yij*my/(4*pi*rij^5) + 3*(xij*h)*mz/(4*pi*rij^5))
endfor
endfor
endswitch
end
wave w
variable xa,ya,za
variable sourceno, result=0
for(sourceno=0; sourceno<numpnts(w)/6; sourceno+=1)
if (za==0)
result += dipolex_quad(w,xa,ya, sourceno)
endif
if (za==1)
result += dipoley_quad(w,xa,ya, sourceno)
endif
if (za==2)
result += dipolez_quad(w,xa,ya, sourceno)
endif
if (za==3)
result += sqrt((dipolex_quad(w,xa,ya, sourceno))^2+(dipoley_quad(w,xa,ya, sourceno))^2+(dipolez_quad(w,xa,ya, sourceno))^2)
endif
endfor
return result
end
function dipolex_quad(w,xa,ya, sourceno)
wave w
variable xa,ya, sourceno
variable mx=w[sourceno*6],my=w[sourceno*6+1],mz=w[sourceno*6+2],xb=w[sourceno*6+3],yb=w[sourceno*6+4],h=w[sourceno*6+5]
variable xij,yij,zij,rij
variable bx_quad, delx, dely, theta, i1, i2
nvar/z diam=root:diam, source_int=root:source_int, thick=root:thick, sensor=root:sensor
if (!nvar_exists(source_int))
variable/g root:diam=0, root:source_int=1, root:thick=0, root:sensor=0
endif
zij=h
bx_quad=0
switch(source_int)
case 1: // Dipole (point) source
xij=xa-xb
yij=ya-yb
rij=sqrt(xij^2 + yij^2 + zij^2)
// Formula in following line (and equivalents below) will vary for dipoley_quad and dipolez_quad
bx_quad=3*xij*xij*mx/(4*pi*rij^5) - mx/(4*pi*rij^3) + 3*xij*yij*my/(4*pi*rij^5) + 3*(xij*h)*mz/(4*pi*rij^5)
break
case 2: // 7-point quadrature on a disk with radius = diam/2
delx=0
dely=0
xij=(xa-delx)-xb
yij=(ya-dely)-yb
rij=sqrt(xij^2 + yij^2 + zij^2)
bx_quad+=0.25*(3*xij*xij*mx/(4*pi*rij^5) - mx/(4*pi*rij^3) + 3*xij*yij*my/(4*pi*rij^5) + 3*(xij*h)*mz/(4*pi*rij^5))
for (theta=0; theta<6; theta+=1)
delx=cos(theta*pi/3)*sqrt(2/3)*(diam/2)
dely=sin(theta*pi/3)*sqrt(2/3)*(diam/2)
xij=(xa-delx)-xb
yij=(ya-dely)-yb
rij=sqrt(xij^2 + yij^2 + zij^2)
bx_quad+=(0.125)*(3*xij*xij*mx/(4*pi*rij^5) - mx/(4*pi*rij^3) + 3*xij*yij*my/(4*pi*rij^5) + 3*(xij*h)*mz/(4*pi*rij^5))
endfor
break
case 3: // 9-point quadrature on a square of size diam or cube with height thick
case 4:
make/o/n=3 weights = {5/9,8/9,5/9}, delta = {-sqrt(3/5), 0, sqrt(3/5)}
weights/=2
delta*=(diam/2)
for (i1=0; i1<3; i1+=1) // Square plane or top layer of cube
for (i2=0; i2<3; i2+=1)
xij=(xa-delta[i1])-xb
yij=(ya-delta[i2])-yb
rij=sqrt(xij^2 + yij^2 + zij^2)
bx_quad+=weights[i1]*weights[i2]*(3*xij*xij*mx/(4*pi*rij^5) - mx/(4*pi*rij^3) + 3*xij*yij*my/(4*pi*rij^5) + 3*(xij*h)*mz/(4*pi*rij^5))
endfor
endfor
endswitch
end
Thanks for your help!
Nathan
So let get this straight- you have a 3-dimensional wave, which you feed to FuncFitMD. If you add the Threadsafe keyword to the fit function (and the sub-functions, since a threadsafe function can't call something that's not threadsafe) then you get a different answer. What happens if you mark the function Threadsafe, but you use /NTHR=1 with FuncFitMD? Does that change the result? It should work just like the non-threadsafe version.
I see references to global variables in your dipolex_quad() function. Do they get written to during the fit? If so, you should be aware that threaded code doesn't guarantee a particular order of running. Writing and reading global variables can be bad, because you don't know when one thread accesses them in relation to other threads. If you're only reading them, I think it should be OK. You might check to make sure you are getting the expected values out of the globals.
In general, you should avoid global variables. They produce dependencies in your code that can be difficult to track down. From the names I'm guessing that you're trying to fit antenna models to measurements, and the globals are feeding in information about the antenna configuration. That shouldn't change during the fit, so you should be OK, as long as you are getting the right values of the globals inside the functions in the first place.
Are the runs really different, or just a little bit different? Are you careful to use the same initial guesses for both cases?
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
November 15, 2012 at 03:03 pm - Permalink
That said, I do find that the multithreading fits are more stable when fitting multiple sets of parameters (sources). I consistently get singular matrix errors for at least some initial values using either /NTHR=1 or non multithread-enabled functions, while enabling multiple threads does converge to a reasonable answer. The starting conditions are always exactly the same as the initial values of the coef wave are copied from 2 other waves which are not subsequently changed. I don't know if the multithread fit is expected to follow different routes to the minima, but I can consistently reproduce it.
Thanks, too, for the advice on global variables. I suspect I'm not using best practice as far as interacting with the user but implemented them in the hope of preventing errors due to calling unset local variables. I shall keep your suggestions in mind in future.
Thanks again, and sorry for getting you to take the time on what was, in the end, muddle-headed thinking. If you want more details, however, about fits that work with multithread functions and not with single thread, I can provide them.
Nathan
November 18, 2012 at 07:42 am - Permalink
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
November 19, 2012 at 09:29 am - Permalink
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
November 20, 2012 at 11:10 am - Permalink