Multi Threading M1 (Igor 8/9 using Rosetta)
ggermer
Does anyone have experience with multi threading on an M1? The CPUs have different clock rates, if I understood correctly (some are slow, others are faster).
Do programs like Igor access all CPUs or are the slow ones reserved for background processes of macOS.
(If Igor can use all CPUs:) Does it lead to significant time differences when using fast and slow CPUs together? If so, can you read the clock speed for each thread by Igor to correct this?
Do have access to an M1 for doing some tests?
September 1, 2021 at 05:55 am - Permalink
I have M1 mini available. Baseline. IP9 release version installed. How do I test this?
September 1, 2021 at 06:19 am - Permalink
I would say do some calculations with Multithread/NT=(x), measure the time and then plot time as function of x. There should be a dip when going from the fast cores to the slow cores.
September 1, 2021 at 06:28 am - Permalink
The number of available CPUs should be displayed with
print ThreadProcessorCount
For the other things, I'm not sure how to easily figure that out. I don't know of any command that gives you the clock speed per processor in Igor Pro.
And unfortunately I don't have an M1 available to try other things out.
Edit: Dividing the time between several CPUs does not give a linear time gain. But I would also assume that there will be differences depending on the clock rate of the core.
September 1, 2021 at 06:28 am - Permalink
The macOS implementation for Igor's ThreadProcessorCount function is essentially this:
return sysconf( _SC_NPROCESSORS_ONLN );
We don't have any M1 machines so I can't tell you what that returns.
There is also no way to read the clock speed of a thread or investigate any other properties of a thread.
September 1, 2021 at 07:09 am - Permalink
I found "MultithreadMandelbrot.pxp" demo from WM which I could easily modify to run same stuff with different number of cores. Looks like this will need some more studies by someone smarter...
NOT using MandelbrotPoint function (was too fast) and only varying number of cores, changing zoom between 6 and 6.5 in the demo, so one time is zoom up and the other zoom down. Same image area. Looks like odd number of cores results in longer calculation times? And yes, it seems reproducible, with some noise from test to test, but the general differences are real.
1 core - 0.9 / 0.7 sec
2 core - 0.57 / 0.44 sec
3 core - 0.64 / 0.45 sec
4 core - 0.38 / 0.30 sec
5 core - 0.50 / 0.35 sec
6 core - 0.33 / 0.25 sec
7 core - 0.40 / 0.28 sec.
8 core - 0.31 / 0.23 sec
September 1, 2021 at 07:31 am - Permalink
Something like the following maybe?
variable ref
Make/N=(1e6)/D/FREE data
ref = stopmstimer(-2)
data[] = p * sin(p) * cos(p) * exp(p)
return (stopmstimer(-2) - ref)/1e6
End
Function RunMe()
variable i
variable numRuns = ThreadProcessorCount
Make/O/N=(numRuns) totals = NaN
for(i = 1; i < numRuns; i += 1)
Make/FREE/N=(i) result
Multithread/NT=(i) result = DoWork()
// print result
totals[i] = Mean(result)
printf "Avearge: %#.010g [s]\r", totals[i]
endfor
Display totals
ModifyGraph mode=4
End
September 1, 2021 at 07:59 am - Permalink
Sure, here are results
Avearge: 1.408926606 [s]
Avearge: 1.241815805 [s]
Avearge: 1.284322858 [s]
Avearge: 1.288824320 [s]
Avearge: 1.446949244 [s]
Avearge: 1.583647490 [s]
Avearge: 1.691548944 [s]
September 1, 2021 at 08:48 am - Permalink
Here is up to 8 cores :
Avearge: 1.232660770 [s]
Avearge: 1.276729107 [s]
Avearge: 1.283119798 [s]
Avearge: 1.442630410 [s]
Avearge: 1.572660923 [s]
Avearge: 1.688646317 [s]
Avearge: 1.808453679 [s]
note: not all cores seem to report 100% utilization in this procedure, even though I increased number of points in DoWork to 1e7 points to make it run longer. 4 cores show 100% utilization and 4 cores do not. Not sure which ones as that I do not see type of cores from the iStatMenu.
September 1, 2021 at 08:51 am - Permalink
With 1e7 I get (IP8 on windows):
Average: 2.364289761 [s]
Average: 2.256549358 [s]
Average: 2.277353048 [s]
Average: 2.321866274 [s]
Average: 2.450463057 [s]
Average: 2.531247616 [s]
Average: 2.648287058 [s]
Average: 2.708741426 [s]
Average: 2.847256899 [s]
Average: 2.920223236 [s]
It could be that taking the average over the #i cores does already mess up the results.
September 1, 2021 at 09:15 am - Permalink
Thanks everyone, this looks very interesting.... effectively, the performance doesn't seem to differ much between the two types of CPU.
With the different clock speeds (internet says: 3.2 GHz and 2 GHz) amazing. Maybe Igor can't access the full performance through Rosetta? Who knows.
September 1, 2021 at 02:21 pm - Permalink
Or my test routine is still not working as it should.
September 2, 2021 at 03:13 am - Permalink
@ thomas_braun - Is the test routine only ever testing up to n-1 cores?
September 2, 2021 at 03:36 am - Permalink
variable ref
Make/N=(1e7)/D/FREE data
ref = stopmstimer(-2)
data[] = p * sin(p) * cos(p) * exp(p)
return (stopmstimer(-2) - ref)/1e6
End
Function RunMe()
variable i
variable numRuns = ThreadProcessorCount
Make/O/N=(numRuns+1) totals = NaN
for(i = 1; i < (numRuns+1); i += 1)
Make/FREE/N=(i) result
Multithread/NT=(i) result = DoWork()
// print result
totals[i] = Mean(result)
printf "Avearge: %#.010g [s]\r", totals[i]
endfor
Display totals
ModifyGraph mode=4
End
There just needed to be some changes - that's how it should work.
However, it seems that you cannot output the speed of the M1's cores at all. Thus, one can unfortunately not take these differences into account by an asymmetrical distribution of the tasks.
Edit: I found someone who did a longer multi-core computation (many user-defined funcfits) for me with an M1 MacBook Air as a test. All eight cores finished the calculation (around 8 minutes) simultaneously (± 5 seconds). Maybe the M1 will re-sort the assigned tasks itself afterwards? Unfortunately, Apple seems to block a lot of information about this.
September 6, 2021 at 04:07 am - Permalink
Dear all,
I have some calculation with multi processing which runs between 45min - 3h depending on the amount of data. I ran on IgorPro 8 the experiment on the same data on a trash-can MacPro and a MacbookAir. The MacbookAir was about 1.5-2 times faster.
Because they run so long, the old trash-can MacPro has to do the job. Nevertheless even during the long run the was no heat issue nearly all CPUs run a max output (no throttle)
best regards
Stefan
September 23, 2021 at 12:17 pm - Permalink
to clarify. the MacBookAir is with M1, has 16 Gb Ram, 1 Tb SDD
September 24, 2021 at 01:53 am - Permalink