diff options
author | Bernhard Urban-Forster <lewurm@gmail.com> | 2019-11-04 18:53:12 +0300 |
---|---|---|
committer | Alexander Köplinger <alex.koeplinger@outlook.com> | 2019-11-04 18:53:12 +0300 |
commit | c44efe7297f8e11cab157c7e96b998dc936d53b7 (patch) | |
tree | 5817d7dc252547d9a9b3eaadac0a5ea96f9733b3 /src | |
parent | 6e65509a17da898933705899677c22eae437d68a (diff) |
[mono] limit DegreeOfParallelism to 16 (#369)
We started to see the `System.Core-xunit` step on CI to hit the timeout of 15 minutes with Linux/ARM64. That was weird, because the step used to be completed in around two minutes. With my local device (jetson board) I wasn't able to reproduce it either; it took around 100s there. We then realized it's specific to the new `taishan` CI machines, which are equipped with 64 cores. Hardcoding `mono_cpu_count` to return 16 restored the performance, however that isn't a viable fix.
Limiting `DefaultDegreeOfParallelism` to 16 fixes it, which is less extreme than limiting `mono_cpu_count ()`, still not ideal though. It seems to boil down to the fact that our non-netcore threadpool implementation doesn't handle a large number of cores well.
`repro.cs`, extracted from here https://github.com/dotnet/corefx/blob/a9b91e205a8794327a028cb4b29953127f0f194c/src/System.Linq.Parallel/tests/QueryOperators/ConcatTests.cs#L145-L154
```csharp
using System;
using System.Linq;
using System.Collections.Generic;
using System.Threading;
public class Repro {
public static void Main (string []args) {
const int ElementCount = 2048;
ParallelQuery<int> leftQuery = ParallelEnumerable.Range(0, ElementCount / 4).Union(ParallelEnumerable.Range(ElementCount / 4, ElementCount / 4));
ParallelQuery<int> rightQuery = ParallelEnumerable.Range(2 * ElementCount / 4, ElementCount / 4).Union(ParallelEnumerable.Range(3 * ElementCount / 4, ElementCount / 4));
var results = new HashSet<int>(leftQuery.Concat(rightQuery));
Console.WriteLine ("results.Count=" + results.Count + ", ElementCount=" + ElementCount);
}
}
```
Before fix:
```console
$ time ./mono/mini/mono-sgen repro.exe
results.Count=2048, ElementCount=2048
real 0m5.846s
user 0m0.344s
sys 0m1.929s
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
net_4_x_System.Core_xunit-test Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 536.005s
```
With this fix:
```console
$ time ./mono/mini/mono-sgen repro.exe
results.Count=2048, ElementCount=2048
real 0m1.247s
user 0m0.206s
sys 0m0.225s
$ make -C mcs/class/System.Core run-xunit-test
[...]
=== TEST EXECUTION SUMMARY ===
net_4_x_System.Core_xunit-test Total: 48774, Errors: 0, Failed: 0, Skipped: 6, Time: 131.143s
```
Diffstat (limited to 'src')
-rw-r--r-- | src/System.Linq.Parallel/src/System/Linq/Parallel/Scheduling/Scheduling.cs | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/src/System.Linq.Parallel/src/System/Linq/Parallel/Scheduling/Scheduling.cs b/src/System.Linq.Parallel/src/System/Linq/Parallel/Scheduling/Scheduling.cs index 945994e06e..4c9a89c1da 100644 --- a/src/System.Linq.Parallel/src/System/Linq/Parallel/Scheduling/Scheduling.cs +++ b/src/System.Linq.Parallel/src/System/Linq/Parallel/Scheduling/Scheduling.cs @@ -47,8 +47,13 @@ namespace System.Linq.Parallel // The number of milliseconds before we assume a producer has been zombied. internal const int ZOMBIED_PRODUCER_TIMEOUT = Timeout.Infinite; +#if MONO + /* limit to degree of 16 to avoid too much contention */ + internal const int MAX_SUPPORTED_DOP = 16; +#else // The largest number of partitions that PLINQ supports. internal const int MAX_SUPPORTED_DOP = 512; +#endif //----------------------------------------------------------------------------------- |