HashSet Initialisation Speed in C#
Which one is faster?
var hs = new HashSet<int>(_data);
var hs = new HashSet<int>();
foreach(int i in _data) {
hs.Add(i);
}
var hs = new HashSet<int>(_data.Length);
foreach (int i in _data) {
hs.Add(i);
}
My first thought was that option 1 is definitely faster - I’m passing entire dataset into HashSet
so .NET should be efficient enough to figure that out. Testing it:
Method | Length | Mean | Error | StdDev | Min | Max | Median | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|
WithConstructor | 1000000 | 52.07 ms | 44.886 ms | 2.460 ms | 49.47 ms | 54.37 ms | 52.36 ms | 454.5455 | 454.5455 | 454.5455 | 17.74 MB |
WithAddNoLength | 1000000 | 53.45 ms | 5.679 ms | 0.311 ms | 53.10 ms | 53.70 ms | 53.54 ms | 900.0000 | 900.0000 | 900.0000 | 41.12 MB |
WithAddWithLength | 1000000 | 37.92 ms | 15.129 ms | 0.829 ms | 37.01 ms | 38.64 ms | 38.10 ms | 500.0000 | 500.0000 | 500.0000 | 17.74 MB |
which means option #3 is faster! Passing data length into constructor makes sure we’ll have no memory reallocation, and then adding element one by one fills it up quicker.
Benchmark Code
#LINQPad optimize+
void Main()
{
Util.AutoScrollResults = true;
BenchmarkRunner.Run<Enumeration>();
}
[ShortRunJob]
[MinColumn, MaxColumn, MeanColumn, MedianColumn]
[MemoryDiagnoser]
[MarkdownExporter]
public class Enumeration
{
[Params(1000000)]
public int Length;
private int[] _data;
private static Random random = new Random();
[GlobalSetup]
public void Setup()
{
_data = Enumerable.Range(0, Length).Select(i => random.Next()).ToArray();
}
[Benchmark]
public void WithConstructor()
{
var hs = new HashSet<int>(_data);
}
[Benchmark]
public void WithAddNoLength() {
var hs = new HashSet<int>();
foreach(int i in _data) {
hs.Add(i);
}
}
[Benchmark]
public void WithAddWithLength() {
var hs = new HashSet<int>(_data.Length);
foreach (int i in _data) {
hs.Add(i);
}
}
}
To contact me, send an email anytime or leave a comment below.