HashSet Initialisation Speed in C#

Which one is faster?

var hs = new HashSet<int>(_data);
var hs = new HashSet<int>();
foreach(int i in _data) {
    hs.Add(i);
}
var hs = new HashSet<int>(_data.Length);
foreach (int i in _data) {
	hs.Add(i);
}

My first thought was that option 1 is definitely faster - I’m passing entire dataset into HashSet so .NET should be efficient enough to figure that out. Testing it:

Method Length Mean Error StdDev Min Max Median Gen0 Gen1 Gen2 Allocated
WithConstructor 1000000 52.07 ms 44.886 ms 2.460 ms 49.47 ms 54.37 ms 52.36 ms 454.5455 454.5455 454.5455 17.74 MB
WithAddNoLength 1000000 53.45 ms 5.679 ms 0.311 ms 53.10 ms 53.70 ms 53.54 ms 900.0000 900.0000 900.0000 41.12 MB
WithAddWithLength 1000000 37.92 ms 15.129 ms 0.829 ms 37.01 ms 38.64 ms 38.10 ms 500.0000 500.0000 500.0000 17.74 MB

which means option #3 is faster! Passing data length into constructor makes sure we’ll have no memory reallocation, and then adding element one by one fills it up quicker.

Benchmark Code

#LINQPad optimize+

void Main()
{
	Util.AutoScrollResults = true;
	BenchmarkRunner.Run<Enumeration>();
}

[ShortRunJob]
[MinColumn, MaxColumn, MeanColumn, MedianColumn]
[MemoryDiagnoser]
[MarkdownExporter]
public class Enumeration
{
	[Params(1000000)]
	public int Length;
    
    private int[] _data;
    private static Random random = new Random();

	[GlobalSetup]
	public void Setup()
	{
        _data = Enumerable.Range(0, Length).Select(i => random.Next()).ToArray();
	}


	[Benchmark]
	public void WithConstructor()
	{
        var hs = new HashSet<int>(_data);
	}

    [Benchmark]
    public void WithAddNoLength() {
        var hs = new HashSet<int>();
        foreach(int i in _data) {
            hs.Add(i);
        }
    }

    [Benchmark]
    public void WithAddWithLength() {
        var hs = new HashSet<int>(_data.Length);
        foreach (int i in _data) {
            hs.Add(i);
        }
    }


}


To contact me, send an email anytime or leave a comment below.