A popular and quick way to work with arrays in Powershell is to use a +=
operation:
$array = @()
1..10000 | Foreach-Object {$array += "Adding $_ to the array"}
Let’s time this:
$array = @()
$stopwatch = [System.Diagnostics.Stopwatch]::StartNew()
1..100000 | Foreach-Object {$array += "Adding $_ to the array"}
$report = '{0} elements collected in {1:n1} seconds'
$report -f $array.Count, $stopwatch.Elapsed.TotalSeconds
Processing these 100,000 records took my system 214.7 seconds. When Powershell code uses the +=
operator, it actually creates a new array that’s one record larger than the current array, copies each existing entry into the new array and then adds the newest entry into the free record. It has to do this every time the +=
operator is used, which is 100,000 times in this sample code. The last time it runs, it needs to copy 99,999 existing entries into a new array….ouch! It’s easy to see why this is so slow!
Let’s use a .NET method to do this and see whether it’s any faster:
$array = [System.Collections.ArrayList]@()
$stopwatch = [System.Diagnostics.Stopwatch]::StartNew()
1..100000 | Foreach-Object { $null = $array.Add("Adding $_ to the array")}
$report = '{0} elements collected in {1:n1} seconds'
$report -f $array.Count, $stopwatch.Elapsed.TotalSeconds
Instead of 214.7 seconds, this takes 0.4 seconds - a 53,575% speed increase with next to no extra work.
Technically, I should have used [System.Collections.Generic.List]
instead, as that’s what Microsoft recommends.
You can do this in pure Powershell too, but it only handles one item per iteration. Because you’re returning more than one element, Powershell will automatically create an array. This method also took 0.4 seconds.
$stopwatch = [System.Diagnostics.Stopwatch]::StartNew()
$array = 1..100000 | Foreach-Object {"Adding $_ to the array"}
$report = '{0} elements collected in {1:n1} seconds'
$report -f $array.Count, $stopwatch.Elapsed.TotalSeconds
Both the .NET method and Powershell’s native array-creation can append arrays without having to recreate/repopulate them. That’s why they’re equally fast, and both far faster than +=
.
Now, +=
isn’t always evil. It’s useful for adding numbers. However, once you start working with array objects, you incur the performance penalty. If you use +=
on a string, you incur the performance penalty since a string is really an array of characters. If you’re working with small arrays and the performance hit is acceptable, then there’s no real harm in using it.