Consider the following trivial function, xpow, which takes an integer as input and returns the first few powers of the number. Another function, xpow_loop, uses the first function to compute the sum of squares of a large sequence of numbers as follows:
function xpow(x)
return [x x^2 x^3 x^4]
end
function xpow_loop(n)
s = 0
for i = 1:n
s = s + xpow(i)[2]
end
return s
end
Benchmarking this function for large input shows that this function is quite slow:
julia> @btime xpow_loop($1000000)
100.594 ms (4999441 allocations: 167.84 MiB)
The clue is in the number of allocations displayed in the preceding output. Within the xpow function, a four-element array is allocated for each invocation of this function. This allocation and the subsequent garbage collection take a significant amount of time. The number (and size) of allocations...