To be able to edit code and run cells, you need to run the notebook yourself. Where would you like to run the notebook?

This notebook takes about 3 minutes to run.

In the cloud (experimental)

Binder is a free, open source service that runs scientific notebooks in the cloud! It will take a while, usually 2-7 minutes to get a session.

On your computer

(Recommended if you want to store your changes.)

  1. Copy the notebook URL:
  2. Run Pluto

    (Also see: How to install Julia and Pluto)

  3. Paste URL in the Open box

Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Author 1
👀 Reading hidden code
html"<button onclick='present()'>Toggle presentation mode</button>"
117 μs
👀 Reading hidden code
begin
using BenchmarkTools
using LLVM
using LLVM.Interop
using Unitful
using PlutoUI
end
6.1 s

A note on benchmarking

Premature optimization is the root of all evil & If you don't measure you won't improve

Tools

  • BenchmarkTools.jl https://github.com/JuliaCI/BenchmarkTools.jl

  • Profiler https://docs.julialang.org/en/latest/manual/profile/

  • ProfileView.jl https://github.com/timholy/ProfileView.jl

  • PProf.jl https://github.com/JuliaPerf/PProf.jl

Other

  • VTunes/Perf https://docs.julialang.org/en/latest/manual/profile/#External-Profiling-1

  • LIKWID.jl

  • MCAnalyzer.jl

👀 Reading hidden code
827 μs

BenchmarkTools.jl

Solid package that tries to eliminate common pitfalls in performance measurment.

  • @benchmark macro that will repeatedly evaluate your code to gain enough samples

  • Caveat: You probably want to escape $ your input data

👀 Reading hidden code
398 μs

@bind N Slider(1:9)
👀 Reading hidden code
328 ms
data = rand(10^N);
👀 Reading hidden code
18.1 μs
sum (generic function with 1 method)
function sum(X::Vector{Float64})
acc = 0::Int64
for x in X
(acc += x)::Float64
end
acc::Union{Int64, Float64}
end
👀 Reading hidden code
916 μs
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min … max):  13.170 ns … 33.679 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     13.191 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.347 ns ±  1.002 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄                                                          ▁
  ██▄▁▄▁█▆▃▁▁▁▁▁▁▄▄▄▄▄▃▁▁▁▁▁▃▁▁▁▁▁▁▁▃▁▃▃▁▁▁▁▄▁▁▁▃▁▁▁▁▁▁▁▃▁▄▆█ █
  13.2 ns      Histogram: log(frequency) by time      20.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark sum($data)
👀 Reading hidden code
1.3 s

Figuring out what is happening

The stages of the compiler

  • @code_lowered

  • @code_typed & @code_warntype

  • @code_llvm

  • @code_native

Where is a function defined @which & @edit

👀 Reading hidden code
537 μs
CodeInfo(
1 ─       acc = Core.typeassert(0, Main.var"workspace#3".Int64)
│   %2  = X
│         @_3 = Base.iterate(%2)
│   %4  = @_3 === nothing
│   %5  = Base.not_int(%4)
└──       goto #4 if not %5
2 ┄ %7  = @_3
│         x = Core.getfield(%7, 1)
│   %9  = Core.getfield(%7, 2)
│   %10 = acc + x
│         acc = %10
│         Core.typeassert(%10, Main.var"workspace#3".Float64)
│         @_3 = Base.iterate(%2, %9)
│   %14 = @_3 === nothing
│   %15 = Base.not_int(%14)
└──       goto #4 if not %15
3 ─       goto #2
4 ┄ %18 = acc
│   %19 = Core.apply_type(Main.var"workspace#3".Union, Main.var"workspace#3".Int64, Main.var"workspace#3".Float64)
│   %20 = Core.typeassert(%18, %19)
└──       return %20
)
@code_lowered sum(data)
👀 Reading hidden code
63.9 ms

A simple example: counting

👀 Reading hidden code
173 μs
f (generic function with 1 method)
function f(N)
acc = 0
for i in 1:N
acc += 1
end
return acc
end
👀 Reading hidden code
808 μs
100000000
K = 100_000_000
👀 Reading hidden code
10.5 μs
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  2.785 ns … 21.130 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.795 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.816 ns ±  0.424 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                             █                                
  ▇▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▂ ▂
  2.78 ns        Histogram: frequency by time        2.81 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
result = @benchmark f($N)
👀 Reading hidden code
851 ms
👀 Reading hidden code
209 ms

So we are doing 100.0 million additions in 2.785 ns...

That would mean our processor operates at 36 PHz

We wish...

Let's do a basic check, 10x bigger input.

👀 Reading hidden code
579 μs
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  2.785 ns … 25.678 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.795 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.817 ns ±  0.455 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                             █                                
  ▇▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▂ ▂
  2.78 ns        Histogram: frequency by time        2.81 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark f($(Ref(10*K))[])
👀 Reading hidden code
845 ms

Let's explore what code we are actually running.

Using Julia's reflection macros we can see all of the stages of code-generation.

👀 Reading hidden code
260 μs
CodeInfo(
1 ─       acc = 0
│   %2  = 1:N
│         @_3 = Base.iterate(%2)
│   %4  = @_3 === nothing
│   %5  = Base.not_int(%4)
└──       goto #4 if not %5
2 ┄ %7  = @_3
│         i = Core.getfield(%7, 1)
│   %9  = Core.getfield(%7, 2)
│         acc = acc + 1
│         @_3 = Base.iterate(%2, %9)
│   %12 = @_3 === nothing
│   %13 = Base.not_int(%12)
└──       goto #4 if not %13
3 ─       goto #2
4 ┄       return acc
)
@code_lowered f(K)
👀 Reading hidden code
2.1 ms
CodeInfo(
1 ─       (acc = 0)::Const(0)
│   %2  = (1:N)::PartialStruct(UnitRange{Int64}, Any[Const(1), Int64])
│         (@_3 = Base.iterate(%2))::Union{Nothing, Tuple{Int64, Int64}}
│   %4  = (@_3 === nothing)::Bool
│   %5  = Base.not_int(%4)::Bool
└──       goto #4 if not %5
2 ┄ %7  = @_3::Tuple{Int64, Int64}
│         (i = Core.getfield(%7, 1))::Int64
│   %9  = Core.getfield(%7, 2)::Int64
│         (acc = acc + 1)::Int64
│         (@_3 = Base.iterate(%2, %9))::Union{Nothing, Tuple{Int64, Int64}}
│   %12 = (@_3 === nothing)::Bool
│   %13 = Base.not_int(%12)::Bool
└──       goto #4 if not %13
3 ─       goto #2
4 ┄       return acc
)
Int64
@code_typed optimize=false f(K)
👀 Reading hidden code
2.3 ms
CodeInfo(
1 ── %1  = Base.sle_int(1, N)::Bool
└───       goto #3 if not %1
2 ──       goto #4
3 ──       goto #4
4 ┄─ %5  = φ (#2 => N, #3 => 0)::Int64
└───       goto #5
5 ──       goto #6
6 ── %8  = Base.slt_int(%5, 1)::Bool
└───       goto #8 if not %8
7 ──       goto #9
8 ──       goto #9
9 ┄─ %12 = φ (#7 => true, #8 => false)::Bool
│    %13 = φ (#8 => 1)::Int64
│    %14 = Base.not_int(%12)::Bool
└───       goto #15 if not %14
10 ┄ %16 = φ (#9 => %13, #14 => %24)::Int64
│    %17 = φ (#9 => 0, #14 => %18)::Int64
│    %18 = Base.add_int(%17, 1)::Int64
│    %19 = (%16 === %5)::Bool
└───       goto #12 if not %19
11 ─       goto #13
12 ─ %22 = Base.add_int(%16, 1)::Int64
└───       goto #13
13 ┄ %24 = φ (#12 => %22)::Int64
│    %25 = φ (#11 => true, #12 => false)::Bool
│    %26 = Base.not_int(%25)::Bool
└───       goto #15 if not %26
14 ─       goto #10
15 ┄ %29 = φ (#13 => %18, #9 => 0)::Int64
└───       return %29
)
Int64
@code_typed optimize=true f(K)
👀 Reading hidden code
4.8 ms
Loading more cells...