To be able to edit code and run cells, you need to run the notebook yourself. Where would you like to run the notebook?

This notebook takes about 1 minute to run.

In the cloud (experimental)

Binder is a free, open source service that runs scientific notebooks in the cloud! It will take a while, usually 2-7 minutes to get a session.

On your computer

(Recommended if you want to store your changes.)

  1. Copy the notebook URL:
  2. Run Pluto

    (Also see: How to install Julia and Pluto)

  3. Paste URL in the Open box

Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Author 1
👀 Reading hidden code
122 μs

Concurrency in Julia

Julia has a Task based concurrency model, most similar to Go. Users can @spawn work and use channels, atomics and locks to communicate between tasks.

Parallelism

Julia uses a homegrown pthread based task runtime + scheduler. Use julia --threads=auto or julia --threads=4 to start Julia with a number of worker threads.

A side-note on naming (Historical note #1)

Due to a historical glitch Julia put all the task related functionality into a module called Threads. I aim to rectify this eventually, since this causes confusion.

Julia doesn't have threads... It only ever had tasks and threads are an implementations detail.

👀 Reading hidden code
584 μs
Base.Threads
const Tasks = Base.Threads
👀 Reading hidden code
145 μs
using Hwloc
👀 Reading hidden code
1.3 s
topology_info()
👀 Reading hidden code
❔
40.9 ms
2
Tasks.nthreads()
👀 Reading hidden code
2.5 ms
fib (generic function with 1 method)
function fib(x)
if x <= 1
return 1
else
b = Tasks.@spawn fib(x-2)
a = fib(x-1)
return a+(fetch(b)::Int)
end
end
👀 Reading hidden code
14.9 ms
8
fib(5)
👀 Reading hidden code
3.5 ms
let ch = Channel{Int}(Inf) # buffered
@sync begin
for i in 1:10
Tasks.@spawn put!(ch, rand(Int))
end
end
close(ch) # Otherwise collect will wait for more data
collect(ch)
end
👀 Reading hidden code
137 ms
using PlutoUI, ThreadPinning
👀 Reading hidden code
6.2 s
with_terminal() do
pinthreads(:random)
threadinfo()
end
👀 Reading hidden code
825 ms
with_terminal() do
pinthreads(:compact)
threadinfo()
end
👀 Reading hidden code
22.7 ms
with_terminal() do
pinthreads(:numa)
threadinfo()
end
👀 Reading hidden code
23.1 ms

The @threads macro (Historical note #2)

Before Julia's task runtime supported multiple worker thread, we had the @threads macro for "parallel for-loops"

function saxpy!(Z, a, X, Y)
    @threads for i in eachindex(Z, X, Y)
        Z[i] = a*X[i] * Y[i]
    end
    return nothing
end

GPU parallelism

Julia has a full-stack GPU programming environment.

  1. Kernel-level

  2. Library-based

  3. Data-parallel primitives

Kernels

using CUDA

function gpu_add2!(y, x)
    index = threadIdx().x
    stride = blockDim().x
    for i in index:stride:length(y)
        @inbounds y[i] += x[i]
    end
    return nothing
end

@cuda threads=256 gpu_add2!(y, x)

Data-parallel primitives

  • map, reduce, mapreduce

  • broadcasting

map!(+, y, y, x)
y .+= x

Research questions

Auto-offloading?

function saxpy!(Z, a, X, Y)
    for i in eachindex(Z, X, Y)
        Z[i] = a*X[i] * Y[i]
    end
    return nothing
end

N = 1024
Z = CUDA.zeros(N)
X = CUDA.rand(N)
Y = CUDA.rand(N)
saxpy!(Z, 2.0, X, Y) # This runs on CPU

Z .= 2.0 .* X .+ Y # This run on GPU

Language support for data-parallel primitives on the CPU?

Currently map&co are all single-threaded since we would need to prove that f is safe to auto-parallelize, and that array loads/stores are unaliased.

(The latter can be surprisingly tricky, looking at you BitArray)

Scheduler and Runtime improvements

  • Pluggable scheduler?

  • Better insights and instrumentation?

  • Cheaper task-switching?

  • Growable-stack / Currently we don't have a cactus-stack.

  • ...

👀 Reading hidden code
5.4 ms

Julia Compilation

Julia uses a multi-stage compilation pipeline with the upper half being written in Julia, and the lower-stage(s) using LLVM.

The pipeline is introspectable and fairly hackable.

👀 Reading hidden code
295 μs
f (generic function with 1 method)
function f(N)
acc = 0
for i in 1:N
acc += 1
end
return acc
end
👀 Reading hidden code
802 μs
CodeInfo(
1 ─       acc = 0
│   %2  = 1:N
│         @_3 = Base.iterate(%2)
│   %4  = @_3 === nothing
│   %5  = Base.not_int(%4)
└──       goto #4 if not %5
2 ┄ %7  = @_3
│         i = Core.getfield(%7, 1)
│   %9  = Core.getfield(%7, 2)
│         acc = acc + 1
│         @_3 = Base.iterate(%2, %9)
│   %12 = @_3 === nothing
│   %13 = Base.not_int(%12)
└──       goto #4 if not %13
3 ─       goto #2
4 ┄       return acc
)
@code_lowered f(10)
👀 Reading hidden code
72.8 ms
CodeInfo(
1 ─       (acc = 0)::Const(0)
│   %2  = (1:N)::PartialStruct(UnitRange{Int64}, Any[Const(1), Int64])
│         (@_3 = Base.iterate(%2))::Union{Nothing, Tuple{Int64, Int64}}
│   %4  = (@_3 === nothing)::Bool
│   %5  = Base.not_int(%4)::Bool
└──       goto #4 if not %5
2 ┄ %7  = @_3::Tuple{Int64, Int64}
│         (i = Core.getfield(%7, 1))::Int64
│   %9  = Core.getfield(%7, 2)::Int64
│         (acc = acc + 1)::Int64
│         (@_3 = Base.iterate(%2, %9))::Union{Nothing, Tuple{Int64, Int64}}
│   %12 = (@_3 === nothing)::Bool
│   %13 = Base.not_int(%12)::Bool
└──       goto #4 if not %13
3 ─       goto #2
4 ┄       return acc
)
Int64
@code_typed optimize=false f(10)
👀 Reading hidden code
323 μs
CodeInfo(
1 ── %1  = Base.sle_int(1, N)::Bool
└───       goto #3 if not %1
2 ──       goto #4
3 ──       goto #4
4 ┄─ %5  = φ (#2 => N, #3 => 0)::Int64
└───       goto #5
5 ──       goto #6
6 ── %8  = Base.slt_int(%5, 1)::Bool
└───       goto #8 if not %8
7 ──       goto #9
8 ──       goto #9
9 ┄─ %12 = φ (#7 => true, #8 => false)::Bool
│    %13 = φ (#8 => 1)::Int64
│    %14 = Base.not_int(%12)::Bool
└───       goto #15 if not %14
10 ┄ %16 = φ (#9 => %13, #14 => %24)::Int64
│    %17 = φ (#9 => 0, #14 => %18)::Int64
│    %18 = Base.add_int(%17, 1)::Int64
│    %19 = (%16 === %5)::Bool
└───       goto #12 if not %19
11 ─       goto #13
12 ─ %22 = Base.add_int(%16, 1)::Int64
└───       goto #13
13 ┄ %24 = φ (#12 => %22)::Int64
│    %25 = φ (#11 => true, #12 => false)::Bool
│    %26 = Base.not_int(%25)::Bool
└───       goto #15 if not %26
14 ─       goto #10
15 ┄ %29 = φ (#13 => %18, #9 => 0)::Int64
└───       return %29
)
Int64
@code_typed optimize=true f(10)
👀 Reading hidden code
132 ms
with_terminal() do
@code_llvm optimize=false f(10)
end
👀 Reading hidden code
486 ms
with_terminal() do
@code_llvm optimize=true f(10)
end
👀 Reading hidden code
86.6 ms
Loading more cells...