Concurrency in Julia
Julia has a Task
based concurrency model, most similar to Go. Users can @spawn
work and use channels, atomics and locks to communicate between tasks.
Parallelism
Julia uses a homegrown pthread based task runtime + scheduler. Use julia --threads=auto
or julia --threads=4
to start Julia with a number of worker threads.
A side-note on naming (Historical note #1)
Due to a historical glitch Julia put all the task related functionality into a module called Threads
. I aim to rectify this eventually, since this causes confusion.
Julia doesn't have threads... It only ever had tasks and threads are an implementations detail.
Base.Threads
2
fib (generic function with 1 method)
8
5623898139965095352
-1723819890708403891
-8603944170663419531
-9207075075160318423
-5437812186776489096
-7502739700500003700
1035726367590185924
-4314674458460265130
-616137577306567529
6894767255460396527
The @threads
macro (Historical note #2)
Before Julia's task runtime supported multiple worker thread, we had the @threads
macro for "parallel for-loops"
function saxpy!(Z, a, X, Y)
@threads for i in eachindex(Z, X, Y)
Z[i] = a*X[i] * Y[i]
end
return nothing
end
GPU parallelism
Julia has a full-stack GPU programming environment.
Kernel-level
Library-based
Data-parallel primitives
Kernels
using CUDA
function gpu_add2!(y, x)
index = threadIdx().x
stride = blockDim().x
for i in index:stride:length(y)
@inbounds y[i] += x[i]
end
return nothing
end
@cuda threads=256 gpu_add2!(y, x)
Data-parallel primitives
map
,reduce
,mapreduce
broadcasting
map!(+, y, y, x)
y .+= x
Research questions
Auto-offloading?
function saxpy!(Z, a, X, Y)
for i in eachindex(Z, X, Y)
Z[i] = a*X[i] * Y[i]
end
return nothing
end
N = 1024
Z = CUDA.zeros(N)
X = CUDA.rand(N)
Y = CUDA.rand(N)
saxpy!(Z, 2.0, X, Y) # This runs on CPU
Z .= 2.0 .* X .+ Y # This run on GPU
Language support for data-parallel primitives on the CPU?
Currently map
&co are all single-threaded since we would need to prove that f
is safe to auto-parallelize, and that array loads/stores are unaliased.
(The latter can be surprisingly tricky, looking at you BitArray
)
Scheduler and Runtime improvements
Pluggable scheduler?
Better insights and instrumentation?
Cheaper task-switching?
Growable-stack / Currently we don't have a cactus-stack.
...
Julia Compilation
Julia uses a multi-stage compilation pipeline with the upper half being written in Julia, and the lower-stage(s) using LLVM.
The pipeline is introspectable and fairly hackable.
f (generic function with 1 method)
CodeInfo(
1 ─ acc = 0
│ %2 = 1:N
│ @_3 = Base.iterate(%2)
│ %4 = @_3 === nothing
│ %5 = Base.not_int(%4)
└── goto #4 if not %5
2 ┄ %7 = @_3
│ i = Core.getfield(%7, 1)
│ %9 = Core.getfield(%7, 2)
│ acc = acc + 1
│ @_3 = Base.iterate(%2, %9)
│ %12 = @_3 === nothing
│ %13 = Base.not_int(%12)
└── goto #4 if not %13
3 ─ goto #2
4 ┄ return acc
)
CodeInfo( 1 ─ (acc = 0)::Const(0) │ %2 = (1:N)::PartialStruct(UnitRange{Int64}, Any[Const(1), Int64]) │ (@_3 = Base.iterate(%2))::Union{Nothing, Tuple{Int64, Int64}} │ %4 = (@_3 === nothing)::Bool │ %5 = Base.not_int(%4)::Bool └── goto #4 if not %5 2 ┄ %7 = @_3::Tuple{Int64, Int64} │ (i = Core.getfield(%7, 1))::Int64 │ %9 = Core.getfield(%7, 2)::Int64 │ (acc = acc + 1)::Int64 │ (@_3 = Base.iterate(%2, %9))::Union{Nothing, Tuple{Int64, Int64}} │ %12 = (@_3 === nothing)::Bool │ %13 = Base.not_int(%12)::Bool └── goto #4 if not %13 3 ─ goto #2 4 ┄ return acc )
Int64
CodeInfo( 1 ── %1 = Base.sle_int(1, N)::Bool └─── goto #3 if not %1 2 ── goto #4 3 ── goto #4 4 ┄─ %5 = φ (#2 => N, #3 => 0)::Int64 └─── goto #5 5 ── goto #6 6 ── %8 = Base.slt_int(%5, 1)::Bool └─── goto #8 if not %8 7 ── goto #9 8 ── goto #9 9 ┄─ %12 = φ (#7 => true, #8 => false)::Bool │ %13 = φ (#8 => 1)::Int64 │ %14 = Base.not_int(%12)::Bool └─── goto #15 if not %14 10 ┄ %16 = φ (#9 => %13, #14 => %24)::Int64 │ %17 = φ (#9 => 0, #14 => %18)::Int64 │ %18 = Base.add_int(%17, 1)::Int64 │ %19 = (%16 === %5)::Bool └─── goto #12 if not %19 11 ─ goto #13 12 ─ %22 = Base.add_int(%16, 1)::Int64 └─── goto #13 13 ┄ %24 = φ (#12 => %22)::Int64 │ %25 = φ (#11 => true, #12 => false)::Bool │ %26 = Base.not_int(%25)::Bool └─── goto #15 if not %26 14 ─ goto #10 15 ┄ %29 = φ (#13 => %18, #9 => 0)::Int64 └─── return %29 )
Int64