LLVM and performance

What happens when you compile?

In Parts 1 and 2, you wrote Nyx code, compiled it, and ran it. But what actually happens between your .nx file and the running program? Understanding this pipeline helps you write faster code and debug performance issues.

The compilation pipeline

When you run make run FILE=program.nx, four things happen:

program.nx  →  Nyx Compiler  →  program.ll  →  Clang/LLVM  →  program (binary)
   (Nyx)         (parser +         (LLVM IR)      (optimizer +      (native
                  codegen)                          assembler)        machine code)

Nyx Compiler reads your .nx file, parses it into an AST (Abstract Syntax Tree), checks types, and generates LLVM IR — an intermediate representation.
LLVM takes the IR, optimizes it aggressively, and generates native machine code for your CPU.

The result is a standalone binary — no interpreter, no VM, no runtime overhead. Just machine instructions.

What is LLVM IR?

LLVM IR is a low-level language that sits between Nyx and machine code. It looks like assembly but is portable across CPU architectures. Here is what a simple function looks like:

fn add(a: int, b: int) -> int {
    return a + b
}

Becomes this LLVM IR:

define i64 @add(i64 %a, i64 %b) {
entry:
    %result = add i64 %a, %b
    ret i64 %result
}

You do not need to write IR — the Nyx compiler generates it. But seeing it helps you understand what your code becomes.

Viewing the IR

To see the IR for any Nyx program:

make compile FILE=program.nx
cat program.ll

This is useful for understanding performance. If a function generates hundreds of IR instructions, it might be worth simplifying.

How LLVM optimizes your code

LLVM applies dozens of optimization passes. Here are the most impactful:

Dead code elimination

fn main() {
    let x: int = 42    // computed but never used
    let y: int = 10
    print(y)            // only y matters
}

LLVM removes x entirely. The compiled program never computes 42.

Constant folding

fn main() {
    let result: int = 3 * 4 + 5
    print(result)    // 17
}

LLVM computes 17 at compile time. The binary just loads the constant.

Loop optimization

fn sum_to(n: int) -> int {
    var total: int = 0
    var i: int = 0
    while i < n {
        total += i
        i += 1
    }
    return total
}

LLVM can unroll small loops, vectorize operations, and sometimes replace the entire loop with a closed-form formula.

Inlining

Small functions get inlined — their code is copied directly into the caller, eliminating function call overhead:

fn square(x: int) -> int { return x * x }

fn main() {
    let n: int = square(5)    // becomes: let n: int = 5 * 5
}

Nyx performance characteristics

Because Nyx compiles to native code via LLVM, it achieves performance comparable to C:

Benchmark	Nyx	C	Ratio
fibonacci(40)	166ms	190ms	0.87x (Nyx wins)
primes(100K)	3.6ms	3.6ms	1.0x
loop(100M)	0μs	0μs	both optimized away
map(100K ops)	24.6ms	23ms	1.07x
HTTP requests/s	73,863	—	competitive with Go

Writing fast Nyx code

Prefer integers over strings

Integer operations are single CPU instructions. String operations allocate memory and copy bytes:

// Fast — integer comparison
if status == 200 { ... }

// Slower — string comparison (compares byte by byte)
if status_text == "OK" { ... }

Minimize allocations in hot loops

Every time you create a string, array, or struct, the garbage collector must track it. In a tight loop, this adds up:

// Slow — creates a new string every iteration
var i: int = 0
while i < 1000000 {
    let s: String = "hello" + int_to_string(i)    // allocation!
    i += 1
}

// Fast — avoid unnecessary allocations
var i: int = 0
var total: int = 0
while i < 1000000 {
    total += i    // no allocation
    i += 1
}

Use StringBuilder for string building

When building strings in a loop, use StringBuilder instead of concatenation:

// Slow — O(n²) because each + creates a new string
var result: String = ""
var i: int = 0
while i < 10000 {
    result = result + "x"
    i += 1
}

// Fast — O(n) with StringBuilder
var sb: StringBuilder = sb_new()
var i: int = 0
while i < 10000 {
    sb_append(sb, "x")
    i += 1
}
let result: String = sb_to_string(sb)

Cache repeated computations

// Slow — calls length() every iteration
while i < arr.length() {
    // ...
    i += 1
}

// Fast — cache the length
let len: int = arr.length()
while i < len {
    // ...
    i += 1
}

The garbage collector

Nyx uses the Boehm GC — a conservative garbage collector. It automatically frees memory you no longer use. You never need to call free() (unless you opt into unsafe manual memory management).

The GC runs periodically. It scans memory to find objects that are no longer referenced and reclaims them. This introduces small pauses, but Nyx tunes the GC for low latency:

Incremental collection reduces pause times
Collection frequency is reduced for server workloads
Most allocations are short-lived and collected quickly

For the vast majority of programs, the GC is invisible. For extreme performance needs (real-time systems, game engines), Nyx offers a no-GC mode:

make run-no-gc FILE=program.nx

In no-GC mode, you must manage memory manually with alloc() and free().

Self-hosting: the ultimate benchmark

The Nyx compiler is written in Nyx and compiles itself. This is called self-hosting — and it is the ultimate performance test. If the compiler were slow, compiling itself would be painfully slow.

The compilation pipeline achieves a fixed point: compiling the compiler twice produces identical output. This proves both correctness and consistency.

Exercises

Write a program that sums numbers 1 to 10,000,000. Compile it with make compile and look at the generated .ll file. Can you find the loop in the IR?

Write two versions of string building: one with + concatenation in a loop, another with StringBuilder. Time both with a large iteration count.

Write a function that computes fibonacci(40). Compare the time with the same algorithm in Python or another interpreted language.

Look at the IR for a simple function with an if statement. Can you identify the branch instructions?

Write a program that creates 1 million structs in a loop. Then write one that creates 1 million integers. Compare the performance difference.

Summary

Nyx compiles to LLVM IR, then LLVM generates optimized native machine code.
Use make compile to see the generated IR.
LLVM optimizes: dead code elimination, constant folding, inlining, loop optimization.
Nyx achieves C-like performance on most benchmarks.
Write fast code: prefer integers, minimize allocations, use StringBuilder, cache values.
The Boehm GC handles memory automatically with low-latency pauses.
No-GC mode is available for manual memory management.
The Nyx compiler is self-hosting — it compiles itself.

Next chapter: FFI — Calling C code →

← Previous: Your second project — A web server Next: FFI — Calling C code →