LLVM and performance
What happens when you compile?
In Parts 1 and 2, you wrote Nyx code, compiled it, and ran it. But what actually happens between your .nx file and the running program? Understanding this pipeline helps you write faster code and debug performance issues.
The compilation pipeline
When you run make run FILE=program.nx, four things happen:
program.nx → Nyx Compiler → program.ll → Clang/LLVM → program (binary)
(Nyx) (parser + (LLVM IR) (optimizer + (native
codegen) assembler) machine code)
- Nyx Compiler reads your
.nxfile, parses it into an AST (Abstract Syntax Tree), checks types, and generates LLVM IR — an intermediate representation. - LLVM takes the IR, optimizes it aggressively, and generates native machine code for your CPU.
The result is a standalone binary — no interpreter, no VM, no runtime overhead. Just machine instructions.
What is LLVM IR?
LLVM IR is a low-level language that sits between Nyx and machine code. It looks like assembly but is portable across CPU architectures. Here is what a simple function looks like:
fn add(a: int, b: int) -> int { return a + b }
Becomes this LLVM IR:
define i64 @add(i64 %a, i64 %b) {
entry:
%result = add i64 %a, %b
ret i64 %result
}
You do not need to write IR — the Nyx compiler generates it. But seeing it helps you understand what your code becomes.
Viewing the IR
To see the IR for any Nyx program:
make compile FILE=program.nx cat program.ll
This is useful for understanding performance. If a function generates hundreds of IR instructions, it might be worth simplifying.
How LLVM optimizes your code
LLVM applies dozens of optimization passes. Here are the most impactful:
Dead code elimination
fn main() { let x: int = 42 // computed but never used let y: int = 10 print(y) // only y matters }
LLVM removes x entirely. The compiled program never computes 42.
Constant folding
fn main() { let result: int = 3 * 4 + 5 print(result) // 17 }
LLVM computes 17 at compile time. The binary just loads the constant.
Loop optimization
fn sum_to(n: int) -> int { var total: int = 0 var i: int = 0 while i < n { total += i i += 1 } return total }
LLVM can unroll small loops, vectorize operations, and sometimes replace the entire loop with a closed-form formula.
Inlining
Small functions get inlined — their code is copied directly into the caller, eliminating function call overhead:
fn square(x: int) -> int { return x * x } fn main() { let n: int = square(5) // becomes: let n: int = 5 * 5 }
Nyx performance characteristics
Because Nyx compiles to native code via LLVM, it achieves performance comparable to C:
| Benchmark | Nyx | C | Ratio |
|---|---|---|---|
| fibonacci(40) | 166ms | 190ms | 0.87x (Nyx wins) |
| primes(100K) | 3.6ms | 3.6ms | 1.0x |
| loop(100M) | 0μs | 0μs | both optimized away |
| map(100K ops) | 24.6ms | 23ms | 1.07x |
| HTTP requests/s | 73,863 | — | competitive with Go |
Writing fast Nyx code
Prefer integers over strings
Integer operations are single CPU instructions. String operations allocate memory and copy bytes:
// Fast — integer comparison if status == 200 { ... } // Slower — string comparison (compares byte by byte) if status_text == "OK" { ... }
Minimize allocations in hot loops
Every time you create a string, array, or struct, the garbage collector must track it. In a tight loop, this adds up:
// Slow — creates a new string every iteration var i: int = 0 while i < 1000000 { let s: String = "hello" + int_to_string(i) // allocation! i += 1 } // Fast — avoid unnecessary allocations var i: int = 0 var total: int = 0 while i < 1000000 { total += i // no allocation i += 1 }
Use StringBuilder for string building
When building strings in a loop, use StringBuilder instead of concatenation:
// Slow — O(n²) because each + creates a new string var result: String = "" var i: int = 0 while i < 10000 { result = result + "x" i += 1 } // Fast — O(n) with StringBuilder var sb: StringBuilder = sb_new() var i: int = 0 while i < 10000 { sb_append(sb, "x") i += 1 } let result: String = sb_to_string(sb)
Cache repeated computations
// Slow — calls length() every iteration while i < arr.length() { // ... i += 1 } // Fast — cache the length let len: int = arr.length() while i < len { // ... i += 1 }
The garbage collector
Nyx uses the Boehm GC — a conservative garbage collector. It automatically frees memory you no longer use. You never need to call free() (unless you opt into unsafe manual memory management).
The GC runs periodically. It scans memory to find objects that are no longer referenced and reclaims them. This introduces small pauses, but Nyx tunes the GC for low latency:
- Incremental collection reduces pause times
- Collection frequency is reduced for server workloads
- Most allocations are short-lived and collected quickly
For the vast majority of programs, the GC is invisible. For extreme performance needs (real-time systems, game engines), Nyx offers a no-GC mode:
make run-no-gc FILE=program.nx
In no-GC mode, you must manage memory manually with alloc() and free().
Self-hosting: the ultimate benchmark
The Nyx compiler is written in Nyx and compiles itself. This is called self-hosting — and it is the ultimate performance test. If the compiler were slow, compiling itself would be painfully slow.
The compilation pipeline achieves a fixed point: compiling the compiler twice produces identical output. This proves both correctness and consistency.
Exercises
- Write a program that sums numbers 1 to 10,000,000. Compile it with
make compileand look at the generated.llfile. Can you find the loop in the IR?
- Write two versions of string building: one with
+concatenation in a loop, another withStringBuilder. Time both with a large iteration count.
- Write a function that computes fibonacci(40). Compare the time with the same algorithm in Python or another interpreted language.
- Look at the IR for a simple function with an
ifstatement. Can you identify the branch instructions?
- Write a program that creates 1 million structs in a loop. Then write one that creates 1 million integers. Compare the performance difference.
Summary
- Nyx compiles to LLVM IR, then LLVM generates optimized native machine code.
- Use
make compileto see the generated IR. - LLVM optimizes: dead code elimination, constant folding, inlining, loop optimization.
- Nyx achieves C-like performance on most benchmarks.
- Write fast code: prefer integers, minimize allocations, use StringBuilder, cache values.
- The Boehm GC handles memory automatically with low-latency pauses.
- No-GC mode is available for manual memory management.
- The Nyx compiler is self-hosting — it compiles itself.
Next chapter: FFI — Calling C code →