·Systems Programming · Memory · OS

Memory Management in Modern Systems: From Stack to Heap

How operating systems and runtimes manage memory — virtual memory, allocation, and garbage collection.

Memory management is one of those topics that separates engineers who use systems from engineers who understand them. Whether you're writing Go, C, Rust, or JavaScript, the runtime is constantly making decisions about where to put your data and when to reclaim it.

Every process operates in a virtual address space — an abstraction provided by the OS and hardware (the MMU, Memory Management Unit). On a 64-bit system, each process gets a 48-bit address space (256 TB), though physical RAM is obviously much smaller. The page table maps virtual addresses to physical frames, and the TLB (Translation Lookaside Buffer) caches recent mappings to avoid a page table walk on every memory access.

The stack is the simplest form of memory management. Each function call pushes a frame onto the stack containing local variables, the return address, and saved registers. When the function returns, the frame is popped. Stack allocation is essentially free — it's just a pointer increment. The stack is typically 1-8 MB per thread and grows downward in memory. Stack overflow occurs when recursion or large local arrays exceed this limit.

The heap is where dynamically allocated memory lives. In C, malloc() requests memory from the heap; in Go, new() or make() may allocate on the heap (or the stack, if escape analysis determines the value doesn't outlive the function). Heap allocation is expensive — the allocator must find a suitable free block, potentially splitting or coalescing blocks to reduce fragmentation.

Modern allocators like tcmalloc (Google), jemalloc (Facebook/Meta), and mimalloc (Microsoft) use thread-local caches and size-class segregation to reduce contention and fragmentation. Go's memory allocator is based on tcmalloc and uses a hierarchy: per-P cache → central list → heap. Small objects (≤32KB) are allocated from size-classed spans, while large objects get their own spans directly.

Garbage collection automates memory reclamation. Go uses a concurrent, tri-color mark-and-sweep collector. It traverses the object graph starting from roots (stack variables, globals), marking reachable objects as live, then sweeps unreachable objects. The GC runs concurrently with mutator threads, with brief stop-the-world pauses (typically under 1ms in Go 1.22+) for stack scanning and write barrier activation.

Java's G1 (Garbage-First) collector divides the heap into regions and collects the regions with the most garbage first (hence the name). ZGC and Shenandoah push further — sub-millisecond pauses regardless of heap size, using colored pointers and load barriers to relocate objects concurrently.

Rust takes a different approach entirely: ownership and borrowing. Every value has exactly one owner, and when the owner goes out of scope, the value is dropped (its memory is freed). The borrow checker enforces at compile time that references never outlive their referents and that mutable and immutable references don't coexist. This eliminates use-after-free, double-free, and data races without any runtime overhead. The cost is a steeper learning curve — the borrow checker rejects many programs that would be safe in practice.

Virtual memory also enables memory-mapped files (mmap), copy-on-write (COW) for fork(), and overcommit (allocating more virtual memory than physical RAM exists). Understanding these mechanisms is essential for building high-performance systems. A database engine that memory-maps its data files gets the OS's page cache for free. A container runtime uses COW to share base image layers efficiently.

The practical takeaway: profile your memory usage. Tools like pprof (Go), Valgrind (C/C++), and async-profiler (Java) reveal allocation hotspots, GC pressure, and memory leaks. In my experience, memory-related performance issues are the most common and the most impactful to fix.

copyright.text