imalloc - High-Performance Lock-Free Memory Allocator

A high-performance, thread-safe memory allocator featuring segregated free lists and lock-free operations. Built from scratch to understand low-level memory management and demonstrate systems programming expertise.

🚀 Performance Highlights

Scenario	imalloc	system malloc	Advantage
Small allocations (32B)	9.06 ns	9.47 ns	4% faster
Medium allocations (512B)	7.82 ns	9.41 ns	17% faster
Bulk allocations (10k blocks)	111 μs	732 μs	85% faster
Fragmented workloads	11.3 μs	22.8 μs	50% faster

Latest benchmarks on 16-core 3GHz CPU

🎯 When to Use imalloc vs malloc

✅ imalloc Excels At:

High-frequency small/medium allocations (16B-8KB)
Bulk allocation patterns (85% faster than malloc)
Fragmentation-heavy workloads (50% faster)

⚠️ malloc Still Better For:

Very large allocations (>8KB) - imalloc uses direct mmap (slower)
Lower memory constraints - imalloc has a lot of internal fragmentation, wasting memory

🔧 Sweet Spot Applications:

Game engines - frequent small object allocation
Web servers - request/response object pools
Data processing - temporary buffer management

✨ Key Features

Lock-free operations using atomic compare-and-swap for thread safety
Segregated free lists with optimized bucket sizes (16B-12KB)
mmap-based heap management for efficient memory usage
Complete malloc API - drop-in replacement with imalloc, ifree, icalloc, irealloc
Comprehensive test suite with 18 unit tests and performance benchmarks

🏗️ Architecture

imalloc uses a multi-step bucket system inspired by jemalloc:

Bucket Sizes:
├── Tier 1: 16-128 bytes (16B increments) - Small objects
├── Tier 2: 128-512 bytes (32B increments) - Medium objects  
└── Tier 3: 512-12KB (128B increments) - Large objects

Core Components:

Lock-free free lists - Atomic operations for thread-safe bucket management
Arena-based allocation - Efficient memory regions with mmap/munmap
Boundary tag system - Block headers for fast coalescing
Size-class optimization - Minimizes internal fragmentation

📦 Quick Start

Prerequisites

C++20 compatible compiler (GCC 10+, Clang 12+)
CMake 3.14+
Linux/Unix system (uses mmap/munmap)

Optional Development Tools:

clang-tidy (for static analysis and code quality checks)

Build

git clone https://github.com/username/imalloc.git
cd imalloc
mkdir build && cd build
cmake ..
make

Build with Static Analysis

# Enable clang-tidy during configuration
cmake .. -DENABLE_CLANG_TIDY=ON
make

# Run static analysis
make tidy

Test

# Run all tests
./test_imalloc

# Run only benchmarks
./test_imalloc --gtest_filter="*Benchmark*"

💻 Usage Examples

Basic Usage

#include "imalloc.h"

// Drop-in replacement for malloc/free
void* ptr = imalloc(1024);
memset(ptr, 0, 1024);
ifree(ptr);

// Zeroed allocation
void* clean_mem = icalloc(100, sizeof(int));
ifree(clean_mem);

// Reallocation
void* small = imalloc(64);
void* large = irealloc(small, 1024); // Efficiently resized
ifree(large);

Integration Example

// Custom allocator for containers
template<typename T>
class IMallocAllocator {
public:
    T* allocate(size_t n) {
        return static_cast<T*>(imalloc(n * sizeof(T)));
    }
    
    void deallocate(T* ptr, size_t) {
        ifree(ptr);
    }
};

// Use with STL containers
std::vector<int, IMallocAllocator<int>> fast_vector;

🧪 Technical Implementation Details

Lock-Free Algorithm Design

Free List Operations:

// Lock-free push using atomic compare-and-swap
do {
    old_head = free_lists[bucket].load(memory_order_acquire);
    block->next_free = old_head;
} while (!free_lists[bucket].compare_exchange_weak(
    old_head, block, memory_order_release, memory_order_relaxed));

Key Design Decisions:

Memory ordering: acquire for loads, release for stores ensures proper synchronization
ABA prevention: Using pointers instead of indices eliminates ABA problem in most cases
Retry loops: CAS failures trigger automatic retry without blocking other threads

Bucket Sizing Mathematics

Three-Tier System Design:

Tier 1 (16-128B):   16B steps  → 8 buckets  → Small objects
Tier 2 (128-512B):  32B steps  → 12 buckets → Medium objects  
Tier 3 (512-12KB):  128B steps → 92 buckets → Large objects
Total: 112 buckets

Size-to-Bucket Mapping:

size_t get_bucket_index(size_t size) {
    if (size <= 128)  return (size - 16 + 15) / 16;        // O(1) division
    if (size <= 512)  return 8 + (size - 128 + 31) / 32;  // Constant offset  
    return 20 + (size - 512 + 127) / 128;                 // Minimal buckets
}

This design helps to lessen the amount of internal fragmentation at smaller sizes while keeping bucket count and initial memory usage manageable.

Memory Layout & Alignment

Block Structure:

[BlockHeader: 16B][Payload: N bytes][Padding: align to 16B]
├─ size: 8B           
├─ next_free: 8B (pointer to next free block or LONE_BLOCK_POINTER)
└─ Payload aligned to max_align_t (16B)

Arena Organization:

[ArenaHeader: 24B][Block₁][Block₂]...[BlockN]
├─ size: 8B     
├─ prev: 8B     
└─ next: 8B

All arenas are page-aligned (4KB) for optimal mmap performance.

Thread Safety Implementation

Lock-Free Components:

Free list operations: Pure atomic CAS-based, no blocking
Bucket initialization: One-time atomic flag with spin-wait
Memory allocation: Fully concurrent for different size classes

Mutex-Protected Components:

Arena list management: Complex doubly-linked list operations
Large allocation tracking: Infrequent but requires consistency

Memory Ordering Guarantees:

// Load with acquire semantics
head = free_lists[bucket].load(memory_order_acquire);

// Store with release semantics  
arena_list.store(new_arena, memory_order_release);

This ensures all memory operations before a release are visible after an acquire.

Performance Engineering

Cache Optimization:

Sequential block layout: New allocations come from the same cache line
Minimal metadata: 16B header
Hot path optimization: Common allocations avoid mmap syscalls as they can be placed on the preallocated arenas

Memory Management Strategy:

Lazy expansion: Only create new arenas when current ones are exhausted
Bulk initialization: Pre-allocate entire arena's worth of blocks

Size Class Design Rationale:

Small (16-128B):  Most common, tight packing crucial
Medium (128-512B): Object-oriented allocations  
Large (512KB+):   Are commonly array allocations, less frequent

🛠️ Development

Running Tests

# All tests (unit + benchmarks)
./test_imalloc

# Unit tests only  
./test_imalloc --gtest_filter="-*Benchmark*"

# Specific test category
./test_imalloc --gtest_filter="IMallocTest.*"

Static Analysis

The project includes comprehensive static analysis using clang-tidy:

# Configure with static analysis enabled
cmake .. -DENABLE_CLANG_TIDY=ON

# Run clang-tidy analysis
make tidy

Enabled Checks:

readability-* - Code readability and style
performance-* - Performance optimization opportunities
modernize-* - Modern C++ best practices
bugprone-* - Potential bug detection
clang-analyzer-* - Deep static analysis

The codebase maintains high quality standards with zero critical static analysis warnings.

🎯 Design Goals

Achieved:

✅ Constant-time bucket lookup
✅ Thread-safe with efficient use of locks
✅ High performance compared system malloc

Technical Challenges Solved:

Lock-free data structures with ABA prevention
Memory coalescing algorithm optimization
Cache-efficient bucket organization
Cross-platform mmap abstraction

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

imalloc - High-Performance Lock-Free Memory Allocator

🚀 Performance Highlights

🎯 When to Use imalloc vs malloc

✅ imalloc Excels At:

⚠️ malloc Still Better For:

🔧 Sweet Spot Applications:

✨ Key Features

🏗️ Architecture

📦 Quick Start

Prerequisites

Build

Build with Static Analysis

Test

💻 Usage Examples

Basic Usage

Integration Example

🧪 Technical Implementation Details

Lock-Free Algorithm Design

Bucket Sizing Mathematics

Memory Layout & Alignment

Thread Safety Implementation

Performance Engineering

🛠️ Development

Running Tests

Static Analysis

🎯 Design Goals

About

Uh oh!

Releases

Packages

Languages

License

IkechiNk/imalloc

Folders and files

Latest commit

History

Repository files navigation

imalloc - High-Performance Lock-Free Memory Allocator

🚀 Performance Highlights

🎯 When to Use imalloc vs malloc

✅ imalloc Excels At:

⚠️ malloc Still Better For:

🔧 Sweet Spot Applications:

✨ Key Features

🏗️ Architecture

📦 Quick Start

Prerequisites

Build

Build with Static Analysis

Test

💻 Usage Examples

Basic Usage

Integration Example

🧪 Technical Implementation Details

Lock-Free Algorithm Design

Bucket Sizing Mathematics

Memory Layout & Alignment

Thread Safety Implementation

Performance Engineering

🛠️ Development

Running Tests

Static Analysis

🎯 Design Goals

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages