
Lightweight CPU Inference Engine for Large Language Models
Run powerful LLMs on any CPU with zero dependencies. A single-file C99 implementation that brings AI capabilities to standard hardware.
LM.C is a research project by NileAGI
Built for accessibility, efficiency, and maximum portability
Single-file C99 implementation runs anywhere without external libraries
Supports all GGML formats from F32 to IQ1_M for maximum efficiency
Designed specifically for CPU inference with minimal memory footprint
Works on any system with a C compiler - no GPU required
A streamlined pipeline from model loading to text generation
Robust, optimized components working together seamlessly
Handles all GGUF metadata types and quantization formats with zero dependencies
Supports 30+ GGML quantization formats from F32 to IQ1_M
Optimized transformer execution with minimal memory footprint
Single-file C99 implementation runs anywhere
From input text to generated output - a streamlined inference workflow
Efficient storage and loading format for large language models
struct gguf_header_t {
uint32_t magic; // "GGUF"
uint32_t version; // Format version
uint64_t tensor_count; // Number of tensors
uint64_t metadata_kv_count;
gguf_metadata_kv_t metadata_kv[];
};Optimized techniques for minimal memory footprint
GGUF Parser
Quantization
Tensor Mapping
Activation Buffers
KV Cache
Token Buffers
SIMD Registers
Thread Pools
Ongoing development and planned features
GGUF File Loader: Complete with metadata extraction
Tensor Data Mapping: Memory-mapped tensor access
Quantization Kernels: All 30+ GGML formats
Transformer Layers: CPU-optimized implementation
Tokenization: Byte-pair encoding support
Sampling: Temperature-based token selection
SIMD Optimization: AVX2/NEON acceleration
Thread Parallelism: Multi-core support
Interactive Mode: Chat interface
CPU-specific enhancements for maximum efficiency
Process quantized weights directly without full dequantization
Optimized cache utilization for better memory access patterns
Zero-copy weight access for reduced memory overhead
Layer-wise execution across multiple CPU cores
Dive into the code, contribute to the project, or learn more about how lm.c is pushing the boundaries of accessible AI.
A research project by NileAGI