lm.c Logo

Lightweight CPU Inference Engine for Large Language Models

Run powerful LLMs on any CPU with zero dependencies. A single-file C99 implementation that brings AI capabilities to standard hardware.

LM.C is a research project by NileAGI

Why lm.c?

Built for accessibility, efficiency, and maximum portability

Zero Dependencies

Single-file C99 implementation runs anywhere without external libraries

30+ Quantization Formats

Supports all GGML formats from F32 to IQ1_M for maximum efficiency

CPU Optimized

Designed specifically for CPU inference with minimal memory footprint

Portable & Lightweight

Works on any system with a C compiler - no GPU required

System Architecture

A streamlined pipeline from model loading to text generation

GGUF File Loading
Header & Metadata Parsing
Tensor Info Loading
Quantization Handling
Transformer Execution
Token Generation
Text Output

Core Components

Robust, optimized components working together seamlessly

GGUF Parser

Handles all GGUF metadata types and quantization formats with zero dependencies

Quantization Engine

Supports 30+ GGML quantization formats from F32 to IQ1_M

CPU Inference

Optimized transformer execution with minimal memory footprint

Portable Runtime

Single-file C99 implementation runs anywhere

How It Works

From input text to generated output - a streamlined inference workflow

Input Text
Tokenization
Embedding Lookup
Transformer Layers
Layer Norm
Attention
FFN
Residual Add
Final Norm
Output Projection
Sampling
Generated Text

GGUF File Structure

Efficient storage and loading format for large language models

struct gguf_header_t {
    uint32_t magic;          // "GGUF"
    uint32_t version;         // Format version
    uint64_t tensor_count;    // Number of tensors
    uint64_t metadata_kv_count;
    gguf_metadata_kv_t metadata_kv[];
};

Memory Efficient Design

Optimized techniques for minimal memory footprint

GGUF Parser

Quantization

Tensor Mapping

Activation Buffers

KV Cache

Token Buffers

SIMD Registers

Thread Pools

Development Roadmap

Ongoing development and planned features

GGUF File Loader: Complete with metadata extraction

Tensor Data Mapping: Memory-mapped tensor access

Quantization Kernels: All 30+ GGML formats

Transformer Layers: CPU-optimized implementation

Tokenization: Byte-pair encoding support

Sampling: Temperature-based token selection

SIMD Optimization: AVX2/NEON acceleration

Thread Parallelism: Multi-core support

Interactive Mode: Chat interface

Performance Optimizations

CPU-specific enhancements for maximum efficiency

Quantization Aware Ops

Process quantized weights directly without full dequantization

Block Processing

Optimized cache utilization for better memory access patterns

Memory Mapping

Zero-copy weight access for reduced memory overhead

Thread Parallelism

Layer-wise execution across multiple CPU cores

Ready to explore lm.c?

Dive into the code, contribute to the project, or learn more about how lm.c is pushing the boundaries of accessible AI.

A research project by NileAGI