A Step-by-Step Guide to Creating Your Own ARM Operating System

1. Introduction

Operating systems are complex pieces of software that manage hardware resources and provide services for applications. Building one from scratch is an excellent way to understand how computers work at a fundamental level. In this comprehensive guide, I’ll walk you through building MeringueOS, a simple but educational operating system for the ARM AArch64 architecture.

There are three ways you can go about building MereingueOS from this guide:

Pull/Clone the repo and analyze the full source code and components yourself. You can even run it locally and potentially extend its capabilities.
Ask an AI Agent to use this resource and ask it to build it piece by piece and ask it questions as you go. (I have included troubleshooting prompts that should allow a tool like Codex, GeminiCLI, OpenCode or even Claude Code to get this done autonomously).
You can do it yourself and grind it out. I’ve tried to make this as clear as possible, so feel free to consult your favorite LLM or google about what feels unclear or any knowledge gaps you may have. I will also share any relevant resources/readings you can read as you go.

There’s alternate ways to learn so there’s no shame in whatever works for you. This topic, after all, is not child’s play.

The Basics: Operating System Architecture

An Operating System is the software layer that sits between your applications and the computer’s hardware. When you run a program, it doesn’t directly control the CPU, memory, or devices - instead, it asks the OS to do these things on its behalf. The OS manages processes (running programs), allocates memory, handles files and networking, controls input/output devices, and provides the user interface you interact with. The breakdown of the architecture, looks like this:

For the OS we’ll be building, it’s called MeringueOS and it runs on the QEMU virt platform, which simulates ARM hardware. By the end of this guide, you’ll have a functional kernel with memory management, exception handling, a basic shell, and standard library functions. More importantly, you’ll understand how these components work and interact with each other.

Prerequisites

To follow along, you should have:

Basic knowledge of C programming
Some familiarity with assembly language (helpful but not required)
Understanding of computer architecture fundamentals
Linux environment (preferred) or macOS/Windows with appropriate tools

While OS development is complex, I’ll break down concepts into manageable parts and provide the necessary background information as we go.

Why Build an OS from Scratch?

Building an operating system teaches you:

How hardware and software interact at a deeper level
Memory management principles
Concurrency and synchronization
Resource allocation and scheduling
System design and architecture

Plus, there’s the satisfaction of creating something fundamental from nothing and watching it come to life. In a previous life, I did a lot of vulnerability research on ARM based devices and this involved a lot of Reverse Engineering. Even though I don’t do this a lot any more, deep down, I always wanted to build an ARM based OS from scratch to further my knowledge - that’s why this whole project exists.

Brief Introduction to ARM AArch64

ARM AArch64 is the 64-bit execution state of the ARMv8 architecture. Some key points:

31 general-purpose 64-bit registers (X0-X30)
Dedicated stack pointer (SP)
Exception levels (EL0-EL3) for privilege separation
Vector-based exception handling
Memory Management Unit (MMU) for virtual memory

MeringueOS runs at EL1, the kernel privilege level, equivalent to kernel mode in x86 systems.

Under the Hood: ARM’s Exception Levels

1
┌───────────────────────────────────────────────────┐
2
│ EL3: Secure Monitor                               │ Highest Privilege
3
│     (Secure world management, trusted firmware)   │
4
├───────────────────────────────────────────────────┤
5
│ EL2: Hypervisor                                   │
6
│     (Virtual machine management)                  │
7
├───────────────────────────────────────────────────┤
8
│ EL1: OS Kernel                                    │ ← MeringueOS runs here
9
│     (Privileged operations, hardware access)      │
10
├───────────────────────────────────────────────────┤
11
│ EL0: Applications                                 │ Lowest Privilege
12
│    (Unprivileged code, restricted hardware access)│
13
└───────────────────────────────────────────────────┘

2. Setting Up the Development Environment

Before coding, let’s set up a development environment.

Required Tools

You’ll need:

An ARM AArch64 cross-compiler (aarch64-linux-gnu-gcc)
QEMU for ARM emulation (qemu-system-aarch64)
Make for build automation
Git (optional, for version control)

On Debian/Ubuntu, install these tools with:

1
sudo apt update
2
sudo apt install gcc-aarch64-linux-gnu qemu-system-arm make git

On macOS with Homebrew:

1
brew install aarch64-elf-gcc qemu make

Troubleshooting Environment Setup

Common issues and their solutions:

Missing cross-compiler errors:

Verify installation with aarch64-linux-gnu-gcc --version
Check PATH environment variable
On some systems, the package name may be different (e.g., gcc-aarch64-linux-gnu)

QEMU command not found:

Install with package manager or from source
Verify installation with qemu-system-aarch64 --version

Project Structure

Let’s create our project directory structure:

1
MeringueOS/
2
├── build/        # Compiled object files and binaries
3
├── src/
4
│   ├── boot/     # Boot code and kernel entry
5
│   ├── exceptions/ # Exception handling
6
│   ├── include/  # Header files
7
│   ├── lib/      # Standard library implementations
8
│   ├── memory/   # Memory management
9
│   ├── shell/    # Shell interface
10
│   └── ui/       # Text User Interface (TUI)
11
└── test/         # Test files

Understanding the Makefile

The Makefile automates the build process. Here’s MeringueOS’s Makefile:

1
# Arm-OS Makefile
2
# Target architecture
3
ARCH = aarch64
4

5
# Cross compiler settings
6
# Note: On macOS or some systems, you may need to use aarch64-elf- instead
7
CROSS_COMPILE = aarch64-linux-gnu-
8
CC = $(CROSS_COMPILE)gcc
9
AS = $(CROSS_COMPILE)as
10
LD = $(CROSS_COMPILE)ld
11
OBJCOPY = $(CROSS_COMPILE)objcopy
12
OBJDUMP = $(CROSS_COMPILE)objdump
13

14
# Compiler flags
15
CFLAGS = -Wall -Wextra -ffreestanding -nostdlib -nostartfiles -mcpu=cortex-a72 -I./src/include
16
ASFLAGS = -mcpu=cortex-a72
17
LDFLAGS = -nostdlib
18

19
# Source directories
20
SRC_DIR = src
21
BOOT_DIR = $(SRC_DIR)/boot
22
MEMORY_DIR = $(SRC_DIR)/memory
23
EXCEPTIONS_DIR = $(SRC_DIR)/exceptions
24
UI_DIR = $(SRC_DIR)/ui
25
LIB_DIR = $(SRC_DIR)/lib
26
SHELL_DIR = $(SRC_DIR)/shell
27
INCLUDE_DIR = $(SRC_DIR)/include
28

29
# Build directories
30
BUILD_DIR = build
31
OBJ_DIR = $(BUILD_DIR)/obj
32

33
# Source files
34
ASM_SRCS = $(wildcard $(BOOT_DIR)/*.S) $(wildcard $(EXCEPTIONS_DIR)/*.S)
35
C_SRCS = $(wildcard $(BOOT_DIR)/*.c) \
36
    $(wildcard $(MEMORY_DIR)/*.c) \
37
    $(wildcard $(EXCEPTIONS_DIR)/*.c) \
38
    $(wildcard $(UI_DIR)/*.c) \
39
    $(wildcard $(LIB_DIR)/*.c) \
40
    $(wildcard $(SHELL_DIR)/*.c)
41

42
# Object files
43
ASM_OBJS = $(patsubst $(SRC_DIR)/%.S, $(OBJ_DIR)/%.o, $(ASM_SRCS))
44
C_OBJS = $(patsubst $(SRC_DIR)/%.c, $(OBJ_DIR)/%.o, $(C_SRCS))
45
OBJS = $(ASM_OBJS) $(C_OBJS)
46

47
# Output files
48
KERNEL = $(BUILD_DIR)/kernel8.elf
49
KERNEL_IMG = $(BUILD_DIR)/kernel8.img
50

51
# Targets
52
.PHONY: all clean qemu debug
53

54
all: $(KERNEL_IMG)
55

56
$(KERNEL_IMG): $(KERNEL)
57
  $(OBJCOPY) -O binary $< $@
58

59
$(KERNEL): $(OBJS) src/linker.ld | $(BUILD_DIR)
60
  $(LD) $(LDFLAGS) -T src/linker.ld -o $@ $(OBJS)
61

62
$(OBJ_DIR)/%.o: $(SRC_DIR)/%.c | $(OBJ_DIR)
63
  @mkdir -p $(dir $@)
64
  $(CC) $(CFLAGS) -c $< -o $@
65

66
$(OBJ_DIR)/%.o: $(SRC_DIR)/%.S | $(OBJ_DIR)
67
  @mkdir -p $(dir $@)
68
  $(AS) $(ASFLAGS) -c $< -o $@
69

70
$(BUILD_DIR) $(OBJ_DIR):
71
  mkdir -p $@
72
  mkdir -p $(OBJ_DIR)/boot
73
  mkdir -p $(OBJ_DIR)/memory
74
  mkdir -p $(OBJ_DIR)/exceptions
75
  mkdir -p $(OBJ_DIR)/ui
76
  mkdir -p $(OBJ_DIR)/lib
77
  mkdir -p $(OBJ_DIR)/shell
78

79
qemu: $(KERNEL_IMG)
80
  qemu-system-aarch64 -M virt -cpu cortex-a72 -m 128M -nographic -kernel $(KERNEL_IMG)
81

82
debug: $(KERNEL_IMG)
83
  qemu-system-aarch64 -M virt -cpu cortex-a72 -m 128M -nographic -kernel $(KERNEL_IMG) -S -s
84

85
clean:
86
  rm -rf $(BUILD_DIR)

Design Decision: Makefile Architecture

This Makefile:

Defines compiler and tool variables
Sets compiler flags for freestanding development (-ffreestanding, -nostdlib)
Creates build directories
Compiles C and assembly files
Links the kernel
Generates a binary image
Provides targets for cleaning, running, and debugging

Alternative Approach: A single-step build would be simpler but slower for large projects as it would recompile everything each time.

Tradeoff: We chose the more modular approach for its incremental build capabilities, which saves time during development.

Testing and Debugging

The Makefile includes two targets for testing:

make qemu: Runs the OS in QEMU
make debug: Starts QEMU in debug mode and connects GDB

When debugging, you can use:

Breakpoints: break function_name or break file.c:line
Step execution: step (into functions) or next (over functions)
Examine memory: x/10x $address (show 10 words in hex)
Show registers: info registers

Platform-Specific Setup and Common Issues

Different operating systems and development environments can present unique challenges when building MeringueOS. This section addresses the most common platform-specific issues and their solutions.

macOS Development Setup

macOS users typically use the Homebrew package manager and have slightly different toolchain naming:

1
# Install ARM cross-compilation toolchain via Homebrew
2
brew install aarch64-elf-gcc qemu make
3

4
# Verify installation
5
aarch64-elf-gcc --version
6
qemu-system-aarch64 --version

Makefile Configuration for macOS: Update the cross-compiler setting in your Makefile:

1
# Cross compiler settings for macOS
2
CROSS_COMPILE = aarch64-elf-

Common macOS Issues:

Toolchain naming differences:

Problem: make fails with “command not found: aarch64-linux-gnu-gcc”
Solution: Change CROSS_COMPILE = aarch64-linux-gnu- to CROSS_COMPILE = aarch64-elf-
Root cause: macOS Homebrew uses different package naming conventions

Assembly file formatting:

Problem: Assembler warnings about “end of file not at end of a line”
Solution: Ensure assembly files (.S) end with a newline character

Missing timeout command:

Problem: timeout command not found when testing with QEMU
Solution: Use gtimeout (via brew install coreutils) or manual process management
Alternative: Use backgrounding: (qemu-system-aarch64 ... &); sleep 5; pkill qemu-system-aarch64

Linux Development Setup

Linux development typically uses distribution package managers:

1
# Ubuntu/Debian
2
sudo apt update
3
sudo apt install gcc-aarch64-linux-gnu qemu-system-arm make git
4

5
# Fedora/RHEL
6
sudo dnf install gcc-aarch64-linux-gnu qemu-system-aarch64 make git
7

8
# Arch Linux
9
sudo pacman -S aarch64-linux-gnu-gcc qemu-arch-extra make git

Makefile Configuration for Linux:

1
# Cross compiler settings for Linux
2
CROSS_COMPILE = aarch64-linux-gnu-

Common Linux Issues:

Package naming variations:

Problem: Package not found errors
Ubuntu/Debian: Use gcc-aarch64-linux-gnu
Some distros: May be aarch64-linux-gnu-gcc or cross-gcc-aarch64
Solution: Check your distribution’s package repository

QEMU package differences:

Problem: qemu-system-aarch64 command not found
Solution: Install qemu-system-arm (includes AArch64) or qemu-arch-extra
Verification: which qemu-system-aarch64

Permission issues with QEMU:

Problem: QEMU fails to start with permission errors
Solution: Add user to appropriate groups: sudo usermod -a -G kvm $USER
Note: May require logout/login to take effect

Build System Troubleshooting

These issues can occur on any platform:

Missing Function Declarations:

1
// Add to src/include/lib/string.h
2
char* strtok(char *str, const char *delim);
3
size_t strspn(const char *s, const char *accept);
4
char* strpbrk(const char *s, const char *reject);
5

6
// Add to src/include/lib/stdio.h
7
char kgetc(void);
8
char kgetc_blocking(void);

Linker Script Dependencies: Ensure your Makefile properly depends on the linker script:

1
$(KERNEL): $(OBJS) src/linker.ld | $(BUILD_DIR)
2
  $(LD) $(LDFLAGS) -T src/linker.ld -o $@ $(OBJS)

Compiler Warning Fixes: To suppress unused parameter warnings in shell commands:

1
void cmd_help(int argc, char **argv) {
2
    (void)argc; (void)argv; // Suppress unused warnings
3
    kprintf("Available commands:\n");
4
    // ... rest of function
5
}

Testing and Verification

QEMU Testing Tips:

1
# Basic test (manual termination with Ctrl+C)
2
make qemu
3

4
# Background testing with automatic termination
5
(make qemu &); sleep 10; pkill qemu-system-aarch64
6

7
# Debug mode (connects GDB on port 1234)
8
make debug

Expected Boot Output: A successful boot should show:

1
MeringueOS starting...
2
Kernel loaded at physical address: 0x0
3
Memory Sections: [memory layout information]
4
PMM: Initialization complete. Total: [X] KB, Free: [Y] KB
5
KHeap: Initialized.
6
TUI: Initialized (basic mode)
7
Starting shell...
8
Welcome to MeringueOS Shell!
9
meringue> [awaiting input]

Common Boot Issues:

No output from QEMU:

Check linker script entry point matches _start symbol
Verify UART initialization in early boot
Try adding debug prints with direct register access

Memory management failures:

Verify bitmap initialization and frame allocation
Check that linker symbols are properly aligned
Ensure PMM bitmap size calculation is correct

Shell not responding:

Verify UART getc functions are implemented
Check that character echoing works
Ensure shell command parsing handles edge cases

Integration Best Practices

When implementing from this guide:

Start Simple: Begin with a minimal implementation and add features incrementally
Test Early: Verify each major component before proceeding to the next
Use Debug Output: Liberal use of kprintf helps track initialization progress
Handle Edge Cases: Add validation for NULL pointers and boundary conditions
Follow Conventions: Maintain consistent coding style throughout the codebase

Remember: The goal is learning, not perfection. Focus on understanding each component’s role in the larger system rather than optimizing every detail. (Once you understand the basics, you can extend functionality from there, don’t skip the fundamentals)

Now that our environment is set up, let’s dive into OS development principles.

3. Fundamentals of OS Development

Before writing code, let’s understand key OS concepts.

What is a Kernel?

The kernel is the core of an operating system, responsible for:

Process management: Creating, scheduling, and terminating processes
Memory management: Allocating and tracking memory usage
Device management: Interacting with hardware devices
System calls: Providing services to user applications

MeringueOS implements a monolithic kernel where all OS services run in kernel space (privileged mode).

An OS Under The Hood

The operating system’s job is to coordinate all the hardware components in your computer to work together effectively. Just as multiple programs might want or need to use the CPU, memory, and disk simultaneously, the OS must schedule and allocate these resources fairly/effectively. It ensures the CPU switches between programs efficiently, manages which program gets which portion of memory, queues up disk requests in an efficient order, and routes network packets to the right applications. Without this coordination, programs would conflict with each other, corrupting data or even potentially crashing the system. The kernel is key when it comes to some of these functions we’ve mentioned here.

Kernel Architecture: The Big Picture

1
┌───────────────────────────────────────────────────────────────┐
2
│                      MeringueOS Kernel                        │
3
│                                                               │
4
│  ┌─────────────┐  ┌─────────────┐  ┌──────────────────────┐   │
5
│  │ Boot        │  │ Memory      │  │ Exception Handling   │   │
6
│  │ Sequence    │──▶ Management  │──▶ (Vector Table)       │   │
7
│  └─────────────┘  └─────────────┘  └──────────────────────┘   │
8
│          │                 ▲                   │              │
9
│          │                 │                   │              │
10
│          ▼                 │                   ▼              │
11
│  ┌─────────────┐  ┌─────────────┐  ┌──────────────────────┐   │
12
│  │ Kernel      │  │ Standard    │  │ UART/Console I/O     │   │
13
│  │ Heap        │◀─┤ Library     │◀─┤                      │   │
14
│  └─────────────┘  └─────────────┘  └──────────────────────┘   │
15
│          │                 ▲                  │               │
16
│          │                 │                  │               │
17
│          └────────────────┐│┌─────────────────┘               │
18
│                           │││                                 │
19
│                           ▼▼▼                                 │
20
│                    ┌─────────────┐                            │
21
│                    │ Shell       │                            │
22
│                    │ Interface   │                            │
23
│                    └─────────────┘                            │
24
└───────────────────────────────────────────────────────────────┘

Design Decision: Monolithic vs. Microkernel

MeringueOS uses a monolithic kernel design where all OS services run in privileged mode.

Alternative: Microkernel architecture would move most services to user space, communicating via message passing.

Tradeoffs:

Monolithic: Has better performance, simpler implementation, but less modular
Microkernel: Better isolation, fault tolerance, modular design, but potential performance overhead from context switching

I chose monolithic for educational simplicity and performance, but a microkernel would be more appropriate for a system prioritizing security and reliability or even something that can used for production use cases (potentially).

Execution Privilege Levels in ARM

ARM AArch64 has four Exception Levels (ELs), providing security isolation:

1
+---------------------------------------+
2
| EL3: Secure monitor                   | Highest privilege
3
+---------------------------------------+
4
| EL2: Hypervisor                       |
5
+---------------------------------------+
6
| EL1: OS kernel                        | <- MeringueOS runs here
7
+---------------------------------------+
8
| EL0: Applications                     | Lowest privilege
9
+---------------------------------------+

MeringueOS runs at EL1 . The OS can access all memory and hardware, while future user applications would run at EL0 with limited access.

Memory Management Concepts

Memory management involves:

Physical memory management: Tracking and allocating RAM
Virtual memory: Providing each process with its own address space
Memory protection: Preventing processes from accessing each other’s memory

MeringueOS implements physical memory management with:

A frame allocator for **4KB blocks **of physical memory
A kernel heap allocator for dynamic memory allocation

Memory Management

Memory management gives each program its own private view of memory, even though they’re all sharing the same physical RAM. Physical memory consists of actual hardware addresses with fixed locations, while virtual memory provides fake addresses that the OS translates to real ones behind the scenes. This translation happens through page tables - when a program accesses memory address 0x1000, the OS might actually store that data at physical address 0x40521000. This isolation prevents programs from reading or corrupting each other’s data.

Exception Handling Basics

Exceptions are events that require special handling, including:

Synchronous exceptions: Instruction-related (e.g., system calls, errors)
Asynchronous exceptions: External events (e.g., interrupts)

ARM AArch64 uses a vector table with fixed offsets for different exception types. When an exception occurs, the CPU:

Saves some state (PC, PSTATE)
Jumps to the appropriate vector entry
Executes the handler code

MeringueOS implements a full vector table and handlers for all exception types.

How Exception Handling works

Exception handling is the CPU’s way of dealing with unexpected events and system calls. When something unusual happens - like dividing by zero, accessing invalid memory, or a program requesting OS services - the CPU immediately stops what it’s doing and jumps to a predetermined handler function. The exception vector table acts** like a dispatch system**, containing addresses for different types of exceptions: synchronous ones (errors, system calls) and asynchronous ones (hardware interrupts). Each exception type gets routed to its specific handler, which saves the current program state, handles the issue, then restores execution where it left off.

The Boot Sequence

The boot process for MeringueOS follows these steps:

1
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
2
│ CPU Reset   │     │ Boot.S      │     │ Clear BSS   │
3
│ EL2 or EL3  │────▶│ Setup Stack │────▶│ Section     │
4
│ PC=Reset    │     │ Check Core  │     │             │
5
└─────────────┘     └─────────────┘     └─────────────┘
6
       │                                       │
7
       │                                       ▼
8
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
9
│ Initialize  │     │ Set up      │     │ Transition  │
10
│ Subsystems  │◀─── │ Exceptions  │◀─── │ to EL1      │
11
│ Start Shell │     │ Vector Table│     │             │
12
└─────────────┘     └─────────────┘     └─────────────┘

Now let’s look at the actual implementation.

4. The Boot Process

The boot process is the first step in bringing our OS to life.

Additional context about the Boot Process

When a computer is powered on, the CPU starts executing from a fixed address with most features disabled - no memory management, no interrupts, just raw instruction execution. The bootloader ’s job is to gradually bring the system to life(make it usable/functional): first setting up a stack so functions can be called, then initializing RAM by clearing uninitialized sections and copying data segments to their runtime locations, configuring exception handlers so the CPU knows where to jump when errors occur, and finally enabling hardware features like the MMU and interrupt controller. Only after this foundational setup can the kernel initialize higher-level services like device drivers, file systems, and eventually start the first user process.

Assembly Entry Point

The entry point is in boot.S. Let’s examine it:

1
/* AArch64 boot code for QEMU virt machine */
2
.section ".text.boot"
3

4
.global _start
5
_start:
6
    // Check processor ID is 0 (primary core)
7
    mrs     x0, mpidr_el1
8
    and     x0, x0, #0xFF
9
    cbz     x0, primary_core
10
    // Secondary cores loop forever
11
1:  wfe
12
    b       1b
13

14
primary_core:
15
    // Set stack pointer to _stack_top
16
    ldr     x0, =_stack_top
17
    mov     sp, x0
18

19
    // Copy .rodata section from load address to execution address
20
    ldr     x0, =_rodata_start     // Destination address
21
    ldr     x1, =_rodata_load      // Source address
22
    ldr     x2, =_rodata_end
23
    sub     x2, x2, x0             // Size to copy
24
    bl      copy_data_section
25

26
    // Copy .data section from load address to execution address
27
    ldr     x0, =_data_start       // Destination address
28
    ldr     x1, =_data_load        // Source address
29
    ldr     x2, =_data_end
30
    sub     x2, x2, x0             // Size to copy
31
    bl      copy_data_section
32

33
    // Clear BSS
34
    ldr     x0, =_bss_start
35
    ldr     x1, =_bss_end
36
    sub     x1, x1, x0
37
    cbz     x1, skip_bss_clear
38
clear_bss_loop:
39
    str     xzr, [x0], #8
40
    sub     x1, x1, #8
41
    cbnz    x1, clear_bss_loop
42
skip_bss_clear:
43

44
    // Set up exception vector table
45
    ldr     x0, =_exception_vector_table
46
    msr     vbar_el1, x0
47

48
    // Set up EL1 (kernel mode)
49
    mrs     x0, CurrentEL
50
    lsr     x0, x0, #2
51
    cmp     x0, #1
52
    beq     already_in_el1
53

54
    // If we're in EL2, configure EL1 and drop to it
55
    /* Disable EL1 timer traps */
56
    mov     x0, #0x3       // CNTHCTL_EL2.EL1PCTEN | CNTHCTL_EL2.EL1PCEN
57
    msr     cnthctl_el2, x0
58
    msr     cntvoff_el2, xzr
59

60
    /* Set EL1 execution state to AArch64 */
61
    mov     x0, #(1 << 31)      // AArch64
62
    orr     x0, x0, #(1 << 1)   // SWIO hardwired
63
    msr     hcr_el2, x0
64

65
    /* Configure EL1 */
66
    // Set up SPSR_EL2 for the transition to EL1
67
    mov     x0, #0x3c5         // DAIF + EL1h (SPSel = 1)
68
    msr     spsr_el2, x0
69

70
    /* Set EL1 entry point and switch */
71
    ldr     x0, =already_in_el1
72
    msr     elr_el2, x0
73
    eret
74

75
already_in_el1:
76
    // Enable floating point
77
    mov     x0, #0x00300000     // FPEN bits
78
    msr     cpacr_el1, x0
79
    isb
80

81
    // Jump to C code, passing boot info pointer
82
    mov     x0, #0              // For now, pass null pointer as boot info
83
    bl      kernel_main         // Call our C entry point
84

85
    // If kernel_main returns, halt the CPU
86
1:  wfe
87
    b       1b
88

89
//------------------------------------------------------------------
90
// Helper function to copy data sections
91
// x0 = destination address
92
// x1 = source address
93
// x2 = size in bytes (must be multiple of 8)
94
//------------------------------------------------------------------
95
copy_data_section:
96
    // Return immediately if source and destination are the same
97
    cmp     x0, x1
98
    beq     copy_done
99

100
    // Return if size is zero
101
    cbz     x2, copy_done
102

103
    // Add debugging output
104
    // Preserve registers used by the output routine
105
    stp     x0, x1, [sp, #-16]!
106
    stp     x2, x30, [sp, #-16]!
107

108
    // Call C debug function to print details about the copy
109
    mov     x3, x2            // Size
110
    bl      boot_debug_copy   // Call C function to print debug info
111

112
    // Restore registers
113
    ldp     x2, x30, [sp], #16
114
    ldp     x0, x1, [sp], #16
115

116
    // Perform the copy, byte by byte for safety
117
copy_byte_loop:
118
    ldrb    w4, [x1], #1     // Load a byte
119
    strb    w4, [x0], #1     // Store a byte
120
    sub     x2, x2, #1       // Decrement size
121
    cbnz    x2, copy_byte_loop
122

123
copy_done:
124
    ret

This assembly code:

Checks if we’re on the primary CPU core
Sets up the stack pointer
Copies initialized data sections
Clears the BSS section (uninitialized global variables)
Sets up the exception vector table
Switches from EL2 to EL1 if needed
Enables floating-point and SIMD instructions
Jumps to kernel_main in C

Under the Hood: CPU State at Boot

When an ARM processor powers up:

It starts in the highest exception level (EL3 or EL2 depending on implementation)
Most features are disabled (caches, MMU, etc.)
The program counter is set to a predetermined value
Only a single core is active in multi-core systems

Setting Up the Stack

The stack is essential for C function calls. We set it to the address of _stack_top, which is defined in our linker script. The stack grows downward from that address, providing space for function calls and local variables.

State Transitions During Boot

1
┌───────────────────┐      ┌───────────────────┐      ┌───────────────────┐
2
│ CPU Reset State   │      │ Initial Setup     │      │ Stack & BSS Setup │
3
│ PC = Reset Vector │─────▶│ Check Core ID     │─────▶│ Stack initialized │
4
│ EL = 2 or 3       │      │Disable secondaries│      │ BSS section zeroed│
5
└───────────────────┘      └───────────────────┘      └────────┬──────────┘
6
                                                               │
7
                                                               ▼
8
┌───────────────────┐      ┌───────────────────┐      ┌───────────────────┐
9
│ C Environment     │      │ Transition to EL1 │      │ Exception Setup   │
10
│Jump to kernel_main│◀──── │ Configure SPSRs   │◀──── │Set up vector table│
11
│ Start OS services │      │ Set return addr   │      │ Enable features   │
12
└───────────────────┘      └───────────────────┘      └───────────────────┘

Debugging the Boot Process

Common boot issues and how to diagnose them:

No output from QEMU:

Check linker script entry point matches _start symbol
Verify reset vector location (0x40100000 for our setup)
Add early debug prints before UART initialization using direct register access

Crash during initialization:

Add debug prints between major steps
Check that BSS/data section handling matches linker script
Verify exception vector alignment (must be 2KB aligned)

Issues transitioning to EL1:

Verify SPSR_EL2 configuration is correct (check the ARM documentation)
Check that return address (ELR_EL2) is properly set
Ensure stack pointer is valid before transition

Transition from Assembly to C

Once the basic hardware initialization is done, we jump to kernel_main() to continue in C. This is defined in kernel.c:

1
#include <kernel.h>
2
#include <lib/stdio.h>
3

4
// External functions we'll implement later
5
extern void frame_alloc_init(const KERNEL_BOOT_PARAMS *params);
6
extern void kheap_init(void);
7
extern int tui_init(void);
8
extern void shell_loop(void);
9

10
// Debug function called from boot code before UART is initialized
11
// We need a direct hardware access version
12
static void early_debug_print(const char *str) {
13
    // UART direct access (PL011)
14
    volatile uint32_t *uart_dr = (volatile uint32_t*)0x09000000;
15
    volatile uint32_t *uart_fr = (volatile uint32_t*)0x09000018;
16

17
    while (*str) {
18
        // Wait for FIFO to have space
19
        while ((*uart_fr) & (1 << 5)); // TXFF bit
20

21
        // Send character
22
        *uart_dr = (uint32_t)*str++;
23
    }
24
}
25

26
// Debug function for section copying, called from assembly
27
void boot_debug_copy(void *dest, void *src, size_t size) {
28
    // Print directly using early UART access
29
    early_debug_print("[BOOT] Copying section: ");
30

31
    // Convert size to string manually (very simple)
32
    char size_str[16];
33
    int i = 0;
34
    int tmp = (int)size;
35

36
    // Handle special case of zero
37
    if (tmp == 0) {
38
        size_str[0] = '0';
39
        size_str[1] = '\0';
40
    } else {
41
        // Convert integer to string (backwards)
42
        while (tmp > 0) {
43
            size_str[i++] = '0' + (tmp % 10);
44
            tmp /= 10;
45
        }
46
        size_str[i] = '\0';
47

48
        // Reverse the string
49
        for (int j = 0; j < i/2; j++) {
50
            char c = size_str[j];
51
            size_str[j] = size_str[i-j-1];
52
            size_str[i-j-1] = c;
53
        }
54
    }
55

56
    early_debug_print(size_str);
57
    early_debug_print(" bytes\n");
58
}
59

60
// Kernel entry point
61
void kernel_main(KERNEL_BOOT_PARAMS *params) {
62
    // Early initialization - placeholder for UART setup
63
    // We'll assume kprintf writes to a UART for now
64

65
    kprintf("MeringueOS starting...\n");
66
    kprintf("Kernel loaded at physical address: 0x%llx\n",
67
            params ? params->kernel_phys_start : 0);
68

69
    // Debug section information
70
    kprintf("Memory Sections:\n");
71
    kprintf("  .text:   %p to %p\n", &_kernel_start, &_text_end);
72
    kprintf("  .rodata: %p to %p (load: %p)\n", &_rodata_start, &_rodata_end, &_rodata_load);
73
    kprintf("  .data:   %p to %p (load: %p)\n", &_data_start, &_data_end, &_data_load);
74
    kprintf("  .bss:    %p to %p\n", &_bss_start, &_bss_end);
75

76
    // Initialize memory management subsystem
77
    kprintf("Initializing Physical Memory Manager...\n");
78
    frame_alloc_init(params);
79

80
    kprintf("Initializing Kernel Heap Allocator...\n");
81
    kheap_init();
82

83
    // Initialize Text User Interface
84
    kprintf("Initializing TUI subsystem...\n");
85
    if (tui_init() != 0) {
86
        kprintf("Failed to initialize TUI subsystem!\n");
87
        // Continue without TUI for now
88
    }
89

90
    // Enter the shell
91
    kprintf("Starting shell...\n");
92
    shell_loop();
93

94
    // If shell returns, halt
95
    kprintf("Kernel halting.\n");
96
    while(1) {
97
        // This is equivalent to a halt
98
        asm volatile("wfi");
99
    }
100
}

This C code initializes each subsystem in turn:

Prints debug information about memory sections
Initializes the physical memory manager
Sets up the kernel heap
Initializes the text UI
Finally, starts the shell, which takes over

Memory Layout

The memory layout is defined in the linker script linker.ld:

1
/* Linker script for Arm-OS */
2

3
/* QEMU virt machine memory layout */
4
ENTRY(_start)
5

6
SECTIONS
7
{
8
    /* Kernel starts at 0x40100000 for QEMU virt machine */
9
    . = 0x40100000;
10
    _kernel_start = .;
11

12
    .text : ALIGN(8)
13
    {
14
        *(.text.boot)  /* Boot code first */
15
        *(.text)       /* All other code */
16
        *(.text.*)     /* Including subsections */
17
    }
18
    _text_end = .;
19

20
    .rodata : ALIGN(8)
21
    {
22
        _rodata_start = .;
23
        *(.rodata)
24
        *(.rodata.*)
25
        _rodata_end = .;
26
    }
27

28
    .data : ALIGN(8)
29
    {
30
        _data_start = .;
31
        *(.data)
32
        *(.data.*)
33
        _data_end = .;
34
    }
35

36
    /* Add symbols for load addresses (where sections are actually loaded by QEMU) */
37
    _rodata_load = LOADADDR(.rodata);
38
    _data_load = LOADADDR(.data);
39

40
    .bss : ALIGN(8)
41
    {
42
        _bss_start = .;
43
        *(.bss)
44
        *(.bss.*)
45
        *(COMMON)
46
        . = ALIGN(8);
47
        _bss_end = .;
48
    }
49

50
    .eh_frame :
51
    {
52
        *(.eh_frame)
53
    }
54

55
    . = ALIGN(8);
56

57
    /* Reserve space for the stack */
58
    . = ALIGN(16);
59
    _stack_bottom = .;
60
    . += 0x10000; /* 64 KB for stack */
61
    . = ALIGN(16);
62
    _stack_top = .;
63

64
    /* Reserve space for PMM bitmap */
65
    . = ALIGN(4096);
66
    _pmm_bitmap_start = .;
67
    . += 0x20000; /* 128 KB for PMM bitmap (can manage 4GB of RAM) */
68
    _pmm_bitmap_end = .;
69

70
    _kernel_end = .;
71
}

This script:

Sets the entry point to _start
Places the kernel at physical address 0x40100000
Organizes sections: text (code), rodata (constants), data, and bss
Reserves stack space
Provides symbols for section boundaries

Design Decision: Static vs. Dynamic Section Loading

My implementation uses static, fixed addresses defined in the linker script.

Alternative: A more dynamic approach could parse an ELF header to locate sections.

Tradeoffs:

Static: Simpler implementation, fixed memory layout, easier to debug
Dynamic: More flexible, supports loading arbitrary binaries, but more complex

I chose static loading for educational clarity and deterministic behavior(easy to debug), though production OSes typically use dynamic approaches for flexibility.

The memory layout looks like:

1
                    +-------------------+ 0x40100000
2
                    | .text (code)      |
3
                    |                   |
4
                    +-------------------+
5
                    | .rodata           |
6
                    | (constants)       |
7
                    +-------------------+
8
                    | .data             |
9
                    | (initialized vars)|
10
                    +-------------------+
11
                    | .bss              |
12
                    | (zeroed at boot)  |
13
                    +-------------------+
14
                    |                   |
15
                    | Available memory  |
16
                    |                   |
17
                    +-------------------+
18
                    | PMM bitmap        |
19
                    | (memory tracking) |
20
                    +-------------------+
21
                    | Stack (64KB)      |
22
                    | (grows downward)  |
23
       _stack_top → +-------------------+

Now that our kernel is booting, let’s implement memory management.

5. Memory Management

Memory management is crucial for any OS. MeringueOS implements two levels:

Physical frame allocation
Kernel heap for dynamic memory

Physical Memory Management

The physical memory manager (PMM) tracks 4KB blocks of RAM called frames. It uses a bitmap where each bit represents a frame.

Let’s look at the implementation in frame_alloc.h:

1
#ifndef FRAME_ALLOC_H
2
#define FRAME_ALLOC_H
3

4
#include <stddef.h>
5
#include <stdint.h>
6
#include <stdbool.h>
7
#include "kernel.h"
8

9
// Define page size (typically 4KB for AArch64)
10
#define PAGE_SIZE 4096
11
#define PAGE_SHIFT 12
12

13
// Define the RAM region for QEMU virt machine
14
#define PMM_RAM_BASE 0x40000000
15

16
// Note: Linker symbols are now included from kernel.h
17

18
// Initialize the physical memory manager
19
void frame_alloc_init(const KERNEL_BOOT_PARAMS *params);
20

21
// Allocate a physical frame, returns NULL if no free frames
22
void* alloc_frame(void);
23

24
// Free a previously allocated physical frame
25
void free_frame(void *frame);
26

27
// Get information about memory
28
uint64_t pmm_get_total_memory(void);
29
uint64_t pmm_get_free_memory(void);
30
uint64_t pmm_get_highest_usable_address(void);
31

32
#endif // FRAME_ALLOC_H

And the implementation in frame_alloc.c:

1
#include <stddef.h>
2
#include <stdint.h>
3
#include <stdbool.h>
4
#include "memory/frame_alloc.h"
5
#include "lib/string.h"
6
#include "lib/stdio.h"
7

8
// For QEMU virt, RAM often starts at 0x40000000 and can be e.g., 128MB or more.
9
// Let's assume a max manageable physical address space, e.g., 1GB beyond RAM start.
10
// Adjust this based on actual QEMU configuration or dynamic detection.
11
#define PMM_MANAGEABLE_SIZE (1024ULL * 1024ULL * 1024ULL) // Manage up to 1GB
12
#define PMM_MAX_ADDRESS (PMM_RAM_BASE + PMM_MANAGEABLE_SIZE)
13
#define PMM_TOTAL_FRAMES (PMM_MANAGEABLE_SIZE / PAGE_SIZE)
14

15
// Bitmap for tracking frame usage. Each bit represents one frame.
16
// Size = Total Frames / 8 bits per byte
17
static uint8_t *frame_bitmap = &_pmm_bitmap_start;
18

19
static uint64_t total_memory = 0;
20
static uint64_t free_memory = 0;
21
static uint64_t highest_usable_address = 0;
22

23
// Helper function to set a bit in the bitmap
24
static void set_bit(size_t bit) {
25
    frame_bitmap[bit / 8] |= (1 << (bit % 8));
26
}
27

28
// Helper function to clear a bit in the bitmap
29
static void clear_bit(size_t bit) {
30
    frame_bitmap[bit / 8] &= ~(1 << (bit % 8));
31
}
32

33
// Helper function to test a bit in the bitmap
34
static bool test_bit(size_t bit) {
35
    return (frame_bitmap[bit / 8] & (1 << (bit % 8))) != 0;
36
}
37

38
// Mark a range of frames as used
39
static void mark_range_used(uint64_t base_addr, uint64_t size) {
40
    uint64_t start_frame = (base_addr >= PMM_RAM_BASE) ?
41
                          (base_addr - PMM_RAM_BASE) / PAGE_SIZE :
42
                          (UINT64_MAX);
43
    uint64_t end_addr = base_addr + size;
44
    uint64_t end_frame = (end_addr > PMM_RAM_BASE) ?
45
                        (end_addr - 1 - PMM_RAM_BASE) / PAGE_SIZE :
46
                        0;
47

48
    if (start_frame >= PMM_TOTAL_FRAMES) return; // Start address out of managed range
49
    if (end_frame >= PMM_TOTAL_FRAMES) end_frame = PMM_TOTAL_FRAMES - 1; // Cap end address
50

51
    kprintf("PMM: Marking used 0x%llx - 0x%llx (Frames %llu - %llu)\n",
52
            base_addr, base_addr + size, start_frame, end_frame);
53

54
    for (uint64_t i = start_frame; i <= end_frame; ++i) {
55
        if (!test_bit(i)) {
56
            set_bit(i);
57
            // Assuming these were initially counted as free, decrement count
58
            if (total_memory >= PAGE_SIZE) total_memory -= PAGE_SIZE;
59
            if (free_memory >= PAGE_SIZE) free_memory -= PAGE_SIZE;
60
        }
61
    }
62
}
63

64
// Mark a range of frames as free
65
static void mark_range_free(uint64_t base_addr, uint64_t size) {
66
    uint64_t start_frame = (base_addr >= PMM_RAM_BASE) ?
67
                          (base_addr - PMM_RAM_BASE) / PAGE_SIZE :
68
                          (UINT64_MAX);
69
    uint64_t end_addr = base_addr + size;
70
    uint64_t end_frame = (end_addr > PMM_RAM_BASE) ?
71
                        (end_addr - 1 - PMM_RAM_BASE) / PAGE_SIZE :
72
                        0;
73

74
    if (start_frame >= PMM_TOTAL_FRAMES) return; // Start address out of managed range
75
    if (end_frame >= PMM_TOTAL_FRAMES) end_frame = PMM_TOTAL_FRAMES - 1; // Cap end address
76

77
    kprintf("PMM: Marking free 0x%llx - 0x%llx (Frames %llu - %llu)\n",
78
            base_addr, base_addr + size, start_frame, end_frame);
79

80
    for (uint64_t i = start_frame; i <= end_frame; ++i) {
81
        if (test_bit(i)) { // Only count if it wasn't already free
82
             total_memory += PAGE_SIZE;
83
             free_memory += PAGE_SIZE;
84
             if ((PMM_RAM_BASE + (i + 1) * PAGE_SIZE) > highest_usable_address) {
85
                 highest_usable_address = PMM_RAM_BASE + (i + 1) * PAGE_SIZE;
86
             }
87
        }
88
       clear_bit(i); // Mark as free regardless
89
    }
90
}
91

92
void frame_alloc_init(const KERNEL_BOOT_PARAMS *params) {
93
    kprintf("PMM: Initializing Physical Memory Manager...\n");
94

95
    // Calculate the bitmap size
96
    size_t bitmap_size = (&_pmm_bitmap_end - &_pmm_bitmap_start);
97
    kprintf("PMM: Bitmap size: %zu bytes, located at %p\n",
98
            bitmap_size, frame_bitmap);
99

100
    // Initially, mark all manageable frames as used
101
    memset(frame_bitmap, 0xFF, bitmap_size);
102
    total_memory = 0;
103
    free_memory = 0;
104
    highest_usable_address = PMM_RAM_BASE;
105

106
    if (params) {
107
        kprintf("PMM: Kernel Physical Range: 0x%llx - 0x%llx\n",
108
                params->kernel_phys_start, params->kernel_phys_end);
109

110
        // For now, we'll simplify and just reserve the kernel address range
111
        // In a full implementation, we would parse the UEFI memory map from params
112

113
        // Mark all memory as free initially (simplified approach)
114
        mark_range_free(PMM_RAM_BASE, PMM_MANAGEABLE_SIZE);
115

116
        // Then mark kernel memory as used
117
        mark_range_used(params->kernel_phys_start,
118
                       params->kernel_phys_end - params->kernel_phys_start);
119
    } else {
120
        // No boot parameters, use linker-provided kernel boundaries
121
        uint64_t kernel_start = (uint64_t)&_kernel_start;
122
        uint64_t kernel_end = (uint64_t)&_kernel_end;
123

124
        kprintf("PMM: Kernel boundaries from linker: 0x%llx - 0x%llx\n",
125
                kernel_start, kernel_end);
126

127
        // Mark all memory as free initially
128
        mark_range_free(PMM_RAM_BASE, PMM_MANAGEABLE_SIZE);
129

130
        // Then mark kernel memory as used
131
        mark_range_used(kernel_start, kernel_end - kernel_start);
132
    }
133

134
    // Mark the bitmap itself as used (it lies within kernel memory, but just to be explicit)
135
    mark_range_used((uint64_t)frame_bitmap, bitmap_size);
136

137
    kprintf("PMM: Initialization complete. Total: %llu KB, Free: %llu KB\n",
138
            total_memory / 1024, free_memory / 1024);
139
}
140

141
void* alloc_frame(void) {
142
    // Simple linear search for first free frame
143
    for (size_t i = 0; i < PMM_TOTAL_FRAMES; i++) {
144
        if (!test_bit(i)) {
145
            set_bit(i);
146
            free_memory -= PAGE_SIZE;
147

148
            // Calculate the physical address
149
            void *frame_addr = (void*)(PMM_RAM_BASE + i * PAGE_SIZE);
150

151
            // Zero the frame for security/predictability
152
            memset(frame_addr, 0, PAGE_SIZE);
153

154
            return frame_addr;
155
        }
156
    }
157

158
    kprintf("PMM: ERROR - Out of physical frames!\n");
159
    return NULL; // No free frame found
160
}
161

162
void free_frame(void *frame) {
163
    if (!frame) return;
164

165
    uint64_t addr = (uint64_t)frame;
166

167
    // Basic validation
168
    if (addr < PMM_RAM_BASE || addr >= PMM_MAX_ADDRESS) {
169
        kprintf("PMM: Attempt to free invalid frame at %p\n", frame);
170
        return;
171
    }
172

173
    // Check alignment
174
    if (addr % PAGE_SIZE != 0) {
175
        kprintf("PMM: Attempt to free unaligned address %p\n", frame);
176
        return;
177
    }
178

179
    // Calculate the bit index
180
    size_t frame_idx = (addr - PMM_RAM_BASE) / PAGE_SIZE;
181

182
    if (frame_idx >= PMM_TOTAL_FRAMES) {
183
        kprintf("PMM: Frame index %zu out of range\n", frame_idx);
184
        return;
185
    }
186

187
    // Check if frame is currently marked as used
188
    if (!test_bit(frame_idx)) {
189
        kprintf("PMM: Warning - double free detected for frame %p\n", frame);
190
        return;
191
    }
192

193
    // Mark as free
194
    clear_bit(frame_idx);
195
    free_memory += PAGE_SIZE;
196
}
197

198
uint64_t pmm_get_total_memory(void) {
199
    return total_memory;
200
}
201

202
uint64_t pmm_get_free_memory(void) {
203
    return free_memory;
204
}
205

206
uint64_t pmm_get_highest_usable_address(void) {
207
    return highest_usable_address;
208
}

The PMM:

Uses a bitmap for frame tracking (1 bit per 4KB frame)
Reserves frames for hardware and the kernel
Provides functions to allocate and free frames
Includes validation to catch errors like double-frees

Under the Hood: Physical Memory Allocation

1
┌───────────────────────────────────────────────────────────────────────┐
2
│ Physical Memory (1GB Starting at 0x40000000)                          │
3
│                                                                       │
4
│ ┌─────────┐ ┌─────────┐ ┌─────────┐             ┌─────────┐           │
5
│ │ Frame 0 │ │ Frame 1 │ │ Frame 2 │    ...      │Frame N-1│           │
6
│ │ (4 KB)  │ │ (4 KB)  │ │ (4 KB)  │             │ (4 KB)  │           │
7
│ └─────────┘ └─────────┘ └─────────┘             └─────────┘           │
8
└───────────────────────────────────────────────────────────────────────┘
9
                           │
10
                           ▼
11
┌───────────────────────────────────────────────────────────────────────┐
12
│ Frame Bitmap (1 bit per frame)                                        │
13
│                                                                       │
14
│ ┌───┬───┬───┬───┬───┬───┬───┬───┬───┐                                 │
15
│ │ 1 │ 1 │ 0 │ 0 │ 1 │ 0 │ 1 │ 1 │...│ (1=used, 0=free)                │
16
│ └───┴───┴───┴───┴───┴───┴───┴───┴───┘                                 │
17
│                                                                       │
18
└───────────────────────────────────────────────────────────────────────┘

When alloc_frame() is called:

The bitmap is scanned for a free frame (bit=0)
The bit is set to 1 (marking it as used)
The corresponding physical address is calculated and returned
If no free frame is found, NULL is returned

Troubleshooting Memory Issues

Common memory management bugs and how to diagnose them:

Out of memory errors:

Check bitmap initialization - ensure it’s properly zeroed
Verify that free_frame() is correctly clearing bits
Add accounting code to track memory usage patterns

Memory corruption:

Ensure frames are aligned properly (check address % PAGE_SIZE == 0)
Verify bitmap is correctly representing frame states
Add debug logging for allocation/free operations

Double free bugs:

Use the validation in free_frame() to catch and log double-frees
Add optional “poisoning” to freed frames for easier debugging

Kernel Heap Implementation

The kernel heap provides dynamic memory allocation. Let’s look at the implementation in kheap.h:

1
#ifndef KHEAP_H
2
#define KHEAP_H
3

4
#include <stddef.h>
5
#include <stdint.h>
6

7
// Initialize the kernel heap
8
void kheap_init(void);
9

10
// Allocate a block of memory
11
void *kmalloc(size_t size);
12

13
// Free a previously allocated block
14
void kfree(void *ptr);
15

16
#endif // KHEAP_H

And the implementation in kheap.c:

1
#include <stddef.h>
2
#include <stdint.h>
3
#include <stdbool.h>
4
#include "memory/kheap.h"
5
#include "memory/frame_alloc.h"
6
#include "lib/string.h"
7
#include "lib/stdio.h"
8

9
// Header for memory blocks (both allocated and free)
10
typedef struct heap_block {
11
    size_t size;          // Size of the data area *excluding* this header
12
    bool is_free;         // True if block is free, false if allocated
13
    struct heap_block *next; // Pointer to the next block in the heap (physical order)
14
    struct heap_block *prev; // Pointer to the previous block in the heap (physical order)
15
    struct heap_block *next_free; // Pointer to the next free block in the free list
16
    struct heap_block *prev_free; // Pointer to the previous free block in the free list
17
} heap_block_t;
18

19
#define HEAP_HEADER_SIZE sizeof(heap_block_t)
20
#define HEAP_MIN_BLOCK_SIZE (HEAP_HEADER_SIZE * 2) // Minimum size to allow splitting
21

22
// Head of the free list (doubly linked)
23
static heap_block_t *free_list_head = NULL;
24
static heap_block_t *heap_start = NULL;
25
static heap_block_t *heap_end = NULL;
26

27
// --- Free List Management ---
28

29
static void add_to_free_list(heap_block_t *block) {
30
    block->is_free = true;
31
    block->next_free = free_list_head;
32
    block->prev_free = NULL;
33
    if (free_list_head) {
34
        free_list_head->prev_free = block;
35
    }
36
    free_list_head = block;
37
}
38

39
static void remove_from_free_list(heap_block_t *block) {
40
    if (block->prev_free) {
41
        block->prev_free->next_free = block->next_free;
42
    } else {
43
        free_list_head = block->next_free; // It was the head
44
    }
45
    if (block->next_free) {
46
        block->next_free->prev_free = block->prev_free;
47
    }
48
    block->is_free = false; // Mark as not free after removal
49
    block->next_free = NULL;
50
    block->prev_free = NULL;
51
}
52

53
// --- Heap Expansion ---
54

55
static bool expand_heap(size_t min_expand_size) {
56
    // Request at least a page, or more if needed
57
    size_t pages_needed = (min_expand_size + HEAP_HEADER_SIZE + PAGE_SIZE - 1) / PAGE_SIZE;
58
    if (pages_needed == 0) pages_needed = 1;
59

60
    kprintf("KHeap: Expanding heap by %zu pages\n", pages_needed);
61

62
    heap_block_t *new_block = NULL;
63
    for (size_t i = 0; i < pages_needed; ++i) {
64
        void *frame = alloc_frame();
65
        if (!frame) {
66
            kprintf("KHeap Error: Failed to allocate frame during expansion!\n");
67
            // If we allocated some frames but not all, we should ideally free them
68
            // or add what we got. For simplicity here, we fail.
69
            return false;
70
        }
71

72
        heap_block_t *current_block = (heap_block_t *)frame;
73
        current_block->size = PAGE_SIZE - HEAP_HEADER_SIZE;
74
        current_block->is_free = true;
75
        current_block->next = NULL; // Will be linked below or by coalesce
76

77
        if (!new_block) {
78
            new_block = current_block; // Keep track of the first new block
79
        }
80

81
        // Link the new block into the main heap structure
82
        if (heap_end) {
83
            current_block->prev = heap_end;
84
            heap_end->next = current_block;
85
        } else {
86
            // This is the very first block in the heap
87
            heap_start = current_block;
88
            current_block->prev = NULL;
89
        }
90
        heap_end = current_block;
91
    }
92

93
    if (!new_block) return false; // Should not happen if alloc_frame succeeded
94

95
    // Coalesce the first new block with the previous block if it was free
96
    if (new_block->prev && new_block->prev->is_free) {
97
        heap_block_t *prev_block = new_block->prev;
98
        remove_from_free_list(prev_block); // Remove old free block
99
        prev_block->size += new_block->size + HEAP_HEADER_SIZE;
100
        prev_block->next = new_block->next;
101
        if (new_block->next) {
102
            new_block->next->prev = prev_block;
103
        }
104
        if (heap_end == new_block) {
105
             heap_end = prev_block;
106
        }
107
        new_block = prev_block; // The merged block is now the 'new' block to add
108
    }
109

110
    // Add the (potentially merged) new block to the free list
111
    add_to_free_list(new_block);
112

113
    return true;
114
}
115

116
// --- Coalescing ---
117

118
static heap_block_t* coalesce(heap_block_t *block) {
119
    if (!block || !block->is_free) return block;
120

121
    heap_block_t *current = block;
122

123
    // Coalesce with next block if it's free
124
    if (current->next && current->next->is_free) {
125
        heap_block_t *next_block = current->next;
126
        kprintf("KHeap: Coalescing forward %p (%zu) with %p (%zu)\n",
127
                current, current->size, next_block, next_block->size);
128
        remove_from_free_list(next_block); // Remove next block from free list
129
        current->size += next_block->size + HEAP_HEADER_SIZE;
130
        current->next = next_block->next;
131
        if (current->next) {
132
            current->next->prev = current;
133
        }
134
        if (heap_end == next_block) {
135
            heap_end = current;
136
        }
137
        // next_block is now merged into current, clear it for safety
138
        memset(next_block, 0, HEAP_HEADER_SIZE);
139
    }
140

141
    // Coalesce with previous block if it's free
142
    if (current->prev && current->prev->is_free) {
143
        heap_block_t *prev_block = current->prev;
144
        kprintf("KHeap: Coalescing backward %p (%zu) with %p (%zu)\n",
145
                prev_block, prev_block->size, current, current->size);
146
        remove_from_free_list(prev_block); // Previous block is already in free list, remove it
147
        prev_block->size += current->size + HEAP_HEADER_SIZE;
148
        prev_block->next = current->next;
149
        if (prev_block->next) {
150
            prev_block->next->prev = prev_block;
151
        }
152
         if (heap_end == current) {
153
            heap_end = prev_block;
154
        }
155
        // current is now merged into prev_block, clear it for safety
156
        memset(current, 0, HEAP_HEADER_SIZE);
157
        current = prev_block; // The result of the coalesce is the previous block
158
    }
159

160
    return current; // Return the potentially larger coalesced block
161
}
162

163
// --- Public API ---
164

165
void kheap_init() {
166
    free_list_head = NULL;
167
    heap_start = NULL;
168
    heap_end = NULL;
169

170
    // Pre-allocate some initial pages
171
    expand_heap(PAGE_SIZE * 4); // Pre-allocate 16KB
172

173
    kprintf("KHeap: Initialized.\n");
174
}
175

176
void* kmalloc(size_t size) {
177
    if (size == 0) {
178
        return NULL;
179
    }
180

181
    // Ensure minimum allocation size and alignment (e.g., align to 8 or 16 bytes)
182
    // For simplicity, let's align to sizeof(void*)
183
    size_t alignment = sizeof(void*);
184
    size = (size + alignment - 1) & ~(alignment - 1);
185

186
    // Add space for the header
187
    size_t total_size_needed = size + HEAP_HEADER_SIZE;
188

189
    // First-fit search
190
    heap_block_t *current_free = free_list_head;
191
    heap_block_t *best_fit = NULL;
192

193
    while (current_free) {
194
        if (current_free->size >= size) { // Found a block large enough
195
            best_fit = current_free;
196
            break; // First fit
197
        }
198
        current_free = current_free->next_free;
199
    }
200

201
    // If no block found, try to expand the heap
202
    if (!best_fit) {
203
        if (!expand_heap(total_size_needed)) {
204
            kprintf("KHeap Error: Failed to expand heap for allocation of size %zu\n", size);
205
            return NULL; // Expansion failed
206
        }
207
        // Retry finding a block (the new block should be suitable)
208
        current_free = free_list_head;
209
         while (current_free) {
210
            if (current_free->size >= size) {
211
                best_fit = current_free;
212
                break;
213
            }
214
            current_free = current_free->next_free;
215
        }
216

217
        if (!best_fit) {
218
             kprintf("KHeap Error: Still no suitable block after expansion!\n");
219
             return NULL; // Should not happen if expansion succeeded
220
        }
221
    }
222

223
    // We found a suitable block (best_fit)
224
    remove_from_free_list(best_fit); // Remove it from the free list
225

226
    // Check if we can split the block
227
    if (best_fit->size >= size + HEAP_MIN_BLOCK_SIZE) {
228
        // Split the block
229
        size_t remaining_size = best_fit->size - size - HEAP_HEADER_SIZE;
230
        heap_block_t *new_free_block = (heap_block_t *)((uint8_t *)best_fit + HEAP_HEADER_SIZE + size);
231

232
        new_free_block->size = remaining_size;
233
        new_free_block->is_free = true; // Will be added to free list
234
        new_free_block->next = best_fit->next;
235
        new_free_block->prev = best_fit;
236

237
        if (new_free_block->next) {
238
            new_free_block->next->prev = new_free_block;
239
        } else {
240
            heap_end = new_free_block; // It's the new end of the heap
241
        }
242

243
        best_fit->size = size; // Adjust size of the allocated block
244
        best_fit->next = new_free_block;
245

246
        // Add the new smaller free block to the free list
247
        add_to_free_list(new_free_block);
248
        kprintf("KHeap: Split block %p. Allocated %zu, remaining %zu at %p\n",
249
                best_fit, best_fit->size, new_free_block->size, new_free_block);
250

251
    } else {
252
        // Cannot split, allocate the whole block
253
        kprintf("KHeap: Allocated whole block %p (%zu) for size %zu\n",
254
                best_fit, best_fit->size, size);
255
    }
256

257
    best_fit->is_free = false;
258

259
    // Return pointer to the data area (after the header)
260
    void *data_ptr = (void *)((uint8_t *)best_fit + HEAP_HEADER_SIZE);
261
    // Optionally zero the allocated memory
262
    memset(data_ptr, 0, best_fit->size);
263

264
    // kprintf("KHeap: kmalloc(%zu) -> %p\n", size, data_ptr);
265
    return data_ptr;
266
}
267

268
void kfree(void *ptr) {
269
    if (!ptr) {
270
        return;
271
    }
272

273
    // Get the header from the pointer
274
    heap_block_t *block = (heap_block_t *)((uint8_t *)ptr - HEAP_HEADER_SIZE);
275

276
    // Basic validation
277
    if (block->is_free) {
278
        kprintf("KHeap Warning: Double free detected for pointer %p\n", ptr);
279
        return;
280
    }
281
    // More robust validation would involve checking magic numbers in the header
282
    // or ensuring the block pointer is within the known heap range [heap_start, heap_end].
283

284
    kprintf("KHeap: kfree(%p) - block %p, size %zu\n", ptr, block, block->size);
285
    block->is_free = true;
286

287
    // Attempt to coalesce with neighbors
288
    heap_block_t *coalesced_block = coalesce(block);
289

290
    // Add the (potentially coalesced) block back to the free list
291
    // Check if it wasn't already added by coalesce (if coalesce returned the same block)
292
    if (coalesced_block == block) {
293
         add_to_free_list(coalesced_block);
294
         kprintf("KHeap: Added block %p (%zu) to free list\n",
295
                 coalesced_block, coalesced_block->size);
296
    } else {
297
         // Coalesce already handled adding the merged block (prev_block)
298
         add_to_free_list(coalesced_block); // Ensure the final coalesced block is on the list
299
         kprintf("KHeap: Added coalesced block %p (%zu) to free list\n",
300
                 coalesced_block, coalesced_block->size);
301
    }
302
}

The kernel heap:

Uses a linked list of memory blocks with headers
Implements first-fit allocation with splitting
Coalesces adjacent free blocks during kfree() to reduce fragmentation
Can expand dynamically by allocating more physical frames

How the Kernel Heap works

The kernel heap is a region of memory that can be dynamically allocated and freed as needed. When code calls kmalloc(256), the heap allocator searches through a list of free memory blocks to find one at least 256 bytes large. If it finds a 1024-byte free block, it splits it into a 256-byte allocated block and a 768-byte free block, adding the remainder back to the free list. When memory is freed with kfree(), the allocator marks that block as available and checks if adjacent blocks are also free - if so, it merges them into a larger block to prevent fragmentation. This constant splitting and merging ensures memory can be efficiently reused throughout the kernel’s lifetime.

Allocation Strategy Deep Dive

Our implementation uses a first-fit approach with splitting to reduce fragmentation:

1
Before allocation of 100 bytes:
2
┌──────────────┬──────────────────────────────────┬───────────────────┐
3
│ Header       │ Free Block (256 bytes)           │ Header            │
4
│ size=256     │                                  │ Next block...     │
5
│ is_free=1    │                                  │                   │
6
└──────────────┴──────────────────────────────────┴───────────────────┘
7

8
After allocation:
9
┌──────────────┬────────────────┬──────────────┬──────────────────────┐
10
│ Header       │ Used Block     │ Header       │ Free Block           │
11
│ size=100     │ (100 bytes)    │ size=140     │ (140 bytes)          │
12
│ is_free=0    │                │ is_free=1    │                      │
13
└──────────────┴────────────────┴──────────────┴──────────────────────┘
14
                                 ▲
15
                                 │
16
                    New header created during split

Design Decision: Memory Allocation Algorithm

I implemented a first-fit algorithm with splitting and coalescing.

Alternatives considered:

Best-fit: Find the smallest block that fits the request
Worst-fit: Find the largest block to minimize fragmentation
Buddy system: Power-of-2 sizes with efficient coalescing
Slab allocator: Pre-allocated objects of common sizes

Tradeoffs:

First-fit: Good general performance, simple implementation
Best-fit: Reduces wasted space but slower search
Worst-fit: Can reduce fragmentation but wastes memory
Buddy: Excellent for power-of-2 allocations, but internal fragmentation
Slab: Excellent for fixed-size allocations, poor for variable sizes

I chose first-fit for its balance of simplicity and reasonable performance characteristics. Production kernels often use combinations of these approaches.

Troubleshooting Memory Issues

Common memory management bugs and how to diagnose them:

Double free:

Symptom: Corruption in free list
Detection: Add validation in kfree() to check if block is already marked as free
Fix: Track allocations during debugging or use memory poisoning

Use after free:

Symptom: Random crashes or data corruption
Detection: Fill freed memory with pattern (0xDE) for debug builds
Fix: Implement pointer nulling after free, heap validation routines

Buffer overflow:

Symptom: Corruption of adjacent blocks’ headers
Detection: Add canary values after allocations
Fix: Add bounds checking, validate heap integrity periodically

Memory leaks:

Symptom: Gradually running out of memory
Detection: Implement allocation tracking
Fix: Add debug-mode tracking of allocation sites

6. Exception Handling

Exception handling is essential for responding to hardware events and system calls.

Mental Model: Exception Handling as Emergency Response

Exception handling is like emergency response in a city. When an alarm sounds (exception occurs), specially trained teams (exception handlers) respond to specific types of emergencies, following prescribed protocols (vector table).

Exception Vector Tables

ARM AArch64 uses a vector table with 16 entries, 4 categories with 4 possible origins:

1
┌─────────────────────────────┬─────────────────────────────┐
2
│ Current EL, using SP0       │ Current EL, using current SP│
3
├─────────────────────────────┼─────────────────────────────┤
4
│ Synchronous                 │ Synchronous                 │
5
├─────────────────────────────┼─────────────────────────────┤
6
│ IRQ/vIRQ                    │ IRQ/vIRQ                    │
7
├─────────────────────────────┼─────────────────────────────┤
8
│ FIQ/vFIQ                    │ FIQ/vFIQ                    │
9
├─────────────────────────────┼─────────────────────────────┤
10
│ SError/vSError              │ SError/vSError              │
11
├─────────────────────────────┼─────────────────────────────┤
12
│ Lower EL, using AArch64     │ Lower EL, using AArch32     │
13
├─────────────────────────────┼─────────────────────────────┤
14
│ Synchronous                 │ Synchronous                 │
15
├─────────────────────────────┼─────────────────────────────┤
16
│ IRQ/vIRQ                    │ IRQ/vIRQ                    │
17
├─────────────────────────────┼─────────────────────────────┤
18
│ FIQ/vFIQ                    │ FIQ/vFIQ                    │
19
├─────────────────────────────┼─────────────────────────────┤
20
│ SError/vSError              │ SError/vSError              │
21
└─────────────────────────────┴─────────────────────────────┘

Let’s look at our implementation in exceptions_asm.S:

1
.section ".text.exceptions"
2
.align 11 // Align to 2KB (2^11)
3

4
.globl _exception_vector_table
5
_exception_vector_table:
6
    // Handlers for exceptions from Current EL using SP0
7
   .align 7 // Align each entry to 128 bytes (2^7)
8
    b sync_handler_sp0      // Synchronous EL1t
9
   .align 7
10
    b irq_handler_sp0       // IRQ EL1t
11
   .align 7
12
    b fiq_handler_sp0       // FIQ EL1t
13
   .align 7
14
    b serror_handler_sp0    // SError EL1t
15

16
    // Handlers for exceptions from Current EL using SPx (SP_EL1)
17
   .align 7
18
    b sync_handler_spx      // Synchronous EL1h
19
   .align 7
20
    b irq_handler_spx       // IRQ EL1h
21
   .align 7
22
    b fiq_handler_spx       // FIQ EL1h
23
   .align 7
24
    b serror_handler_spx    // SError EL1h
25

26
    // Handlers for exceptions from Lower EL (EL0) using AArch64
27
   .align 7
28
    b sync_handler_el0_64   // Synchronous EL0 (64-bit)
29
   .align 7
30
    b irq_handler_el0_64    // IRQ EL0 (64-bit)
31
   .align 7
32
    b fiq_handler_el0_64    // FIQ EL0 (64-bit)
33
   .align 7
34
    b serror_handler_el0_64 // SError EL0 (64-bit)
35

36
    // Handlers for exceptions from Lower EL (EL0) using AArch32 (Placeholder)
37
   .align 7
38
    b sync_handler_el0_32   // Synchronous EL0 (32-bit)
39
   .align 7
40
    b irq_handler_el0_32    // IRQ EL0 (32-bit)
41
   .align 7
42
    b fiq_handler_el0_32    // FIQ EL0 (32-bit)
43
   .align 7
44
    b serror_handler_el0_32 // SError EL0 (32-bit)
45

46

47
// Common entry point macro for saving context
48
// Assumes exception taken to EL1 using SP_EL1 (current SP)
49
.macro save_context
50
    // Allocate space on the stack for GPRs (x0-x30), SPSR_EL1, ELR_EL1, SP_EL0
51
    // 31 GPRs + 3 system regs = 34 registers * 8 bytes/reg = 272 bytes
52
    // Align stack pointer to 16 bytes before pushing
53
    sub sp, sp, #288       // Allocate space (272 + padding for alignment)
54
    mov x0, sp             // Copy SP to temporary register
55
    and x0, x0, #-16       // Align the register to 16-byte boundary
56
    mov sp, x0             // Update SP with aligned value
57

58
    // Store GPRs x0-x30 (31 registers)
59
    stp x0, x1, [sp, #16 * 0]
60
    stp x2, x3, [sp, #16 * 1]
61
    stp x4, x5, [sp, #16 * 2]
62
    stp x6, x7, [sp, #16 * 3]
63
    stp x8, x9, [sp, #16 * 4]
64
    stp x10, x11, [sp, #16 * 5]
65
    stp x12, x13, [sp, #16 * 6]
66
    stp x14, x15, [sp, #16 * 7]
67
    stp x16, x17, [sp, #16 * 8]
68
    stp x18, x19, [sp, #16 * 9]
69
    stp x20, x21, [sp, #16 * 10]
70
    stp x22, x23, [sp, #16 * 11]
71
    stp x24, x25, [sp, #16 * 12]
72
    stp x26, x27, [sp, #16 * 13]
73
    stp x28, x29, [sp, #16 * 14]
74
    str x30, [sp, #16 * 15] // Store LR (x30)
75

76
    // Store relevant system registers
77
    mrs x0, spsr_el1
78
    mrs x1, elr_el1
79
    mrs x2, sp_el0
80
    stp x0, x1, [sp, #16 * 15 + 8]  // Store SPSR_EL1, ELR_EL1
81
    str x2, [sp, #16 * 16 + 8]      // Store SP_EL0
82

83
    // Pass pointer to saved registers (current SP) to C handler in x0
84
    mov x0, sp
85
.endm
86

87
// Common exit point macro for restoring context
88
.macro restore_context
89
    // x0 might contain return value from C handler, but we don't use it here.
90
    // Restore system registers first
91
    ldp x0, x1, [sp, #16 * 15 + 8]  // Load SPSR_EL1, ELR_EL1
92
    ldr x2, [sp, #16 * 16 + 8]      // Load SP_EL0
93
    msr spsr_el1, x0
94
    msr elr_el1, x1
95
    msr sp_el0, x2
96

97
    // Restore GPRs x0-x30
98
    ldp x0, x1, [sp, #16 * 0]
99
    ldp x2, x3, [sp, #16 * 1]
100
    ldp x4, x5, [sp, #16 * 2]
101
    ldp x6, x7, [sp, #16 * 3]
102
    ldp x8, x9, [sp, #16 * 4]
103
    ldp x10, x11, [sp, #16 * 5]
104
    ldp x12, x13, [sp, #16 * 6]
105
    ldp x14, x15, [sp, #16 * 7]
106
    ldp x16, x17, [sp, #16 * 8]
107
    ldp x18, x19, [sp, #16 * 9]
108
    ldp x20, x21, [sp, #16 * 10]
109
    ldp x22, x23, [sp, #16 * 11]
110
    ldp x24, x25, [sp, #16 * 12]
111
    ldp x26, x27, [sp, #16 * 13]
112
    ldp x28, x29, [sp, #16 * 14]
113
    ldr x30, [sp, #16 * 15] // Restore LR (x30)
114

115
    // Deallocate stack space used for saving context
116
    add sp, sp, #288        // Must match allocation size
117

118
    eret // Return from exception
119
.endm
120

121
// --- Specific Handlers ---
122
// These handlers assume the exception was taken to EL1 using SP_EL1
123

124
sync_handler_common:
125
    save_context
126
    bl handle_sync_exception // Call C handler
127
    restore_context
128

129
irq_handler_common:
130
    save_context
131
    bl handle_irq           // Call C handler
132
    restore_context
133

134
fiq_handler_common:
135
    save_context
136
    bl handle_fiq           // Call C handler
137
    restore_context
138

139
serror_handler_common:
140
    save_context
141
    bl handle_serror        // Call C handler
142
    restore_context
143

144
// --- Vector Table Entry Points ---
145
// Route different vector entries to appropriate common handlers
146
// For simplicity, this example routes most EL1/EL0_64 entries to common handlers.
147
// A real OS might need more differentiation based on SP0/SPx.
148
sync_handler_sp0:
149
sync_handler_spx:
150
sync_handler_el0_64:
151
    b sync_handler_common
152

153
irq_handler_sp0:
154
irq_handler_spx:
155
irq_handler_el0_64:
156
    b irq_handler_common
157

158
fiq_handler_sp0:
159
fiq_handler_spx:
160
fiq_handler_el0_64:
161
    b fiq_handler_common
162

163
serror_handler_sp0:
164
serror_handler_spx:
165
serror_handler_el0_64:
166
    b serror_handler_common
167

168
// Placeholder handlers for AArch32 (just loop indefinitely)
169
sync_handler_el0_32:
170
irq_handler_el0_32:
171
fiq_handler_el0_32:
172
serror_handler_el0_32:
173
    wfi
174
    b .
175

176
// --- External C function declarations ---
177
.globl handle_sync_exception
178
.globl handle_irq
179
.globl handle_fiq
180
.globl handle_serror

This code:

Creates a properly aligned vector table (2KB alignment, 128-byte entries)
Provides handlers for all 16 exception types
Uses macros to save and restore all registers (context)
Calls C functions for actual exception handling

Under the Hood: The Exception Mechanism

When an exception occurs:

CPU saves critical state (PC, PSTATE) in dedicated registers
CPU switches to appropriate exception level if needed
PC is set to the address in the vector table corresponding to the exception type
Exception handler code executes
When complete, the handler executes eret to return to the interrupted code

1
Normal Execution                  Exception Occurs              Handler Execution
2
┌─────────────────┐               ┌─────────────────┐           ┌─────────────────┐
3
│ Instruction 1   │               │ Save state:     │           │ Exception       │
4
│ Instruction 2   │               │ - ELR_ELx = PC  │           │ handler code    │
5
│ Instruction 3   │───Exception──▶│ - SPSR_ELx = PSR│──Jump to─▶│ Analyze cause   │
6
│ Instruction 4   │               │ - Switch SP     │  handler  │ Handle exception│
7
│ ...             │               │                 │           │ Restore state   │
8
└─────────────────┘               └─────────────────┘           └─────────────────┘
9
        ▲                                                                │
10
        │                                                                │
11
        └────────────────────────────Return (eret)───────────────────────┘

Design Decision: Context Saving Strategy

My implementation saves all registers for simplicity.

Alternative: Partial register saving based on the AArch64 procedure call standard.

Tradeoffs:

Full saving: Simpler, consistent, but more overhead for simple exceptions
Partial saving: More efficient but requires understanding which registers must be preserved

I chose full context saving for educational clarity and robustness, though production kernels often optimize this.

Saving and Restoring Context

When an exception occurs, we need to preserve the CPU state. Our save_context macro:

Allocates stack space for saving registers
Saves all 30 general-purpose registers
Saves ELR_EL1 (return address), SPSR_EL1 (saved program status), and SP_EL0

This allows the exception handler to use the registers without disrupting the interrupted code.

Handling Different Types of Exceptions

The C handler in exceptions.c processes exceptions based on their type:

1
#include <stdint.h>
2
#include <stdbool.h>
3
#include "exceptions/exceptions.h"
4
#include "lib/stdio.h"
5

6
// Helper function to read ESR_EL1
7
static inline uint64_t read_esr_el1(void) {
8
    uint64_t val;
9
    asm volatile("mrs %0, esr_el1" : "=r" (val));
10
    return val;
11
}
12

13
// Helper function to read ELR_EL1
14
static inline uint64_t read_elr_el1(void) {
15
    uint64_t val;
16
    asm volatile("mrs %0, elr_el1" : "=r" (val));
17
    return val;
18
}
19

20
// Helper function to read FAR_EL1
21
static inline uint64_t read_far_el1(void) {
22
    uint64_t val;
23
    asm volatile("mrs %0, far_el1" : "=r" (val));
24
    return val;
25
}
26

27
// Simple panic function
28
void panic(const char *message) {
29
    kprintf("\nKERNEL PANIC: %s\n", message);
30
    kprintf("System halted.\n");
31
    // Disable interrupts here
32
    asm volatile("msr daifset, #0xf");
33
    while (1) {
34
        asm volatile("wfi"); // Wait for interrupt (effectively halt)
35
    }
36
}
37

38
// Function to print register context
39
void print_registers(const saved_registers_t *context) {
40
    kprintf("Saved Registers:\n");
41
    for (int i = 0; i < 31; i += 2) {
42
         kprintf("  x%-2d: %016llx   x%-2d: %016llx\n",
43
                 i, context->regs[i], i + 1, (i + 1 < 31)? context->regs[i + 1] : 0);
44
    }
45
     kprintf("  SPSR_EL1: %016llx\n", context->spsr_el1);
46
     kprintf("  ELR_EL1:  %016llx\n", context->elr_el1);
47
     kprintf("  SP_EL0:   %016llx\n", context->sp_el0);
48
}
49

50

51
// --- Exception Handlers ---
52

53
// Called by assembly wrapper for synchronous exceptions
54
void handle_sync_exception(saved_registers_t *context) {
55
    uint64_t esr = read_esr_el1();
56
    uint64_t elr = context->elr_el1; // Use saved ELR
57
    uint64_t far = read_far_el1(); // Fault Address Register
58

59
    uint32_t ec = (esr >> 26) & 0x3F; // Extract Exception Class (bits 31:26)
60
    uint32_t iss = esr & 0x1FFFFFF;   // Extract Instruction Specific Syndrome (bits 24:0)
61

62
    kprintf("\n--- Synchronous Exception Taken ---\n");
63
    kprintf(" ESR_EL1: %016llx (EC: 0x%x, ISS: 0x%x)\n", esr, ec, iss);
64
    kprintf(" ELR_EL1: %016llx (Return Address)\n", elr);
65

66
    const char* ec_str = "Unknown";
67
    bool far_valid = false;
68

69
    switch (ec) {
70
        case 0b000000: ec_str = "Unknown reason"; break;
71
        case 0b000001: ec_str = "Trapped WFI or WFE"; break;
72
        //... other EC values for MCR/MRC, MCRR/MRRC, LDC/STC etc. (AArch32 related)
73
        case 0b001110: ec_str = "Illegal Execution State"; break;
74
        case 0b010001: ec_str = "SVC instruction execution in AArch32 state"; break;
75
        case 0b010101: ec_str = "SVC instruction execution in AArch64 state"; break;
76
        case 0b011000: ec_str = "Trapped MSR, MRS or System instruction execution in AArch64 state"; break;
77
        case 0b011001: ec_str = "Access to SVE functionality trapped"; break; // Added in ARMv8.2
78
        case 0b100000: ec_str = "Instruction Abort from a lower Exception level (AArch32)"; far_valid = true; break;
79
        case 0b100001: ec_str = "Instruction Abort from a lower Exception level (AArch64)"; far_valid = true; break;
80
        case 0b100010: ec_str = "PC alignment fault exception"; break;
81
        case 0b100100: ec_str = "Data Abort from a lower Exception level (AArch32)"; far_valid = true; break;
82
        case 0b100101: ec_str = "Data Abort from a lower Exception level (AArch64)"; far_valid = true; break;
83
        case 0b100110: ec_str = "SP alignment fault exception"; break;
84
        case 0b101000: ec_str = "Trapped floating-point exception (AArch32)"; break;
85
        case 0b101100: ec_str = "Trapped floating-point exception (AArch64)"; break;
86
        case 0b110000: ec_str = "SError interrupt"; break;
87
        case 0b110001: ec_str = "Breakpoint exception from a lower Exception level (AArch32)"; break;
88
        case 0b110010: ec_str = "Breakpoint exception from a lower Exception level (AArch64)"; break;
89
        case 0b110100: ec_str = "Step exception from a lower Exception level (AArch32)"; break;
90
        case 0b110101: ec_str = "Step exception from a lower Exception level (AArch64)"; break;
91
        case 0b111000: ec_str = "Watchpoint exception from a lower Exception level (AArch32)"; break;
92
        case 0b111001: ec_str = "Watchpoint exception from a lower Exception level (AArch64)"; break;
93
        case 0b111100: ec_str = "BRK instruction execution in AArch64 state"; break;
94
        // Exceptions from current EL
95
        case 0b100011: ec_str = "Instruction Abort from current EL"; far_valid = true; break;
96
        case 0b100111: ec_str = "Data Abort from current EL"; far_valid = true; break;
97
        default: ec_str = "Unhandled Exception Class"; break;
98
    }
99

100
    kprintf(" Type: %s\n", ec_str);
101
    if (far_valid) {
102
        kprintf(" FAR_EL1: %016llx (Faulting Virtual Address)\n", far);
103
    }
104
    print_registers(context);
105
    kprintf("-------------------------------------\n");
106

107
    // Handle specific exceptions or panic
108
    if (ec == 0b111100) { // BRK instruction
109
        kprintf("BRK instruction encountered. Continuing execution.\n");
110
        // Advance ELR_EL1 past the BRK instruction (assuming BRK is 4 bytes)
111
        context->elr_el1 += 4;
112
        // Return normally via restore_context -> eret
113
    } else if (ec == 0b010101) { // SVC instruction
114
        uint16_t svc_imm = iss & 0xFFFF; // Extract immediate value from ISS
115
        kprintf("SVC instruction encountered (Imm: 0x%x). Implement SVC handler.\n", svc_imm);
116
        // Handle the system call based on svc_imm and registers x0-x7 in context
117
        // For now, just advance ELR and return.
118
         context->elr_el1 += 4;
119
    }
120
     else {
121
        // For most other synchronous exceptions, panic.
122
        panic("Unhandled synchronous exception");
123
    }
124
}
125

126
// Called by assembly wrapper for IRQ exceptions
127
void handle_irq(saved_registers_t *context) {
128
    kprintf("\n--- IRQ Received ---\n");
129
    // TODO: Interact with the Generic Interrupt Controller (GIC)
130
    // 1. Read Interrupt Acknowledge Register (IAR) from CPU interface (GICC_IAR)
131
    //    to get the interrupt ID and acknowledge the interrupt.
132
    // 2. Dispatch to the appropriate driver/handler based on the ID.
133
    // 3. Write to End Of Interrupt Register (EOIR) (GICC_EOIR) to signal completion.
134
    kprintf(" (No GIC driver implemented yet)\n");
135
    print_registers(context);
136
    kprintf("--------------------\n");
137
    // For now, just return via restore_context -> eret
138
}
139

140
// Placeholder for FIQ
141
void handle_fiq(saved_registers_t *context) {
142
    kprintf("\n--- FIQ Received ---\n");
143
    print_registers(context);
144
    panic("FIQ handling not implemented");
145
}
146

147
// Placeholder for SError
148
void handle_serror(saved_registers_t *context) {
149
    uint64_t esr = read_esr_el1();
150
    kprintf("\n--- SError Received ---\n");
151
    kprintf(" ESR_EL1: %016llx\n", esr);
152
    print_registers(context);
153
    panic("SError handling not implemented");
154
}

This code:

Reads exception information from system registers
Decodes the exception type using the Exception Class (EC) field from ESR_EL1
Handles specific types like SVC (system calls), BRK instructions, and data aborts
Provides detailed diagnostics by printing register contents

Debugging Exception Handlers

Tips for debugging exception-related issues:

System appears to hang:

Check that the vector table is properly registered using vbar_el1
Verify that exception handlers eventually return (check for missing eret)
Use simple, direct UART output before the register-saving code

Corrupted state after exception:

Validate register save/restore sequence matches in pairs
Check stack alignment (must be 16-byte aligned for AArch64)
Verify SP adjustment matches between save and restore

Recursive exceptions:

Implement guard against reentrant exceptions using a simple counter
Add debug output showing exception nesting level
Check for exceptions during context saving/restoring

Implementing System Calls

System calls allow user programs to request services from the kernel. They’re implemented using the SVC instruction, which generates a synchronous exception.

The full flow is:

User code executes svc #N with a number indicating the desired function
CPU takes a synchronous exception to EL1
Our exception handler decodes the SVC number
The appropriate system call handler is invoked
Results are returned to the user program

Let’s implement I/O functions to interact with our OS.

7. UART and I/O

The Universal Asynchronous Receiver/Transmitter (UART) provides serial communication for console I/O.

How UART Serial Communication Works

UART (Universal Asynchronous Receiver/Transmitter) is one of the simplest ways computers communicate - it sends data one bit at a time over a single wire. To send a byte like ‘A’ (0x41), the UART hardware breaks it into individual bits and transmits them at a predetermined rate (like 115200 bits per second). Before each byte, it sends a “start bit” to signal incoming data, then the 8 data bits, and finally a “stop bit” to mark the end. The receiving UART samples the line at the same rate, reconstructing bytes from the bit stream. This simplicity makes UART perfect for console output and debugging, though it can be limited by its slow speed and lack of error correction beyond basic parity checks.

The PL011 UART Controller

The QEMU virt platform uses ARM’s PL011 UART at address 0x09000000.

Under the Hood: UART Registers

The PL011 UART is a memory-mapped device with these key registers:

1
┌──────────────────────────────────────────────────────────────┐
2
│ PL011 UART Registers                                         │
3
├─────────────┬────────────────────┬───────────────────────────┤
4
│ Offset      │ Register           │ Purpose                   │
5
├─────────────┼────────────────────┼───────────────────────────┤
6
│ 0x000       │ UART_DR           │ Data Register (read/write) │
7
│ 0x018       │ UART_FR           │ Flag Register              │
8
│ 0x024       │ UART_IBRD         │ Integer Baud Rate Divisor  │
9
│ 0x028       │ UART_FBRD         │ Fractional Baud Rate       │
10
│ 0x02C       │ UART_LCRH         │ Line Control Register      │
11
│ 0x030       │ UART_CR           │ Control Register           │
12
└─────────────┴────────────────────┴───────────────────────────┘

When writing to the UART:

Check UART_FR to see if the transmit FIFO is full
Write the character to UART_DR when there’s space
The UART hardware handles sending the bit sequence

When reading from the UART:

Check UART_FR to see if the receive FIFO has data
Read the character from UART_DR if data is available

The implementation is in uart.c:

1
#include "lib/uart.h"
2
#include <stdint.h>
3

4
// QEMU virt PL011 UART registers
5
#define UART_BASE       0x09000000
6
#define UART_DR         ((volatile uint32_t*)(UART_BASE + 0x00))
7
#define UART_FR         ((volatile uint32_t*)(UART_BASE + 0x18))
8
#define UART_IBRD       ((volatile uint32_t*)(UART_BASE + 0x24))
9
#define UART_FBRD       ((volatile uint32_t*)(UART_BASE + 0x28))
10
#define UART_LCRH       ((volatile uint32_t*)(UART_BASE + 0x2C))
11
#define UART_CR         ((volatile uint32_t*)(UART_BASE + 0x30))
12
#define UART_IMSC       ((volatile uint32_t*)(UART_BASE + 0x38))
13

14
// Flag register bits
15
#define UART_FR_RXFE    0x10    // Receive FIFO empty
16
#define UART_FR_TXFF    0x20    // Transmit FIFO full
17

18
// Line control register bits
19
#define UART_LCRH_FEN   0x10    // Enable FIFOs
20
#define UART_LCRH_WLEN_8 0x60   // 8 bits word length
21

22
// Control register bits
23
#define UART_CR_UARTEN  0x01    // UART enable
24
#define UART_CR_TXE     0x100   // Transmit enable
25
#define UART_CR_RXE     0x200   // Receive enable
26

27
void uart_init(void) {
28
    // Disable UART while configuring
29
    *UART_CR = 0;
30

31
    // Configure baud rate: 115200 baud
32
    // Assuming 48MHz clock, divisor = 48000000/(16*115200) = 26.041666...
33
    // Integer part = 26
34
    // Fractional part = 0.041666... * 64 = 2.66... ≈ 3
35
    *UART_IBRD = 26;
36
    *UART_FBRD = 3;
37

38
    // Configure line control: 8 bits, no parity, 1 stop bit, FIFOs enabled
39
    *UART_LCRH = UART_LCRH_WLEN_8 | UART_LCRH_FEN;
40

41
    // Mask all interrupts initially
42
    *UART_IMSC = 0;
43

44
    // Enable UART, transmit and receive
45
    *UART_CR = UART_CR_UARTEN | UART_CR_TXE | UART_CR_RXE;
46
}
47

48
void uart_putc(char c) {
49
    // Wait for FIFO to have space
50
    while (*UART_FR & UART_FR_TXFF);
51

52
    // Send character
53
    *UART_DR = c;
54

55
    // If it's a newline, also send a carriage return
56
    if (c == '\n') {
57
        uart_putc('\r');
58
    }
59
}
60

61
char uart_getc(void) {
62
    // If receive FIFO is empty, return 0
63
    if (*UART_FR & UART_FR_RXFE) {
64
        return 0;
65
    }
66

67
    // Read and return character
68
    return *UART_DR;
69
}
70

71
int uart_is_data_available(void) {
72
    // Check if receive FIFO is not empty
73
    return !(*UART_FR & UART_FR_RXFE);
74
}
75

76
### Design Decision: Polling vs. Interrupt-Driven I/O
77

78
Our implementation uses polling for simplicity.
79

80
**Alternative**: Interrupt-driven I/O would use IRQ handlers to process UART events.
81

82
**Tradeoffs**:
83
- Polling: Simple implementation, but wastes CPU cycles when waiting
84
- Interrupt-driven: More efficient CPU usage, but more complex to implement
85

86
> I chose polling for educational clarity, though a production OS would typically use interrupts for better performance.
87

88
### Implementing Standard I/O Functions
89

90
Now let's implement a basic printf-like function in `stdio.c`:
91

92
```c
93
#include <stdarg.h>
94
#include <stddef.h>
95
#include <stdint.h>
96
#include <stdbool.h>
97
#include "lib/stdio.h"
98
#include "lib/uart.h"
99
#include "lib/string.h"
100

101
// Buffer sizes
102
#define PRINTF_BUFFER_SIZE 1024
103
#define MAX_INT_DIGITS 21  // For 64-bit integer
104

105
// Basic kprintf implementation
106
int kprintf(const char *format, ...) {
107
    char buffer[PRINTF_BUFFER_SIZE];
108
    char *buf_ptr = buffer;
109
    va_list args;
110
    va_start(args, format);
111

112
    while (*format != '\0' && (size_t)(buf_ptr - buffer) < PRINTF_BUFFER_SIZE - 1) {
113
        if (*format == '%') {
114
            format++;
115

116
            // Check for length modifiers
117
            bool is_long = false;
118
            bool is_longlong = false;
119

120
            if (*format == 'l') {
121
                is_long = true;
122
                format++;
123
                if (*format == 'l') {
124
                    is_longlong = true;
125
                    is_long = false;
126
                    format++;
127
                }
128
            }
129

130
            // Process format specifier
131
            switch (*format) {
132
                case 'c':
133
                    *buf_ptr++ = (char)va_arg(args, int);
134
                    break;
135
                case 's': {
136
                    const char *str = va_arg(args, const char*);
137
                    if (str == NULL) str = "(null)";
138
                    size_t len = strlen(str);
139
                    if (len > PRINTF_BUFFER_SIZE - (buf_ptr - buffer) - 1)
140
                        len = PRINTF_BUFFER_SIZE - (buf_ptr - buffer) - 1;
141
                    memcpy(buf_ptr, str, len);
142
                    buf_ptr += len;
143
                    break;
144
                }
145
                case 'd':
146
                case 'i': {
147
                    int64_t value;
148
                    if (is_longlong)
149
                        value = va_arg(args, int64_t);
150
                    else if (is_long)
151
                        value = va_arg(args, long);
152
                    else
153
                        value = va_arg(args, int);
154

155
                    // Convert to string using helper function
156
                    char num_buf[MAX_INT_DIGITS];
157
                    char *num_ptr = num_buf + MAX_INT_DIGITS - 1;
158
                    *num_ptr = '\0';
159

160
                    bool is_negative = value < 0;
161
                    uint64_t abs_value = is_negative ? -value : value;
162

163
                    do {
164
                        *--num_ptr = '0' + (abs_value % 10);
165
                        abs_value /= 10;
166
                    } while (abs_value > 0);
167

168
                    if (is_negative) {
169
                        *--num_ptr = '-';
170
                    }
171

172
                    size_t len = strlen(num_ptr);
173
                    if (len > PRINTF_BUFFER_SIZE - (buf_ptr - buffer) - 1)
174
                        len = PRINTF_BUFFER_SIZE - (buf_ptr - buffer) - 1;
175
                    memcpy(buf_ptr, num_ptr, len);
176
                    buf_ptr += len;
177
                    break;
178
                }
179
                case 'u': {
180
                    uint64_t value;
181
                    if (is_longlong)
182
                        value = va_arg(args, uint64_t);
183
                    else if (is_long)
184
                        value = va_arg(args, unsigned long);
185
                    else
186
                        value = va_arg(args, unsigned int);
187

188
                    // Convert to string
189
                    char num_buf[MAX_INT_DIGITS];
190
                    char *num_ptr = num_buf + MAX_INT_DIGITS - 1;
191
                    *num_ptr = '\0';
192

193
                    do {
194
                        *--num_ptr = '0' + (value % 10);
195
                        value /= 10;
196
                    } while (value > 0);
197

198
                    size_t len = strlen(num_ptr);
199
                    if (len > PRINTF_BUFFER_SIZE - (buf_ptr - buffer) - 1)
200
                        len = PRINTF_BUFFER_SIZE - (buf_ptr - buffer) - 1;
201
                    memcpy(buf_ptr, num_ptr, len);
202
                    buf_ptr += len;
203
                    break;
204
                }
205
                case 'x':
206
                case 'X': {
207
                    uint64_t value;
208
                    if (is_longlong)
209
                        value = va_arg(args, uint64_t);
210
                    else if (is_long)
211
                        value = va_arg(args, unsigned long);
212
                    else
213
                        value = va_arg(args, unsigned int);
214

215
                    // Convert to hex string
216
                    char num_buf[MAX_INT_DIGITS];
217
                    char *num_ptr = num_buf + MAX_INT_DIGITS - 1;
218
                    *num_ptr = '\0';
219

220
                    const char *hex_chars = (*format == 'x') ? "0123456789abcdef" : "0123456789ABCDEF";
221

222
                    do {
223
                        *--num_ptr = hex_chars[value & 0xF];
224
                        value >>= 4;
225
                    } while (value > 0);
226

227
                    size_t len = strlen(num_ptr);
228
                    if (len > PRINTF_BUFFER_SIZE - (buf_ptr - buffer) - 1)
229
                        len = PRINTF_BUFFER_SIZE - (buf_ptr - buffer) - 1;
230
                    memcpy(buf_ptr, num_ptr, len);
231
                    buf_ptr += len;
232
                    break;
233
                }
234
                case 'p': {
235
                    void *ptr = va_arg(args, void*);
236

237
                    // Print "0x" prefix
238
                    if (buf_ptr + 2 < buffer + PRINTF_BUFFER_SIZE - 1) {
239
                        *buf_ptr++ = '0';
240
                        *buf_ptr++ = 'x';
241
                    }
242

243
                    // Convert to hex string
244
                    uint64_t value = (uint64_t)ptr;
245
                    char num_buf[MAX_INT_DIGITS];
246
                    char *num_ptr = num_buf + MAX_INT_DIGITS - 1;
247
                    *num_ptr = '\0';
248

249
                    // Always print full pointer width (16 hex digits for 64-bit)
250
                    int digit_count = 0;
251
                    do {
252
                        *--num_ptr = "0123456789abcdef"[value & 0xF];
253
                        value >>= 4;
254
                        digit_count++;
255
                    } while (value > 0 || digit_count < 16);
256

257
                    size_t len = strlen(num_ptr);
258
                    if (len > PRINTF_BUFFER_SIZE - (buf_ptr - buffer) - 1)
259
                        len = PRINTF_BUFFER_SIZE - (buf_ptr - buffer) - 1;
260
                    memcpy(buf_ptr, num_ptr, len);
261
                    buf_ptr += len;
262
                    break;
263
                }
264
                case '%':
265
                    *buf_ptr++ = '%';
266
                    break;
267
                default:
268
                    // Unsupported format specifier, just copy it
269
                    *buf_ptr++ = '%';
270
                    *buf_ptr++ = *format;
271
                    break;
272
            }
273
        } else {
274
            *buf_ptr++ = *format;
275
        }
276

277
        format++;
278
    }
279

280
    // Null-terminate the buffer
281
    *buf_ptr = '\0';
282

283
    // Output the buffer through UART
284
    char *output_ptr = buffer;
285
    while (*output_ptr) {
286
        uart_putc(*output_ptr++);
287
    }
288

289
    va_end(args);
290
    return buf_ptr - buffer;
291
}
292

293
// Get a character (non-blocking)
294
char kgetc(void) {
295
    return uart_getc();
296
}
297

298
// Simple blocking character read
299
char kgetc_blocking(void) {
300
    char c;
301
    while ((c = uart_getc()) == 0) {
302
        // Wait for character
303
        asm volatile("yield");
304
    }
305

306
    // Convert CR (Enter key) to LF for processing
307
    if (c == '\r') {
308
        return '\n';
309
    }
310

311
    return c;
312
}

Implementing printf: Format String Parsing

Our kprintf implementation parses format strings character by character:

1
┌─────────────────────────────────────────────────────────────┐
2
│ Format String: "Value: %d, Hex: 0x%x"                       │
3
└───────────┬─────────────────────────────────────────────────┘
4
            │
5
            ▼
6
┌─────────────────────────────────────────────┐
7
│ Parser State Machine                        │
8
│                                             │
9
│       ┌───────┐      '%'      ┌───────┐     │
10
│ ─────▶│ Normal│─────────────▶ │ Format│     │
11
│       │ Text  │               │ Spec  │     │
12
│       └───────┘     format    └───┬───┘     │
13
│          ▲          parsed        │         │
14
│          └────────────────────────┘         │
15
└─────────────────────────────────────────────┘
16
            │
17
            ▼
18
┌─────────────────────────────────────────────┐
19
│ Output: "Value: 42, Hex: 0x2A"              │
20
└─────────────────────────────────────────────┘

This code:

Implements kprintf for formatted console output
Supports basic format specifiers: %c, %s, %d, %x, %p
Uses varargs for handling variable argument lists
Includes length modifiers for different integer sizes

Troubleshooting I/O Issues

Common UART/printf issues and how to diagnose them:

No output from UART:

Check UART initialization (baud rate, format settings)
Verify UART registers are mapped to correct physical addresses
Try direct register writes to UART_DR as a test

Garbled UART output:

Check baud rate settings match expected clock frequency
Verify line settings (parity, stop bits, word length)
Check for buffer overflows in printf implementation

Printf format handling issues:

Add bounds checking to prevent buffer overflows
Check for missing null termination in generated strings
Verify format specifiers are fully supported

With I/O functions in place, let’s build a shell interface for user interaction.

8. Building a Basic Shell

A shell provides a command-line interface for interacting with the OS. Let’s implement one in shell.c:

How Command-Line Shells Work

A shell operates in a continuous Read-Evaluate-Print-Loop (REPL) cycle. First, it waits for and reads user input, typically one line at a time, echoing each character as you type and handling special keys like backspace. Once you press Enter, it parses the command line into arguments by splitting on spaces - the first token becomes the command name, and the rest become parameters. The shell then looks up this command in its command table and executes the corresponding function, passing the parsed arguments. After the command completes and prints its output, the shell loops back to wait for the next command. This cycle continues indefinitely, processing commands one at a time until the shell is terminated.

1
┌───────────────────────────────────────────────────────────────┐
2
│ Shell REPL Cycle                                              │
3
│                                                               │
4
│           ┌──────────┐                                        │
5
│           │  Read    │                                        │
6
│           │  Input   │                                        │
7
│           └────┬─────┘                                        │
8
│                │                                              │
9
│ ┌───────────┐  │   ┌────────────┐    ┌────────────────────┐   │
10
│ │  Wait for │◀─┘   │ Parse into │    │ Execute Command    │   │
11
│ │  Command  │      │ Arguments  │───▶│ Corresponding to   │   │
12
│ └───────────┘      └────────────┘    │ Command Table      │   │
13
│       ▲                              └─────────┬──────────┘   │
14
│       │                                        │              │
15
│       └────────────────────────────────────────┘              │
16
│                                                               │
17
└───────────────────────────────────────────────────────────────┘

Here’s the implementation in shell.c:

1
#include <stdint.h>
2
#include <stddef.h>
3
#include <stdbool.h>
4
#include "kernel.h"
5
#include "shell/shell.h"
6
#include "lib/stdio.h"
7
#include "lib/string.h"
8
#include "lib/stdlib_stubs.h"
9
#include "memory/frame_alloc.h"
10
#include "memory/kheap.h"
11

12
#define MAX_CMD_LEN 128
13
#define MAX_ARGS 10
14

15
// --- Address Validation ---
16
// Needs access to PMM's knowledge of valid RAM regions.
17
// This is a simplified check against the highest known usable address.
18
bool is_address_valid(uint64_t addr, size_t len) {
19
    uint64_t highest_ram = pmm_get_highest_usable_address();
20
    // Basic check: ensure start and end are within the known usable range
21
    // and don't wrap around. Assumes PMM_RAM_BASE is the lowest valid RAM addr.
22
    if (addr < PMM_RAM_BASE || addr >= highest_ram) {
23
        return false;
24
    }
25
    if (len > 0 && (addr + len - 1) >= highest_ram) {
26
         // Check for overflow before checking end boundary
27
         if (addr + len < addr) return false; // Overflow occurred
28
         return false; // End address is out of bounds
29
    }
30
    return true;
31
}
32

33

34
// --- Shell Commands ---
35

36
void cmd_memdump(int argc, char **argv) {
37
    if (argc < 2) {
38
        kprintf("Usage: memdump <address> [length]\n");
39
        return;
40
    }
41

42
    char *endptr;
43
    uint64_t addr = simple_strtoull(argv[1], &endptr, 0);
44
    if (*endptr != '\0') {
45
        kprintf("Error: Invalid address format '%s'\n", argv[1]);
46
        return;
47
    }
48

49
    size_t length = 256; // Default length
50
    if (argc > 2) {
51
        length = (size_t)simple_strtoull(argv[2], &endptr, 0);
52
        if (*endptr != '\0') {
53
            kprintf("Error: Invalid length format '%s'\n", argv[2]);
54
            return;
55
        }
56
    }
57

58
    if (length == 0) return;
59

60
    // Validate the entire range
61
    if (!is_address_valid(addr, length)) {
62
        kprintf("Error: Address range 0x%llx - 0x%llx is not within valid RAM.\n",
63
                addr, addr + length -1);
64
        return;
65
    }
66

67
    kprintf("Memory dump from 0x%llx (length %zu):\n", addr, length);
68

69
    volatile uint8_t *ptr = (volatile uint8_t *)addr;
70
    for (size_t i = 0; i < length; i += 16) {
71
        kprintf("%016llx: ", addr + i);
72
        // Print hex bytes
73
        for (size_t j = 0; j < 16; ++j) {
74
            if (i + j < length) {
75
                kprintf("%02x ", ptr[i + j]);
76
            } else {
77
                kprintf("   ");
78
            }
79
            if (j == 7) kprintf(" "); // Add extra space in the middle
80
        }
81
        kprintf(" |");
82
        // Print ASCII chars
83
        for (size_t j = 0; j < 16; ++j) {
84
            if (i + j < length) {
85
                char c = ptr[i + j];
86
                kprintf("%c", (c >= 32 && c <= 126)? c : '.');
87
            } else {
88
                kprintf(" ");
89
            }
90
        }
91
        kprintf("|\n");
92
    }
93
}
94

95
void cmd_peek(int argc, char **argv) {
96
     if (argc < 2) {
97
        kprintf("Usage: peek <address> [size: b/h/w/d (default: d)]\n");
98
        return;
99
    }
100

101
    char *endptr;
102
    uint64_t addr = simple_strtoull(argv[1], &endptr, 0);
103
     if (*endptr != '\0') {
104
        kprintf("Error: Invalid address format '%s'\n", argv[1]);
105
        return;
106
    }
107

108
    char size_char = 'd'; // Default to double word (64-bit)
109
    size_t size_bytes = 8;
110
    if (argc > 2) {
111
        size_char = argv[2][0];
112
        if (strlen(argv[2]) != 1) {
113
             kprintf("Error: Invalid size format '%s'\n", argv[2]);
114
             return;
115
        }
116
    }
117

118
    switch (size_char) {
119
        case 'b': size_bytes = 1; break; // Byte
120
        case 'h': size_bytes = 2; break; // Half-word (16-bit)
121
        case 'w': size_bytes = 4; break; // Word (32-bit)
122
        case 'd': size_bytes = 8; break; // Double-word (64-bit)
123
        default:
124
            kprintf("Error: Invalid size '%c'. Use b, h, w, or d.\n", size_char);
125
            return;
126
    }
127

128
     // Validate address for the specified size
129
    if (!is_address_valid(addr, size_bytes)) {
130
        kprintf("Error: Address 0x%llx is not within valid RAM for size %zu.\n", addr, size_bytes);
131
        return;
132
    }
133

134
    // Ensure address alignment for larger types
135
    if ((size_bytes > 1) && (addr % size_bytes != 0)) {
136
        kprintf("Warning: Address 0x%llx is not aligned for size %zu.\n", addr, size_bytes);
137
        // Proceeding might cause alignment fault depending on CPU config / EL
138
    }
139

140
    uint64_t value = 0;
141
    volatile void *ptr = (volatile void *)addr;
142

143
    kprintf("Peek at 0x%llx (size %zu): ", addr, size_bytes);
144

145
    switch (size_bytes) {
146
        case 1: value = *(volatile uint8_t*)ptr; kprintf("0x%02llx\n", value); break;
147
        case 2: value = *(volatile uint16_t*)ptr; kprintf("0x%04llx\n", value); break;
148
        case 4: value = *(volatile uint32_t*)ptr; kprintf("0x%08llx\n", value); break;
149
        case 8: value = *(volatile uint64_t*)ptr; kprintf("0x%016llx\n", value); break;
150
    }
151
}
152

153
void cmd_poke(int argc, char **argv) {
154
    if (argc < 3) {
155
        kprintf("Usage: poke <address> <value> [size: b/h/w/d (default: d)]\n");
156
        return;
157
    }
158

159
    char *endptr;
160
    uint64_t addr = simple_strtoull(argv[1], &endptr, 0);
161
    if (*endptr != '\0') {
162
        kprintf("Error: Invalid address format '%s'\n", argv[1]);
163
        return;
164
    }
165

166
    uint64_t value = simple_strtoull(argv[2], &endptr, 0);
167
     if (*endptr != '\0') {
168
        kprintf("Error: Invalid value format '%s'\n", argv[2]);
169
        return;
170
    }
171

172
    char size_char = 'd'; // Default to double word (64-bit)
173
    size_t size_bytes = 8;
174
    if (argc > 3) {
175
        size_char = argv[3][0];
176
        if (strlen(argv[3]) != 1) {
177
             kprintf("Error: Invalid size format '%s'\n", argv[3]);
178
             return;
179
        }
180
    }
181

182
     switch (size_char) {
183
        case 'b': size_bytes = 1; break; // Byte
184
        case 'h': size_bytes = 2; break; // Half-word (16-bit)
185
        case 'w': size_bytes = 4; break; // Word (32-bit)
186
        case 'd': size_bytes = 8; break; // Double-word (64-bit)
187
        default:
188
            kprintf("Error: Invalid size '%c'. Use b, h, w, or d.\n", size_char);
189
            return;
190
    }
191

192
    // Validate address for the specified size
193
    if (!is_address_valid(addr, size_bytes)) {
194
        kprintf("Error: Address 0x%llx is not within valid RAM for size %zu.\n", addr, size_bytes);
195
        return;
196
    }
197

198
    // Ensure address alignment for larger types
199
    if ((size_bytes > 1) && (addr % size_bytes != 0)) {
200
        kprintf("Warning: Address 0x%llx is not aligned for size %zu.\n", addr, size_bytes);
201
        // Proceeding might cause alignment fault depending on CPU config / EL
202
    }
203

204
    volatile void *ptr = (volatile void *)addr;
205

206
    kprintf("Poke at 0x%llx (size %zu) with value 0x%llx\n", addr, size_bytes, value);
207

208
    switch (size_bytes) {
209
        case 1: *(volatile uint8_t*)ptr = (uint8_t)value; break;
210
        case 2: *(volatile uint16_t*)ptr = (uint16_t)value; break;
211
        case 4: *(volatile uint32_t*)ptr = (uint32_t)value; break;
212
        case 8: *(volatile uint64_t*)ptr = (uint64_t)value; break;
213
    }
214
}
215

216
void cmd_alloc(int argc, char **argv) {
217
    if (argc < 2) {
218
        kprintf("Usage: alloc <size>\n");
219
        return;
220
    }
221

222
    char *endptr;
223
    size_t size = (size_t)simple_strtoull(argv[1], &endptr, 0);
224
    if (*endptr != '\0' || size == 0) {
225
        kprintf("Error: Invalid size '%s'\n", argv[1]);
226
        return;
227
    }
228

229
    void *ptr = kmalloc(size);
230
    if (ptr) {
231
        kprintf("Allocated %zu bytes at 0x%p\n", size, ptr);
232
    } else {
233
        kprintf("Allocation failed!\n");
234
    }
235
}
236

237
void cmd_free(int argc, char **argv) {
238
    if (argc < 2) {
239
        kprintf("Usage: free <address>\n");
240
        return;
241
    }
242

243
    char *endptr;
244
    uint64_t addr = simple_strtoull(argv[1], &endptr, 0);
245
    if (*endptr != '\0') {
246
        kprintf("Error: Invalid address format '%s'\n", argv[1]);
247
        return;
248
    }
249

250
    void *ptr = (void *)addr;
251
    kprintf("Freeing memory at 0x%p\n", ptr);
252
    kfree(ptr);
253
}
254

255
void cmd_help(int argc, char **argv) {
256
    kprintf("Available commands:\n");
257
    kprintf("  help          - Display this help message\n");
258
    kprintf("  memdump <addr> [len] - Dump memory contents (default len=256)\n");
259
    kprintf("  peek <addr> [sz] - Read value from memory (sz=b/h/w/d, default=d)\n");
260
    kprintf("  poke <addr> <val> [sz] - Write value to memory (sz=b/h/w/d, default=d)\n");
261
    kprintf("  alloc <size>  - Allocate memory of given size\n");
262
    kprintf("  free <addr>   - Free previously allocated memory\n");
263
    kprintf("  pmm_info      - Display Physical Memory Manager info\n");
264
}
265

266
void cmd_pmm_info(int argc, char **argv) {
267
    kprintf("Physical Memory Manager Info:\n");
268
    kprintf("  Total Usable Memory: %llu KB\n", pmm_get_total_memory() / 1024);
269
    kprintf("  Free Memory:         %llu KB\n", pmm_get_free_memory() / 1024);
270
    kprintf("  Highest Usable Addr: 0x%llx\n", pmm_get_highest_usable_address());
271
}
272

273

274
// --- Shell Main Loop ---
275

276
// Table of command handlers
277
typedef struct {
278
    const char *name;
279
    void (*func)(int argc, char **argv);
280
} command_t;
281

282
// Static command array with initialized function pointers
283
static command_t commands[8];
284

285
// Runtime initialization of the command table to work around section initialization issues
286
static void init_command_table(void) {
287
    static char help_cmd[] = "help";
288
    static char memdump_cmd[] = "memdump";
289
    static char peek_cmd[] = "peek";
290
    static char poke_cmd[] = "poke";
291
    static char alloc_cmd[] = "alloc";
292
    static char free_cmd[] = "free";
293
    static char pmm_info_cmd[] = "pmm_info";
294

295
    commands[0].name = help_cmd;
296
    commands[0].func = cmd_help;
297

298
    commands[1].name = memdump_cmd;
299
    commands[1].func = cmd_memdump;
300

301
    commands[2].name = peek_cmd;
302
    commands[2].func = cmd_peek;
303

304
    commands[3].name = poke_cmd;
305
    commands[3].func = cmd_poke;
306

307
    commands[4].name = alloc_cmd;
308
    commands[4].func = cmd_alloc;
309

310
    commands[5].name = free_cmd;
311
    commands[5].func = cmd_free;
312

313
    commands[6].name = pmm_info_cmd;
314
    commands[6].func = cmd_pmm_info;
315

316
    // Sentinel
317
    commands[7].name = NULL;
318
    commands[7].func = NULL;
319

320
    kprintf("Command table initialized:\n");
321
    for (int i = 0; i < 7; i++) {
322
        kprintf("  [%d] name at %p: '%s', len=%d\n",
323
                i, commands[i].name, commands[i].name,
324
                strlen(commands[i].name));
325
    }
326
}
327

328
void shell_loop() {
329
    char cmd_buffer[MAX_CMD_LEN];
330
    char *argv[MAX_ARGS];
331
    int argc;
332

333
    kprintf("\nMeringueOS Shell\n");
334
    kprintf("Type 'help' for available commands.\n");
335

336
    // Initialize command table at runtime
337
    init_command_table();
338

339
    // Debug command table
340
    kprintf("Command table at %p:\n", commands);
341
    kprintf("Debug: .rodata section address range: %p to %p\n", &_rodata_start, &_rodata_end);
342
    for (int i = 0; commands[i].name != NULL; i++) {
343
        kprintf("  [%d] name at %p: '%s', func at %p\n",
344
                i, commands[i].name, commands[i].name, commands[i].func);
345
    }
346

347
    while (1) {
348
        kprintf("> ");
349
        memset(cmd_buffer, 0, MAX_CMD_LEN);
350
        int i = 0;
351
        char c;
352

353
        // Read command line
354
        while (i < MAX_CMD_LEN - 1) {
355
            c = kgetc_blocking(); // Use a blocking read
356
            if (c == '\r' || c == '\n') {
357
                kprintf("\n"); // Echo newline
358
                break;
359
            } else if (c == '\b' || c == 127) { // Handle backspace
360
                if (i > 0) {
361
                    i--;
362
                    kprintf("\b \b"); // Erase character on screen
363
                }
364
            } else if (c >= 32 && c <= 126) { // Printable ASCII
365
                cmd_buffer[i++] = c;
366
                kprintf("%c", c); // Echo character
367
            }
368
            // Ignore other characters
369
        }
370
        cmd_buffer[i] = '\0'; // Null-terminate
371

372
        if (i == 0) {
373
            continue; // Empty command
374
        }
375

376
        // Parse command and arguments using strtok
377
        argc = 0;
378
        char *token = strtok(cmd_buffer, " ");
379
        while (token != NULL && argc < MAX_ARGS) {
380
            argv[argc++] = token;
381
            token = strtok(NULL, " ");
382
        }
383

384
        if (argc == 0) {
385
            continue; // Only whitespace
386
        }
387

388
        // Find and execute command
389
        bool found = false;
390

391
        // Debug info
392
        kprintf("Command entered: '%s'\n", argv[0]);
393

394
        for (int i = 0; commands[i].name != NULL; i++) {
395
            kprintf("Comparing with command: '%s'\n", commands[i].name);
396
            if (strcmp(argv[0], commands[i].name) == 0) {
397
                kprintf("Match found! Executing...\n");
398
                commands[i].func(argc, argv);
399
                found = true;
400
                break;
401
            }
402
        }
403

404
        if (!found) {
405
            kprintf("Unknown command: %s\n", argv[0]);
406
        }
407
    }
408
}

Command Parsing Algorithm

The shell parses command lines using this algorithm:

Skip leading whitespace
Mark the start of an argument
Find the next whitespace or end of string
Replace the whitespace with a null terminator
Add the argument to the argv array
Repeat until the end of the string

1
Input:  "  memdump   0x40100000  64 "
2
         ↓
3
Step 1:  "memdump   0x40100000  64 "
4
         ↓
5
Step 2:  argv[0] = "memdump   0x40100000  64 "
6
         ↓
7
Step 3:  Find next whitespace after "memdump"
8
         ↓
9
Step 4:  "memdump\0  0x40100000  64 "
10
         ↓
11
Step 5:  argv[0] = "memdump"
12
         ↓
13
         Repeat for remaining arguments...

Design Decision: Command Table Structure

I implement a command table with function pointers for extensibility.

Alternative: A giant switch statement or if-else chain.

Tradeoffs:

Command table: More modular, easier to extend, slightly more complex
Switch statement: Simpler, potentially more efficient, but harder to maintain

I chose the command table approach for its modularity and to demonstrate function pointer usage in C.

Implementing Memdump Command

The memdump command demonstrates several important OS concepts:

1
Command:  memdump 0x40100000 64
2

3
┌───────────────────────────────────────────────────────────────┐
4
│ 0x40100000: 60 02 00 d4 00 00 00 00 e1 03 00 aa e2 03 01 aa   │
5
│ 0x40100010: e3 03 02 aa e4 03 03 aa e5 03 04 aa e6 03 05 aa   │
6
│ 0x40100020: e7 03 06 aa e8 03 07 aa e9 03 08 aa ea 03 09 aa   │
7
│ 0x40100030: eb 03 0a aa ec 03 0b aa ed 03 0c aa ee 03 0d aa   │
8
└───────────────────────────────────────────────────────────────┘

To implement this safely:

Command parsing validates the address range is accessible
Memory is accessed through volatile pointers to prevent optimization
Both hex and ASCII representations are shown for clarity

Debugging Shell Issues

Common shell issues and how to diagnose them:

Command parsing fails:

Add debug output showing tokenization results
Check for buffer overflows during parsing
Verify edge cases (empty input, excessive whitespace)

Commands not found:

Check string comparison logic
Verify command table is properly initialized
Debug with explicit comparisons of each character

Memory-related commands crash:

Implement robust address validation before access
Add range checking to prevent accessing unmapped regions
Use volatility qualifiers for register and device access

9. Standard Library Implementation

A standard library provides essential functions for memory and string operations.

Writing a LibC from Scratch

MeringueOS implements its own C standard library functions from scratch, providing the fundamental building blocks for string and memory operations. The memset and memcpy functions handle raw memory manipulation byte-by-byte - while production systems optimize with word-sized operations, our implementation prioritizes clarity and correctness. String functions like strlen count characters until hitting a null terminator, while strcmp compares strings character-by-character, carefully casting to unsigned to handle high-bit characters correctly. The strtok function tokenizes strings by maintaining static state between calls (making it non-reentrant), using helper functions strspn and strpbrk to skip delimiters and find token boundaries. These implementations form the foundation that the rest of the kernel relies on - from parsing shell commands to managing memory blocks.

String Library

Let’s look at string.c:

1
#include <stddef.h>
2
#include <stdint.h>
3
#include <stdbool.h>
4
#include "lib/string.h"
5

6
void* memset(void *s, int c, size_t n) {
7
    unsigned char *p = (unsigned char *)s;
8
    unsigned char uc = (unsigned char)c;
9
    while (n-- > 0) {
10
        *p++ = uc;
11
    }
12
    return s;
13
}
14

15
void* memcpy(void *dest, const void *src, size_t n) {
16
    unsigned char *d = (unsigned char *)dest;
17
    const unsigned char *s = (const unsigned char *)src;
18
    while (n-- > 0) {
19
        *d++ = *s++;
20
    }
21
    return dest;
22
}
23

24
size_t strlen(const char *s) {
25
    size_t len = 0;
26
    while (*s++) {
27
        len++;
28
    }
29
    return len;
30
}
31

32
int strcmp(const char *s1, const char *s2) {
33
    while (*s1 && (*s1 == *s2)) {
34
        s1++;
35
        s2++;
36
    }
37
    return *(const unsigned char*)s1 - *(const unsigned char*)s2;
38
}
39

40
int strncmp(const char *s1, const char *s2, size_t n) {
41
    while (n > 0 && *s1 && (*s1 == *s2)) {
42
        s1++;
43
        s2++;
44
        n--;
45
    }
46
    if (n == 0) {
47
        return 0;
48
    }
49
    return *(const unsigned char*)s1 - *(const unsigned char*)s2;
50
}
51

52
// Note: strtok is not re-entrant due to the static pointer!
53
static char *strtok_last;
54

55
char* strtok(char *str, const char *delim) {
56
    char *token;
57

58
    if (str == NULL) {
59
        str = strtok_last;
60
    }
61
    if (str == NULL) {
62
        return NULL; // No more tokens
63
    }
64

65
    // Skip leading delimiters
66
    str += strspn(str, delim);
67
    if (*str == '\0') {
68
        strtok_last = NULL;
69
        return NULL;
70
    }
71

72
    // Find the end of the token
73
    token = str;
74
    str = strpbrk(token, delim);
75
    if (str == NULL) {
76
        // This token is the last one
77
        strtok_last = NULL;
78
    } else {
79
        // Terminate the token and save the position for the next call
80
        *str = '\0';
81
        strtok_last = str + 1;
82
    }
83

84
    return token;
85
}
86

87
// Helper for strtok: length of initial segment consisting of accept characters
88
size_t strspn(const char *s, const char *accept) {
89
    size_t count = 0;
90
    const char *a;
91
    bool found;
92

93
    while (*s) {
94
        found = false;
95
        for (a = accept; *a; a++) {
96
            if (*s == *a) {
97
                found = true;
98
                break;
99
            }
100
        }
101
        if (!found) {
102
            break;
103
        }
104
        count++;
105
        s++;
106
    }
107
    return count;
108
}
109

110
// Helper for strtok: find first occurrence of reject character
111
char* strpbrk(const char *s, const char *reject) {
112
    const char *r;
113
    while (*s) {
114
        for (r = reject; *r; r++) {
115
            if (*s == *r) {
116
                return (char *)s;
117
            }
118
        }
119
        s++;
120
    }
121
    return NULL;
122
}

These functions provide basic string operations:

memset: Set memory to a value
memcpy: Copy memory regions
strlen: Get string length
strcmp/strncmp: Compare strings
strtok: Tokenize strings
strspn: Get length of substring with accepted characters
strpbrk: Find character from set in string

Design Decision: String.h Implementation

When implementing string functions, I chose for safety and simplicity.

Alternative: Highly optimized SIMD or word-by-word implementations.

Tradeoffs:

Simple byte-by-byte: Easier to understand, safer, but slower
Optimized word-by-word: Much faster but more complex and architecture-specific

I chose the simpler approach for educational purposes, but production kernels would use highly optimized implementations.

Memory Functions: The Foundation

Memory functions like memcpy and memset are some of the most frequently called functions in an OS:

1
┌───────────────────────────────────────────────────────────────┐
2
│ memcpy(dest, src, n)                                          │
3
│                                                               │
4
│ Source Memory                       Destination Memory        │
5
│ ┌───┬───┬───┬───┬───┬───┐          ┌───┬───┬───┬───┬───┬───┐  │
6
│ │ A │ B │ C │ D │ E │ F │   Copy   │ ? │ ? │ ? │ ? │ ? │ ? │  │
7
│ └───┴───┴───┴───┴───┴───┘   ───▶   └───┴───┴───┴───┴───┴───┘  │
8
│                                                               │
9
│ After memcpy:                                                 │
10
│ ┌───┬───┬───┬───┬───┬───┐          ┌───┬───┬───┬───┬───┬───┐  │
11
│ │ A │ B │ C │ D │ E │ F │          │ A │ B │ C │ D │ E │ F │  │
12
│ └───┴───┴───┴───┴───┴───┘          └───┴───┴───┴───┴───┴───┘  │
13
└───────────────────────────────────────────────────────────────┘

Under the Hood: Optimizing Memory Operations

In production kernels, memory operations use several optimization techniques:

Alignment handling:

Check source and destination alignment
Use aligned word-sized operations when possible
Fall back to byte operations for misaligned portions

SIMD instructions:

Use vector operations (like NEON on ARM) for bulk transfers
Process 16 or 32 bytes per instruction

Cache considerations:

Use prefetch instructions for large transfers
Consider cache line size for optimal performance

Stdlib Stubs

For completeness, we provide stubs for standard library functions in stdlib_stubs.c:

1
#include <lib/stdlib_stubs.h>
2
#include <lib/stdio.h>
3

4
// Simple implementation supporting base 10 and base 16 (0x prefix)
5
unsigned long long simple_strtoull(const char *nptr, char **endptr, int base) {
6
    unsigned long long result = 0;
7
    bool negative = false; // Ignored for unsigned, but good practice
8
    const char *orig_nptr = nptr;
9

10
    // Skip leading whitespace
11
    while (*nptr == ' ' || *nptr == '\t' || *nptr == '\n' ||
12
           *nptr == '\r' || *nptr == '\f' || *nptr == '\v') {
13
        nptr++;
14
    }
15

16
    // Handle optional sign
17
    if (*nptr == '+') {
18
        nptr++;
19
    } else if (*nptr == '-') {
20
        negative = true; // Keep track even if ignored
21
        nptr++;
22
    }
23

24
    // Determine base
25
    if ((base == 0 || base == 16) && *nptr == '0' && (nptr[1] == 'x' || nptr[1] == 'X')) {
26
        base = 16;
27
        nptr += 2;
28
    } else if (base == 0) {
29
        base = 10;
30
    }
31

32
    if (base < 2 || base > 16) { // Simplified: only support up to base 16
33
        if (endptr) *endptr = (char *)orig_nptr;
34
        return 0; // Invalid base
35
    }
36

37
    unsigned long long cutoff = UINT64_MAX / base;
38
    unsigned long long cutlim = UINT64_MAX % base;
39

40
    while (true) {
41
        int digit;
42
        char c = *nptr;
43

44
        if (c >= '0' && c <= '9') {
45
            digit = c - '0';
46
        } else if (c >= 'a' && c <= 'z') {
47
            digit = c - 'a' + 10;
48
        } else if (c >= 'A' && c <= 'Z') {
49
            digit = c - 'A' + 10;
50
        } else {
51
            break; // Invalid character
52
        }
53

54
        if (digit >= base) {
55
            break; // Invalid digit for base
56
        }
57

58
        // Basic overflow check
59
        if (result > cutoff || (result == cutoff && (unsigned long long)digit > cutlim)) {
60
            result = UINT64_MAX; // Indicate overflow
61
            // Consume remaining valid digits for endptr
62
            while (true) {
63
                 c = *(++nptr);
64
                 if (c >= '0' && c <= '9') digit = c - '0';
65
                 else if (c >= 'a' && c <= 'z') digit = c - 'a' + 10;
66
                 else if (c >= 'A' && c <= 'Z') digit = c - 'A' + 10;
67
                 else break;
68
                 if (digit >= base) break;
69
            }
70
            break; // Exit main loop after overflow
71
        }
72

73
        result = result * base + digit;
74
        nptr++;
75
    }
76

77
    if (endptr) {
78
        *endptr = (char *)nptr;
79
    }
80

81
    // Apply sign if needed (though result is unsigned)
82
    // if (negative && result != UINT64_MAX) {
83
    //     // Standard strtoull doesn't negate, but strtoll would
84
    // }
85

86
    return result;
87
}
88

89
// Simple wrapper for unsigned long
90
unsigned long simple_strtoul(const char *nptr, char **endptr, int base) {
91
    unsigned long long val = simple_strtoull(nptr, endptr, base);
92
    // Check for overflow against ULONG_MAX if needed, though simple_strtoull already checks against UINT64_MAX
93
    if (val > ULONG_MAX) {
94
        return ULONG_MAX;
95
    }
96
    return (unsigned long)val;
97
}
98

99
// Basic abort function - enters infinite loop
100
void abort(void) {
101
    kprintf("ABORT CALLED!\n");
102
    // Disable interrupts?
103
    while(1) {}
104
}

These stubs provide:

simple_strtoull: Convert string to unsigned long long
simple_strtoul: Convert string to unsigned long
abort: Terminate the program with an error

Troubleshooting Standard Library Issues

Common standard library bugs and how to diagnose them:

String function crashes:

Add NULL pointer checks in all functions
Verify bounds checking for fixed-size buffers
Check for proper null termination

Memory corruption with memcpy:

Verify source and destination don’t overlap, or use memmove
Check that size parameter is correct
Verify alignment requirements are met

Integer conversion issues:

Add explicit overflow checking in conversion functions
Use appropriate integer types for the expected range
Implement bounds checking for all conversions

10. Advanced Topics

Now that we have the core OS working, let’s look at some advanced topics.

Text-Based UI

MeringueOS includes a simple text-based UI layer in tui.c:

1
#include <stddef.h>
2
#include <stdint.h>
3
#include <stdbool.h>
4
#include "ui/tui.h"
5
#include "lib/stdio.h"
6

7
// This is a placeholder implementation for the TUI
8
// In the full version, this would integrate with a framebuffer
9

10
int tui_init(void) {
11
    kprintf("TUI: Initializing...\n");
12
    kprintf("TUI: Using fallback console mode (no framebuffer)\n");
13

14
    // Return success for now - we'll just use UART output
15
    return 0;
16
}
17

18
void tui_write(const char *data, size_t len) {
19
    // For now, just forward to UART output via kprintf
20
    for (size_t i = 0; i < len; i++) {
21
        kprintf("%c", data[i]);
22
    }
23
}

The TUI module:

Wraps UART functions for text output
Could be extended to use a framebuffer in the future
Provides a clean abstraction for console output

ANSI Terminal Control

While our current UI is UART-based, it can be extended with ANSI escape sequences:

1
┌───────────────────────────────────────────────────────────────┐
2
│ ANSI Terminal Control                                         │
3
│                                                               │
4
│ Code          Effect                                          │
5
├───────────────┬───────────────────────────────────────────────┤
6
│ \033[2J       │ Clear the entire screen                       │
7
│ \033[H        │ Move cursor to top-left corner                │
8
│ \033[<n>A     │ Move cursor up n lines                        │
9
│ \033[<n>B     │ Move cursor down n lines                      │
10
│ \033[<n>C     │ Move cursor forward n characters              │
11
│ \033[<n>D     │ Move cursor backward n characters             │
12
│ \033[<n>;<m>H │ Move cursor to position (n,m)                 │
13
│ \033[0m       │ Reset all attributes                          │
14
│ \033[1m       │ Bold/bright                                   │
15
│ \033[31m      │ Red text                                      │
16
│ \033[42m      │ Green background                              │
17
└───────────────┴───────────────────────────────────────────────┘

Extending the System

MeringueOS can be extended in several ways:

Process Management

To implement processes, you’d need:

Process Control Block (PCB): Data structure tracking process state

1
   typedef struct {
2
       uint64_t pid;              // Process ID
3
       process_state_t state;     // Running, Ready, Blocked, etc.
4
       saved_registers_t context; // Saved CPU context
5
       void *stack_pointer;       // Process stack
6
       void *page_table;          // Virtual memory mappings
7
       struct pcb *next;          // For scheduler list
8
   } pcb_t;

Context Switching: Mechanism to save/restore process state

1
   ┌─────────────┐     ┌────────────────┐     ┌─────────────┐
2
   │ Process A   │     │ Scheduler      │     │ Process B   │
3
   │ Running     │────▶│ 1. Save A state│────▶│ Restore B   │
4
   │             │     │ 2. Pick next   │     │ state       │
5
   └─────────────┘     └────────────────┘     └─────────────┘

Scheduler: Algorithm to determine which process runs next

Round-robin: Give each process equal time slices
Priority-based: Higher priority processes run first
Multi-level feedback queue: Adaptive scheduling based on behavior

Virtual Memory System

A virtual memory system would require:

Page Tables: Data structures mapping virtual to physical addresses

1
   ┌───────────────────┐     ┌───────────────────────────────┐
2
   │ Virtual Address   │────▶│ Page Table                    │
3
   │ 0xFFFF000000000000│     │ VA: 0xFFFF000000000000        │
4
   └───────────────────┘     │ PA: 0x40200000                │
5
                             │ Permissions: Read/Write/Exec  │
6
                             └───────────────────────────────┘

MMU Configuration: Setting up translation control registers

1
   // Enable MMU with the page tables we've created
2
   void enable_mmu(uint64_t ttbr0_base) {
3
       // Set Translation Table Base Register
4
       asm volatile("msr ttbr0_el1, %0" : : "r" (ttbr0_base));
5

6
       // Configure MMU control registers
7
       uint64_t sctlr;
8
       asm volatile("mrs %0, sctlr_el1" : "=r" (sctlr));
9
       sctlr |= (1 << 0);  // Enable MMU
10
       asm volatile("msr sctlr_el1, %0" : : "r" (sctlr));
11
   }

TLB Management: Invalidating translations after page table changes

1
   void invalidate_tlb_entry(uint64_t va) {
2
       asm volatile("tlbi vaae1, %0" : : "r" (va));
3
       asm volatile("dsb sy");
4
       asm volatile("isb");
5
   }

11. Conclusion

In this guide, we’ve built MeringueOS from scratch, implementing:

A boot process for AArch64
Memory management with physical frames and a kernel heap
Exception handling with a complete vector table
UART-based I/O
A command-line shell
Standard library functions
A text-based UI

You now have a working operating system for the ARM architecture and an understanding of how its components interact. This is just the beginning - there are many ways to extend and improve MeringueOS.

Building an OS from scratch teaches valuable lessons about:

Hardware-software interactions
Memory management principles
System architecture and design
Low-level optimization
Debugging complex systems

The most important takeaway is the depth of understanding you gain by implementing these fundamental components yourself.

Happy system programming!

A Step-by-Step Guide to Creating Your Own ARM Operating System

Share this post