STM32 Series #6: The Complete Pointer Accident Handbook — How and Why Things Break

In Episode 5 we fully internalized that a pointer is a typed address. Declaration *, dereference *, casts, the -> operator — we mastered the correct way to use them all.

This time we flip to the dark side. What actually happens when you get a pointer wrong?

HardFault, undefined behavior (UB), silent memory corruption — “it was working fine and suddenly broke,” “the debugger stopped at a strange place,” “only the release build reproduces it” — the root cause lurking behind these all-too-familiar embedded symptoms is almost always a pointer accident.

Knowing how things break is the most powerful way to learn. This time we deliberately step on the land mines and watch what happens under the debugger.

📖 Previous Article

#5: Pointers = Addresses with Types — Turning Pointers into a Weapon

📍 Series Index

Full 13-Part Series: The Embedded World Beyond Pointers

✅ What You'll Be Able to Do After This Article

Explain the mechanism by which NULL dereference causes a HardFault
Understand why a dangling pointer "breaks silently"
Explain with a stack diagram why returning the address of a local variable is dangerous
Understand why out-of-bounds array access doesn't produce a compile error
Grasp the concept of undefined behavior (UB) in C and its relationship to optimization
Read the fault location and Fault Status registers in the debugger when a HardFault occurs

Table of Contents

Experiment Environment and Ground Rules

This time we’re writing code that deliberately breaks things. Some accident patterns only reproduce in a release build (-O2) and not in a debug build (-O0).

Switching Build Configurations in STM32CubeIDE

Having Debug and Release build configurations is standard practice in Eclipse-based IDEs (STM32CubeIDE is one of them). Visual Studio, Keil MDK, and other IDEs have the same concept. STM32CubeIDE provides both configurations by default.

🐛 Debug Build (-O0)

Used during normal development and debugging

Click the hammer icon ▼ in the toolbar
Select Debug
Build Project (Ctrl+B)

📍 Check/change optimization level:
Project → Properties → C/C++ Build → Settings →
MCU GCC Compiler → Optimization → -O0

🚀 Release Build (-O2/-Os)

Used for shipping products and reproducing optimization bugs

Click the hammer icon ▼ in the toolbar
Select Release
Build Project (Ctrl+B)

📍 Check/change optimization level:
Same path → Optimization → -O2 or -Os

Switching the configuration for debug execution: After building, click the green bug icon ▼ and select the .elf file for the configuration you want to use.

Debug/project.elf    ← Debug build output (use this with the debugger)
Release/project.elf  ← Release build output

⚠️ Note: In a Release build, optimization often places variables in registers, so the debugger's variable watch may not display values correctly. You can use it to confirm the broken behavior, but stepping through code won't be reliable.

What Changes Between Optimization Levels

“Optimization” means the compiler rewrites your code to achieve the same behavior faster and smaller. The higher the level, the more the generated machine code diverges from what you wrote in C.

Overview of Each Level

Flag	CubeIDE label	Main effect
`-O0`	Optimize for debug	No optimization. C code is translated almost literally to machine code
`-O1`	Optimize	Removes side-effect-free computations, simple inlining
`-O2`	Optimize more	Loop unrolling, function inlining, variable register promotion
`-O3`	Optimize most	More aggressive inlining and vectorization (rare in embedded)
`-Os`	Optimize for size	Code size priority (disables some `-O2` optimizations)

What Changes Between `-O0` and `-O2`

Let’s look at a concrete example:

int loop_sum(void) {
    int sum = 0;
    for (int i = 0; i < 4; i++) {
        sum += i;
    }
    return sum;
}

What -O0 generates (conceptual):

// Faithfully translates the for loop to machine code
// i stored in RAM → read/write RAM every iteration → compare → branch
mov  r3, #0        // sum = 0
mov  r2, #0        // i = 0
loop:
  add  r3, r3, r2  // sum += i
  add  r2, r2, #1  // i++
  cmp  r2, #4      // i < 4 ?
  blt  loop        // branch
mov  r0, r3        // return sum

What -O2 generates (conceptual):

// The compiler pre-computed "sum is always 6"
// The entire loop disappears
mov  r0, #6        // return 6 (computed at compile time)

The loop shrank to one instruction. That’s the power of optimization.

How Optimization Changes the Way Bugs Appear

This is the most important point in this article. Let’s understand the two cases that produce “works in debug build, breaks in release build.”

Case 1: The compiler “relocates” a variable from memory to a register

The setup: a timer interrupt sets flag to 1, and the main loop waits for it before proceeding.

// ── Global variable ──────────────────────────────
uint32_t flag = 0;   // shared between interrupt and main loop

// ── Interrupt handler (called automatically by the timer) ──
void TIM2_IRQHandler(void) {
    flag = 1;   // ← writes 1 to flag in RAM
}

// ── Main loop ───────────────────────────────
int main(void) {
    // ... initialization ...

    while (flag == 0) {
        // wait here until flag becomes 1
    }

    // continue once flag == 1
    do_something();
}

How this differs between -O0 and -O2, from the CPU’s perspective:

The CPU has ultra-fast working scratch pads called registers (r0–r12, etc.) — more than 100× faster than RAM, but only a handful of them. The compiler decides which values to keep in registers.

[Inside CPU]  super fast       [Outside CPU]  slow, large
+---------------------+        +---------------------------+
|  Registers          |        |  RAM                      |
|    r0 = 0           |<------>|  0x20000000: flag = 0     |
|    r1 = ...         |        |  0x20000004: ...          |
+---------------------+        +---------------------------+
  CPU's working scratch pad        where data actually lives
  (few, ultra-fast)                (large, slower than CPU)

With -O0 (no optimization):

Each iteration of the while loop:
  1. Read flag from RAM  →  r0 = RAM[flag]   (check if 0 or 1)
  2. r0 == 0? → Yes → continue loop
  3. Interrupt fires, TIM2_IRQHandler writes 1 to flag in RAM
  4. Next iteration reads RAM again  →  r0 = 1
  5. r0 == 0? → No → exits loop ✅

With -O2 (optimized):

The compiler analyzes the while (flag == 0) loop body and concludes: “there is no code inside this loop that modifies flag.” It optimizes as follows:

Once at the start:
  r0 = RAM[flag]   (= 0)

Every iteration after that:
  r0 == 0? → Yes → continue loop   ← never reads RAM again, only r0
  r0 == 0? → Yes → continue loop
  r0 == 0? → Yes → continue loop
  ...（infinite loop）

Even when the interrupt fires and sets flag = 1 in RAM, the main loop is no longer looking at RAM. r0 still holds the 0 it read at the start. Result: infinite loop.

With -O0, RAM is re-read every iteration so it works correctly. With -O2, RAM is never read again so it breaks. This is the true cause of “works in debug build.”

The fix: Add volatile to flag.

volatile uint32_t flag = 0;   // ← just add volatile

volatile is the instruction: “This variable may change due to external factors (like interrupts) that the compiler doesn’t know about. Always read from memory every time.” This forces -O2 to re-read RAM every iteration.

Case 2: The compiler deletes an entire branch it judges “will never be reached”

int32_t x = INT32_MAX;   // x = 2,147,483,647 (maximum value)
x = x + 1;               // ① signed integer overflow → UB

if (x < 0) {             // ② will x be negative?
    error_handler();
}

The compiler follows the C language specification and “reasons” as follows:

C spec: "Signed integers do not overflow (by definition)"
  ↓
Compiler: "① x + 1 does not overflow (because the spec says UB)"
  ↓
Compiler: "If INT32_MAX + 1 doesn't overflow, the result must be positive"
  ↓
Compiler: "② x < 0 is always false → this if-branch is unnecessary"
  ↓
Compiler: deletes the entire call to error_handler()

With -O0, code is executed as written, so error_handler() is called. But with -O2, the compiler concludes it’s “logically unnecessary” and removes that code entirely.

The same C code behaves in completely opposite ways depending on the optimization level.

💡 The common message in both cases:
The compiler optimizes under the assumption that C code is written correctly.
Using a variable that can change from outside without volatile, or writing code that contains UB — these betray that assumption. The result is the most painful class of bug: works in debug build, breaks in release build.

Each experiment is labeled with these symbols:

Symbol	Meaning
💥 Immediate crash	Running this almost certainly causes a HardFault or runaway
🕵️ Breaks silently	Appears to work but data is being corrupted internally
🎲 UB (undefined behavior)	Result varies by compiler; the most treacherous pattern

All experiment code is written inside main() in main.c, or as standalone functions. Work through them while observing with the STM32CubeIDE debugger (ST-Link).

💥 Accident 1: NULL Dereference

What Happens

When you try to dereference a NULL pointer (a pointer holding 0x00000000), the STM32 detects an abnormal condition at the hardware level and a HardFault exception is raised.

💬 What is a HardFault exception?
The Cortex-M4 CPU’s emergency handler, called automatically when it detects an “unauthorized memory access” or “invalid instruction.” It’s similar to a “Segmentation fault” on a PC OS. On a microcontroller without an OS, the default behavior is to enter the HardFault_Handler infinite loop, halting the program completely.

// ❌ Pattern 1: Write to NULL pointer (may not fault in some environments)
uint32_t* ptr = NULL;
*ptr = 42;

// 💥 Pattern 2: Call a NULL function pointer (guaranteed HardFault)
void (*fn)(void) = NULL;
fn();   // Jump to address 0 → invalid instruction → certain HardFault

Why It Causes a HardFault

In the STM32 (Cortex-M4) memory map, the area near address 0x00000000 is Flash (or unused). A write to address 0 is rejected by the CPU’s MPU (Memory Protection Unit) or as a bus fault.

💬 What is the MPU (Memory Protection Unit)?
A hardware feature that lets you configure rules like “writes to this address range are forbidden.” The STM32F401 has an MPU, but it’s disabled by default. Even with the MPU disabled, writes to address 0 are detected as a bus error (BusFault) and escalated to a HardFault.

Cortex-M4 fault sequence:

ptr = 0x00000000
   ↓
*ptr = 42  // write instruction to address 0
   ↓
Bus or MPU detects access violation
   ↓
Jump to HardFault exception handler
   ↓
Default handler: infinite loop (while(1))

Observing in the Debugger

Open main.c and add the following code between /* USER CODE BEGIN 2 */ and /* USER CODE END 2 */. This area is a “user code protected zone” that won’t be erased when CubeMX regenerates code.

int main(void)
{
  HAL_Init();
  SystemClock_Config();
  MX_GPIO_Init();

  /* USER CODE BEGIN 2 */

  /* Experiment 1: NULL dereference (guaranteed HardFault method) */
  void (*fn)(void) = NULL;   // create a NULL function pointer
  fn();                      // ← put a breakpoint here and press F6 to single-step

  /* USER CODE END 2 */

  while (1)
  {
  }
}

💬 Why use a function pointer?
Writing *(uint32_t*)NULL = value; on the STM32F401 may silently fail without a HardFault, because address 0 is mapped as an alias of Flash and writes may be ignored. Calling a NULL function pointer (jumping to address 0) causes the CPU to try executing an invalid instruction, producing a guaranteed HardFault.

Debug procedure:

Set a breakpoint on the fn(); line (double-click the line number → blue dot ●)
Start debug execution with F11 or the bug icon
When execution stops at the breakpoint, press F6 (step over) once
The debugger automatically jumps to and stops at HardFault_Handler

STM32CubeIDE’s debugger stops at HardFault_Handler.

⚠️ 画像が見つかりません: /posts/stm32-episode06/debug_hardfault_stop.jpg

Confirming the HardFault

💬 Reading the cause from SCB registers:
In the debugger, go to the Expressions tab → Add new expression and enter these addresses. While stopped in HardFault_Handler, values are highlighted in red.
*(uint32_t*)0xE000ED28
*(uint32_t*)0xE000ED2C
When calling a NULL function pointer (fn()), the actual values displayed are:

Register Address Measured (decimal) Measured (hex) Meaning

CFSR 0xE000ED28 131072 0x00020000 INVSTATE: tried to execute a non-Thumb instruction

HFSR 0xE000ED2C 1073741824 0x40000000 FORCED: a lower-priority fault was escalated to HardFault

⚠️ 画像が見つかりません: /posts/stm32-episode06/debug_expressions_hardfault.jpg

Register	Address	Measured (decimal)	Measured (hex)	Meaning
CFSR	`0xE000ED28`	131072	`0x00020000`	INVSTATE: tried to execute a non-Thumb instruction
HFSR	`0xE000ED2C`	1073741824	`0x40000000`	FORCED: a lower-priority fault was escalated to HardFault

💡 HardFault is an “escalated” error:
HardFault doesn’t occur in isolation — in most cases it’s the result of a BusFault / MemManage Fault / UsageFault being escalated. The FORCED bit in HFSR (0x40000000) means exactly that: “a lower-priority fault occurred and was escalated to HardFault.” In this case, INVSTATE is a type of UsageFault, escalated to HardFault.

Why does INVSTATE occur?

ARM processors have two instruction sets (types of machine code the CPU can interpret):

Mode Instruction width Characteristics

ARM mode 32-bit fixed Legacy instruction set. High performance but larger code size

Thumb mode 16/32-bit mixed Compressed form of ARM mode. Memory-efficient and effective

Cortex-M series (what STM32 uses) only supports Thumb mode. ARM mode has been removed.

CPU series Primary use ARM mode Thumb mode

Cortex-M (STM32, nRF5x, etc.) Microcontrollers, embedded ❌ None ✅ Only

Cortex-A (Raspberry Pi, smartphones, etc.) Linux, high-performance apps ✅ Yes ✅ Yes

Cortex-R (automotive, storage control, etc.) Real-time, high-reliability ✅ Yes ✅ Yes

AVR (Arduino Uno’s ATmega) Microcontrollers — — (proprietary ISA)

Cortex-M deliberately omits ARM mode to minimize code size and power consumption. The Thumb-2 extension it uses instead supports 32-bit instructions, so there’s virtually no performance penalty.

How does the CPU know to execute in Thumb mode? The least significant bit (LSB) of function pointer addresses.
Address 0x08000001 → LSB = 1 → Execute in Thumb mode (normal)
Address 0x08000000 → LSB = 0 → Execute in ARM mode (not supported on Cortex-M!)
Normally, the compiler automatically sets function pointer LSBs to 1. But NULL (= 0x00000000) has LSB = 0, which means “execute in ARM mode.” Since Cortex-M has no ARM mode, an INVSTATE (Invalid State) fault occurs and is escalated to HardFault.

Mode	Instruction width	Characteristics
ARM mode	32-bit fixed	Legacy instruction set. High performance but larger code size
Thumb mode	16/32-bit mixed	Compressed form of ARM mode. Memory-efficient and effective

CPU series	Primary use	ARM mode	Thumb mode
Cortex-M (STM32, nRF5x, etc.)	Microcontrollers, embedded	❌ None	✅ Only
Cortex-A (Raspberry Pi, smartphones, etc.)	Linux, high-performance apps	✅ Yes	✅ Yes
Cortex-R (automotive, storage control, etc.)	Real-time, high-reliability	✅ Yes	✅ Yes
AVR (Arduino Uno’s ATmega)	Microcontrollers	—	— (proprietary ISA)

⚠️ In real embedded development

Code with omitted NULL checks is running with a time bomb. It's "fine while it works," but the moment it calls an uninitialized function pointer or unregistered callback, it explodes. HAL library callbacks are defined as __weak empty functions specifically to prevent this accident.

Prevention

// ✅ Always NULL-check before using a pointer
if (ptr != NULL) {
    *ptr = 42;
}

// ✅ Explicitly use NULL during initialization to represent "not set"
typedef void (*callback_t)(void);
callback_t on_complete = NULL;   // intentional NULL

// Don't call before registering
if (on_complete != NULL) {
    on_complete();
}

What is NULL?

NULL is a sentinel value meaning “a pointer that points nowhere.” Its actual value is just 0.

// Using NULL
uint32_t* ptr = NULL;

// This means exactly the same thing
uint32_t* ptr = 0;

Both express “ptr doesn’t point to anything.”

Common usage: Initialize a pointer to “not yet usable” state and NULL-check before use.

uint32_t* ptr = NULL;   // ← explicitly "not pointing to anything yet"

// ... later, assign an address to ptr ...

if (ptr != NULL) {
    *ptr = 42;   // ← use only after NULL check
}

This habit alone prevents the majority of NULL dereference accidents.

🕵️ Accident 2: Dangling Pointer

What is a “Dangling Pointer”?

A dangling pointer is “a pointer that was once valid but now points to an invalid location.”

💡 In embedded development, the “returning a local variable’s address” pattern (Accident 3 below) is far more commonly encountered than malloc. It’s a type of dangling pointer.

Most typical example (in PC-style code):

uint32_t* ptr = (uint32_t*)malloc(4);   // allocate memory
*ptr = 100;
free(ptr);                               // release memory

// ↓ ptr still holds the same address (dangling!)
*ptr = 200;   // 🕵️ write to already-freed memory

Why It Breaks Silently

After free(), ptr’s value (the address) doesn’t change. The write succeeds somewhere in memory. The result:

May not break immediately (the freed region hasn’t been reallocated yet)
Corrupts another variable or the malloc management structure
Surfaces much later as a mysterious bug

This is why it “breaks silently.”

Reality in STM32 Development

Embedded code rarely uses malloc/free, but the same pattern appears in a different form:

💬 Why avoid malloc/free in bare-metal embedded?
① Memory fragmentation: repeated allocation and deallocation leaves free memory in small, unusable pieces, eventually making it impossible to allocate larger blocks.
② Fixed heap size: a microcontroller’s RAM is only tens to hundreds of KB, and if the heap runs out, malloc returns NULL.
③ Real-time disruption: malloc’s execution time is non-deterministic and can delay interrupt response.
In embedded, the basic design principle is “allocate all needed memory at compile time; don’t dynamically grow or shrink.”

// ❌ A global pointer pointing to a local variable on the stack
static uint32_t* g_sensor_ptr;

void init_sensor(void) {
    uint32_t local_val = 0;         // ← placed on the stack
    g_sensor_ptr = &local_val;      // ← save address in global
}   // ← local_val's lifetime ends here (stack frame disappears)

void read_sensor(void) {
    uint32_t val = *g_sensor_ptr;   // 🕵️ reads from an already-invalid address
}

This is closely related to the next accident: stack lifetime.

🕵️ Accident 3: Stack Lifetime (Returning the Address of a Local Variable)

One of the Most Commonly Stepped-On Land Mines

How to try it: Write two functions in main.c and call them from inside /* USER CODE BEGIN 2 */.

/* ---- Write above main() (inside USER CODE BEGIN 0) ---- */

// ❌ Function that returns the address of a local variable (intentionally broken)
uint32_t* get_value(void) {
    uint32_t result = 42;
    return &result;   // ⚠️ result disappears from the stack when this function ends
}


/* ---- Write inside main() (USER CODE BEGIN 2) ---- */

uint32_t* ptr = get_value();   // ptr points to an invalid address
uint32_t val = *ptr;           // 🕵️ reads "garbage" from the stack (42 is not guaranteed)

Building this, GCC issues a warning:

warning: function returns address of local variable [-Wreturn-local-addr]

Debugger observation steps:

Set a breakpoint on uint32_t* ptr = get_value();
Step (F6) to stop immediately after calling get_value()
Note the value (address) of ptr in the Variables view
Step (F6) once more to execute val = *ptr
Confirm that val is not 42 (garbage is read)

Visualizing the Stack

💬 What is a stack frame?
Every time a function is called, a dedicated memory region (for local variables, return address, etc.) is pushed onto the top of the stack. This region is the “stack frame.” When the function returns, its frame is marked as “no longer needed” and removed from the stack (the SP just moves back; the data isn’t actually erased), ready to be overwritten by the next function call.

[While get_value() is executing]

Stack (RAM)                           Notes
+----------------------+
|  result = 42         |  <-- get_value's frame (SP points here)
|  return address      |      SP = stack pointer
+----------------------+
|  main_loop frame     |
+----------------------+

[After get_value() has returned]

Stack (RAM)                           Notes
+----------------------+
|  42 (remnant)        |  <-- frame is "freed" but value still remains
|  ...                 |      will be overwritten by next function call
+----------------------+  <-- SP returned here
|  main_loop frame     |
+----------------------+

ptr still points to the old result address → this is the dangling pointer!

After the function returns, its stack frame is “freed.” The next time any function is called, that same region gets overwritten.

⚠️ 画像が見つかりません: /posts/stm32-episode06/debug_stack_lifetime.png

Using GCC Warnings

GCC can detect this pattern:

warning: function returns address of local variable [-Wreturn-local-addr]

Enabling -Wall -Wextra in STM32CubeIDE’s build options will surface warnings like this at compile time. The habit of fixing all warnings prevents these accidents proactively.

Correct Patterns

// ✅ Pattern 1: Use a static variable (lifetime is the entire program)
uint32_t* get_value_safe(void) {
    static uint32_t result = 42;   // static → placed in static storage (Flash/RAM)
    return &result;                // ← lifetime is the entire program, so this is safe
}

// ✅ Pattern 2: Write to the caller's buffer (the most embedded-style approach)
void get_value_ptr(uint32_t* out) {
    *out = 42;   // write to the caller's stack
}

// Usage
uint32_t val;
get_value_ptr(&val);

⚠️ Side effect of static: loss of reentrancy

A static variable exists exactly once for the whole program. In a design where the same function is also called from an interrupt handler, two contexts may access result simultaneously, causing a race condition:

// If a TIM interrupt fires while get_value_safe() is running in the main loop,
// and the interrupt handler also calls get_value_safe()...
// → two contexts writing to static result simultaneously → value corruption

Conclusion: For functions that might be called from an interrupt, Pattern 2 (pass via argument) is the embedded iron rule. Only use static where you're certain the function will never be called from an interrupt. The reentrancy problem is covered in greater depth in Episode 9 on interrupt anti-patterns.

💥 Accident 4: Out-of-Bounds Array Access

C Has No Bounds Checking

uint32_t buf[4] = {10, 20, 30, 40};

// Indices 0–3 are valid
buf[4] = 99;    // 💥 out of bounds (buf[4] doesn't exist)
buf[-1] = 99;   // 💥 negative index

💬 buf[4] and *(buf+4) are the same:
In C, buf[i] is shorthand for *(buf + i). buf[4] means “read/write the location 4 elements (= 16 bytes) forward from buf’s base address” — the compiler does not check whether that range is valid.

Python and Java raise IndexError. C omits runtime bounds checking so it can run on microcontrollers without an OS. That’s both C’s strength and its danger.

💬 What is bounds checking?
An array has a “valid index range.” For uint32_t buf[4], valid indices are 0–3; 4 and -1 are invalid. Checking whether access stays within these boundaries is bounds checking. Python and Java do this automatically on every access and raise an error if violated. C does none of this — the programmer is responsible.

Destroying Variables on the Stack

💬 “On the stack”?
Local variables declared inside a function are automatically placed on the stack (a region of RAM). When variables are “on the stack,” they are laid out adjacent to each other in memory, so writing beyond an array’s boundary overwrites neighboring variables.
void example(void) {
    uint32_t buf[4];    // placed on the stack (local variable)
    uint32_t canary;    // placed immediately next to buf
}

When an array is on the stack, out-of-bounds writes corrupt neighboring stack variables or the return address:

💬 What is the return address?
When a function is called, the CPU saves “where to return in the caller” on the stack. This is the return address. When return is executed, the CPU jumps to that address. If it’s overwritten by an out-of-bounds access, return jumps to a completely unrelated address and the program goes haywire. (PC stack overflow attacks exploit exactly this.)

How to try it: Write a function in /* USER CODE BEGIN 0 */ and call it from /* USER CODE BEGIN 2 */.

/* ---- Write in USER CODE BEGIN 0 ---- */

void bad_function(void) {
    uint32_t buf[4];       // stack-placed array (valid indices 0–3)
    uint32_t canary = 0xCAFEBABE;  // sentinel variable placed right next to buf

    for (int i = 0; i <= 4; i++) {  // ← <= 4 is wrong! (< 4 is correct)
        buf[i] = 0xDEAD;            // when i=4, overwrites canary
    }

    // Check canary value → still 0xCAFEBABE? Or changed to 0xDEAD?
    volatile uint32_t check = canary;
    (void)check;
}


/* ---- Write in USER CODE BEGIN 2 ---- */

bad_function();

Observing in the Debugger

Set a breakpoint inside bad_function() and observe changes in the Variables view and Memory view.

Key thing to watch: the value of canary

After the loop runs, check canary in the Variables view. If it’s still 0xCAFEBABE, the write stayed within buf’s bounds. If it changed to 0 or another value, the buf[4] write overwrote canary.

Actual observed result:

i      = 4      ← loop ran through i=4 (should have stopped at i<4)
buf[0] = 57005  (= 0xDEAD)
buf[1] = 57005  (= 0xDEAD)
buf[2] = 57005  (= 0xDEAD)
buf[3] = 57005  (= 0xDEAD)
canary = 57005  (= 0xDEAD)  ← overwritten from 0xCAFEBABE! (evidence of the accident)

canary changed to the same 0xDEAD as buf. Proof that the write to buf[4] overwrote the adjacent canary. The debugger highlights changed variables in pink.

⚠️ 画像が見つかりません: /posts/stm32-episode06/debug_variables_overflow.png

What happened:

The loop went one element past the array boundary and silently overwrote the adjacent canary.
No compile error, no warning.

buf has 4 elements, so valid addresses are buf[0]–buf[3]. Each element is 4 bytes, so memory looks like this:

Address       Variable    Value after loop
0x20017fd8    buf[0]      0xDEAD   (written at i=0)
0x20017fdc    buf[1]      0xDEAD   (written at i=1)
0x20017fe0    buf[2]      0xDEAD   (written at i=2)
0x20017fe4    buf[3]      0xDEAD   (written at i=3)
0x20017fe8    canary      0xDEAD   (written at i=4) ← this is the problem!
              ↑
              same address as buf[4]

buf[4] means “the address 4×4=16 bytes past buf’s start.” C writes there regardless of whether that’s where canary lives. As a result, canary was overwritten from 0xCAFEBABE to 0xDEAD.

In real code, important variables sit where canary is

In this experiment, only the dummy canary was corrupted. But in real code, that location might hold:

Sensor readings
Communication buffers
Motor output values
The return address (the address to return to)

These get overwritten to 0xDEAD and no compile error or warning is produced. You end up chasing bugs like “why is the motor running away?” or “HardFault on return” — notoriously difficult to trace.

The Memory view shows 0xDEAD (little-endian: AD DE 00 00) filling the region past buf’s base address (0x20017fd8).

⚠️ 画像が見つかりません: /posts/stm32-episode06/debug_memory_view.png

Prevention: Manage Array Size as a Constant

#define BUF_SIZE 4

uint32_t buf[BUF_SIZE];

for (int i = 0; i < BUF_SIZE; i++) {   // ← < BUF_SIZE (NOT <=!)
    buf[i] = 0;
}

🎲 Accident 5: Undefined Behavior (UB)

What is UB?

The C language specification has a concept called “Undefined Behavior (UB)”.

In one sentence: “an operation where the C specification explicitly states ‘we don’t know what will happen if you execute this.’”

Why does this exist?
C was designed to run on any CPU. Different CPUs have different capabilities and limitations, so the spec marks certain operations as “undefined” — leaving it up to each compiler to make the optimal decision for its target CPU.

The critical point: UB does not cause an error.

Ordinary bug: compile error or runtime crash → immediately noticeable
UB           : may not error at all → ships undetected

Code containing UB produces inconsistent results depending on compiler and optimization level:

Situation	Result
Works as expected by chance at `-O0`	→ “It worked!” — false confidence
Crashes at `-O2`	→ “Why does only the release build break?”
Related code deleted entirely at `-O2`	→ Most dangerous. No error, the operation just doesn’t happen

The last case — “code silently disappears” — is the most dangerous UB in embedded. The compiler analyzes code under the assumption “UB won’t happen,” so it may conclude “code after a UB point will never execute” and delete it.

Representative UB Patterns

UB-1: Using an Uninitialized Pointer

uint32_t* ptr;          // ← uninitialized (contains garbage address)
uint32_t val = *ptr;    // 🎲 UB: completely unpredictable what happens

ptr contains “garbage” from the stack. If it happens to be 0x00000000 (NULL), you’ll notice it via HardFault. But if it happens to look like a valid address, it reads or writes somewhere else in memory entirely.

UB-2: Signed Integer Overflow

int32_t x = INT32_MAX;   // 2147483647 (maximum value of a 32-bit signed integer)
x = x + 1;              // 🎲 UB: signed integer overflow is UB

💬 What is INT32_MAX?
A constant defined in <stdint.h>: the maximum value representable by int32_t, 2,147,483,647 (about 2.1 billion). Adding 1 to it would normally produce a negative number, but the C spec says that behavior is “undefined.”

💬 What is wrap-around?
When a value exceeds its maximum and wraps back to the minimum (0). Unsigned integers (uint32_t) are guaranteed to wrap around by the C spec (UINT32_MAX + 1 = 0). Signed integers (int32_t) are NOT guaranteed to wrap — it’s UB.

Signed integer overflow is UB (unsigned uint32_t wraps around, so it’s not UB). GCC may optimize under the assumption that signed integers don’t overflow.

UB-3: Hardware Registers Without volatile

Also discussed in Episode 5:

// ❌ Reading a register without volatile
uint32_t* reg = (uint32_t*)0x40020014;   // GPIOA ODR
*reg = 0x01;

while (*reg != 0x00) {   // ← compiler may judge "this condition never changes"
    // ... do nothing ...
}
// → with -O2, the entire loop may be deleted

The compiler analyzes that “there’s no code in this function that writes to *reg,” judges the while condition to be constant false, and may delete the loop. volatile prevents this.

💬 What is volatile?
A keyword that tells the compiler “this memory may change externally at any time, in ways you don’t know about.” With volatile, the compiler never caches that variable in a register; every access forces an actual memory read/write. Always add it to hardware registers and variables modified by interrupts.

🎲 Accident 6: Unexpected Behavior Changes Due to Optimization

💬 How this differs from Accident 5:
Accident 5 involves “operations explicitly designated as UB by the C spec (signed integer overflow, uninitialized pointers, etc.).” Accident 6 involves “code that is valid C, but where the absence of volatile or similar causes optimization to produce unintended behavior.” Both produce the same symptom — “works in debug build, breaks in release build” — so learn them together.

Works in Debug Build but Breaks in Release Build

Most of the nastiest “hard to reproduce” bugs in embedded development are caused by differences in optimization level.

Example: Shared Variable Without volatile (Foreshadowing Interrupts)

// ❌ Variable shared between interrupt handler and main loop, without volatile
uint32_t flag = 0;   // no volatile!

// Interrupt handler (TIM2_IRQHandler, etc.)
void TIM2_IRQHandler(void) {
    flag = 1;   // set by interrupt
}

// main loop
while (flag == 0) {
    // waiting for something
}
// → at -O2, flag is loaded into a register once and reused
// → even when the interrupt sets flag to 1 in RAM, the register still holds 0
// → infinite loop

At -O0, RAM is re-read every time so this isn’t noticed. At -O2, it immediately becomes an infinite loop.

The Correct Way

// ✅ Add volatile to tell the compiler "read from memory every time"
volatile uint32_t flag = 0;

volatile will be covered in detail at the assembly level in Episode 12 (Optimization and Assembly). For now, remember: “add volatile to any variable touched by interrupts or hardware.”

Example: Behavior Change After Stack Corruption

void stack_corruption(void) {
    uint32_t buf[4];

    // -O0: padding may be inserted between buf and other variables
    // -O2: optimization may eliminate variables or change layout
    //      → same code can break different variables

    buf[4] = 0xDEAD;   // stack corruption
}

Because the stack layout changes between optimization levels, a scenario where debug build was fine but release build breaks something different is possible.

🔬 Practice: Break-it Code + Debugger Observation

Let’s create an experiment project and observe each accident pattern in the debugger.

Project Setup

Create a new project (NUCLEO-F401RE) in STM32CubeIDE and add experiment code to the main() function in main.c.

Important: Set Optimization to -O0 (debug build).
Project → Properties → C/C++ Build → Settings → Tool Settings → MCU GCC Compiler → Optimization

💬 How to set a breakpoint:
In the STM32CubeIDE editor, double-click the left of the line number where you want to stop — a blue dot ● appears. During debug execution (bug icon), the program stops automatically when it reaches that line. Then press F6 to step one line at a time.

Full Experiment Code

Run experiments one at a time by uncommenting them. Detailed observation steps are in each accident section above.

Experiment	What to uncomment	What to check
Experiment 1: NULL dereference	2 lines for `fn`	Debugger stops at HardFault_Handler
Experiment 2: Dangling pointer	— (covered in Accident 2 section)	Follow the description in Accident 2
Experiment 3: Stack lifetime	already active	`dummy_read` shows garbage in Variables view
Experiment 4: Out-of-bounds	already active	`canary` changes from `0xCAFEBABE` in Variables view

/* Experiment functions — NEVER include these in production code */

/* Experiment 3: Return address of local variable (compiler warning will appear) */
uint32_t* dangerous_get_ptr(void) {
    uint32_t local = 0x12345678;
    return &local;   // ⚠️ warning: function returns address of local variable
}

/* Experiment 4: Out-of-bounds array access (run with -O0; layout changes at -O2) */
void array_overflow_demo(void) {
    uint32_t buf[4]  = {0xAAAA, 0xBBBB, 0xCCCC, 0xDDDD};
    uint32_t canary  = 0xCAFEBABE;   // placed right next to buf (likely)

    /* Check the address of buf in the debugger before running */
    buf[4] = 0xDEAD;   // out-of-bounds write (= write to *(buf+4))

    /* Check whether canary changed */
    (void)canary;
}

int main(void) {
    HAL_Init();
    SystemClock_Config();
    MX_GPIO_Init();

    /* ---- Experiment 1: NULL dereference ---- */
    /*
    void (*fn)(void) = NULL;
    fn();   // → debugger stops at HardFault_Handler
    */

    /* ---- Experiment 3: Stack lifetime ---- */
    uint32_t* stale_ptr = dangerous_get_ptr();
    /*
     * stale_ptr is now invalid.
     * In the debugger, confirm that the address stale_ptr points to
     * no longer holds "0x12345678"
     */
    volatile uint32_t dummy_read = *stale_ptr;   // read garbage
    (void)dummy_read;

    /* ---- Experiment 4: Out-of-bounds array access ---- */
    array_overflow_demo();
    /* Confirm the change to canary in the debugger */

    while (1) {
    }
}

🛡️ Habits That Prevent Accidents

Accident pattern	Prevention
NULL dereference	`!= NULL` check before using any pointer
Dangling pointer	Assign `ptr = NULL` after `free()` to invalidate it
Stack lifetime expired	Never return the address of a local variable. Use `static` or pass via argument
Out-of-bounds array access	`< SIZE` (NOT `<=`). Manage size with `#define`
Undefined behavior	Fix all compiler warnings with `-Wall -Wextra`
Optimization-induced misbehavior	Add `volatile` to any variable touched by hardware or interrupts

Make the Compiler Work for You

Add these flags to STM32CubeIDE’s compiler options:

-Wall -Wextra -Wpointer-arith -Wstrict-prototypes

Project → Properties → C/C++ Build → Settings → MCU GCC Compiler → Miscellaneous → Other flags

With just this, the majority of accidents covered in this article will be detected as build-time warnings.

Use Static Analysis Tools

For patterns compiler warnings can’t catch, static analysis tools are effective. cppcheck is a free tool for embedded C that detects null pointer dereferences, out-of-bounds accesses, and uninitialized variables without compiling:

cppcheck --enable=all --inconclusive src/

Integrate it into CI/CD to run checks automatically before code review.

Summary

We experienced 6 types of “how things break”:

NULL dereference → HardFault. SCB->CFSR identifies the fault type
Dangling pointer → silently corrupts other variables. Assign NULL after free()
Stack lifetime → never return a local variable’s address. GCC warns you
Out-of-bounds array access → C has no bounds checking. Enforce < SIZE
Undefined behavior → compiler can make code “not exist”
Optimization-induced breakage → hotbed of hard-to-notice bugs. volatile is the key

“Knowing how things break” means that when you encounter a bug, you can think “ah, that’s that pattern.” This intuition is what separates the experienced engineer.

“Code that works in a debug build is not correct code.”
Behavior at -O0 only tells you “works without optimization.” Code with missing volatile or UB can suddenly break in a release build or with a different compiler. Don’t stop at debug-build testing — develop the habit of also verifying in a release build.

From the next episode we enter “the world of time.” Clocks, timers, execution time measurement — embedded development must master not just “space” but “time.”

What’s Next

⏱️ Episode 7: The World of Time — Knowing the Weight of a Single Cycle

How many CPU cycles does "wait 1 second" actually consume? Learn to measure execution time in µs using DWT CYCCNT, and internalize the embedded engineer's iron rule: "you can't discuss it without measuring it."

📖 Read Episode 7

Experiment Environment and Ground Rules#

Switching Build Configurations in STM32CubeIDE#

What Changes Between Optimization Levels#

Overview of Each Level#

What Changes Between -O0 and -O2#

How Optimization Changes the Way Bugs Appear#

💥 Accident 1: NULL Dereference#

What Happens#

Why It Causes a HardFault#

Observing in the Debugger#

Confirming the HardFault#

Prevention#

What is NULL?#

🕵️ Accident 2: Dangling Pointer#

What is a “Dangling Pointer”?#

Why It Breaks Silently#

Reality in STM32 Development#

🕵️ Accident 3: Stack Lifetime (Returning the Address of a Local Variable)#

One of the Most Commonly Stepped-On Land Mines#

Visualizing the Stack#

Using GCC Warnings#

Correct Patterns#

💥 Accident 4: Out-of-Bounds Array Access#

C Has No Bounds Checking#

Destroying Variables on the Stack#

Observing in the Debugger#

Prevention: Manage Array Size as a Constant#

🎲 Accident 5: Undefined Behavior (UB)#

What is UB?#

Representative UB Patterns#

UB-1: Using an Uninitialized Pointer#

UB-2: Signed Integer Overflow#

UB-3: Hardware Registers Without volatile#

🎲 Accident 6: Unexpected Behavior Changes Due to Optimization#

Works in Debug Build but Breaks in Release Build#

Example: Shared Variable Without volatile (Foreshadowing Interrupts)#

The Correct Way#

Example: Behavior Change After Stack Corruption#

🔬 Practice: Break-it Code + Debugger Observation#

Project Setup#

Full Experiment Code#

🛡️ Habits That Prevent Accidents#

Make the Compiler Work for You#

Use Static Analysis Tools#

Summary#

What’s Next#

Related Articles#

📚 Related Articles

STM32 Series #5: Pointers = Addresses with Types — Turning Pointers into a Weapon

STM32 Series #0: Why Embedded Programming Looks Hard — Place and Time

Full 13-Part Series: The Embedded World Beyond Pointers — Memory and Time in STM32 Programming

【STM32 Series #12】Optimization and Assembly — Watching C Become Machine Code, and Becoming a Strong Embedded Engineer

【STM32 Series #10】The DMA Idea — Understanding the Transfer Architecture That Makes the CPU Idle

STM32 Series #8: Understanding Interrupts — Vector Table, NVIC, Context Saving, and TIM2 Implementation

Experiment Environment and Ground Rules

Switching Build Configurations in STM32CubeIDE

What Changes Between Optimization Levels

Overview of Each Level

What Changes Between `-O0` and `-O2`

How Optimization Changes the Way Bugs Appear

💥 Accident 1: NULL Dereference

What Happens

Why It Causes a HardFault

Observing in the Debugger

Confirming the HardFault

Prevention

What is NULL?

🕵️ Accident 2: Dangling Pointer

What is a “Dangling Pointer”?

Why It Breaks Silently

Reality in STM32 Development

🕵️ Accident 3: Stack Lifetime (Returning the Address of a Local Variable)

One of the Most Commonly Stepped-On Land Mines

Visualizing the Stack

Using GCC Warnings

Correct Patterns

💥 Accident 4: Out-of-Bounds Array Access

C Has No Bounds Checking

Destroying Variables on the Stack

Observing in the Debugger

Prevention: Manage Array Size as a Constant

🎲 Accident 5: Undefined Behavior (UB)

What is UB?

Representative UB Patterns

UB-1: Using an Uninitialized Pointer

UB-2: Signed Integer Overflow

UB-3: Hardware Registers Without volatile

🎲 Accident 6: Unexpected Behavior Changes Due to Optimization

Works in Debug Build but Breaks in Release Build

Example: Shared Variable Without volatile (Foreshadowing Interrupts)

The Correct Way

Example: Behavior Change After Stack Corruption

🔬 Practice: Break-it Code + Debugger Observation

Project Setup

Full Experiment Code

🛡️ Habits That Prevent Accidents

Make the Compiler Work for You

Use Static Analysis Tools

Summary

What’s Next

Related Articles