In Episode 10 we learned DMA. “The DMA buffer must be a global variable.” “You can’t pass a stack variable to DMA.” But did you wonder — why do global variables and stack variables behave so differently in memory?

The answer is in the linker script.


📖 Previous Episode

Episode 10: The DMA Idea — Making the CPU Idle

📍 Series Top Page

Full 13-Part Series: The Embedded World Beyond Pointers


✅ What You'll Be Able to Do After This Article

  • Explain what .text / .data / .bss are and where each is placed
  • Read the MEMORY / SECTIONS blocks of a linker script (.ld file)
  • Read arm-none-eabi-size output to accurately determine Flash and RAM usage
  • Use a map file to identify which module is consuming the most memory
  • Avoid the stack overflow and missing const pitfalls

Table of Contents

  1. What Linker Scripts and Map Files Are
  2. The Big Picture: Memory Layout
  3. What .text / .data / .bss Actually Are
  4. The Startup Sequence: Who Copies .data?
  5. Reading the Linker Script
  6. arm-none-eabi-size and the Map File
  7. Hands-On: Finding and Reducing Memory Usage
  8. Common Pitfalls
  9. Summary

What Linker Scripts and Map Files Are

The Problem the Linker Solves

Compiling a C file produces a .o object file. But at this stage, “where exactly in Flash does main() go?” is still undecided.

main.o                              stm32f4xx_hal_uart.o
├── main()          ← unresolved    ├── HAL_UART_Transmit()  ← unresolved
├── led_toggle()    ← unresolved    └── ...
└── g_counter       ← unresolved

The linker’s (arm-none-eabi-ld) job is to take all of these .o files and:

  1. Resolve symbol references — main.c calls HAL_UART_Transmit, but where is that function?
  2. Assign final addresses and combine everything into a single .elf binary.

The linker script (.ld file) is the instruction sheet you hand to the linker telling it which memory regions exist and how to map each section into them.

What Makes Linker Scripts Valuable

Without a linker script, the linker has no idea that “Flash starts at 0x08000000” or “SRAM starts at 0x20000000.” It would either guess or fail.

By writing an explicit linker script you can:

Goal Example
Separate bootloader from app Reserve first 32 KB of Flash for the bootloader, rest for the app
Place code in TCM Put ISR handlers in Tightly Coupled Memory for zero-latency execution
Define multi-core shared memory Fix the address of a communication buffer between Core0 and Core1
Add external memory Declare an external SRAM or QSPI Flash region and use it transparently
Optimize section placement Cluster frequently-accessed constants in the fast-access Flash region
✅ Linker scripts in bootloader development

When building a bootloader for STM32, you split Flash into “bootloader (0x08000000–0x08007FFF)” and “application (0x08008000–)”. Just change the ORIGIN and LENGTH in the app’s linker script and the compiler places the binary at the correct address automatically — no hard-coded jump addresses needed.

Linker Scripts Are Not Just for Microcontrollers

Every compiled binary — not just embedded firmware — goes through a linker, and most linkers support linker scripts.

Environment Linker Linker script usage
Embedded (ARM) arm-none-eabi-ld Written explicitly — this article
Linux kernel GNU ld arch/arm/kernel/vmlinux.lds.S — mandatory per architecture
Linux apps GNU ld / lld GCC’s default script applied automatically
macOS Apple ld64 __TEXT / __DATA segment layout
Windows MSVC link.exe Controlled via /ENTRY and .def files
WebAssembly wasm-ld Controls .wasm memory section layout

The Linux kernel’s vmlinux.lds places boot code at a specific address, collects interrupt vectors, and separates read-only from writable sections — exactly the same structure as a STM32 linker script. If you can read an embedded linker script, you’re already halfway to understanding how the Linux kernel is built.

💡 Why linker scripts seem invisible for PC apps

Running gcc main.c -o main on Linux applies GCC’s built-in default linker script automatically. That script already knows the standard ELF layout, so typical app development never touches it.

On a microcontroller there is no OS and no standard memory map, so you must always write the script explicitly. The linker script exists in both worlds — it’s just invisible on the PC side.

The Map File: A Build Report Card

The map file is the complete record of what the linker decided — which symbol ended up at which address and how many bytes it occupies.

Use case What you actually do
Check Flash / RAM usage Sum section sizes, monitor against limits
Find what’s bloating the binary Identify the module or function using the most space
Diagnose a crash address Look up a hard-fault address to find the function name
Verify linker symbols Confirm _estack and _sbss are at the expected addresses

🗺️ The Big Picture: Memory Layout

STM32F401RE Memory Map

The STM32F401RE (NUCLEO-F401RE) has two types of memory:

Type Start Address Size Purpose
Flash 0x08000000 512 KB Persistent storage for program code and constants
SRAM 0x20000000 96 KB Runtime data, stack, and heap
graph TD subgraph Flash["Flash 512KB (0x08000000–)"] F1[".text
Code + Interrupt Vectors"] F2[".rodata
String literals + const"] F3[".data init values
Copy source for globals"] end subgraph SRAM["SRAM 96KB (0x20000000–)"] S1[".data
Initialized globals"] S2[".bss
Zero-initialized globals"] S3["heap ↑
malloc region"] S4["stack ↓
Local variables"] end F3 -->|"Copied at startup"| S1

Flash is non-volatile (survives power-off); SRAM is volatile (cleared on power-off). Programs execute from Flash; only data that changes at runtime lives in SRAM.

💡 Why Flash speed isn't a problem

The STM32F401’s Flash has an ART Accelerator (prefetch + instruction cache). At 84 MHz, instruction fetches run at near-zero wait states in practice, so Flash read latency is typically not a bottleneck.

Prefetch: The CPU predicts which instructions it will need next and reads them from Flash in advance. By loading data before the CPU actually requests it, prefetch hides the inherent read latency of Flash from the pipeline.

Instruction cache: Recently-fetched Flash instructions are held in a small, fast memory (the cache). When the same code runs repeatedly — loops are the classic example — subsequent iterations are served from the cache without touching Flash at all.


📦 What .text / .data / .bss Actually Are

How you declare a variable in C determines which section it goes into.

.text Section

Holds code and constants. Placed in Flash.

/* All of the following go into .text (or .rodata) */

void led_toggle(void) {           /* Function code → .text */
    HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
}

const uint8_t lut[256] = { ... }; /* const array → .rodata (subset of .text) */
const char msg[] = "Hello\r\n";   /* String literal → .rodata */

.rodata (read-only data) is stored in Flash as part of .text. Variables marked const go to Flash and consume no RAM.

.data Section

Holds initialized global and static-local variables.

/* The following go into .data */

uint32_t g_counter = 100;          /* Has an initial value → .data */
static uint8_t s_mode = 2;         /* static local, same rule */

void foo(void) {
    static int s_count = 0;        /* .data */
    s_count++;
}

.data is special — it exists in both Flash and SRAM:

  • Flash (LMA: Load Memory Address): the initial-value copy, preserved during power-off
  • SRAM (VMA: Virtual Memory Address): where the CPU actually reads and writes at runtime

At startup, the startup code (startup.s) copies the Flash copy into SRAM.

graph LR subgraph Flash["Flash (LMA)"] FD[".data init values\ne.g. g_counter=100"] end subgraph SRAM["SRAM (VMA)"] SD[".data runtime area\ng_counter"] end FD -->|"startup.s copies at boot"| SD SD -->|"CPU reads/writes at runtime"| CPU["CPU"]
💡 LMA vs VMA
  • LMA (Load Memory Address): the address in the binary (.bin/.hex) — the physical location in Flash.
  • VMA (Virtual Memory Address): the address the program expects at runtime — what the CPU uses.

.text executes from Flash, so LMA = VMA. .data is stored in Flash but copied to SRAM, so LMA (Flash) ≠ VMA (SRAM).

.bss Section

Holds zero-initialized global and static-local variables.

/* The following go into .bss */

uint32_t g_error_count;            /* No initializer (= 0) → .bss */
static uint8_t rx_buf[256];        /* Zero-initialized → .bss */
uint8_t g_flag = 0;                /* Explicitly 0 → .bss (compiler may optimize) */

.bss initial values are not stored in Flash. The startup code simply zeros out the .bss region. This means a large zero-initialized buffer adds no Flash cost.

Section Flash SRAM Notes
.text Code + const. Read-only
.rodata String literals + const arrays
.data ✅ (init copy) Consumes both Flash and SRAM
.bss ✗ (size only) Flash-efficient. Zero-init guaranteed
stack ✅ (dynamic) Local variables, call frames
heap ✅ (dynamic) malloc-allocated memory
💡 Quick glossary for this chapter
Term Meaning
Section A named “bucket” (.text, .data, .bss, …) grouping related content. The linker places each bucket into the appropriate memory region
Symbol A function or variable name. To the linker it’s just “a name attached to an address.” The map file lists every symbol with its final address
Non-volatile Retains its value when power is removed (Flash). Opposite: volatile (SRAM)
Alignment Placing data at an address that is a multiple of 2, 4, or 8 bytes so the CPU can access it in a single bus cycle. The . = ALIGN(8) lines in the linker script enforce this
LMA / VMA LMA (Load Memory Address) = where the data lives in the binary on Flash. VMA (Virtual Memory Address) = where the CPU expects it at runtime (SRAM for .data)

⚙️ The Startup Sequence: Who Copies .data?

Before main() is called, the startup code (startup_stm32f401retx.s) runs these steps in order:

graph TD A["Reset_Handler"] --> B["Set SP to _estack\n(top of SRAM)"] B --> C["Copy .data from Flash to SRAM\n(_sidata → _sdata through _edata)"] C --> D["Zero out .bss\n(fill _sbss through _ebss with 0)"] D --> E["Call SystemInit()\n(clocks, FPU, etc.)"] E --> G["Call __libc_init_array()\n(C++ global constructors)"] G --> F["Call main()"]

Here is what startup_stm32f401retx.s’s Reset_Handler does, expressed as C pseudo-code:

void Reset_Handler(void)
{
    /* 1. Copy .data section from Flash (LMA) to SRAM (VMA) */
    uint32_t *src = &_sidata;   /* Flash: start of init data */
    uint32_t *dst = &_sdata;    /* SRAM: start of .data */
    while (dst < &_edata) {
        *dst++ = *src++;
    }

    /* 2. Zero out .bss section */
    uint32_t *bss = &_sbss;
    while (bss < &_ebss) {
        *bss++ = 0;
    }

    /* 3. System initialization (clocks, FPU, etc.) */
    SystemInit();

    /* 4. C++ global constructors (empty for pure-C projects) */
    __libc_init_array();

    /* 5. Jump to application */
    main();
}
✅ Why global variables start at zero

The C standard requires zero-initialization for global and static variables with no explicit initializer. This is guaranteed because startup.s zeroes out .bss before main() is called.

Local variables (stack variables) are not handled by startup.s, which is why reading an uninitialized local variable gives an indeterminate value.

About __libc_init_array(): In C++ projects, this calls the constructors of any global objects. In pure-C projects the function body is empty — but it is always called. This is where the sequence diverges between C and C++ firmware.

Startup order note: On the STM32F401 with CubeIDE-generated startup code, .data/.bss initialization happens before SystemInit(). Some boards with external SDRAM (STM32H7, etc.) require MPU/FMC configuration before SRAM copy can succeed, so the order may differ on those platforms.


📄 Reading the Linker Script

STM32CubeIDE generates a file named STM32F401RETx_FLASH.ld in each project.

MEMORY Block

Declares the physical memory regions:

MEMORY
{
  RAM    (xrw) : ORIGIN = 0x20000000, LENGTH = 96K
  FLASH  (rx)  : ORIGIN = 0x08000000, LENGTH = 512K
}
  • ORIGIN: start address
  • LENGTH: size
  • (rx) / (xrw): permissions (r=read, x=execute, w=write)

SECTIONS Block

Maps each section to a memory region:

SECTIONS
{
  /* Code and constants → Flash */
  .text :
  {
    *(.text)         /* .text from all object files */
    *(.text*)
    *(.rodata)
    *(.rodata*)
  } >FLASH

  /* .data: VMA in SRAM, but initial values stored in Flash */
  _sidata = LOADADDR(.data);   /* Flash address of init values (LMA) */

  .data :
  {
    _sdata = .;      /* Start of .data in SRAM (VMA) — startup.s reads this */
    *(.data)
    *(.data*)
    _edata = .;      /* End of .data in SRAM */
  } >RAM AT> FLASH   /* VMA = RAM, LMA = FLASH */

  /* .bss: SRAM only — nothing written to Flash */
  .bss :
  {
    _sbss = .;
    *(.bss)
    *(.bss*)
    *(COMMON)
    _ebss = .;
  } >RAM

  /* Reserve space for heap and stack */
  ._user_heap_stack :
  {
    . = ALIGN(8);
    . = . + _Min_Heap_Size;
    . = . + _Min_Stack_Size;
    . = ALIGN(8);
  } >RAM
}

The >RAM AT> FLASH directive on .data is the key: “place the VMA in RAM, but the LMA (what gets written to the binary) in Flash.”

Key Linker Symbols

These symbols are used by startup.s and the HAL:

Symbol Meaning
_estack Initial stack pointer value (top of SRAM)
_sidata Flash address of .data initial values (LMA)
_sdata Start of .data in SRAM (VMA)
_edata End of .data in SRAM
_sbss Start of .bss in SRAM
_ebss End of .bss in SRAM
_Min_Stack_Size Minimum stack reservation (default 0x400 = 1 KB)
_Min_Heap_Size Minimum heap reservation (default 0x200 = 512 B)
⚠️ _Min_Stack_Size doesn't cap actual stack usage

_Min_Stack_Size tells the linker to reserve at least this much space at the top of SRAM for the stack. It does not limit how deep the stack can grow at runtime. If the stack overflows, it will silently overwrite .bss below it — the linker will not warn you.


🔍 arm-none-eabi-size and the Map File

Reading arm-none-eabi-size

After building, run this command (STM32CubeIDE displays it automatically in the build log):

$ arm-none-eabi-size firmware.elf
   text    data     bss     dec     hex filename
  12480     116    2072   14668    394c firmware.elf
Column Meaning Memory consumed
text .text + .rodata bytes Flash
data .data bytes Flash (init copy) + SRAM (runtime)
bss .bss bytes SRAM only
dec Total (decimal)

Flash usage = text + data. Static SRAM usage = data + bss.

\text{Flash usage} = \text{text} + \text{data} \text{Static SRAM usage} = \text{data} + \text{bss}

For the example above:

  • Flash: 12480 + 116 = 12,596 bytes (2.4% of 512 KB)
  • Static SRAM: 116 + 2072 = 2,188 bytes (2.3% of 96 KB)
💡 Static vs dynamic SRAM

The SRAM figure from arm-none-eabi-size is only the statically allocated portion. Runtime stack depth (function call depth × local variable sizes) and heap allocations from malloc are not included. The difference between 96 KB total and this static figure is what stack and heap can use.

Reading the Map File

The firmware.map file generated by the build contains a full memory layout. Start with the summary:

Memory region         Used Size  Region Size  %age Used
             RAM:       2188 B        96 KB      2.22%
           FLASH:      12596 B       512 KB      2.40%

Then look at the .text section breakdown (excerpt):

.text           0x0800000c     0x2c04
 *(.text)
 .text          0x0800000c      0x1d4 ./Core/Src/main.o
 .text          0x080001e0      0x2a0 ./Drivers/STM32F4xx_HAL_Driver/Src/stm32f4xx_hal_uart.o
 .text          0x08000480       0xb4 ./Drivers/STM32F4xx_HAL_Driver/Src/stm32f4xx_hal_dma.o

Each line shows: address, size (hex), and object file. The .bss section looks the same:

.bss            0x20000090      0x818
 .bss           0x20000090        0x4 ./Core/Src/main.o
 .bss           0x20000094      0x800 ./Core/Src/usart.o   ← rx_buf[2048]

The Information-Leak Risk of Map Files

A map file contains every function name, variable name, address, and size in the binary. That’s invaluable for debugging — but it also means the map file is a complete blueprint of your firmware’s internal structure.

For security-sensitive products:

  • Never ship the map file with the product binary — treat it as internal documentation only
  • Never copy a debug-symbol .elf to a production environment
  • Ship only a symbol-stripped .bin (arm-none-eabi-strip)
⚠️ One missing line in a config file — 512,000 lines of source code exposed

The risk of accidentally exposing a “map” of your internal structure is not unique to embedded systems. The web world has JavaScript source maps — files that map minified, obfuscated production JavaScript back to the original TypeScript source. Same name, same concept: a file that connects the compiled artifact back to its internal structure.

On March 31, 2026, a single missing line (*.map) in Anthropic’s .npmignore file caused 512,000+ lines of Claude Code TypeScript source — including designs for unreleased projects — to be published to the global npm registry.

👉 Full story: Claude Code Source Code Leak — the .map File Mechanism and Its Impact


🛠️ Hands-On: Finding and Reducing Memory Usage

Watching the Numbers Change

/* Before: nothing extra */

/* After: add a large global buffer */
uint8_t log_buf[4096];

arm-none-eabi-size output:

/* Before */
   text    data     bss     dec
  12480     116    2072   14668

/* After */
   text    data     bss     dec
  12480     116    6168   18764   ← bss grew by 4096, Flash unchanged

log_buf has no initializer so it goes into .bss. Flash is untouched; only SRAM grows.

Adding an Initializer Costs Flash Too

uint8_t log_buf[4096] = { 0xFF };   /* First byte 0xFF, rest 0 */
   text    data     bss     dec
  12480    4212    2072   18764   ← data grew by 4096 (Flash and SRAM both grow)

If you don’t need a non-zero initial value, leaving the initializer off saves Flash.

Finding the Biggest Consumer in the Map File

# Sort .bss entries by size (Linux/Mac)
grep -A1 "\.bss" firmware.map | sort -k2 -rn | head -20

On Windows, open firmware.map in a text editor and search for .bss, then sort by the size column manually.

✅ RAM reduction priority order

When a map file identifies a large consumer, work through these options in order:

  1. Right-size the buffer (do you really need 4096 bytes, or would 512 suffice?)
  2. Add const (lookup tables and constant data should live in Flash)
  3. Switch from static to dynamic allocation (malloc/free if the data isn’t always needed)
  4. Redesign the algorithm (can you stream data rather than buffer it all?)

⚠️ Common Pitfalls

Pitfall 1: Stack Overflow (Silent Corruption)

The stack grows downward from the top of SRAM; the heap grows upward from below it. When the stack overflows into the heap region it corrupts malloc’s internal structures; if it continues past the heap into .bss, it silently overwrites global variables. In projects that don’t use malloc the heap is near-zero in size, so .bss becomes the primary victim.

high address
stack ↓ grows down
(unused space)
heap ↑ grows up
.bss  ← overflow overwrites here!
.data
_estack (0x20018000) / initial SP
← 0x20000000
low address

Symptoms: Global variables suddenly have wrong values. Specific function calls corrupt unrelated memory. Non-reproducible behavior.

/* ❌ Large local array → stack overflow candidate */
void process_data(void)
{
    char work_buf[4096];   /* 4 KB on the stack */
    uint8_t temp[2048];
}

/* ✅ Make it static — moves to .bss, off the stack */
void process_data(void)
{
    static char work_buf[4096];
    static uint8_t temp[2048];
}
⚠️ Detecting stack overflow on STM32F4

The STM32F4 has no hardware stack guard (some Cortex-M33 devices can use the MPU for this). Stack overflow typically manifests as a hard fault or mysterious data corruption.

A simple “stack canary” technique: write a known pattern at the boundary between the stack reserve area and .bss, and check it in your main loop.

extern uint32_t _ebss;
volatile uint32_t *canary = &_ebss - 4;
*canary = 0xDEADBEEF;

/* In the main loop: */
if (*canary != 0xDEADBEEF) {
    Error_Handler();   /* Stack overflow detected */
}

Pitfall 2: Forgetting const

/* ❌ No const → .data → wastes both Flash AND RAM */
uint8_t sin_table[256] = { 0, 1, 3, 5, ... };   /* 256 bytes of RAM */

/* ✅ With const → .rodata → Flash only, zero RAM cost */
const uint8_t sin_table[256] = { 0, 1, 3, 5, ... };

Large lookup tables without const inflate both data (Flash) and SRAM. If arm-none-eabi-size shows a suspiciously large data value, look for missing const qualifiers.

Pitfall 3: Heap/Stack Collision

The heap grows upward from below the stack. If you allocate a large heap block, it can collide with a deeply-nested stack frame. malloc may return NULL, but in some scenarios the collision goes undetected.

In embedded systems, static allocation (globals / static locals) is almost always safer than malloc — no fragmentation, no collision risk, and the linker will tell you at build time if you’ve exceeded SRAM.

Pitfall 4: Setting _Min_Stack_Size Too Small

/* STM32CubeIDE defaults */
_Min_Heap_Size  = 0x200;   /* 512 B */
_Min_Stack_Size = 0x400;   /* 1 KB */

This reserves at least 1 KB for the stack, but it does not prevent deeper stack usage. The HAL calls ISR handlers which add stack frames of their own. A safe rule of thumb for small STM32F4 projects: raise _Min_Stack_Size to 0x800 (2 KB).


Summary

Concept Key Point
.text / .rodata Code and const data. Flash only. Read-only at runtime
.data Initialized globals. Flash (init copy) + SRAM (runtime). Costs both
.bss Zero-initialized globals. SRAM only. Flash-efficient
LMA / VMA .data exists in Flash (LMA) and SRAM (VMA) simultaneously
startup.s Copies .data, zeroes .bss, calls SystemInit, then main
arm-none-eabi-size text+data = Flash usage; data+bss = static SRAM usage
Map file Per-module, per-symbol layout and size
Stack overflow Stack grows down, silently overwrites .bss
Missing const Without const, tables go into .data — wasting Flash and SRAM
📌 The linker script is a design document

Once you can read a linker script, the foundation of why embedded code works becomes visible all the way down to Flash and RAM.

Why must DMA buffers be global? Why is zero-initialization guaranteed for globals? Where exactly should you add const? Every one of these questions connects to section layout and the linker script.

Building the habit of checking the map file on every non-trivial firmware change will save you hours of debugging when you finally hit a memory constraint.

Next up: Episode 12 — Optimization and Assembly (Trust the Compiler, But Understand It). What changes with -O2? Inlining, loop unrolling, the impact of volatile on optimization, and a look at the disassembly with arm-none-eabi-objdump.


Next Episode

🚀 Episode 12: Optimization and Assembly — Trust the Compiler, But Understand It

What -O2 optimization actually does. Inlining, loop unrolling, and how volatile interacts with the optimizer. Using arm-none-eabi-objdump to disassemble and read the compiler's decisions.


FAQ

Q. Is zero-initialization of global variables truly guaranteed?

Yes. The C standard (C99 §6.7.9) requires zero-initialization for all globals and static variables without explicit initializers. This is implemented by startup.s zeroing the .bss section. Memory allocated with malloc is not initialized — use calloc if you need zeroed memory.

Q. What happens when Flash is full?

The linker fails the build with an error like region 'FLASH' overflowed by X bytes. It is a build-time error, not a runtime failure — you’ll know immediately.

Q. What happens when SRAM is full?

Static overflow (too many globals/statics) is also a linker error (region 'RAM' overflowed). Stack and heap overflow at runtime are not caught by the linker. They manifest as hard faults or silent data corruption — which is why pitfall 1 is so dangerous.

Q. Can I place a large array directly in Flash?

Yes — add const to a global variable and it goes into .rodata in Flash. Note that Flash has a limited write-endurance (typically 100k–1M erase cycles), so it’s fine for read-only data but not for runtime-modifiable values. Frequently-accessed large tables in Flash may also cause cache misses, so hot data is sometimes copied to SRAM first.

Q. Do I ever need to edit the linker script myself?

Rarely, for most projects. You will need to if you: add external SRAM or QSPI Flash, place specific code in TCM (Tightly Coupled Memory) for maximum speed, or need to co-exist with a bootloader that occupies part of Flash.