In Episode 10 we learned DMA. “The DMA buffer must be a global variable.” “You can’t pass a stack variable to DMA.” But did you wonder — why do global variables and stack variables behave so differently in memory?
The answer is in the linker script.
📖 Previous Episode
📍 Series Top Page
✅ What You'll Be Able to Do After This Article
- Explain what .text / .data / .bss are and where each is placed
- Read the MEMORY / SECTIONS blocks of a linker script (.ld file)
- Read
arm-none-eabi-sizeoutput to accurately determine Flash and RAM usage - Use a map file to identify which module is consuming the most memory
- Avoid the stack overflow and missing
constpitfalls
Table of Contents
- What Linker Scripts and Map Files Are
- The Big Picture: Memory Layout
- What .text / .data / .bss Actually Are
- The Startup Sequence: Who Copies .data?
- Reading the Linker Script
- arm-none-eabi-size and the Map File
- Hands-On: Finding and Reducing Memory Usage
- Common Pitfalls
- Summary
What Linker Scripts and Map Files Are
The Problem the Linker Solves
Compiling a C file produces a .o object file. But at this stage, “where exactly in Flash does main() go?” is still undecided.
main.o stm32f4xx_hal_uart.o
├── main() ← unresolved ├── HAL_UART_Transmit() ← unresolved
├── led_toggle() ← unresolved └── ...
└── g_counter ← unresolved
The linker’s (arm-none-eabi-ld) job is to take all of these .o files and:
- Resolve symbol references —
main.ccallsHAL_UART_Transmit, but where is that function? - Assign final addresses and combine everything into a single
.elfbinary.
The linker script (.ld file) is the instruction sheet you hand to the linker telling it which memory regions exist and how to map each section into them.
What Makes Linker Scripts Valuable
Without a linker script, the linker has no idea that “Flash starts at 0x08000000” or “SRAM starts at 0x20000000.” It would either guess or fail.
By writing an explicit linker script you can:
| Goal | Example |
|---|---|
| Separate bootloader from app | Reserve first 32 KB of Flash for the bootloader, rest for the app |
| Place code in TCM | Put ISR handlers in Tightly Coupled Memory for zero-latency execution |
| Define multi-core shared memory | Fix the address of a communication buffer between Core0 and Core1 |
| Add external memory | Declare an external SRAM or QSPI Flash region and use it transparently |
| Optimize section placement | Cluster frequently-accessed constants in the fast-access Flash region |
When building a bootloader for STM32, you split Flash into “bootloader (0x08000000–0x08007FFF)” and “application (0x08008000–)”. Just change the ORIGIN and LENGTH in the app’s linker script and the compiler places the binary at the correct address automatically — no hard-coded jump addresses needed.
Linker Scripts Are Not Just for Microcontrollers
Every compiled binary — not just embedded firmware — goes through a linker, and most linkers support linker scripts.
| Environment | Linker | Linker script usage |
|---|---|---|
| Embedded (ARM) | arm-none-eabi-ld | Written explicitly — this article |
| Linux kernel | GNU ld | arch/arm/kernel/vmlinux.lds.S — mandatory per architecture |
| Linux apps | GNU ld / lld | GCC’s default script applied automatically |
| macOS | Apple ld64 | __TEXT / __DATA segment layout |
| Windows | MSVC link.exe | Controlled via /ENTRY and .def files |
| WebAssembly | wasm-ld | Controls .wasm memory section layout |
The Linux kernel’s vmlinux.lds places boot code at a specific address, collects interrupt vectors, and separates read-only from writable sections — exactly the same structure as a STM32 linker script. If you can read an embedded linker script, you’re already halfway to understanding how the Linux kernel is built.
Running gcc main.c -o main on Linux applies GCC’s built-in default linker script automatically. That script already knows the standard ELF layout, so typical app development never touches it.
On a microcontroller there is no OS and no standard memory map, so you must always write the script explicitly. The linker script exists in both worlds — it’s just invisible on the PC side.
The Map File: A Build Report Card
The map file is the complete record of what the linker decided — which symbol ended up at which address and how many bytes it occupies.
| Use case | What you actually do |
|---|---|
| Check Flash / RAM usage | Sum section sizes, monitor against limits |
| Find what’s bloating the binary | Identify the module or function using the most space |
| Diagnose a crash address | Look up a hard-fault address to find the function name |
| Verify linker symbols | Confirm _estack and _sbss are at the expected addresses |
🗺️ The Big Picture: Memory Layout
STM32F401RE Memory Map
The STM32F401RE (NUCLEO-F401RE) has two types of memory:
| Type | Start Address | Size | Purpose |
|---|---|---|---|
| Flash | 0x08000000 |
512 KB | Persistent storage for program code and constants |
| SRAM | 0x20000000 |
96 KB | Runtime data, stack, and heap |
Code + Interrupt Vectors"] F2[".rodata
String literals + const"] F3[".data init values
Copy source for globals"] end subgraph SRAM["SRAM 96KB (0x20000000–)"] S1[".data
Initialized globals"] S2[".bss
Zero-initialized globals"] S3["heap ↑
malloc region"] S4["stack ↓
Local variables"] end F3 -->|"Copied at startup"| S1
Flash is non-volatile (survives power-off); SRAM is volatile (cleared on power-off). Programs execute from Flash; only data that changes at runtime lives in SRAM.
The STM32F401’s Flash has an ART Accelerator (prefetch + instruction cache). At 84 MHz, instruction fetches run at near-zero wait states in practice, so Flash read latency is typically not a bottleneck.
Prefetch: The CPU predicts which instructions it will need next and reads them from Flash in advance. By loading data before the CPU actually requests it, prefetch hides the inherent read latency of Flash from the pipeline.
Instruction cache: Recently-fetched Flash instructions are held in a small, fast memory (the cache). When the same code runs repeatedly — loops are the classic example — subsequent iterations are served from the cache without touching Flash at all.
📦 What .text / .data / .bss Actually Are
How you declare a variable in C determines which section it goes into.
.text Section
Holds code and constants. Placed in Flash.
/* All of the following go into .text (or .rodata) */
void led_toggle(void) { /* Function code → .text */
HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
}
const uint8_t lut[256] = { ... }; /* const array → .rodata (subset of .text) */
const char msg[] = "Hello\r\n"; /* String literal → .rodata */
.rodata (read-only data) is stored in Flash as part of .text. Variables marked const go to Flash and consume no RAM.
.data Section
Holds initialized global and static-local variables.
/* The following go into .data */
uint32_t g_counter = 100; /* Has an initial value → .data */
static uint8_t s_mode = 2; /* static local, same rule */
void foo(void) {
static int s_count = 0; /* .data */
s_count++;
}
.data is special — it exists in both Flash and SRAM:
- Flash (LMA: Load Memory Address): the initial-value copy, preserved during power-off
- SRAM (VMA: Virtual Memory Address): where the CPU actually reads and writes at runtime
At startup, the startup code (startup.s) copies the Flash copy into SRAM.
- LMA (Load Memory Address): the address in the binary (.bin/.hex) — the physical location in Flash.
- VMA (Virtual Memory Address): the address the program expects at runtime — what the CPU uses.
.text executes from Flash, so LMA = VMA. .data is stored in Flash but copied to SRAM, so LMA (Flash) ≠ VMA (SRAM).
.bss Section
Holds zero-initialized global and static-local variables.
/* The following go into .bss */
uint32_t g_error_count; /* No initializer (= 0) → .bss */
static uint8_t rx_buf[256]; /* Zero-initialized → .bss */
uint8_t g_flag = 0; /* Explicitly 0 → .bss (compiler may optimize) */
.bss initial values are not stored in Flash. The startup code simply zeros out the .bss region. This means a large zero-initialized buffer adds no Flash cost.
| Section | Flash | SRAM | Notes |
|---|---|---|---|
.text |
✅ | ✗ | Code + const. Read-only |
.rodata |
✅ | ✗ | String literals + const arrays |
.data |
✅ (init copy) | ✅ | Consumes both Flash and SRAM |
.bss |
✗ (size only) | ✅ | Flash-efficient. Zero-init guaranteed |
| stack | ✗ | ✅ (dynamic) | Local variables, call frames |
| heap | ✗ | ✅ (dynamic) | malloc-allocated memory |
| Term | Meaning |
|---|---|
| Section | A named “bucket” (.text, .data, .bss, …) grouping related content. The linker places each bucket into the appropriate memory region |
| Symbol | A function or variable name. To the linker it’s just “a name attached to an address.” The map file lists every symbol with its final address |
| Non-volatile | Retains its value when power is removed (Flash). Opposite: volatile (SRAM) |
| Alignment | Placing data at an address that is a multiple of 2, 4, or 8 bytes so the CPU can access it in a single bus cycle. The . = ALIGN(8) lines in the linker script enforce this |
| LMA / VMA | LMA (Load Memory Address) = where the data lives in the binary on Flash. VMA (Virtual Memory Address) = where the CPU expects it at runtime (SRAM for .data) |
⚙️ The Startup Sequence: Who Copies .data?
Before main() is called, the startup code (startup_stm32f401retx.s) runs these steps in order:
Here is what startup_stm32f401retx.s’s Reset_Handler does, expressed as C pseudo-code:
void Reset_Handler(void)
{
/* 1. Copy .data section from Flash (LMA) to SRAM (VMA) */
uint32_t *src = &_sidata; /* Flash: start of init data */
uint32_t *dst = &_sdata; /* SRAM: start of .data */
while (dst < &_edata) {
*dst++ = *src++;
}
/* 2. Zero out .bss section */
uint32_t *bss = &_sbss;
while (bss < &_ebss) {
*bss++ = 0;
}
/* 3. System initialization (clocks, FPU, etc.) */
SystemInit();
/* 4. C++ global constructors (empty for pure-C projects) */
__libc_init_array();
/* 5. Jump to application */
main();
}
The C standard requires zero-initialization for global and static variables with no explicit initializer. This is guaranteed because startup.s zeroes out .bss before main() is called.
Local variables (stack variables) are not handled by startup.s, which is why reading an uninitialized local variable gives an indeterminate value.
About __libc_init_array(): In C++ projects, this calls the constructors of any global objects. In pure-C projects the function body is empty — but it is always called. This is where the sequence diverges between C and C++ firmware.
Startup order note: On the STM32F401 with CubeIDE-generated startup code, .data/.bss initialization happens before SystemInit(). Some boards with external SDRAM (STM32H7, etc.) require MPU/FMC configuration before SRAM copy can succeed, so the order may differ on those platforms.
📄 Reading the Linker Script
STM32CubeIDE generates a file named STM32F401RETx_FLASH.ld in each project.
MEMORY Block
Declares the physical memory regions:
MEMORY
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 96K
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
}
ORIGIN: start addressLENGTH: size(rx)/(xrw): permissions (r=read, x=execute, w=write)
SECTIONS Block
Maps each section to a memory region:
SECTIONS
{
/* Code and constants → Flash */
.text :
{
*(.text) /* .text from all object files */
*(.text*)
*(.rodata)
*(.rodata*)
} >FLASH
/* .data: VMA in SRAM, but initial values stored in Flash */
_sidata = LOADADDR(.data); /* Flash address of init values (LMA) */
.data :
{
_sdata = .; /* Start of .data in SRAM (VMA) — startup.s reads this */
*(.data)
*(.data*)
_edata = .; /* End of .data in SRAM */
} >RAM AT> FLASH /* VMA = RAM, LMA = FLASH */
/* .bss: SRAM only — nothing written to Flash */
.bss :
{
_sbss = .;
*(.bss)
*(.bss*)
*(COMMON)
_ebss = .;
} >RAM
/* Reserve space for heap and stack */
._user_heap_stack :
{
. = ALIGN(8);
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >RAM
}
The >RAM AT> FLASH directive on .data is the key: “place the VMA in RAM, but the LMA (what gets written to the binary) in Flash.”
Key Linker Symbols
These symbols are used by startup.s and the HAL:
| Symbol | Meaning |
|---|---|
_estack |
Initial stack pointer value (top of SRAM) |
_sidata |
Flash address of .data initial values (LMA) |
_sdata |
Start of .data in SRAM (VMA) |
_edata |
End of .data in SRAM |
_sbss |
Start of .bss in SRAM |
_ebss |
End of .bss in SRAM |
_Min_Stack_Size |
Minimum stack reservation (default 0x400 = 1 KB) |
_Min_Heap_Size |
Minimum heap reservation (default 0x200 = 512 B) |
_Min_Stack_Size tells the linker to reserve at least this much space at the top of SRAM for the stack. It does not limit how deep the stack can grow at runtime. If the stack overflows, it will silently overwrite .bss below it — the linker will not warn you.
🔍 arm-none-eabi-size and the Map File
Reading arm-none-eabi-size
After building, run this command (STM32CubeIDE displays it automatically in the build log):
$ arm-none-eabi-size firmware.elf
text data bss dec hex filename
12480 116 2072 14668 394c firmware.elf
| Column | Meaning | Memory consumed |
|---|---|---|
text |
.text + .rodata bytes | Flash |
data |
.data bytes | Flash (init copy) + SRAM (runtime) |
bss |
.bss bytes | SRAM only |
dec |
Total (decimal) | — |
Flash usage = text + data. Static SRAM usage = data + bss.
For the example above:
- Flash: 12480 + 116 = 12,596 bytes (2.4% of 512 KB)
- Static SRAM: 116 + 2072 = 2,188 bytes (2.3% of 96 KB)
The SRAM figure from arm-none-eabi-size is only the statically allocated portion. Runtime stack depth (function call depth × local variable sizes) and heap allocations from malloc are not included. The difference between 96 KB total and this static figure is what stack and heap can use.
Reading the Map File
The firmware.map file generated by the build contains a full memory layout. Start with the summary:
Memory region Used Size Region Size %age Used
RAM: 2188 B 96 KB 2.22%
FLASH: 12596 B 512 KB 2.40%
Then look at the .text section breakdown (excerpt):
.text 0x0800000c 0x2c04
*(.text)
.text 0x0800000c 0x1d4 ./Core/Src/main.o
.text 0x080001e0 0x2a0 ./Drivers/STM32F4xx_HAL_Driver/Src/stm32f4xx_hal_uart.o
.text 0x08000480 0xb4 ./Drivers/STM32F4xx_HAL_Driver/Src/stm32f4xx_hal_dma.o
Each line shows: address, size (hex), and object file. The .bss section looks the same:
.bss 0x20000090 0x818
.bss 0x20000090 0x4 ./Core/Src/main.o
.bss 0x20000094 0x800 ./Core/Src/usart.o ← rx_buf[2048]
The Information-Leak Risk of Map Files
A map file contains every function name, variable name, address, and size in the binary. That’s invaluable for debugging — but it also means the map file is a complete blueprint of your firmware’s internal structure.
For security-sensitive products:
- Never ship the map file with the product binary — treat it as internal documentation only
- Never copy a debug-symbol
.elfto a production environment - Ship only a symbol-stripped
.bin(arm-none-eabi-strip)
The risk of accidentally exposing a “map” of your internal structure is not unique to embedded systems. The web world has JavaScript source maps — files that map minified, obfuscated production JavaScript back to the original TypeScript source. Same name, same concept: a file that connects the compiled artifact back to its internal structure.
On March 31, 2026, a single missing line (*.map) in Anthropic’s .npmignore file caused 512,000+ lines of Claude Code TypeScript source — including designs for unreleased projects — to be published to the global npm registry.
👉 Full story: Claude Code Source Code Leak — the .map File Mechanism and Its Impact
🛠️ Hands-On: Finding and Reducing Memory Usage
Watching the Numbers Change
/* Before: nothing extra */
/* After: add a large global buffer */
uint8_t log_buf[4096];
arm-none-eabi-size output:
/* Before */
text data bss dec
12480 116 2072 14668
/* After */
text data bss dec
12480 116 6168 18764 ← bss grew by 4096, Flash unchanged
log_buf has no initializer so it goes into .bss. Flash is untouched; only SRAM grows.
Adding an Initializer Costs Flash Too
uint8_t log_buf[4096] = { 0xFF }; /* First byte 0xFF, rest 0 */
text data bss dec
12480 4212 2072 18764 ← data grew by 4096 (Flash and SRAM both grow)
If you don’t need a non-zero initial value, leaving the initializer off saves Flash.
Finding the Biggest Consumer in the Map File
# Sort .bss entries by size (Linux/Mac)
grep -A1 "\.bss" firmware.map | sort -k2 -rn | head -20
On Windows, open firmware.map in a text editor and search for .bss, then sort by the size column manually.
When a map file identifies a large consumer, work through these options in order:
- Right-size the buffer (do you really need 4096 bytes, or would 512 suffice?)
- Add
const(lookup tables and constant data should live in Flash) - Switch from static to dynamic allocation (
malloc/freeif the data isn’t always needed) - Redesign the algorithm (can you stream data rather than buffer it all?)
⚠️ Common Pitfalls
Pitfall 1: Stack Overflow (Silent Corruption)
The stack grows downward from the top of SRAM; the heap grows upward from below it. When the stack overflows into the heap region it corrupts malloc’s internal structures; if it continues past the heap into .bss, it silently overwrites global variables. In projects that don’t use malloc the heap is near-zero in size, so .bss becomes the primary victim.
_estack (0x20018000) / initial SPSymptoms: Global variables suddenly have wrong values. Specific function calls corrupt unrelated memory. Non-reproducible behavior.
/* ❌ Large local array → stack overflow candidate */
void process_data(void)
{
char work_buf[4096]; /* 4 KB on the stack */
uint8_t temp[2048];
}
/* ✅ Make it static — moves to .bss, off the stack */
void process_data(void)
{
static char work_buf[4096];
static uint8_t temp[2048];
}
The STM32F4 has no hardware stack guard (some Cortex-M33 devices can use the MPU for this). Stack overflow typically manifests as a hard fault or mysterious data corruption.
A simple “stack canary” technique: write a known pattern at the boundary between the stack reserve area and .bss, and check it in your main loop.
extern uint32_t _ebss;
volatile uint32_t *canary = &_ebss - 4;
*canary = 0xDEADBEEF;
/* In the main loop: */
if (*canary != 0xDEADBEEF) {
Error_Handler(); /* Stack overflow detected */
}
Pitfall 2: Forgetting const
/* ❌ No const → .data → wastes both Flash AND RAM */
uint8_t sin_table[256] = { 0, 1, 3, 5, ... }; /* 256 bytes of RAM */
/* ✅ With const → .rodata → Flash only, zero RAM cost */
const uint8_t sin_table[256] = { 0, 1, 3, 5, ... };
Large lookup tables without const inflate both data (Flash) and SRAM. If arm-none-eabi-size shows a suspiciously large data value, look for missing const qualifiers.
Pitfall 3: Heap/Stack Collision
The heap grows upward from below the stack. If you allocate a large heap block, it can collide with a deeply-nested stack frame. malloc may return NULL, but in some scenarios the collision goes undetected.
In embedded systems, static allocation (globals / static locals) is almost always safer than malloc — no fragmentation, no collision risk, and the linker will tell you at build time if you’ve exceeded SRAM.
Pitfall 4: Setting _Min_Stack_Size Too Small
/* STM32CubeIDE defaults */
_Min_Heap_Size = 0x200; /* 512 B */
_Min_Stack_Size = 0x400; /* 1 KB */
This reserves at least 1 KB for the stack, but it does not prevent deeper stack usage. The HAL calls ISR handlers which add stack frames of their own. A safe rule of thumb for small STM32F4 projects: raise _Min_Stack_Size to 0x800 (2 KB).
Summary
| Concept | Key Point |
|---|---|
| .text / .rodata | Code and const data. Flash only. Read-only at runtime |
| .data | Initialized globals. Flash (init copy) + SRAM (runtime). Costs both |
| .bss | Zero-initialized globals. SRAM only. Flash-efficient |
| LMA / VMA | .data exists in Flash (LMA) and SRAM (VMA) simultaneously |
| startup.s | Copies .data, zeroes .bss, calls SystemInit, then main |
| arm-none-eabi-size | text+data = Flash usage; data+bss = static SRAM usage |
| Map file | Per-module, per-symbol layout and size |
| Stack overflow | Stack grows down, silently overwrites .bss |
| Missing const | Without const, tables go into .data — wasting Flash and SRAM |
Once you can read a linker script, the foundation of why embedded code works becomes visible all the way down to Flash and RAM.
Why must DMA buffers be global? Why is zero-initialization guaranteed for globals? Where exactly should you add const? Every one of these questions connects to section layout and the linker script.
Building the habit of checking the map file on every non-trivial firmware change will save you hours of debugging when you finally hit a memory constraint.
Next up: Episode 12 — Optimization and Assembly (Trust the Compiler, But Understand It). What changes with -O2? Inlining, loop unrolling, the impact of volatile on optimization, and a look at the disassembly with arm-none-eabi-objdump.
Next Episode
🚀 Episode 12: Optimization and Assembly — Trust the Compiler, But Understand It
What -O2 optimization actually does. Inlining, loop unrolling, and how volatile interacts with the optimizer. Using arm-none-eabi-objdump to disassemble and read the compiler's decisions.
FAQ
Q. Is zero-initialization of global variables truly guaranteed?
Yes. The C standard (C99 §6.7.9) requires zero-initialization for all globals and static variables without explicit initializers. This is implemented by startup.s zeroing the .bss section. Memory allocated with malloc is not initialized — use calloc if you need zeroed memory.
Q. What happens when Flash is full?
The linker fails the build with an error like region 'FLASH' overflowed by X bytes. It is a build-time error, not a runtime failure — you’ll know immediately.
Q. What happens when SRAM is full?
Static overflow (too many globals/statics) is also a linker error (region 'RAM' overflowed). Stack and heap overflow at runtime are not caught by the linker. They manifest as hard faults or silent data corruption — which is why pitfall 1 is so dangerous.
Q. Can I place a large array directly in Flash?
Yes — add const to a global variable and it goes into .rodata in Flash. Note that Flash has a limited write-endurance (typically 100k–1M erase cycles), so it’s fine for read-only data but not for runtime-modifiable values. Frequently-accessed large tables in Flash may also cause cache misses, so hot data is sometimes copied to SRAM first.
Q. Do I ever need to edit the linker script myself?
Rarely, for most projects. You will need to if you: add external SRAM or QSPI Flash, place specific code in TCM (Tightly Coupled Memory) for maximum speed, or need to co-exist with a bootloader that occupies part of Flash.