Skip to content

fix(stm32wl): Harden LittleFS flash driver and FS layer for reliable prefs persistence#10126

Draft
ndoo wants to merge 8 commits intomeshtastic:developfrom
mesh-malaysia:stm32-littlefs
Draft

fix(stm32wl): Harden LittleFS flash driver and FS layer for reliable prefs persistence#10126
ndoo wants to merge 8 commits intomeshtastic:developfrom
mesh-malaysia:stm32-littlefs

Conversation

@ndoo
Copy link
Copy Markdown
Contributor

@ndoo ndoo commented Apr 10, 2026

Summary

This PR fixes persistent configuration loss on STM32WL devices (fixes #9704) by hardening the LittleFS flash driver and correcting several missing or incorrect FS layer behaviours introduced when LittleFS support was first landed in #5987.

Changes are grouped into three layers:


1. Flash driver robustness (src/platform/stm32wl/LittleFS.cpp)

The original _internal_flash_prog and _internal_flash_erase implementations had four correctness issues that could silently corrupt or lose data:

Issue Fix
_internal_flash_erase discarded HAL_FLASH_Unlock() return, proceeding into erase even if flash remained locked Return LFS_ERR_IO on failure
_internal_flash_prog loop continued writing after HAL_FLASH_Program() failed, touching subsequent addresses; HAL_FLASH_Lock() was also skipped on the early-return error path Break on first failure; always HAL_FLASH_Lock() after the loop
Stale error flags in FLASH->SR from a previous failed operation cause the HAL to return HAL_ERROR immediately on the next operation without attempting it Call __HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_ALL_ERRORS) after each successful HAL_FLASH_Unlock()
The STM32WL HAL minimum write unit is one 64-bit doubleword (8 bytes). dw_count = size / 8 silently truncated trailing bytes when size % 8 != 0; LittleFS never saw the short write Return LFS_ERR_INVAL early if size % 8 != 0

Also moved the bounds check before the write loop (validate the full range before touching flash) rather than checking per-doubleword mid-loop.


2. FS layer completeness (src/FSCommon.cpp, src/mesh/NodeDB.cpp)

Three higher-level FS operations were missing STM32WL coverage:

rmDir() — STM32WL fell through all branches silently. Extended the existing ARCH_NRF52 branch to ARCH_NRF52 || ARCH_STM32WL; STM32_LittleFS::rmdir_r() delegates to lfs_remove().

saveToDisk() format-on-retry — The NRF52 recovery path (format + remount on a failed save, then retry) was not extended to STM32WL. Extended the #if defined(ARCH_NRF52) guard to include ARCH_STM32WL, matching existing NRF52 resilience semantics.

renameFile() — atomic SafeFile close — The non-ESP32 path implemented rename as copyFile + remove, costing two full block write+erase cycles per SafeFile::close() and, crucially, opening the destination with FILE_O_WRITE without truncation — the root cause of stale-byte corruption identified in #9927. Extended the ARCH_ESP32 branch to ARCH_ESP32 || ARCH_STM32WL so that STM32_LittleFS::rename()lfs_rename() is used instead: a metadata-only atomic operation with no extra flash wear and no truncation concern.


3. Flash headroom (variants/stm32/stm32.ini, src/mesh/NodeDB.cpp, src/mesh/NodeDB.h, src/modules/AdminModule.cpp)

backup.proto requires two LittleFS blocks (4 KB out of ~28 KB usable on the STM32WL filesystem). Backup is only written on an explicit admin command and adds no value on a flash-constrained node. Added MESHTASTIC_EXCLUDE_BACKUP=1 to stm32.ini and gated the three AdminModule case handlers and NodeDB declarations/implementations behind #if !MESHTASTIC_EXCLUDE_BACKUP, matching the pattern used by existing MESHTASTIC_EXCLUDE_* flags throughout the codebase.


Relationship to open PRs

vs #9927fix(STM32WL) fix LittleFS FILE_O_WRITE truncation for persistent prefs

#9927 adds LFS_O_TRUNC to the FILE_O_WRITE open flags in the STM32WL backend to prevent stale bytes from a previous write remaining after a shorter protobuf is written over a larger one.

This PR addresses the same symptom through a more complete fix: by routing renameFile() through lfs_rename() for STM32WL, SafeFile writes to a .tmp file and atomically replaces the target via a metadata-only rename — the destination is always either the complete old version or the complete new version. The FILE_O_WRITE truncation issue only matters if stale bytes can survive in the final file; with an atomic rename the final file is the temp file's inode, so truncation of the destination never occurs. This PR's approach is a superset of #9927's fix and supersedes it.

vs #10072Remove copyFile/renameFile reimplementations, adjust STM32 SafeFile to match NRF52

#10072 removes the copyFile/renameFile reimplementations entirely for all non-ESP32 platforms (also superseding #9927). The renameFile fix in this PR overlaps with #10072.

If #10072 lands first, only the renameFile hunk in FSCommon.cpp would need to be dropped; the remaining changes (flash driver hardening, rmDir, format-on-retry, EXCLUDE_BACKUP) are fully complementary and not covered by #10072.


Context

The original STM32WL LittleFS implementation was contributed in #5987 and has served as the foundation, but several edge cases in the flash HAL wrapper and FS-layer integration were not discovered until devices accumulated more real-world usage. The bugs in _internal_flash_prog (continuing past errors, skipping lock on error path) and the missing HAL_FLASH_Unlock() check in _internal_flash_erase are the most likely root causes of flash corruption that manifest as settings reverting to defaults after reboot.


Testing

Tested on Seeed Wio-E5 (STM32WLE5JC) mini dev board, RAK3172 (original variant + Russell) and Milesight GS301 (where the fixes helped to unbrick broken storage, likely due to triggering a format due to the corrupted flash - can’t confirm due to lack of logging). Configuration persistence verified across reboots with multiple setting changes (channel config, device config, module config).

Changes to FSCommon.cpp and NodeDB.cpp that aren't already guarded by ARCH_STM32WL follow the exact pattern of existing ARCH_NRF52 code paths, which have been in production for some time. Non-STM32WL builds are unaffected.

Hardware help welcome for regression testing on other STM32WL variants (CDEBYTE E77, RAK3172, etc.).


🤝 Attestations

  • I have tested that my proposed changes behave as described.
  • I have tested that my proposed changes do not cause any obvious regressions on the following devices:
    • Heltec (Lora32) V3
    • LilyGo T-Deck
    • LilyGo T-Beam
    • RAK WisBlock 4631
    • Seeed Studio T-1000E tracker card
    • Other:
      • Seeed Wio-E5 (STM32WLE5JC) - mini dev board
      • RAK3172
      • Milesight GS301 (toilet sensor)

@github-actions github-actions bot added needs-review Needs human review bugfix Pull request that fixes bugs labels Apr 10, 2026
@ndoo
Copy link
Copy Markdown
Contributor Author

ndoo commented Apr 10, 2026

Please tag as AI.

@thebentern thebentern added the ai-generated Possible AI-generated low-quality content label Apr 10, 2026
@Komzpa
Copy link
Copy Markdown
Contributor

Komzpa commented Apr 11, 2026

maybe relevant: #6895 (my attempt at that issue, not only for stm)

@ndoo
Copy link
Copy Markdown
Contributor Author

ndoo commented Apr 12, 2026

maybe relevant: #6895 (my attempt at that issue, not only for stm)

probably related but may have been not a fix for stm32 since safefile was broken

ndoo added 8 commits April 14, 2026 16:58
STM32_LittleFS already implements rmdir_r() which delegates to
lfs_remove(). STM32WL was excluded from all branches in rmDir(),
causing directory cleanup to silently do nothing.

Extend the existing NRF52 branch to cover STM32WL.

Signed-off-by: Andrew Yong <me@ndoo.sg>
LittleFS::begin() already formats on mount failure. Extend the
existing NRF52 format-on-write-retry path to STM32WL so a failed
save triggers a format + remount before the second attempt, matching
the NRF52 recovery semantics.

Signed-off-by: Andrew Yong <me@ndoo.sg>
The non-ESP32 path in renameFile() does copyFile + remove, costing
two full block write+erase cycles per rename. STM32_LittleFS::rename()
delegates to lfs_rename() which is an atomic metadata-only operation
with no extra flash wear.

SafeFile uses renameFile() on every atomic close, so this directly
reduces write amplification on the 28KB STM32WL filesystem.

Signed-off-by: Andrew Yong <me@ndoo.sg>
_internal_flash_prog already checks HAL_FLASH_Unlock() and returns
LFS_ERR_IO on failure. _internal_flash_erase discarded the return
value, proceeding to erase even if the flash was not unlocked.

Apply the same check for consistency and safety.

Signed-off-by: Andrew Yong <me@ndoo.sg>
Previously the programming loop continued to the next doubleword after
HAL_FLASH_Program() failed, potentially writing to invalid addresses
and returning a misleading error code only at the end (last iteration).
HAL_FLASH_Lock() was also skipped on the mid-loop early return path.

- Move bounds check before the loop (validate full range at once)
- Break on first HAL error so subsequent doublewords are not written
- Move HAL_FLASH_Lock() after the loop so it always runs

Signed-off-by: Andrew Yong <me@ndoo.sg>
Stale error flags in FLASH->SR from a previous failed operation can
cause HAL_FLASH_Program() or HAL_FLASHEx_Erase() to return HAL_ERROR
immediately without attempting the operation.

Add __HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_ALL_ERRORS) after each
HAL_FLASH_Unlock() in both _internal_flash_prog and
_internal_flash_erase to ensure a clean state before each operation.

Signed-off-by: Andrew Yong <me@ndoo.sg>
The STM32WL HAL minimum write unit is one 64-bit doubleword (8 bytes).
_internal_flash_prog silently truncated any trailing bytes when size % 8
!= 0 because dw_count = size / 8 drops the remainder. Return LFS_ERR_INVAL
early so LittleFS sees the error rather than a silent short write.

Signed-off-by: Andrew Yong <me@ndoo.sg>
backup.proto requires 2 LittleFS blocks (4 KB) and is only written on an
explicit admin command. On a flash-constrained node with 12 usable blocks,
this is headroom better preserved for normal operation. Guard the three
AdminModule case handlers and NodeDB declarations/implementations behind
#if !MESHTASTIC_EXCLUDE_BACKUP, matching the pattern used by existing
MESHTASTIC_EXCLUDE_* flags throughout the codebase.

Signed-off-by: Andrew Yong <me@ndoo.sg>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-generated Possible AI-generated low-quality content bugfix Pull request that fixes bugs needs-review Needs human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: STM32/E5 settings not saved

3 participants