fix(stm32wl): Harden LittleFS flash driver and FS layer for reliable prefs persistence#10126
Draft
ndoo wants to merge 8 commits intomeshtastic:developfrom
Draft
fix(stm32wl): Harden LittleFS flash driver and FS layer for reliable prefs persistence#10126ndoo wants to merge 8 commits intomeshtastic:developfrom
ndoo wants to merge 8 commits intomeshtastic:developfrom
Conversation
Contributor
Author
|
Please tag as AI. |
Contributor
|
maybe relevant: #6895 (my attempt at that issue, not only for stm) |
Contributor
Author
probably related but may have been not a fix for stm32 since safefile was broken |
STM32_LittleFS already implements rmdir_r() which delegates to lfs_remove(). STM32WL was excluded from all branches in rmDir(), causing directory cleanup to silently do nothing. Extend the existing NRF52 branch to cover STM32WL. Signed-off-by: Andrew Yong <me@ndoo.sg>
LittleFS::begin() already formats on mount failure. Extend the existing NRF52 format-on-write-retry path to STM32WL so a failed save triggers a format + remount before the second attempt, matching the NRF52 recovery semantics. Signed-off-by: Andrew Yong <me@ndoo.sg>
The non-ESP32 path in renameFile() does copyFile + remove, costing two full block write+erase cycles per rename. STM32_LittleFS::rename() delegates to lfs_rename() which is an atomic metadata-only operation with no extra flash wear. SafeFile uses renameFile() on every atomic close, so this directly reduces write amplification on the 28KB STM32WL filesystem. Signed-off-by: Andrew Yong <me@ndoo.sg>
_internal_flash_prog already checks HAL_FLASH_Unlock() and returns LFS_ERR_IO on failure. _internal_flash_erase discarded the return value, proceeding to erase even if the flash was not unlocked. Apply the same check for consistency and safety. Signed-off-by: Andrew Yong <me@ndoo.sg>
Previously the programming loop continued to the next doubleword after HAL_FLASH_Program() failed, potentially writing to invalid addresses and returning a misleading error code only at the end (last iteration). HAL_FLASH_Lock() was also skipped on the mid-loop early return path. - Move bounds check before the loop (validate full range at once) - Break on first HAL error so subsequent doublewords are not written - Move HAL_FLASH_Lock() after the loop so it always runs Signed-off-by: Andrew Yong <me@ndoo.sg>
Stale error flags in FLASH->SR from a previous failed operation can cause HAL_FLASH_Program() or HAL_FLASHEx_Erase() to return HAL_ERROR immediately without attempting the operation. Add __HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_ALL_ERRORS) after each HAL_FLASH_Unlock() in both _internal_flash_prog and _internal_flash_erase to ensure a clean state before each operation. Signed-off-by: Andrew Yong <me@ndoo.sg>
The STM32WL HAL minimum write unit is one 64-bit doubleword (8 bytes). _internal_flash_prog silently truncated any trailing bytes when size % 8 != 0 because dw_count = size / 8 drops the remainder. Return LFS_ERR_INVAL early so LittleFS sees the error rather than a silent short write. Signed-off-by: Andrew Yong <me@ndoo.sg>
backup.proto requires 2 LittleFS blocks (4 KB) and is only written on an explicit admin command. On a flash-constrained node with 12 usable blocks, this is headroom better preserved for normal operation. Guard the three AdminModule case handlers and NodeDB declarations/implementations behind #if !MESHTASTIC_EXCLUDE_BACKUP, matching the pattern used by existing MESHTASTIC_EXCLUDE_* flags throughout the codebase. Signed-off-by: Andrew Yong <me@ndoo.sg>
75e8379 to
feee3fd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes persistent configuration loss on STM32WL devices (fixes #9704) by hardening the LittleFS flash driver and correcting several missing or incorrect FS layer behaviours introduced when LittleFS support was first landed in #5987.
Changes are grouped into three layers:
1. Flash driver robustness (
src/platform/stm32wl/LittleFS.cpp)The original
_internal_flash_progand_internal_flash_eraseimplementations had four correctness issues that could silently corrupt or lose data:_internal_flash_erasediscardedHAL_FLASH_Unlock()return, proceeding into erase even if flash remained lockedLFS_ERR_IOon failure_internal_flash_progloop continued writing afterHAL_FLASH_Program()failed, touching subsequent addresses;HAL_FLASH_Lock()was also skipped on the early-return error pathHAL_FLASH_Lock()after the loopFLASH->SRfrom a previous failed operation cause the HAL to returnHAL_ERRORimmediately on the next operation without attempting it__HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_ALL_ERRORS)after each successfulHAL_FLASH_Unlock()dw_count = size / 8silently truncated trailing bytes whensize % 8 != 0; LittleFS never saw the short writeLFS_ERR_INVALearly ifsize % 8 != 0Also moved the bounds check before the write loop (validate the full range before touching flash) rather than checking per-doubleword mid-loop.
2. FS layer completeness (
src/FSCommon.cpp,src/mesh/NodeDB.cpp)Three higher-level FS operations were missing STM32WL coverage:
rmDir()— STM32WL fell through all branches silently. Extended the existingARCH_NRF52branch toARCH_NRF52 || ARCH_STM32WL;STM32_LittleFS::rmdir_r()delegates tolfs_remove().saveToDisk()format-on-retry — The NRF52 recovery path (format + remount on a failed save, then retry) was not extended to STM32WL. Extended the#if defined(ARCH_NRF52)guard to includeARCH_STM32WL, matching existing NRF52 resilience semantics.renameFile()— atomic SafeFile close — The non-ESP32 path implemented rename ascopyFile + remove, costing two full block write+erase cycles perSafeFile::close()and, crucially, opening the destination withFILE_O_WRITEwithout truncation — the root cause of stale-byte corruption identified in #9927. Extended theARCH_ESP32branch toARCH_ESP32 || ARCH_STM32WLso thatSTM32_LittleFS::rename()→lfs_rename()is used instead: a metadata-only atomic operation with no extra flash wear and no truncation concern.3. Flash headroom (
variants/stm32/stm32.ini,src/mesh/NodeDB.cpp,src/mesh/NodeDB.h,src/modules/AdminModule.cpp)backup.protorequires two LittleFS blocks (4 KB out of ~28 KB usable on the STM32WL filesystem). Backup is only written on an explicit admin command and adds no value on a flash-constrained node. AddedMESHTASTIC_EXCLUDE_BACKUP=1tostm32.iniand gated the threeAdminModulecase handlers andNodeDBdeclarations/implementations behind#if !MESHTASTIC_EXCLUDE_BACKUP, matching the pattern used by existingMESHTASTIC_EXCLUDE_*flags throughout the codebase.Relationship to open PRs
vs #9927 —
fix(STM32WL) fix LittleFS FILE_O_WRITE truncation for persistent prefs#9927 adds
LFS_O_TRUNCto theFILE_O_WRITEopen flags in the STM32WL backend to prevent stale bytes from a previous write remaining after a shorter protobuf is written over a larger one.This PR addresses the same symptom through a more complete fix: by routing
renameFile()throughlfs_rename()for STM32WL,SafeFilewrites to a.tmpfile and atomically replaces the target via a metadata-only rename — the destination is always either the complete old version or the complete new version. TheFILE_O_WRITEtruncation issue only matters if stale bytes can survive in the final file; with an atomic rename the final file is the temp file's inode, so truncation of the destination never occurs. This PR's approach is a superset of #9927's fix and supersedes it.vs #10072 —
Remove copyFile/renameFile reimplementations, adjust STM32 SafeFile to match NRF52#10072 removes the
copyFile/renameFilereimplementations entirely for all non-ESP32 platforms (also superseding #9927). TherenameFilefix in this PR overlaps with #10072.If #10072 lands first, only the
renameFilehunk inFSCommon.cppwould need to be dropped; the remaining changes (flash driver hardening,rmDir, format-on-retry,EXCLUDE_BACKUP) are fully complementary and not covered by #10072.Context
The original STM32WL LittleFS implementation was contributed in #5987 and has served as the foundation, but several edge cases in the flash HAL wrapper and FS-layer integration were not discovered until devices accumulated more real-world usage. The bugs in
_internal_flash_prog(continuing past errors, skipping lock on error path) and the missingHAL_FLASH_Unlock()check in_internal_flash_eraseare the most likely root causes of flash corruption that manifest as settings reverting to defaults after reboot.Testing
Tested on Seeed Wio-E5 (STM32WLE5JC) mini dev board, RAK3172 (original variant + Russell) and Milesight GS301 (where the fixes helped to unbrick broken storage, likely due to triggering a format due to the corrupted flash - can’t confirm due to lack of logging). Configuration persistence verified across reboots with multiple setting changes (channel config, device config, module config).
Changes to
FSCommon.cppandNodeDB.cppthat aren't already guarded byARCH_STM32WLfollow the exact pattern of existingARCH_NRF52code paths, which have been in production for some time. Non-STM32WL builds are unaffected.Hardware help welcome for regression testing on other STM32WL variants (CDEBYTE E77, RAK3172, etc.).
🤝 Attestations