Fix cs_loader process crash on missing TPA directory (#789)#831
Open
G00dS0ul wants to merge 7 commits into
Open
Fix cs_loader process crash on missing TPA directory (#789)#831G00dS0ul wants to merge 7 commits into
G00dS0ul wants to merge 7 commits into
Conversation
…a-crash-guard-clean
…rd (metacall#789) With the TPA crash guard in place, loading an invalid C# configuration now fails gracefully (non-zero) instead of aborting the process. Enable viferga's invalid-configuration test and update its assertion from ASSERT_EQ to ASSERT_NE to express that corrected behavior.
Point the install-tree DOTNET_CORE_LOADER_ASSEMBLY_PATH at
${CMAKE_INSTALL_PREFIX}/${INSTALL_LIB}/CSLoader.dll instead of the
build-output dir, which the slim cli/runtime images delete. Mirrors the
py_loader dev/install split. Complements the crash guard so C# is
functional in the packaged image, not just non-fatal on missing config.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix cs_loader process crash on missing TPA directory (#789)
Summary
Fixes the hard process crash (
std::terminate, exit 134) from #789 when the C# loader is initialized with a TPA directory that doesn't exist — e.g. in the slim/cli image where the build tree has been removed.This PR addresses issue (1) of two: the error handling. A bad configuration should never abort the whole process — it should degrade gracefully. The underlying root cause (issue (2): the loader assembly path being baked to the build-output directory) is intentionally left for a follow-up PR, so the two concerns stay separate and the real packaging bug isn't hidden behind the now-graceful skip.
Root cause
netcore_linux::AddFilesFromDirectoryToTpaListbuilt the TPA list using the throwing form of the iterator:If
directoryis missing/unreadable, thedirectory_iteratorconstructor throwsstd::filesystem::filesystem_error. Nothing on the C++ host-bootstrap path catches it, so it propagates out and the process callsstd::terminate.There was also a latent
size_tunderflow:path.compare(path.length() - 4, 4, ".dll")underflows for any entry name shorter than 4 characters.What this PR changes
In
source/loaders/cs_loader/include/cs_loader/netcore_linux.h:std::error_codeoverload ofdirectory_iterator, so a missing/unreadable directory is skipped gracefully instead of crashing:path.length() >= 4guard to fix the underflow.fsalias (resolved via__has_includetostd::filesystemorstd::experimental::filesystem), so no new includes or namespace changes are introduced.What this PR deliberately does NOT do
This is two issues in reality, and this PR is only the first:
${PROJECT_OUTPUT_DIR}/CSLoader.dll) in both the development and install blocks ofcs_loader/CMakeLists.txt, so the installedcs_loader.jsonpoints at a directory the slim image deletes. This will be fixed in a separate PR, repointing the install config at the installedCSLoader.dlllocation (mirroring the dev/install splitpy_loaderalready uses).Keeping them separate avoids hiding (2) behind the graceful skip introduced by (1).
Reproducing #789
The crash is on the C++ host-bootstrap path:
It is not exercised by the managed xUnit suite (
cs_loader_test, which runsdotnet testand never boots the native host). The C++ integration testsmetacall-cs-test/metacall-csharp-static-class-testreproduce it.Force the missing-directory condition (the same situation the slim/cli image creates by deleting the build tree) by pointing
dotnet_loader_assembly_pathat a path that doesn't exist:Before the fix (develop)
metacall-cs-test,metacall-csharp-static-class-testpass (exit 0).C++ exception with description "filesystem error: directory iterator cannot open directory: No such file or directory [/tmp/metacall-missing]" thrown in the test body.GTEST_CATCH_EXCEPTIONS=0, i.e. real production behavior where nothing wraps the call) — the throw is uncaught and the process aborts:cs_loader_test(managed) — passes regardless; confirms it never touches the native init path.This is exactly the cs_loader: filesystem_error crash when LOADER_SCRIPT_PATH directory does not exist #789 crash chain.
After the fix (branch
fix/cs-loader-tpa-crash-guard-clean, verified Jun 25)Re-running the exact same steps on the patched branch:
metacall-csharp-static-class-testpasses (exit 0): no regression on the normal path.GTEST_CATCH_EXCEPTIONS=0— the process no longer aborts. There is noterminate called …, no__cxa_throw -> std::terminate -> abort, and noAborted (Signal sent by tkill()). The loader fails gracefully instead:The test still fails — correctly, because the config genuinely points at a directory that doesn't exist, so
CSLoader.dllcan't be found and initialization returnsNULL. But the host stays alive and returns a clean error code instead of crashing. That is the fault-tolerance fix: a bad config degrades gracefully instead of taking down the whole process.Testing
cmake --build .— full build succeeds.ctest -R cs_loader_test --output-on-failure— passes (1/1): the guard doesn't regress normal assembly loading / C# execution.metacall-cs-test/metacall-csharp-static-class-test— pass on a valid config; with a missing-directory config they no longer abort the process (see reproduction above).Refs