Skip to content

Fix cs_loader process crash on missing TPA directory (#789)#831

Open
G00dS0ul wants to merge 7 commits into
metacall:developfrom
G00dS0ul:fix/cs-loader-tpa-crash-guard-clean
Open

Fix cs_loader process crash on missing TPA directory (#789)#831
G00dS0ul wants to merge 7 commits into
metacall:developfrom
G00dS0ul:fix/cs-loader-tpa-crash-guard-clean

Conversation

@G00dS0ul

@G00dS0ul G00dS0ul commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Fix cs_loader process crash on missing TPA directory (#789)

Summary

Fixes the hard process crash (std::terminate, exit 134) from #789 when the C# loader is initialized with a TPA directory that doesn't exist — e.g. in the slim/cli image where the build tree has been removed.
This PR addresses issue (1) of two: the error handling. A bad configuration should never abort the whole process — it should degrade gracefully. The underlying root cause (issue (2): the loader assembly path being baked to the build-output directory) is intentionally left for a follow-up PR, so the two concerns stay separate and the real packaging bug isn't hidden behind the now-graceful skip.

Root cause

netcore_linux::AddFilesFromDirectoryToTpaList built the TPA list using the throwing form of the iterator:

for (auto &dirent : fs::directory_iterator(directory)) { ... }

If directory is missing/unreadable, the directory_iterator constructor throws std::filesystem::filesystem_error. Nothing on the C++ host-bootstrap path catches it, so it propagates out and the process calls std::terminate.
There was also a latent size_t underflow: path.compare(path.length() - 4, 4, ".dll") underflows for any entry name shorter than 4 characters.

What this PR changes

In source/loaders/cs_loader/include/cs_loader/netcore_linux.h:

  • Use the non-throwing std::error_code overload of directory_iterator, so a missing/unreadable directory is skipped gracefully instead of crashing:
void AddFilesFromDirectoryToTpaList(std::string directory, std::string &tpaList)
{
    // A bad/missing config directory must not abort the process.
    // Use the error_code overload so it can't throw an uncaught filesystem_error.
    std::error_code ec;
    fs::directory_iterator it(directory, ec);
    if (ec)
    {
        // (optional) log a warning here once logging is confirmed reachable
        return; // bad config dir -> skip gracefully instead of std::terminate
    }
​
    for (const auto &dirent : it)
    {
        std::string path = dirent.path();
​
        // length guard avoids size_t underflow for names shorter than 4 chars
        if (path.length() >= 4 && path.compare(path.length() - 4, 4, ".dll") == 0)
        {
            tpaList.append(path + ":");
        }
    }
}
  • Adds a path.length() >= 4 guard to fix the underflow.
  • Reuses the file's existing fs alias (resolved via __has_include to std::filesystem or std::experimental::filesystem), so no new includes or namespace changes are introduced.

What this PR deliberately does NOT do

This is two issues in reality, and this PR is only the first:

  1. Bad error handling — fixed here (a bad config must not crash the process).
  2. Real error — the loader assembly path is configured to the build-output directory (${PROJECT_OUTPUT_DIR}/CSLoader.dll) in both the development and install blocks of cs_loader/CMakeLists.txt, so the installed cs_loader.json points at a directory the slim image deletes. This will be fixed in a separate PR, repointing the install config at the installed CSLoader.dll location (mirroring the dev/install split py_loader already uses).
    Keeping them separate avoids hiding (2) behind the graceful skip introduced by (1).

Reproducing #789

The crash is on the C++ host-bootstrap path:

metacall_load_from_file_ex -> loader_impl_load_from_file -> cs_loader_impl_initialize
  -> simple_netcore::start -> netcore_linux::start -> ConfigAssemblyName
  -> AddFilesFromDirectoryToTpaList -> directory_iterator (throws)

It is not exercised by the managed xUnit suite (cs_loader_test, which runs dotnet test and never boots the native host). The C++ integration tests metacall-cs-test / metacall-csharp-static-class-test reproduce it.
Force the missing-directory condition (the same situation the slim/cli image creates by deleting the build tree) by pointing dotnet_loader_assembly_path at a path that doesn't exist:

rm -rf /tmp/metacall-missing
find build -name cs_loader.json | while read f; do
  cp "$f" "$f.bak"
  sed -i 's#"dotnet_loader_assembly_path"[[:space:]]*:[[:space:]]*"[^"]*"#"dotnet_loader_assembly_path": "/tmp/metacall-missing/CSLoader.dll"#' "$f"
done

Before the fix (develop)

  • Valid configmetacall-cs-test, metacall-csharp-static-class-test pass (exit 0).
  • Missing dir, gtest catching ON — test FAILS with C++ exception with description "filesystem error: directory iterator cannot open directory: No such file or directory [/tmp/metacall-missing]" thrown in the test body.
  • Missing dir, gtest catching OFF (GTEST_CATCH_EXCEPTIONS=0, i.e. real production behavior where nothing wraps the call) — the throw is uncaught and the process aborts:
terminate called after throwing an instance of 'std::filesystem::__cxx11::filesystem_error'
  what():  filesystem error: directory iterator cannot open directory: No such file or directory [/tmp/metacall-missing]
...
#12  cs_loader_impl_initialize        (libcs_loader.so)
...
#6   __cxa_throw
#5   std::terminate()
#2   abort
Aborted (Signal sent by tkill())

After the fix (branch fix/cs-loader-tpa-crash-guard-clean, verified Jun 25)

Re-running the exact same steps on the patched branch:

  • Valid configmetacall-csharp-static-class-test passes (exit 0): no regression on the normal path.
  • Missing dir, GTEST_CATCH_EXCEPTIONS=0 — the process no longer aborts. There is no terminate called …, no __cxa_throw -> std::terminate -> abort, and no Aborted (Signal sent by tkill()). The loader fails gracefully instead:
Error: CreateDelegate status (0x80070002)        # 0x80070002 = ERROR_FILE_NOT_FOUND
Error: Loader (cs) returned NULL value on the initialization
...
1 FAILED TEST
exit=8

The test still fails — correctly, because the config genuinely points at a directory that doesn't exist, so CSLoader.dll can't be found and initialization returns NULL. But the host stays alive and returns a clean error code instead of crashing. That is the fault-tolerance fix: a bad config degrades gracefully instead of taking down the whole process.

Testing

  • cmake --build . — full build succeeds.
  • ctest -R cs_loader_test --output-on-failure — passes (1/1): the guard doesn't regress normal assembly loading / C# execution.
  • metacall-cs-test / metacall-csharp-static-class-test — pass on a valid config; with a missing-directory config they no longer abort the process (see reproduction above).

Refs

G00dS0ul and others added 7 commits June 25, 2026 15:33
…rd (metacall#789)

With the TPA crash guard in place, loading an invalid C# configuration now
fails gracefully (non-zero) instead of aborting the process. Enable viferga's
invalid-configuration test and update its assertion from ASSERT_EQ to ASSERT_NE
to express that corrected behavior.
Point the install-tree DOTNET_CORE_LOADER_ASSEMBLY_PATH at
${CMAKE_INSTALL_PREFIX}/${INSTALL_LIB}/CSLoader.dll instead of the
build-output dir, which the slim cli/runtime images delete. Mirrors the
py_loader dev/install split. Complements the crash guard so C# is
functional in the packaged image, not just non-fatal on missing config.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants