Skip to content

[mimalloc] Only use __builtin_thread_pointer() in multithreaded builds#26747

Merged
juj merged 1 commit intoemscripten-core:mainfrom
kleisauke:mimalloc-issue-26742
Apr 22, 2026
Merged

[mimalloc] Only use __builtin_thread_pointer() in multithreaded builds#26747
juj merged 1 commit intoemscripten-core:mainfrom
kleisauke:mimalloc-issue-26742

Conversation

@kleisauke
Copy link
Copy Markdown
Collaborator

In single-threaded builds, __builtin_thread_pointer() may return 0, breaking mimalloc's non-null thread pointer assumption and causing a segfault with -sSAFE_HEAP. Only use it when threading is enabled.

Fixes: #26742.

In single-threaded builds, `__builtin_thread_pointer()` may return 0,
breaking mimalloc's non-null thread pointer assumption and causing a
segfault with `-sSAFE_HEAP`. Only use it when threading is enabled.

Fixes: emscripten-core#26742.
Copy link
Copy Markdown
Collaborator

@juj juj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@juj juj merged commit 736b63f into emscripten-core:main Apr 22, 2026
29 checks passed
@juj
Copy link
Copy Markdown
Collaborator

juj commented Apr 22, 2026

Btw, does mimalloc compile on top of emmalloc? I see there are emmalloc specific fields

    # build emmalloc as only a system allocator, without exporting itself onto
    # malloc/free in the global scope
    '-DEMMALLOC_NO_STD_EXPORTS',

in the config fields for mimalloc?

@sbc100
Copy link
Copy Markdown
Collaborator

sbc100 commented Apr 22, 2026

I think we should probably fix __builtin_thread_pointer() in this case. I think it should probably always return a pointer to the TLS region, even when there is zero bytes of TLS data and even in single threaded builds.

@sbc100
Copy link
Copy Markdown
Collaborator

sbc100 commented Apr 22, 2026

Can you add the failing test to the list of core2s tests that we run in .circleci/config.yml?

Or better still add a SAFE_HEAP variant to on the tests in test_other.py that reproduces this issue?

@kleisauke
Copy link
Copy Markdown
Collaborator Author

Btw, does mimalloc compile on top of emmalloc?

mimalloc is built on top of emmalloc. There are some design notes here:

// Design
// ======
//
// mimalloc is built on top of emmalloc. emmalloc is a minimal allocator on top
// of sbrk. The reason for having three layers here is that we want mimalloc to
// be able to allocate and release system memory properly, the same way it would
// when using VirtualAlloc on Windows or mmap on POSIX, and sbrk is too limited.
// Specifically, sbrk can only go up and down, and not "skip" over regions, and
// so we end up either never freeing memory to the system, or we can get stuck
// with holes.
//
// Atm wasm generally does *not* free memory back the system: once grown, we do
// not shrink back down (https://github.com/WebAssembly/design/issues/1397).
// However, that is expected to improve
// (https://github.com/WebAssembly/memory-control/blob/main/proposals/memory-control/Overview.md)
// and so we do not want to bake those limitations in here.
//
// Even without that issue, we want our system allocator to handle holes, that
// is, it should merge freed regions and allow allocating new content there of
// the full size, etc., so that we do not waste space. That means that the
// system allocator really does need to handle the general problem of allocating
// and freeing variable-sized chunks of memory in a random order, like malloc/
// free do. And so it makes sense to layer mimalloc on top of such an
// implementation.
//
// emmalloc makes sense for the lower level because it is small and simple while
// still fully handling merging of holes etc. It is not the most efficient
// allocator, but our assumption is that mimalloc needs to be fast while the
// system allocator underneath it is called much less frequently.
//

I think we should probably fix __builtin_thread_pointer() in this case.

I agree. Would it be worth adding a TODO comment in a follow-up PR? Also, feel free to revert this PR once __builtin_thread_pointer() is fixed in LLVM.

Can you add the failing test to the list of core2s tests that we run in .circleci/config.yml?

Or better still add a SAFE_HEAP variant to on the tests in test_other.py that reproduces this issue?

I could add the previously failing test to the CircleCI config, but I've just noticed that it doesn't reproduce in the core0s suite (which is somewhat unexpected). For anyone interested in digging deeper, the segfault occurs in __wasm_call_ctors() / mi_process_attach().

#if defined(__clang__)
#define mi_attr_constructor __attribute__((constructor(101)))
#define mi_attr_destructor __attribute__((destructor(101)))
#else
#define mi_attr_constructor __attribute__((constructor))
#define mi_attr_destructor __attribute__((destructor))
#endif
static void mi_attr_constructor mi_process_attach(void) {
_mi_auto_process_init();
}
static void mi_attr_destructor mi_process_detach(void) {
_mi_auto_process_done();
}

@sbc100
Copy link
Copy Markdown
Collaborator

sbc100 commented Apr 22, 2026

But we have a shortlist of core2s tests that we run in CI.. presumably you could just add to that list?

sbc100 added a commit to sbc100/llvm-project that referenced this pull request Apr 22, 2026
…ilds

Without this fix `__tls_base` can remain set to zero which leads
`__builtin_thread_pointer` to return NULL, which is should not.

See emscripten-core/emscripten#26747
@kleisauke
Copy link
Copy Markdown
Collaborator Author

Checking that __builtin_thread_pointer() is non-null in single-threaded builds is enough to cover this (as done in PR #26751).

FWIW, here's a minimal reproducer reduced by cvise:

Details

reduced.c:

#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>

typedef struct {
  _Atomic(uintptr_t) tid;
} mi_atomic_once_t;

// A first-class heap
typedef struct mi_heap_s mi_heap_t;

// A sub-process
typedef struct mi_subproc_s mi_subproc_t;

// A thread-local heap ("theap") owns a set of thread-local pages
typedef struct mi_theap_s mi_theap_t;

struct mi_heap_s {
  mi_subproc_t *subproc;
};

struct mi_subproc_s {
  _Atomic(int64_t) purge_expire; // REMOVE THIS TO ALSO SEGFAULT ON -g -sSAFE_HEAP=1
  _Atomic(mi_heap_t *) heap_main;
};

struct mi_theap_s {
  _Atomic(mi_heap_t *) heap;
};

static mi_subproc_t subproc_main = { 0 };

mi_heap_t heap_main = { 0 };

mi_theap_t theap_main = { &heap_main };

// the theap belonging to the main heap
__thread mi_theap_t *__mi_theap_main = NULL;

bool _mi_atomic_once_enter(mi_atomic_once_t *once) {
  const uintptr_t current_tid = (uintptr_t)__builtin_thread_pointer();
  if (once->tid == current_tid) {
    return false;
  }

  return true;
}

#define mi_atomic_do_once \
  static mi_atomic_once_t _mi_once = { 0 }; \
  for (bool _mi_exec = _mi_atomic_once_enter(&_mi_once); _mi_exec; _mi_exec = false)

static void mi_heap_main_init(void) {
  heap_main.subproc = &subproc_main;
}

static void mi_process_init_once(void) {
  mi_heap_main_init();
}

void mi_process_init(void) {
  mi_atomic_do_once {
    mi_process_init_once();
  }
}

void _mi_theap_default_set(mi_theap_t *theap) {
  if (theap->heap->subproc->heap_main)
    __mi_theap_main = theap;
}

static void __attribute__((constructor(101))) mi_process_attach(void) {
  mi_process_init();
  _mi_theap_default_set(&theap_main);
}

Works without any issues:

$ emcc -O2 -sSAFE_HEAP=1 -pthread reduced.c && node a.out.js
$ emcc -g -sSAFE_HEAP=1 reduced.c && node a.out.js

Segfault:

$ emcc -O2 -sSAFE_HEAP=1 reduced.c && node a.out.js
...
RuntimeError: Aborted(segmentation fault). Build with -sASSERTIONS for more info.

The original segfault happend here:

if (_mi_is_heap_main(_mi_theap_heap(theap))) {
__mi_theap_main = theap;
}

@kleisauke kleisauke deleted the mimalloc-issue-26742 branch April 23, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

core2s.test_wrap_malloc_mimalloc failure

3 participants