Make threads thread-safe#24439
Conversation
This fixes a race condition that was very occasionally causing dist/threads/t/free.t to fail. PL_veto_switch_non_tTHX_context was originally added as a global boolean-ish variable. It was temporarily set to true during a thread's freeing, to stop the thread's locale being switched to when that locale was in the middle of being freed. The variable was then restored to its previous value at the end of freeing the thread. Unfortunately this wasn't thread-safe: if multiple threads were being freed at the same time, the variable's value could get overwritten and/or restored to the wrong value. This commit changes the variable to be per-thread. This actually simplifies things: just set the thread's instance variable to true when freeing the thread, and you don't even need to restore the value to false later on. Given that there isn't always a one-to-one mapping between threads and interpreters, it's possible that there's some subtlety I've overlooked. But it seems to work and makes 'valgrind --tool=helgrind' happier.
Add some code comments to some functions and variables whose details confused me.
Add some code comments to join() and detach(), better explaining when the thread's interpreter is freed and when the thread itself is freed.
In S_ithread_create(), rename these variables:
rc_stack_size => setstack_err
rc_thread_create => create_err
They are return codes from function calls (hence the 'rc') but that
wasn't obvious to me at the time, and cognitively I was reading things
like
if (rc_stack_size) { ...}
as "if a non-default stack size has been specified" rather than
"if there was an error while setting the stack size".
No doubt someone else's cognitive biases will work differently from mine
and in 10 years time they will rename these vars back to rc_foo.
In the ithread_create() XS function, the variable 'thread' is used early on to refer to the parent thread, and later on to refer to the just-created thread. This is confusing. Add a separate, tightly scoped var - parent_thread - for the former purpose.
Currently a newly-created thread struct (not yet associated with a running OS thread) is added to the list of threads very early. This commit changes it so that it's added to the list after the thread is fully created (including creating the OS thread) but hasn't yet started executing (the underlying OS thread is still waiting on a lock to be released to indicate that it should start running). This change shouldn't make any practical difference, as currently the MY_POOL.create_destruct_mutex mutex is held throughout the creation process; so no other threads could see the not-fully-formed thread in the list of threads. But it will become useful during the next few commits which will try to reduce how long create_destruct_mutex is held for.
Previously, create_destruct_mutex was locked on the line before calling S_ithread_create(); that function would then be responsible for unlocking it later. This commit moves that locking to (almost) the first line of S_ithread_create() instead. Since that static function is only called from one place, this can be done safely without changing functionality. However, by having both the locking and unlocking done in the same function, it makes static analysers happier and doesn't require special hints like PERL_TSA_RELEASE() any more. And it also makes it easier for humans to understand the locking.
Previously, the XSUB ithread_create() called S_ithread_create(), which created the thread struct, including its mutex, and locked it early on. On return from S_ithread_create(), ithread_create() unlocked that mutex, allowing the child to start running. This commit moves that unlocking from just after where S_ithread_create() is called from, to being the last action of S_ithread_create() itself. This is virtually functionally the same, apart from a couple of lines of code in the parent which are now run *after* the unlock, but which are unaffected by whether the child has just started running. By having both the locking and unlocking done in the same function, it makes static analysers happier and doesn't require special hints like CLANG_DIAG_IGNORE_STMT(-Wthread-safety) any more.
There's a mutex, create_destruct_mutex, which is sort of global,
and which is currently held for the entirely of the duration of creating
a new perl thread (including the time-consuming cloning the parent's
interpreter).
This means that on a multi-processor system, only one new thread can be
created at a time, which is very slow.
I can see no good reason for holding this global lock for an extended
duration: all the cloning is just something done privately by the parent
thread copying stuff into a new interpreter which hasn't even been
allocated a new OS thread yet. Any other thread shouldn't care about
this and should be allowed to create its own child threads at the same
time.
So this commit modifies S_ithread_create() to only have
create_destruct_mutex held during the brief places where the global
thread pool state is modified.
Consider the following somewhat contrived code example:
use threads;
sub empty {}
sub f {
for (1..10000) {
threads->new(\&empty)->join();
}
}
for my $i (1..16) {
push @t, threads->new(\&f);
}
$_->join() for @t;
It creates 16 parallel jobs, where each job is just creating and
destroying a thread 10000 times.
On my 8 core laptop, before this commit it took 64 wallclock secs;
after, it takes 10 secs. A similar total CPU (in fact slightly more),
but now spread over all the CPUs in parallel.
As well as a performance boost, part of the motivation for this commit
is to rationalise and simplify when both a thread's mutex and the global
mutex are being held at the same time - currently helgrind is
complaining the ordering of acquiring both locks isn't consistent.
Subsequent commits will address this further.
Following this commit, the global(ish) create_destruct_mutex lock will now always be locked *before* a thread's mutex in situations where both need to be held. Previously this wasn't consistent, and deadlocks were possible. A new test file has been added, zz_deadlock.t, which demonstrates a reproducible (if somewhat contrived) deadlock. The biggest change in this commit is that a call to perl_destruct for a thread is no longer done with the thread's mutex held. This should be safe, as at that point the thread is no longer running. Other threads may still hold a pointer to the thread object, and they can still lock that to get the thread's status, but whether the thread's interpreter is being actively freed or not at that point wont affect that. The other change in this commit is a few tweaks to ordering in S_ithread_create(). First, the newly-created thread struct doesn't get its mutex locked now until we're about to actually create the OS thread - it doesn't need locking before then. Second, the new thread struct is added to the pool slightly earlier, just *before* the thread mutex has to be locked and held until the end. (It has to be held until the end because the lock is being used to stop the just-created OS thread from starting to run). The issue with perl_destruct() was that it would lock thread A's mutex, then free all its objects, which might include thread objects if that thread itself created other threads. If such a thread B was finished, then deleting the thread B object might cause thread B's refcount to reach zero and the thread struct would be removed from the pool, involving locking create_destruct_mutex. Thus we have a situation of thread A's mutex being locked and held and then the pool mutex being locked. Conversely, methods like threads->list() lock the pool, then traverse the list of of threads, briefly locking each thread in turn. This is an example of the opposite order. This is what zz_deadlock.t exercises. The new test file is prefixed 'zz' so that it's the last to run. So if it deadlocks and hangs, all the other test files at least get to run first.
Two variants of 'if (...) {' are selected via #ifdef; this commit
moves the '{' at the end of both lines outside of the #ifdef to ensure
balanced braces for text editors.
Clarify the order of locking.
Rename the static function S_ithread_free() to S_ithread_dec_free() to better indicate that its job is more like SvREFCNT_dec(): i.e. rather than unconditionally free the thread, decrement its ref count and only free if it reaches zero. Then add some more code comments explaining what's happening with the ref count at various places.
During the last stages of freeing a thread, its interpreter is freed. Sometimes my_perl is set to MY_POOL.main_thread.interp for the remaining few steps of thread destruction, to provide an interpreter for those bits of code which need it. This is on the basis that the main interpreter is freed last, so is always available. However, this is a deeply unsatisfactory arrangement. It means that for a short while both the being-free thread and the main thread share the same interpreter, and even if that doesn't cause any active harm, it makes tools like helgrind spit out false positives. In particular, it was getting upset about free2.t, since the main thread was updating PL_phase while the other thread was testing its value. This commit instead resets my_perl to NULL in places where it would otherwise point to a freed interpreter, and only narrowly and temporarily sets it to MY_POOL.main_thread.interp in one place where PerlMemShared_free() needs it. My rationale is that it is better to set my_perl to NULL and deal with any immediate and obvious SEGV fallout, than to set it to MY_POOL.main_thread.interp and silently suffer from rare race conditions.
This commit adds a harmless extra final lock and immediate unlock of
create_destruct_mutex just before the OS thread exits. This is to
workaround a false positive being reported by the
valgrind --tool=helgrind
thread-race detection tool.
See the code comments for the gory details.
Add some comments at the top of this test file explaining its purpose.
This commit adds a new field, pool, to the thread struct to provide an
alternative way to determine the pool which the thread is a part of.
This avoids a rare race condition where the usual lookup method via
PL_modglobal crashes because PL_modglobal has already been freed.
In more detail. There is a pool of threads created via the 'threads'
module. Typically there is only a single pool, but where the perl
interpreter is embedded within an application that creates its own
interpreters and/or threads, each interpreter's invocation of 'use
threads' creates a new pool which is used by all threads created within
that interpreter via 'threads->new()' etc.
For a particular pool, each thread within that pool has an interpreter
with a PL_modglobal hash, and the "threads::_pool$XS_VERSION" key holds
an SvIV whose value is a pointer to the pool.
Many thread XS functions start with dMY_POOL, which declares and
retrieves the pool pointer via the passed pTHX.
There is a specific issue with S_ithread_dec_free(). This also does
dMY_POOL, but there is a rare race condition where PL_modglobal has
already been freed by that point, and so a SEGV occurs in dMY_POOL.
In particular, it's a race between detach() and the thread ending that
occasionally manifested itself in t/blocks.t. If a thread finishes at
almost exactly the same time as another thread calls detach() on that
finishing thread, the following can happen:
running thread: thread calling detach():
finish perl code and
return from perl to
S_ithread_run()
lock thread
set PERL_ITHR_FINISHED
unlock thread
ithread_detach():
lock thread
if PERL_ITHR_FINISHED
call S_ithread_clear(), which does:
perl_destruct(thread->interp)
thread->interp = NULL
lock thread
call S_ithread_dec_free()
which does:
dMY_POOL;
which tries to access
thread->interp->Imodglobal
SEGV
Now, that race could possibly be fixed by not unlocking and relocking
the thread between setting PERL_ITHR_FINISHED and calling
S_ithread_dec_free(), but in reality its a bit more complex than that
as two locks are actually held and need to be unlocked and relocked in
the right order.
So this commit takes a different approach, and in particular makes
S_ithread_dec_free() no longer reliant on PL_modglobal still being
valid. It achieves this by storing a pool pointer in the thread
structure whenever a new thread is created, and using that in any
functions where a thread pointer is available in addition to pTHX.
The next couple of commits will fix up some of the untidiness.
In S_ithread_clear(), when freeing a perl's interpreter, set thread->interp() early on, so that nothing else can access the interpreter in a semi-freed state. This commit doesn't fix any particular bug; its just a bit of general defensive coding.
Remove this internal state flag. When S_ithread_create() fails to successfully create an OS thread, this flag used to be set to indicate to S_ithread_dec_free() and S_ithread_clear() that the thread and interpreter should be unconditionally freed, bypassing the normal reference count and flag checks. This is a bit of a hack. Instead, this commit makes it so that if the OS thread creation fails, the ref count is set to 1, "finished" flags are set, then S_ithread_dec_free() is called and behaves in a normal fashion (without special-case handling) to free the otherwise unused interpreter and threads struct.
whitespace-only: reindent a code block after previous commit removed a condition.
This internal flag is defined as:
#define PERL_ITHR_UNCALLABLE (PERL_ITHR_DETACHED|PERL_ITHR_JOINED)
Eliminate PERL_ITHR_UNCALLABLE and instead explicitly use
(DETATCHED|JOINED) everywhere. The define obfuscates rather than
enlightens (IMHO) - it's not immediately obvious what is meant by an
"uncallable" thread.
Rather than declaring all the threads::foo() XS methods via:
MODULE = threads PACKAGE = threads PREFIX = ithread_
void
ithread_foo(...)
just declare them as:
MODULE = threads PACKAGE = threads
void
foo(...)
....
Since we're not wrapping an underlying C library which has ithread_foo()
functions, the PREFIX is pointless: both XS variants above generate the
same threads.c file. It just confuses things. So remove it.
Add a new test file which tests for all of (well, most of) the warnings and errors which shared.pm and shared.xs can generate.
When cloning an interpreter, the refcount of shared objects was being
incremented without being locked: a potential race condition.
This commit makes it hold the shared interpreter lock before doing so.
That lock is a recursive lock, where currently the only way to unlock it
is that recursive_lock_acquire() does a
SAVEDESTRUCTOR_X(recursive_lock_release,lock)
so that the lock is automatically released during scope exit. This is
too heavyweight for our needs here (and because we're cloning we don't
necessarily even have a savestack). Also, incrementing an SV's RC can't
possibly croak(), so there's no danger of the lock left dangling.
So this commit adds a boolean arg to recursive_lock_acquire() to
indicate whether it should do SAVEDESTRUCTOR_X() or not.
The bodies of the two XSUBs cond_wait() and cond_timedwait() are virtually identical. So this commit moves the common code out into a new static function, S_do_cond_timedwait(). No functional changes, except that in one warning message cond_timedwait() no longer misidentifies itself as 'cond_wait'.
Explain better how the perl/XS cond_wait() functions mimic the equivalent OS functions.
There are two forms of cond_wait():
cond_wait($cond);
cond_wait($cond, $lock);
The first form uses the same variable for both condition and lock.
When signalling, cond_signal() warns if the variable wasn't locked
prior: you're supposed to do:
lock($cond);
cond_signal($cond);
and if you skip the locking, you get the warning.
Unfortunately the warning check in threads.xs always checks whether the
$cond lock was locked; but if the second form of wait() was used, then
instead it aught to check whether $lock was locked.
So after doing a 2-arg wait, this:
lock($lock);
cond_signal($cond);
is the correct form; unfortunately it was giving a spurious warning:
cond_signal() called on unlocked variable
A couple of commits ago wait() was made to start tracking what lock was
used as its argument; the presence of this value now allows us to check
the correct lock and thus avoid the incorrect warning.
This is mainly to keep thread-debugging tools like helgrind happy rather than strictly being necessary. See the added code comments for more details.
These two XS methods require their first argument to be a reference, but this wasn't checked for and passing a non-ref argument used to SEGV.
shared_clone() is documented as refusing to clone certain SV types such
as typeglobs: instead it is supposed to warn or croak, depending on the
setting of $threads::shared::clone_warn.
However, the cloning code only did the check for *references* to
disallowed types. So the following:
my $x = shared_clone(\*ABC);
would correctly produce this error:
Unsupported ref type: GLOB at ...
while this:
my $x = shared_clone([*ABC]);
produced this:
perl: sv.c:3908: S_glob_assign_glob: Assertion `isGV_with_GP(gvgp_)' failed.
This commit changes the code so that non-ref types are ban-checked too.
The checking isn't comprehensive - I've currently made it just look for
GLOB and CODE as these are the documented exceptions.
GLOB in particular needs to be banned as it is hard to successfully
share a typeglob between threads: S_glob_assign_glob() in sv.c does all
sorts of things like adding a backreference from the glob's stash, and
it all goes horribly wrong in relation to which pointers are in which
interpreter. This is even if it gets that far; currently the XS sharing
code upgrades a new SV to a typeglob without giving it a GP, resulting
in the specific assertion failure shown above. This is fixed in the next
commit.
Also, its not very clear what the semantics of sharing or cloning a
typeglob between threads should be. If *foo is gets shared, do $foo,
@foo etc also become shared?
See the next commit for further banning of shared typeglobs.
Previously this code:
my $sh : shared = *ABC;
produced:
perl: sv.c:3908: S_glob_assign_glob: Assertion `isGV_with_GP(gvgp_)' failed.
Now it produces:
Invalid value for shared scalar at ...
This is a follow-up to the previous commit, which did a similar thing
for shared_clone(). That commit worked at the perl level, objecting if a
typeglob was found while recursing through a structure to be cloned.
This commit works at the XS level, croaking on any attempt to copy a
private typeglob value into the shadow SV in the shared interpreter.
In addition, the magic-set code now avoids upgrading the shared SV to
the target type: this is now left to sv_setsv() itself, which will know
the best way to upgrade. In particular, a naive
sv_upgrade(nullsv, SVt_PVGV) left the SV as a GV with no GP, leading to
the assertion failure seen above.
See previous commit for some reasons why sharing a GV is a bad idea.
This debugging code, as used by Devel::Peek::Dump(), would fail an assert when dumping a malformed GV which had no GvNAME_HEK(). Now fixed. I've also made it display the raw GvNAME_HEK() address in addition to displaying its decoded NAME/NAMELEN values.
Make the code more readable, especially as it has many, and nested, #ifdefs. Its mainly just whitespace changes, although I did put a pair of braces round a complex single-statement else clause to better delineate it visually, and I reformatted a code comment.
This commit is essentially a revert of d525a7b from 2009, which changed Newxz(PL_psig_ptr, SIG_SIZE, SV*); Newxz(PL_psig_name, SIG_SIZE, SV*); to Newxz(PL_psig_name, 2 * SIG_SIZE, SV*); PL_psig_ptr = PL_psig_name + SIG_SIZE; The original commit doesn't give any specific reasoning, but I assume it was for a miniscule startup efficiency boost. It worked on the assumption that both PL_psig_name and PL_psig_ptr were SIG_SIZE sized arrays of SV pointers and so could be handled with a single double-sized alloc(); but I want to break that assumption: allowing for example, the pointers in PL_psig_ptr to be declared atomic.
For various signal-related vars, change e.g.
PL_psig_ptr = (SV**)NULL;
to
PL_psig_ptr = NULL;
We don't need casts on NULL these days, and the casts will get in the
way of declaring some vars to be atomic types in the next few commits.
This allows C variable types to be declared atomic, so that they can be used in a lock-free thread-safe way. However, there are are currently issues with portability and C++, so it isn't enabled by default, nor via a Configure probe: it has to be explicitly enabled via -Accflags='-DPERL_USE_ATOMIC'. See the thread http://nntp.perl.org/group/perl.perl5.porters/270882 for details of the issue. For now it will be used in the next few commits to add optional atomicity to PL_sig_pending and similar to achieve the target of making 'valgrind --tool=helgrind' run cleanly. So even if the thread signalling issues can't yet be fixed in production, at least it can be confirmed that the fix is conceptually correct, and that there are no other issues being masked by all the noise from the failing threads signal code.
Declare PL_sig_pending, PL_psig_pend, PL_psig_ptr as PERL_ATOMIC()
types. Note that unless perl is explicitly built with
-Accflags='-DPERL_USE_ATOMIC'
this macro is currently a NOOP. So by default this commit makes no
change in behaviour. (This macro was added by the previous commit.)
This fixes possible race conditions where the PL_[p]sig_foo variables
are being checked and/or updated by multiple threads. In particular, the
$thread->kill() method works by allowing the caller to update $thread's
PL_sig_foo state. The kill() XSUB locks the target thread's mutex, but
that isn't sufficient, as the target thread isn't also locking its mutex
every time it checks PL_sig_pending in PERL_ASYNC_CHECK().
In any case, normal mutex locking isn't appropriate here, as
PL_sig_pending is checked after every op is executed, so would slow
things down; and some of these variables are modified within a signal
handler, where locking would inappropriate.
The issues show up mainly in the dist/threads/t/kill*.t test files when
run under valgrind --tool=helgrind.
Add some comments at the top of this test file explaining what its purpose is. This is essentially just a cut+paste of the commit message which created the file.
Fix a threads race condition.
Perl_rcpv_free() does OP_REFCNT_LOCK then messes with a
reference-counted OP's reference count. A couple of asserts are done
just before the lock is acquired; this commit moves the locking earlier
so that in particular, the
assert(rcpv->refcount);
is done while the ref count is locked. This fixes a race condition
flagged by valgrind --tool=drd
(It's not a very significant race condition, as it could only be
triggered on DEBUGGING builds).
The other PERL_REENTRANT_foo() macros have this line:
CLANG_DIAG_IGNORE(-Wthread-safety)
Add it to PERL_REENTRANT_UNLOCK() too. Otherwise building perl under
clang with -Wthread-safety gives zillions of errors along the lines of:
warning: mutex 'PL_env_mutex.lock' is not held on every path through here.
This is because the static analysis done by -Wthread-safety isn't smart
enough to realise that a recursive lock won't actually be
locked/unlocked when count > 1.
It's not clear to me why this wasn't needed in the UNLOCK macro when the
macros were first created.
|
force-pushed a windows portability fix in commit "add workaround for helgrind false +ve" |
|
Looks like it's deadlocking on MacOS in threads-shared, and somewhere in Cygwin. Here's the run and tracebacks on MacOS, could the panic be the problem?: |
|
Cygwin also blocked on threads-shared/t/err.t in the same way: Cygwin backtraces below, looks like this one also has a panic: Thread 5 is a system thread from attaching the debugger. Thread 2 looks like some cygwin signal delivery thread. |
|
It might be worth pushing a smoke-me branch to get some coverage on the BSDs and Solaris. (there's also https://github.com/vmactions/ but these are slow) |
|
Is this p.r. something we're going to try to get in to perl-5.44? (It's large enough that, IMO, it should be deferred to the 5.45 dev cycle.) |
It's already marked defer-next-dev. |
|
|
||
| /* Allocate thread structure in context of the main thread's interpreter */ | ||
| { | ||
| MUTEX_LOCK(&my_pool->create_destruct_mutex); |
There was a problem hiding this comment.
Why do these two lines need to be locked?
There was a problem hiding this comment.
Looks they don't need to be - this is vestigal behaviour from when the function itself was invoked while locked.
| # define PERL_ATOMIC(atype) _Atomic(atype) | ||
| # endif | ||
| # endif | ||
| #endif |
There was a problem hiding this comment.
Can't we just use atomic_int instead? AFAIK that should be portable between C and C++. Or do we really need specific atomic types?
There was a problem hiding this comment.
PL_sig_ptr has atomic SV* pointers, and there doesn't seem to be a built-in atomic pointer type?
There was a problem hiding this comment.
Yeah, that is a bit unfortunate.
This test checks that a warning is issued when two threads call cond_wait() for the same condition var, but with different locks. The problem is that shared.xs then calls the underlying OS's cond_wait() functions with similarly bad values, triggering undefined behaviour. On some platforms this caused pthreads_cond_wait() to return an error code, triggering a panic and deadlock. So just remove the test.
After some refactoring, there was a vestigial piece of locking around a PerlMemShared_malloc() call in S_ithread_create(). Remove it, since its unnecessary now.
A recent commit of mine made threads set their context to NULL when in the late stages of being freed - previously they were set briefly to the main thread's context. In theory at that point the context shouldn't be used at all; if accessed, it likely indicates a bug. I decided that dereffing a null pointer was more likely to obviously show up a fault than randomly accessing a different thread's context at this point. This worked out on platforms where PERL_SET_CONTEXT() is implemented directly; on platforms where it falls back to Perl_set_context(), it was assert-failing due to the NULL argument.
The previous commit fixed the general Perl_set_context() function; it turns out win32 has its own version of that function: fix that too.
esabol
left a comment
There was a problem hiding this comment.
Please fix the typo in the comment added to perlvars.h.
| #endif | ||
|
|
||
| #ifdef USE_POSIX_2008_LOCALE | ||
| /* This variable is is initialised just once, at process start-up time, to |
There was a problem hiding this comment.
Typo! is is should just be is.
This branch of nearly 50 commits fixes many thread-related race conditions and similar, and collectively make all the threads and threads::shared tests pass under valgrind's helgrind and drd tools.
These commits are approximately sorted into three batches: 1) fix up the threads distro; 2) fix up threads::shared; 3) add atomic stuff to make threads and signal handlers work together.
Some commits are actual fixes; others are refactoring in preparation for a fix; some are adding comments etc; some are fixing other (non-race related) issues I spotted along the way.
Given the large number of commits, potential reviewers might want to announce in advance that they'll review only a specific subset of the commits; or just proofread; or whatever.