Enable race detector for CI by typeless · Pull Request #1441 · go-gitea/gitea

typeless · 2017-04-04T08:20:20Z

tboerger · 2017-04-04T08:22:02Z

I thought more about a flag like ENABLE_RACE or something like that

typeless · 2017-04-04T08:33:25Z

@tboerger I am okay with either way. We can have the final decision discussed in #1430. I'll change the PRs accordingly.

strk · 2017-04-04T08:37:33Z

LGTM

typeless · 2017-04-04T08:43:09Z

I have checked 'Allow edits from maintainers.'. You know 😄
Otherwise, we need a follow-up PR to update the signature.

Edit: maybe you guys with the maintainer key could upload a new PR with the updated signature. This change is only a one-liner, that would probably easier for you.

appleboy · 2017-04-11T13:32:22Z

LGTM

strk · 2017-04-18T20:44:59Z

@appleboy can you help with the signature ? @tboerger mentioned you as his preferred successor as he's stepping down (mentioned on Gitter)

strk · 2017-04-19T07:26:31Z

It is the coverage step that fails, due to:

Unable to authenticate user. Error fetching user. GET https://api.github.com/user: 401 Bad credentials []

Is that what the signature is for ?
Was there a plan to drop the signature requirement @appleboy ?

appleboy · 2017-04-19T07:47:41Z

@strk I don't have permission to update the sig file. @lunny can help this.

strk · 2017-04-30T05:44:33Z

The mysql check fails on drone with:

undefined reference to `__libc_malloc'
race_linux_amd64.sy

Some linker flags missing @typeless ?

appleboy · 2017-04-30T06:31:56Z

@lunny Need to resign drone config.

lunny · 2017-04-30T09:59:52Z

build failed. what's this collect2: error: ld returned 1 exit status

bkcsoft · 2017-05-03T01:16:39Z

Is the drone.sig updated? otherwise put status/blocked on this :sli

lunny · 2017-05-03T01:41:15Z

@bkcsoft yes, updated

lunny · 2017-05-25T02:24:39Z

@typeless any news?

typeless · 2017-05-25T02:49:29Z

@lunny It looks that glibc is absent. Alpine Linux uses musl libc which is probably incompatible (or it only supports static linking?)

Also I found this golang/go#9918.

lunny · 2017-05-25T03:00:15Z

So let's move it to v1.x.x

zeripath · 2021-07-21T21:35:36Z

Something is logging after or in-between tests again.

lunny · 2021-07-22T04:03:01Z

The testing.T should not be shared by design but we did with no lock.

zeripath · 2021-07-22T10:15:02Z

The testing.T should not be shared by design but we did with no lock.

It's not that. This is a race forced by go because t.Logf(...) has been called between tests.

testlogger works by writing our logging to t.Logf(...) instead of to os.Stdout. This has the benefit that we only see logging from failed tests and that we have clear delineation of what logs relate to which tests.

If you look carefully at the stacktrace you will see that the race is detected deep within go internal code. This is a deliberate race added by Go in order to detect logging to t outside of a test.

Logging to t outside of a test is considered a race because tests are supposed to be self-contained and not make changes that could affect other tests or continue work outside of those tests.

I have three suspicions for what is happening:

1. Queue Asynchronicity

Let's look at a previous incarnation of this problem - (and perhaps one that has returned):

Test A involves pushing something to a Queue
Test A finishes successfully.
But... the Queue hasn't finished processing the item pushed to it.
The Queue logs something.
The logger is a testlogger which is a wrapper around the current t.Logf()
TestB hasn't started yet... so t is the previous completed Test A t.
Go's race detector detects this - and declares a Race in the testlogger. (Which is the technique go uses to detect this.)

I previously attempted to handle this through Flushing the queues - but this failed - and ultimately @lunny worked around this by adding the immediate queue - abandoning asynchronicity in queues during tests.

Now the above goroutine trace mentions a level db which makes me suspicious that there is still a non-immediate disk/level queue in the tests which may be the cause of this. That could be because of my changes to the default queue settings but it might have always been there waiting for some timing change to appear.

2. Logger Asynchronicity

Fundamentally the logger has to be asynchronous to be performant and it uses logevent channels as its method of asynchronicity. The current testlogger - which we absolutely need to be able to make interpreting our test logs at all helpful (see above) - is hooked-in using such a channel. There could be a number of logs that that are waiting to be processed when the test ends.

An additional log.Flush in the cleanup defer might help this issue - otherwise we're looking at another immediate type solution. Unfortunately an immediate log solution here would be more involved than in the queue case.

3. Other asynchronous behaviour

Gitea has other uncontrolled asynchronous activity that is external to Queues and could be the cause of this "race". Go's race detector here is deeply unhelpful as it's not telling us what was trying to be logged and the indirection used in the logger to make it asynchronous means that the stacktrace provided by go's race detection isn't telling us who actually called log. This makes debugging the issue difficult ...

Summary

Ultimately we're going to need to add complete lifecycle (start/stop) control to every test (and by extension the whole of Gitea) to prevent random races like this in the future. That is - each test will shutdown everything between each test - including logging. graceful gets us a long way toward this but there are still quite a lot of things that would need to be done for this. E.g. (live) reconfiguration, proper service registration, likely even dependency injection.

There's the other option of just throwing away inter-test logging or emitting it to console but that's just ignoring the issue.

6543 · 2021-08-10T00:42:39Z

EDIT: nope at least it had passed once https://drone.gitea.io/go-gitea/gitea/42750 🙈

Set RACE_ENABLED=0 to disable it when release

lunny · 2021-08-26T05:48:02Z

CI failed seems not related.

…a into typeless-enable-race-detector-in-ci

lunny · 2021-08-26T08:49:26Z

make L-G-T-M work

tboerger added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Apr 4, 2017

tboerger added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Apr 4, 2017

lunny added this to the 1.2.0 milestone Apr 4, 2017

tboerger added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Apr 11, 2017

appleboy approved these changes Apr 11, 2017

View reviewed changes

lunny force-pushed the enable-race-detector-in-ci branch from ab4224e to 57a07cd Compare April 19, 2017 07:54

lunny force-pushed the enable-race-detector-in-ci branch from 57a07cd to e9915f9 Compare April 30, 2017 06:24

lunny force-pushed the enable-race-detector-in-ci branch from e9915f9 to 34a00ff Compare April 30, 2017 06:33

bkcsoft added status/blocked This PR cannot be merged yet, i.e. because it depends on another unmerged PR and removed status/blocked This PR cannot be merged yet, i.e. because it depends on another unmerged PR labels May 3, 2017

lunny added the status/blocked This PR cannot be merged yet, i.e. because it depends on another unmerged PR label May 25, 2017

lunny added this to the 1.x.x milestone May 25, 2017

This was referenced Jul 19, 2021

Fix race in log #16490

Merged

Fix race in log (#16490) #16505

Merged

This comment has been minimized.

Sign in to view

typeless and others added 10 commits August 25, 2021 23:55

Enable race detector by default

782b968

Set RACE_ENABLED=0 to disable it when release

Disable race detector for release builds

5b6bdaa

use true

dd4674b

fix

7694598

debug issue

f280790

fix

6118e40

verbose

54ddf05

clean

79b9228

Fix wrong merge

8a969a1

Merge branch 'main' into enable-race-detector-in-ci

e61f744

6543 approved these changes Aug 25, 2021

View reviewed changes

Merge branch 'main' into enable-race-detector-in-ci

fbd09bc

lunny added 2 commits August 26, 2021 15:43

Fix coverage merge

89e19ef

Merge branch 'enable-race-detector-in-ci' of github.com:typeless/gite…

c9eef66

…a into typeless-enable-race-detector-in-ci

This was referenced Aug 26, 2021

Prevent "Race" detected in TestAdmin*User #16830

Merged

Use git attributes to determine generated and vendored status for language stats and diffs #16773

Merged

6543 mentioned this pull request Aug 28, 2021

[CI] rm unit-test-race step since its now coverd by unit-test too #16856

Merged

This was referenced Sep 13, 2021

Update go-chi/session (fixes "race" in tests) #17031

Merged

Workaround coverage bug part 2 #16906

Merged

Prevent coverage break #16887

Merged

Remove the final test from the testlogger safely #16907

Merged

Uh oh!

Conversation

typeless commented Apr 4, 2017 • edited by 6543 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tboerger commented Apr 4, 2017

Uh oh!

typeless commented Apr 4, 2017

Uh oh!

strk commented Apr 4, 2017

Uh oh!

typeless commented Apr 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

appleboy commented Apr 11, 2017

Uh oh!

strk commented Apr 18, 2017

Uh oh!

strk commented Apr 19, 2017

Uh oh!

appleboy commented Apr 19, 2017

Uh oh!

strk commented Apr 30, 2017

Uh oh!

appleboy commented Apr 30, 2017

Uh oh!

lunny commented Apr 30, 2017

Uh oh!

bkcsoft commented May 3, 2017

Uh oh!

lunny commented May 3, 2017

Uh oh!

lunny commented May 25, 2017

Uh oh!

typeless commented May 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lunny commented May 25, 2017

Uh oh!

This comment has been minimized.

zeripath commented Jul 21, 2021

Uh oh!

lunny commented Jul 22, 2021

Uh oh!

zeripath commented Jul 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Queue Asynchronicity

2. Logger Asynchronicity

3. Other asynchronous behaviour

Summary

Uh oh!

6543 commented Aug 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lunny commented Aug 26, 2021

Uh oh!

lunny commented Aug 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

typeless commented Apr 4, 2017 •

edited by 6543

Loading

typeless commented Apr 4, 2017 •

edited

Loading

typeless commented May 25, 2017 •

edited

Loading

zeripath commented Jul 22, 2021 •

edited

Loading

6543 commented Aug 10, 2021 •

edited

Loading