Skip to content

Enable race detector for CI#1441

Merged
lunny merged 13 commits intogo-gitea:mainfrom
typeless:enable-race-detector-in-ci
Aug 26, 2021
Merged

Enable race detector for CI#1441
lunny merged 13 commits intogo-gitea:mainfrom
typeless:enable-race-detector-in-ci

Conversation

@typeless
Copy link
Copy Markdown
Contributor

@typeless typeless commented Apr 4, 2017

close #1430

@tboerger
Copy link
Copy Markdown
Member

tboerger commented Apr 4, 2017

I thought more about a flag like ENABLE_RACE or something like that

@tboerger tboerger added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Apr 4, 2017
@typeless
Copy link
Copy Markdown
Contributor Author

typeless commented Apr 4, 2017

@tboerger I am okay with either way. We can have the final decision discussed in #1430. I'll change the PRs accordingly.

@strk
Copy link
Copy Markdown
Member

strk commented Apr 4, 2017

LGTM

@tboerger tboerger added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Apr 4, 2017
@typeless
Copy link
Copy Markdown
Contributor Author

typeless commented Apr 4, 2017

I have checked 'Allow edits from maintainers.'. You know 😄
Otherwise, we need a follow-up PR to update the signature.

Edit: maybe you guys with the maintainer key could upload a new PR with the updated signature. This change is only a one-liner, that would probably easier for you.

@lunny lunny added this to the 1.2.0 milestone Apr 4, 2017
@appleboy
Copy link
Copy Markdown
Member

LGTM

@tboerger tboerger added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Apr 11, 2017
@strk
Copy link
Copy Markdown
Member

strk commented Apr 18, 2017

@appleboy can you help with the signature ? @tboerger mentioned you as his preferred successor as he's stepping down (mentioned on Gitter)

@strk
Copy link
Copy Markdown
Member

strk commented Apr 19, 2017

It is the coverage step that fails, due to:

Unable to authenticate user. Error fetching user. GET https://api.github.com/user: 401 Bad credentials []

Is that what the signature is for ?
Was there a plan to drop the signature requirement @appleboy ?

@appleboy
Copy link
Copy Markdown
Member

@strk I don't have permission to update the sig file. @lunny can help this.

@lunny lunny force-pushed the enable-race-detector-in-ci branch from ab4224e to 57a07cd Compare April 19, 2017 07:54
@strk
Copy link
Copy Markdown
Member

strk commented Apr 30, 2017

The mysql check fails on drone with:

undefined reference to `__libc_malloc'
race_linux_amd64.sy

Some linker flags missing @typeless ?

@lunny lunny force-pushed the enable-race-detector-in-ci branch from 57a07cd to e9915f9 Compare April 30, 2017 06:24
@appleboy
Copy link
Copy Markdown
Member

@lunny Need to resign drone config.

@lunny lunny force-pushed the enable-race-detector-in-ci branch from e9915f9 to 34a00ff Compare April 30, 2017 06:33
@lunny
Copy link
Copy Markdown
Member

lunny commented Apr 30, 2017

build failed. what's this collect2: error: ld returned 1 exit status

@bkcsoft bkcsoft added status/blocked This PR cannot be merged yet, i.e. because it depends on another unmerged PR and removed status/blocked This PR cannot be merged yet, i.e. because it depends on another unmerged PR labels May 3, 2017
@bkcsoft
Copy link
Copy Markdown
Contributor

bkcsoft commented May 3, 2017

Is the drone.sig updated? otherwise put status/blocked on this :sli

@lunny
Copy link
Copy Markdown
Member

lunny commented May 3, 2017

@bkcsoft yes, updated

@lunny
Copy link
Copy Markdown
Member

lunny commented May 25, 2017

@typeless any news?

@typeless
Copy link
Copy Markdown
Contributor Author

typeless commented May 25, 2017

@lunny It looks that glibc is absent. Alpine Linux uses musl libc which is probably incompatible (or it only supports static linking?)

Also I found this golang/go#9918.

@lunny
Copy link
Copy Markdown
Member

lunny commented May 25, 2017

So let's move it to v1.x.x

@lunny lunny added the status/blocked This PR cannot be merged yet, i.e. because it depends on another unmerged PR label May 25, 2017
@lunny lunny added this to the 1.x.x milestone May 25, 2017
This was referenced Jul 19, 2021
@6543

This comment has been minimized.

@zeripath
Copy link
Copy Markdown
Contributor

Something is logging after or in-between tests again.

@lunny
Copy link
Copy Markdown
Member

lunny commented Jul 22, 2021

The testing.T should not be shared by design but we did with no lock.

@zeripath
Copy link
Copy Markdown
Contributor

zeripath commented Jul 22, 2021

The testing.T should not be shared by design but we did with no lock.

It's not that. This is a race forced by go because t.Logf(...) has been called between tests.

testlogger works by writing our logging to t.Logf(...) instead of to os.Stdout. This has the benefit that we only see logging from failed tests and that we have clear delineation of what logs relate to which tests.

If you look carefully at the stacktrace you will see that the race is detected deep within go internal code. This is a deliberate race added by Go in order to detect logging to t outside of a test.

Logging to t outside of a test is considered a race because tests are supposed to be self-contained and not make changes that could affect other tests or continue work outside of those tests.


I have three suspicions for what is happening:

1. Queue Asynchronicity

Let's look at a previous incarnation of this problem - (and perhaps one that has returned):

  1. Test A involves pushing something to a Queue
  2. Test A finishes successfully.
  3. But... the Queue hasn't finished processing the item pushed to it.
  4. The Queue logs something.
  5. The logger is a testlogger which is a wrapper around the current t.Logf()
  6. TestB hasn't started yet... so t is the previous completed Test A t.
  7. Go's race detector detects this - and declares a Race in the testlogger. (Which is the technique go uses to detect this.)

I previously attempted to handle this through Flushing the queues - but this failed - and ultimately @lunny worked around this by adding the immediate queue - abandoning asynchronicity in queues during tests.

Now the above goroutine trace mentions a level db which makes me suspicious that there is still a non-immediate disk/level queue in the tests which may be the cause of this. That could be because of my changes to the default queue settings but it might have always been there waiting for some timing change to appear.

2. Logger Asynchronicity

Fundamentally the logger has to be asynchronous to be performant and it uses logevent channels as its method of asynchronicity. The current testlogger - which we absolutely need to be able to make interpreting our test logs at all helpful (see above) - is hooked-in using such a channel. There could be a number of logs that that are waiting to be processed when the test ends.

An additional log.Flush in the cleanup defer might help this issue - otherwise we're looking at another immediate type solution. Unfortunately an immediate log solution here would be more involved than in the queue case.

3. Other asynchronous behaviour

Gitea has other uncontrolled asynchronous activity that is external to Queues and could be the cause of this "race". Go's race detector here is deeply unhelpful as it's not telling us what was trying to be logged and the indirection used in the logger to make it asynchronous means that the stacktrace provided by go's race detection isn't telling us who actually called log. This makes debugging the issue difficult ...

Summary

Ultimately we're going to need to add complete lifecycle (start/stop) control to every test (and by extension the whole of Gitea) to prevent random races like this in the future. That is - each test will shutdown everything between each test - including logging. graceful gets us a long way toward this but there are still quite a lot of things that would need to be done for this. E.g. (live) reconfiguration, proper service registration, likely even dependency injection.

There's the other option of just throwing away inter-test logging or emitting it to console but that's just ignoring the issue.

@6543
Copy link
Copy Markdown
Member

6543 commented Aug 10, 2021

EDIT: nope at least it had passed once https://drone.gitea.io/go-gitea/gitea/42750 🙈

@lunny
Copy link
Copy Markdown
Member

lunny commented Aug 26, 2021

CI failed seems not related.

@lunny
Copy link
Copy Markdown
Member

lunny commented Aug 26, 2021

make L-G-T-M work

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/testing type/upstream This is an issue in one of Gitea's dependencies and should be reported there

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make use of the race detector