Skip to content

fix: Fix deathlock during statically registering services#4657

Draft
pavel-jares-bcm wants to merge 1 commit into
v3.x.xfrom
reboot/eureka-registry-deathlock-fix
Draft

fix: Fix deathlock during statically registering services#4657
pavel-jares-bcm wants to merge 1 commit into
v3.x.xfrom
reboot/eureka-registry-deathlock-fix

Conversation

@pavel-jares-bcm
Copy link
Copy Markdown
Contributor

Description

When service is statically registered (since #4051) there is one additional lock in the class ApimlInstanceRegistry (see the synchronization removal). In case of multiple threads it could lead to service freezing on start-up during reading the static definition file.

Lock order of registerStatically:

  • org.zowe.apiml.discovery.ApimlInstanceRegistry#registerStatically : synchronized (lock)
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : read.lock();
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : synchronized (lock)
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : leave synchronized block, but it is still locked
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : read.unlock();
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : unlock synchronized block

Lock order of register:

  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : read.lock();
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : synchronized (lock)
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : unlock synchronized block
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : read.unlock();

Deathlock:
thead1 (register): read.lock()
thead2 (registerStatically): synchronized (lock)
thread1 (register): is waiting for synchronized block to be available (to be done by thread2)
thread2 (registerStatically): is waiting for read.unlock() (to be issued by thread1)

Note: locks are used also by other methods but it should be the issue.

The source is the REST call from any south-bound service during the initialization of static services. The risk is higher when the static definition file or amount of service is bigger.

The additional lock (see the synchronized (lock) in registerStatically) was added because the original method register shouldn't change the property expectedNumberOfClientsSendingRenews. Using the same lock should prevent exclusivity to update the field. It is not correct solution because of orders of locks.

The new solution store information about the change (correction - the oposite operation) in a ThreadLocal, so the locks are not necessary now.

The similar issue was in cancel method as well. But it is not big deal as this method is used very rarelly.

Other changes in the PR becomes from analysis that using reflection is not needed at this version of Eureka (there were very probably changes in the overriden code). We can call the whole method now and it is not necessary to reimplement it via calling private methods via reflection.

Linked to # (issue)
Part of the # (epic)

Type of change

Please delete options that are not relevant.

  • fix: Bug fix (non-breaking change which fixes an issue)
  • feat: New feature (non-breaking change which adds functionality)
  • docs: Change in a documentation
  • refactor: Refactor the code
  • chore: Chore, repository cleanup, updates the dependencies.
  • BREAKING CHANGE or !: Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the style guidelines of this project
  • PR title conforms to commit message guideline ## Commit Message Structure Guideline
  • I have commented my code, particularly in hard-to-understand areas. In JS I did provide JSDoc
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • The java tests in the area I was working on leverage @nested annotations
  • Any dependent changes have been merged and published in downstream modules

For more details about how should the code look like read the Contributing guideline

Signed-off-by: Pavel Jareš <Pavel.Jares@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

1 participant