Skip to content
This repository was archived by the owner on Jul 28, 2021. It is now read-only.
This repository was archived by the owner on Jul 28, 2021. It is now read-only.

Intermittent DNS failures when running Alpine containers in user-defined docker-compose network #303

@Iristyle

Description

@Iristyle

This is a cross-post from moby/libnetwork#2371 as I don't know where the bug lies.

In my environment, I am able to reproduce DNS resolution failures minimally with the following compose file when running LCOW.

version: '3'

services:
  foo:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "while true; do nslookup bar.internal && sleep 1s; done"
    networks:
      default:
        aliases:
         - foo.internal

  bar:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "while true; do nslookup foo.internal && sleep 1s; done"
    networks:
      default:
        aliases:
         - bar.internal

docker-compose up will yield something like the following, noting failures like bar_1 | nslookup: can't resolve 'foo.internal': Name does not resolve and foo_1 | nslookup: can't resolve 'bar.internal': Name does not resolve mixed in with successful resolutions:

PS C:\source\alpine-test> docker-compose -f .\docker-compose-bad.yml up
Creating network "alpine-test_default" with the default driver
Creating alpine-test_bar_1 ... done
Creating alpine-test_foo_1 ... done
Attaching to alpine-test_foo_1, alpine-test_bar_1
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | nslookup: can't resolve 'foo.internal': Name does not resolve
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | nslookup: can't resolve 'foo.internal': Name does not resolve
bar_1  |
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  |
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
foo_1  |
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25 alpine-test_foo_1.alpine-test_default
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | nslookup: can't resolve 'bar.internal': Name does not resolve
bar_1  |
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25
foo_1  |
foo_1  | nslookup: can't resolve '(null)': Name does not resolve
foo_1  | Name:      bar.internal
foo_1  | Address 1: 172.18.76.19 alpine-test_bar_1.alpine-test_default
bar_1  |
bar_1  | Name:      foo.internal
bar_1  | Address 1: 172.18.67.25
bar_1  | nslookup: can't resolve '(null)': Name does not resolve
Gracefully stopping... (press Ctrl+C again to force)

I can run this compose stack on OSX and it does not fail. If I switch to an ubuntu container from Alpine, the resolutions don't fail.

I can at least workaround the problem a bit by modifying the compose file to first perform a dig against the host like this:

version: '3'

services:
  foo:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "apk add bind-tools; dig bar.internal; while true; do nslookup bar.internal; sleep 2s; done"
    networks:
      default:
        aliases:
         - foo.internal

  bar:
    image: alpine:latest
    dns_search: internal
    entrypoint: sh -c "apk add bind-tools; dig foo.internal; while true; do nslookup foo.internal; sleep 2s; done"
    networks:
      default:
        aliases:
         - bar.internal

The nslookup: can't resolve '(null)': Name does not resolve in the original case is reported to be unnecessary per gliderlabs/docker-alpine#476 (comment), but after performing a dig that message changes and resolutions look like:

bar_1  | Server:                172.25.128.1
bar_1  | Address:       172.25.128.1#53
bar_1  |
bar_1  | Non-authoritative answer:
bar_1  | Name:  foo.internal
bar_1  | Address: 172.25.139.149
bar_1  |

My host is as follows

Client:
 Debug Mode: false
 Plugins:
  app: Docker Application (Docker Inc., v0.8.0-beta2)
  buildx: Build with BuildKit (Docker Inc., v0.2.0-6-g509c4b6-tp)

Server:
 Containers: 2
  Running: 0
  Paused: 0
  Stopped: 2
 Images: 138
 Server Version: master-dockerproject-2019-04-28
 Storage Driver: windowsfilter (windows) lcow (linux)
  Windows:
  LCOW:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics l2bridge l2tunnel nat null overlay transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: inactive
 Default Isolation: hyperv
 Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
 Operating System: Windows 10 Enterprise Version 1809 (OS Build 17763.437)
 OSType: windows
 Architecture: x86_64
 CPUs: 2
 Total Memory: 16GiB
 Name: ci-lcow-prod-1
 ID: 0ac02c9d-aaba-42f4-8749-5a64af3068d8
 Docker Root Dir: C:\ProgramData\docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

The LCOW image is built from linuxkit/lcow@d5dfdbc - I tried the latest merged PR, but it didn't launch containers and I had to revert (more info in linuxkit/lcow#45 (comment))

There are some further details in the original issue I filed at moby/libnetwork#2371

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions