Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Today, Micromize attaches eBPF programs to LSM hooks and enforces:
- **Strict container boundaries** — blocks filesystem escapes and host access
- **Capability restriction** — prevents privilege escalation via `unshare`/`clone`/`setns`
- **Ptrace blocking** — eliminates ptrace-based debugging/injection attacks
- **Socket restriction** — blocks `AF_ALG` (kernel crypto userspace API) socket usage in containers, mitigating CVE-2026-31431 and related attack surface
- **Socket restriction** — blocks dangerous socket address families (AF_ALG, AF_VSOCK, AF_PACKET, AF_TIPC, AF_RDS, AF_SMC, AF_CAN, AF_NFC, AF_BLUETOOTH, …) and AF_NETLINK protocols used by container-escape chains (NETLINK_NETFILTER for nf_tables LPEs, NETLINK_XFRM/AUDIT/KOBJECT_UEVENT), mitigating CVE-2026-31431, CVE-2024-50264, and the nf_tables LPE family (CVE-2022-32250 / CVE-2024-1086 / CVE-2024-26925 / …).
- **Execution integrity** — SBOM + runtime hash validation via `bpf_ima_file_hash`

Policies are loaded before container start and enforced at execution time. No runtime replacement. No learning mode. Kernel-native enforcement.
Expand Down
6 changes: 6 additions & 0 deletions cmd/micromize/root_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ func TestBuildDisabledSet(t *testing.T) {
wantDisabled: []string{"ptrace-restrict", "cap-restrict"},
wantEnabled: []string{"fs-restrict"},
},
{
name: "socket restrict can be disabled alongside others",
disableGadgets: "socket-restrict,cap-restrict",
wantDisabled: []string{"socket-restrict", "cap-restrict"},
wantEnabled: []string{"fs-restrict", "ptrace-restrict", "binary-attestation"},
},
{
name: "whitespace around names is trimmed",
disableGadgets: " ptrace-restrict , cap-restrict ",
Expand Down
61 changes: 49 additions & 12 deletions gadgets/socket-restrict/README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,61 @@
# socket-restrict

Restrict dangerous socket primitives in containers.
Restrict dangerous socket address families and high-risk `AF_NETLINK`
protocols in containers.

This gadget blocks all `AF_ALG` (kernel crypto userspace API) socket usage
inside containers. `AF_ALG` is rarely needed in containerized production
workloads — most TLS, SSH, and dm-crypt use cases never touch it — and
blocking it eliminates a class of kernel attack surface from the container
boundary.
This gadget started as an `AF_ALG` hardening control and now applies a baked-in
deny-list for socket families that are rarely needed in cloud-native workloads
but repeatedly show up in container-escape and local-privilege-escalation
chains. In addition to `AF_ALG`, it blocks families such as `AF_VSOCK`,
`AF_PACKET`, `AF_TIPC`, `AF_RDS`, `AF_SMC`, `AF_CAN`, `AF_NFC`,
`AF_BLUETOOTH`, `AF_AX25`, `AF_ATMPVC`, `AF_ATMSVC`, `AF_X25`, `AF_KCM`, and
`AF_CAIF`.

The initial motivation is CVE-2026-31431 (Copy Fail), a Linux kernel local
privilege escalation in `algif_aead` that can be triggered via `AF_ALG`
sockets. This gadget blocks the entire killchain at socket creation time,
before any vulnerable kernel path is reached.
The initial motivation is still CVE-2026-31431 (Copy Fail), a Linux kernel
local privilege escalation in `algif_aead` that can be triggered via `AF_ALG`
sockets. `AF_ALG` is rarely needed in containerized production workloads —
most TLS, SSH, and dm-crypt use cases never touch it — and blocking it
eliminates that kernel attack surface before any vulnerable path is reached.

The scope now also covers `AF_VSOCK` (for example CVE-2024-50264) and selected
`AF_NETLINK` protocols that repeatedly appear in `nf_tables` LPE chains,
including CVE-2022-32250, CVE-2022-34918, CVE-2023-32233, CVE-2024-1086,
CVE-2024-26925, CVE-2024-26581, and CVE-2024-26809.

Allowed by default: `AF_INET`, `AF_INET6`, `AF_UNIX`, and `AF_NETLINK`
protocols such as `NETLINK_ROUTE`, `NETLINK_GENERIC`, and
`NETLINK_SOCK_DIAG`.

## Default deny-list

| Scope | Value | Rationale |
|---|---|---|
| Family | `AF_ALG` (38) | Kernel crypto userspace API; preserves the original CVE-2026-31431 mitigation |
| Family | `AF_VSOCK` (40) | Virtio-vsock attack surface; see CVE-2024-50264 |
| Family | `AF_TIPC` (30) | Rare in containers; unnecessary kernel messaging surface |
| Family | `AF_RDS` (21) | Rare in containers; unnecessary kernel messaging surface |
| Family | `AF_SMC` (43) | Rare in containers; unnecessary specialized transport surface |
| Family | `AF_CAN` (29) | Rare in containers; unnecessary CAN bus surface |
| Family | `AF_NFC` (39) | Rare in containers; unnecessary NFC surface |
| Family | `AF_BLUETOOTH` (31) | Rare in containers; unnecessary Bluetooth surface |
| Family | `AF_AX25` (3) | Rare in containers; unnecessary amateur radio surface |
| Family | `AF_ATMPVC` (8) | Rare in containers; unnecessary ATM surface |
| Family | `AF_ATMSVC` (20) | Rare in containers; unnecessary ATM surface |
| Family | `AF_X25` (9) | Rare in containers; unnecessary X.25 surface |
| Family | `AF_KCM` (41) | Rare in containers; unnecessary KCM surface |
| Family | `AF_CAIF` (37) | Rare in containers; unnecessary CAIF surface |
| Family | `AF_PACKET` (17) | Raw link-layer socket surface rarely needed in application containers |
| `AF_NETLINK` protocol | `NETLINK_NETFILTER` (12) | Blocks the main `nf_tables` LPE family used by container-escape chains |
| `AF_NETLINK` protocol | `NETLINK_XFRM` (6) | Unnecessary IPsec/XFRM control plane in typical containers |
| `AF_NETLINK` protocol | `NETLINK_AUDIT` (9) | Unnecessary audit control plane in typical containers |
| `AF_NETLINK` protocol | `NETLINK_KOBJECT_UEVENT` (15) | Unnecessary uevent channel in typical containers |

## Hooks

| Hook | Purpose |
|---|---|
| `lsm/socket_create` | Block `AF_ALG` socket creation (main choke point) |
| `lsm/socket_bind` | Defense-in-depth: block `AF_ALG` bind if a socket FD exists from before policy load. Preserves `alg_type`/`alg_name` for visibility. |
| `lsm/socket_create` | Block denied socket families and selected `AF_NETLINK` protocols at creation time |
| `lsm/socket_bind` | Defense-in-depth: block denied binds if a socket FD existed before policy load. Preserves `alg_type` / `alg_name` for `AF_ALG` visibility. |

## Getting Started

Expand Down
3 changes: 3 additions & 0 deletions gadgets/socket-restrict/gadget.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ datasources:
family:
annotations:
description: Socket address family
protocol:
annotations:
description: Socket protocol (e.g. NETLINK_NETFILTER for AF_NETLINK)
process:
annotations:
description: The process attempting a restricted socket operation
Expand Down
84 changes: 64 additions & 20 deletions gadgets/socket-restrict/program.bpf.c
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

#include <vmlinux.h>

#include <bpf/bpf_core_read.h>
#include <gadget/buffer.h>
#include <gadget/filter.h>
#include <gadget/macros.h>
Expand All @@ -16,17 +17,53 @@ GADGET_TRACER_MAP(events, 1024 * 256);

GADGET_TRACER(socket_restrict, events, event);

// Block AF_ALG socket creation — main choke point.
static __always_inline bool is_denied_family(__u16 family, __u32 protocol) {
switch (family) {
case AF_ALG:
case AF_VSOCK:
case AF_TIPC:
case AF_RDS:
case AF_SMC:
case AF_CAN:
case AF_NFC:
case AF_BLUETOOTH:
case AF_AX25:
case AF_ATMPVC:
case AF_ATMSVC:
case AF_X25:
case AF_KCM:
case AF_CAIF:
case AF_PACKET:
return true;
case AF_NETLINK:
switch (protocol) {
case NETLINK_NETFILTER:
case NETLINK_XFRM:
case NETLINK_AUDIT:
case NETLINK_KOBJECT_UEVENT:
return true;
default:
return false;
}
default:
return false;
}
}

// Block denied socket families and selected AF_NETLINK protocols.
SEC("lsm/socket_create")
int BPF_PROG(micromize_socket_create, int family, int type, int protocol,
int kern) {
(void)type;

if (kern)
return 0;

if (gadget_should_discard_data_current())
return 0;

if (family != AF_ALG)
__u32 socket_protocol = protocol >= 0 ? (__u32)protocol : 0;
if (!is_denied_family((__u16)family, socket_protocol))
return 0;

struct event *event;
Expand All @@ -39,8 +76,10 @@ int BPF_PROG(micromize_socket_create, int family, int type, int protocol,

gadget_process_populate(&event->process);
event->timestamp_raw = bpf_ktime_get_boot_ns();
event->event_type = EVENT_TYPE_SOCKET_AF_ALG_CREATE;
event->family = AF_ALG;
event->event_type = family == AF_ALG ? EVENT_TYPE_SOCKET_AF_ALG_CREATE
: EVENT_TYPE_SOCKET_FAMILY_DENIED_CREATE;
event->family = (__u32)family;
event->protocol = socket_protocol;
event->alg_type[0] = '\0';
event->alg_name[0] = '\0';

Expand All @@ -52,22 +91,23 @@ int BPF_PROG(micromize_socket_create, int family, int type, int protocol,
return 0;
}

// Defense-in-depth: block AF_ALG bind if a socket FD exists from before
// policy load. Preserves alg_type/alg_name for visibility.
// Defense-in-depth: block denied socket binds if a socket FD exists from before
// policy load. Preserve AF_ALG alg_type/alg_name for visibility.
SEC("lsm/socket_bind")
int BPF_PROG(micromize_socket_bind, struct socket *sock,
struct sockaddr *address, int addrlen) {
(void)sock;

if (gadget_should_discard_data_current())
return 0;

if (!address || addrlen < SOCKADDR_ALG_TYPE_END)
if (!address || addrlen < sizeof(__u16))
return 0;

__u16 family = 0;
bpf_probe_read_kernel(&family, sizeof(family), address);
if (family != AF_ALG)

struct sock *sk = BPF_CORE_READ(sock, sk);
__u32 protocol = sk ? (__u32)BPF_CORE_READ_BITFIELD_PROBED(sk, sk_protocol) : 0;
if (!is_denied_family(family, protocol))
return 0;

struct event *event;
Expand All @@ -80,19 +120,23 @@ int BPF_PROG(micromize_socket_bind, struct socket *sock,

gadget_process_populate(&event->process);
event->timestamp_raw = bpf_ktime_get_boot_ns();
event->event_type = EVENT_TYPE_SOCKET_AF_ALG_BIND;
event->event_type = family == AF_ALG ? EVENT_TYPE_SOCKET_AF_ALG_BIND
: EVENT_TYPE_SOCKET_FAMILY_DENIED_BIND;
event->family = family;
event->protocol = protocol;
event->alg_type[0] = '\0';
event->alg_name[0] = '\0';

bpf_probe_read_kernel(event->alg_type, SOCKADDR_ALG_TYPE_LEN,
(const char *)address + SOCKADDR_ALG_TYPE_OFFSET);
event->alg_type[SOCKADDR_ALG_TYPE_LEN] = '\0';
if (family == AF_ALG && addrlen >= SOCKADDR_ALG_TYPE_END) {
bpf_probe_read_kernel(event->alg_type, SOCKADDR_ALG_TYPE_LEN,
(const char *)address + SOCKADDR_ALG_TYPE_OFFSET);
event->alg_type[SOCKADDR_ALG_TYPE_LEN] = '\0';

if (addrlen >= SOCKADDR_ALG_MIN_LEN) {
bpf_probe_read_kernel(event->alg_name, SOCKADDR_ALG_NAME_LEN,
(const char *)address + SOCKADDR_ALG_NAME_OFFSET);
event->alg_name[SOCKADDR_ALG_NAME_LEN - 1] = '\0';
} else {
event->alg_name[0] = '\0';
if (addrlen >= SOCKADDR_ALG_MIN_LEN) {
bpf_probe_read_kernel(event->alg_name, SOCKADDR_ALG_NAME_LEN,
(const char *)address + SOCKADDR_ALG_NAME_OFFSET);
event->alg_name[SOCKADDR_ALG_NAME_LEN - 1] = '\0';
}
}

gadget_submit_buf(ctx, &events, event, sizeof(*event));
Expand Down
77 changes: 77 additions & 0 deletions gadgets/socket-restrict/program.bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,86 @@
#define EPERM 1
#endif

#ifndef AF_AX25
#define AF_AX25 3
#endif

#ifndef AF_ATMPVC
#define AF_ATMPVC 8
#endif

#ifndef AF_X25
#define AF_X25 9
#endif

#ifndef AF_NETLINK
#define AF_NETLINK 16
#endif

#ifndef AF_PACKET
#define AF_PACKET 17
#endif

#ifndef AF_ATMSVC
#define AF_ATMSVC 20
#endif

#ifndef AF_RDS
#define AF_RDS 21
#endif

#ifndef AF_CAN
#define AF_CAN 29
#endif

#ifndef AF_TIPC
#define AF_TIPC 30
#endif

#ifndef AF_BLUETOOTH
#define AF_BLUETOOTH 31
#endif

#ifndef AF_CAIF
#define AF_CAIF 37
#endif

#ifndef AF_ALG
#define AF_ALG 38
#endif

#ifndef AF_NFC
#define AF_NFC 39
#endif

#ifndef AF_VSOCK
#define AF_VSOCK 40
#endif

#ifndef AF_KCM
#define AF_KCM 41
#endif

#ifndef AF_SMC
#define AF_SMC 43
#endif

#ifndef NETLINK_XFRM
#define NETLINK_XFRM 6
#endif

#ifndef NETLINK_AUDIT
#define NETLINK_AUDIT 9
#endif

#ifndef NETLINK_NETFILTER
#define NETLINK_NETFILTER 12
#endif

#ifndef NETLINK_KOBJECT_UEVENT
#define NETLINK_KOBJECT_UEVENT 15
#endif

#define SOCKADDR_ALG_TYPE_OFFSET 2
#define SOCKADDR_ALG_TYPE_LEN 14
#define SOCKADDR_ALG_TYPE_END (SOCKADDR_ALG_TYPE_OFFSET + SOCKADDR_ALG_TYPE_LEN)
Expand All @@ -27,6 +103,7 @@ struct event {
struct gadget_process process;
__u32 event_type;
__u32 family;
__u32 protocol;
char alg_type[EVENT_ALG_TYPE_LEN];
char alg_name[SOCKADDR_ALG_NAME_LEN];
};
2 changes: 2 additions & 0 deletions include/micromize/event_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ enum micromize_event_type {
// socket-restrict
EVENT_TYPE_SOCKET_AF_ALG_CREATE = 11,
EVENT_TYPE_SOCKET_AF_ALG_BIND = 12,
EVENT_TYPE_SOCKET_FAMILY_DENIED_CREATE = 14,
EVENT_TYPE_SOCKET_FAMILY_DENIED_BIND = 15,
};

#endif /* __MICROMIZE_EVENT_TYPES_H */
19 changes: 19 additions & 0 deletions internal/gadget/registry_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,25 @@ func TestRegistry_Register(t *testing.T) {
}
}

func TestRegistry_RegisterAllDefaultGadgets(t *testing.T) {
r := NewRegistry(&mockContextCreator{}, &mockRuntimeManager{})
gadgets := []string{"fs-restrict", "cap-restrict", "ptrace-restrict", "socket-restrict", "binary-attestation"}

for _, name := range gadgets {
r.Register(name, &GadgetConfig{ImageName: name + "-image"})
}

if len(r.gadgets) != len(gadgets) {
t.Fatalf("expected %d gadgets, got %d", len(gadgets), len(r.gadgets))
}

for _, name := range gadgets {
if _, ok := r.gadgets[name]; !ok {
t.Errorf("expected gadget %q to be registered", name)
}
}
}

func TestRegistry_RunAll(t *testing.T) {
done := make(chan struct{})
var once sync.Once
Expand Down
4 changes: 4 additions & 0 deletions internal/operators/operators.go
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,8 @@ const (
eventTypeSocketAFAlgCreate = 11
eventTypeSocketAFAlgBind = 12
eventTypeCapModuleAutoload = 13
eventTypeSocketFamilyDeniedCreate = 14
eventTypeSocketFamilyDeniedBind = 15
)

var eventTypeNames = map[uint32]string{
Expand All @@ -327,6 +329,8 @@ var eventTypeNames = map[uint32]string{
eventTypeSocketAFAlgCreate: "af_alg_socket_create",
eventTypeSocketAFAlgBind: "af_alg_socket_bind",
eventTypeCapModuleAutoload: "module_autoload",
eventTypeSocketFamilyDeniedCreate: "socket_family_denied_create",
eventTypeSocketFamilyDeniedBind: "socket_family_denied_bind",
}

// NewEventTypeOperator creates an operator that enriches events with a
Expand Down
Loading
Loading