Linux Kernel Sandboxing: Landlock & Secc…

Introduction: Why Kernel-Level Sandboxing Matters More Than Ever

The Linux kernel attack surface has gotten dramatically larger, and honestly, the numbers are sobering. In just the first 16 days of 2025, 134 new Linux kernel CVEs were published. By October 2025, CISA had added seven kernel vulnerabilities to its Known Exploited Vulnerabilities catalog — every single one actively weaponized against enterprise infrastructure. Ransomware groups like Qilin, Kraken, and RansomHub exploited kernel-level flaws to breach more than 700 organizations across 62 countries.

The takeaway? Perimeter defenses and system call monitoring alone just aren't enough anymore.

Modern Linux security demands defense-in-depth at the kernel level — restricting what processes can do even after an attacker gains code execution. This is where kernel sandboxing comes in: Landlock, seccomp-BPF, and the security implications of io_uring form the modern triad of process-level security controls that every Linux administrator and DevSecOps engineer needs to understand.

In this guide, I'll walk you through each mechanism — how they work, how to implement them, and how to layer them together for robust defense-in-depth. Whether you're hardening a production server, building a CI/CD pipeline, or securing containerized workloads, you'll come away with the knowledge and tools to implement kernel-level sandboxing today.

Understanding the Linux Sandboxing Stack

Before we dive into the individual mechanisms, it's worth stepping back to see how they all fit together in the broader Linux security architecture. Each one tackles a different layer of the problem:

Seccomp-BPF — Filters which system calls a process is allowed to make. It sits at the syscall entry point, deciding whether a call proceeds or gets blocked.
Landlock — Controls which filesystem paths and network ports a process can access. It works at the LSM (Linux Security Module) layer, enforcing fine-grained access control without needing root privileges.
Namespaces and cgroups — Isolate process views of system resources (PIDs, network, mount points) and limit resource consumption.
io_uring — An asynchronous I/O interface that bypasses traditional syscall paths, creating a security blind spot that you need to explicitly address.

The most effective sandboxing strategies combine these mechanisms. Seccomp restricts which operations a process can perform, Landlock restricts which resources it can access, and io_uring needs to be either disabled or carefully monitored to prevent bypass. Think of it as locks on different doors — each one independently valuable, but together they're far stronger.

Seccomp-BPF: System Call Filtering

How Seccomp Works

Seccomp (Secure Computing Mode) was originally introduced in Linux 2.6.12 as a strict mode that only allowed read(), write(), exit(), and sigreturn(). That's it — four syscalls and nothing else. In Linux 3.5, seccomp-BPF extended this with programmable Berkeley Packet Filter programs that can inspect and filter any system call based on its number and arguments.

Here are the key security properties that make seccomp-BPF so valuable:

Irreversible — Once a seccomp filter is applied, it can't be removed. Child processes inherit the filter, too.
TOCTOU-resistant — BPF programs can't dereference pointers, so they evaluate syscall arguments directly. This prevents time-of-check-time-of-use attacks.
Architecture-aware — Filters must verify the CPU architecture first, since syscall numbers vary across architectures (a subtle detail that bites people more often than you'd expect).

Writing a Seccomp Filter in C

Here's a practical example of a seccomp-BPF filter that implements a whitelist approach — only allowing the system calls your application actually needs:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stddef.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/audit.h>
#include <sys/syscall.h>

/* Macros for BPF filter construction */
#define VALIDATE_ARCHITECTURE \
    BPF_STMT(BPF_LD | BPF_W | BPF_ABS, \
             offsetof(struct seccomp_data, arch)), \
    BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, AUDIT_ARCH_X86_64, 1, 0), \
    BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL)

#define EXAMINE_SYSCALL \
    BPF_STMT(BPF_LD | BPF_W | BPF_ABS, \
             offsetof(struct seccomp_data, nr))

#define ALLOW_SYSCALL(name) \
    BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_##name, 0, 1), \
    BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW)

#define KILL_PROCESS \
    BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL)

int main(int argc, char *argv[]) {
    struct sock_filter filter[] = {
        /* Validate architecture */
        VALIDATE_ARCHITECTURE,
        /* Load syscall number */
        EXAMINE_SYSCALL,
        /* Allow only essential syscalls */
        ALLOW_SYSCALL(read),
        ALLOW_SYSCALL(write),
        ALLOW_SYSCALL(exit),
        ALLOW_SYSCALL(exit_group),
        ALLOW_SYSCALL(brk),
        ALLOW_SYSCALL(mmap),
        ALLOW_SYSCALL(fstat),
        ALLOW_SYSCALL(close),
        /* Kill on any other syscall */
        KILL_PROCESS,
    };

    struct sock_fprog prog = {
        .len = (unsigned short)(sizeof(filter) / sizeof(filter[0])),
        .filter = filter,
    };

    /* Ensure no new privileges can be gained */
    prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);

    /* Apply the seccomp filter */
    if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
        perror("prctl(PR_SET_SECCOMP)");
        return 1;
    }

    /* This will succeed — write is allowed */
    printf("Seccomp filter applied. Only whitelisted syscalls allowed.\n");

    /* Any disallowed syscall would kill the process */
    return 0;
}

Using libseccomp for Easier Filter Management

Writing raw BPF programs is error-prone — I've seen even experienced developers make off-by-one mistakes in their filter jump offsets. The libseccomp library provides a much friendlier high-level API that's both safer and more maintainable:

#include <seccomp.h>
#include <stdio.h>
#include <unistd.h>

int main() {
    /* Create a filter context with a default KILL action */
    scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL);
    if (!ctx) {
        fprintf(stderr, "Failed to initialize seccomp\n");
        return 1;
    }

    /* Whitelist necessary syscalls */
    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0);
    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0);
    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit), 0);
    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit_group), 0);
    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(brk), 0);
    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(mmap), 0);
    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(close), 0);
    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(fstat), 0);

    /* Load and enforce the filter */
    if (seccomp_load(ctx) < 0) {
        fprintf(stderr, "Failed to load seccomp filter\n");
        seccomp_release(ctx);
        return 1;
    }

    printf("Seccomp filter active via libseccomp.\n");

    seccomp_release(ctx);
    return 0;
}

Compile with: gcc -o sandbox sandbox.c -lseccomp

Seccomp in Practice: Systemd Integration

For services managed by systemd, you can apply seccomp filters declaratively without writing any code at all. The SystemCallFilter directive in unit files is probably the most practical way to harden production services — and it's something you can roll out incrementally:

# /etc/systemd/system/myapp.service
[Unit]
Description=My Hardened Application
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/myapp
User=myapp
Group=myapp

# Seccomp: only allow these syscall groups
SystemCallFilter=@system-service
SystemCallFilter=~@privileged @resources @mount

# Additional hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
PrivateDevices=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes

[Install]
WantedBy=multi-user.target

The @system-service group includes the syscalls typically needed by well-behaved system services. The ~ prefix denies the listed groups. You can inspect what's in each group with systemd-analyze syscall-filter — it's genuinely useful for understanding what you're allowing and blocking.

Landlock: Unprivileged Filesystem and Network Access Control

What Makes Landlock Different

Landlock is a Linux Security Module (LSM) introduced in kernel 5.13 that fundamentally changes the access control paradigm. Unlike SELinux or AppArmor — which require administrator involvement and system-wide configuration — Landlock lets any unprivileged process restrict its own access rights. This makes it uniquely suited for application-level sandboxing.

Here's what makes it special:

Stackable — Landlock works alongside SELinux, AppArmor, and other LSMs. It adds restrictions but can't bypass existing ones.
Unprivileged — No root or CAP_SYS_ADMIN required. Any process can sandbox itself.
Cumulative — Each new ruleset adds another layer of restrictions. A process can tighten but never loosen its own constraints.
Inheritable — Child processes inherit the parent's Landlock domain and can't escape it.

That last point is really important. Once you sandbox a process, everything it spawns is sandboxed too. No backsies.

Landlock ABI Versions

Landlock has evolved across kernel releases, and understanding the ABI versions is essential for writing portable code:

ABI Version	Kernel	Capabilities Added
1	5.13	Basic filesystem access rights (read, write, execute, make_dir, etc.)
2	5.19	`LANDLOCK_ACCESS_FS_REFER` — control file renames and links across directories
3	6.2	`LANDLOCK_ACCESS_FS_TRUNCATE` — control file truncation
4	6.4	Network access rights: `LANDLOCK_ACCESS_NET_BIND_TCP` and `LANDLOCK_ACCESS_NET_CONNECT_TCP`
5	6.10	`LANDLOCK_ACCESS_FS_IOCTL_DEV` — control device ioctl operations
6	6.12	IPC scoping: `LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET` and `LANDLOCK_SCOPE_SIGNAL`

Implementing Landlock in C

Here's a complete example that creates a sandbox restricting a process to read-only access to /usr and read-write access to /tmp, while blocking everything else:

#include <linux/landlock.h>
#include <sys/syscall.h>
#include <sys/prctl.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#ifndef landlock_create_ruleset
static inline int landlock_create_ruleset(
    const struct landlock_ruleset_attr *attr, size_t size, __u32 flags) {
    return syscall(__NR_landlock_create_ruleset, attr, size, flags);
}
#endif

#ifndef landlock_add_rule
static inline int landlock_add_rule(int ruleset_fd,
    enum landlock_rule_type type,
    const void *attr, __u32 flags) {
    return syscall(__NR_landlock_add_rule, ruleset_fd, type, attr, flags);
}
#endif

#ifndef landlock_restrict_self
static inline int landlock_restrict_self(int ruleset_fd, __u32 flags) {
    return syscall(__NR_landlock_restrict_self, ruleset_fd, flags);
}
#endif

int add_fs_rule(int ruleset_fd, const char *path, __u64 access) {
    struct landlock_path_beneath_attr path_beneath = {
        .allowed_access = access,
    };
    path_beneath.parent_fd = open(path, O_PATH | O_CLOEXEC);
    if (path_beneath.parent_fd < 0) {
        perror("open");
        return -1;
    }
    int ret = landlock_add_rule(ruleset_fd,
        LANDLOCK_RULE_PATH_BENEATH, &path_beneath, 0);
    close(path_beneath.parent_fd);
    return ret;
}

int main(int argc, char *argv[]) {
    /* Check Landlock ABI version */
    int abi = landlock_create_ruleset(NULL, 0,
        LANDLOCK_CREATE_RULESET_VERSION);
    if (abi < 0) {
        perror("Landlock not supported");
        return 1;
    }
    printf("Landlock ABI version: %d\n", abi);

    /* Define the ruleset: handle filesystem access */
    struct landlock_ruleset_attr ruleset_attr = {
        .handled_access_fs =
            LANDLOCK_ACCESS_FS_READ_FILE |
            LANDLOCK_ACCESS_FS_READ_DIR |
            LANDLOCK_ACCESS_FS_WRITE_FILE |
            LANDLOCK_ACCESS_FS_MAKE_REG |
            LANDLOCK_ACCESS_FS_EXECUTE,
    };

    int ruleset_fd = landlock_create_ruleset(&ruleset_attr,
        sizeof(ruleset_attr), 0);
    if (ruleset_fd < 0) {
        perror("landlock_create_ruleset");
        return 1;
    }

    /* Allow read-only + execute access to /usr */
    __u64 ro_access = LANDLOCK_ACCESS_FS_READ_FILE |
                      LANDLOCK_ACCESS_FS_READ_DIR |
                      LANDLOCK_ACCESS_FS_EXECUTE;
    if (add_fs_rule(ruleset_fd, "/usr", ro_access) < 0) {
        fprintf(stderr, "Failed to add rule for /usr\n");
        return 1;
    }

    /* Allow read-write access to /tmp */
    __u64 rw_access = LANDLOCK_ACCESS_FS_READ_FILE |
                      LANDLOCK_ACCESS_FS_READ_DIR |
                      LANDLOCK_ACCESS_FS_WRITE_FILE |
                      LANDLOCK_ACCESS_FS_MAKE_REG;
    if (add_fs_rule(ruleset_fd, "/tmp", rw_access) < 0) {
        fprintf(stderr, "Failed to add rule for /tmp\n");
        return 1;
    }

    /* Ensure no new privileges can be gained */
    prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);

    /* Enforce the ruleset */
    if (landlock_restrict_self(ruleset_fd, 0)) {
        perror("landlock_restrict_self");
        return 1;
    }
    close(ruleset_fd);

    printf("Landlock sandbox active.\n");
    printf("Try accessing files outside /usr and /tmp...\n");

    /* This will fail — /etc is not in our ruleset */
    FILE *f = fopen("/etc/passwd", "r");
    if (!f) {
        printf("Blocked: Cannot read /etc/passwd (expected)\n");
    } else {
        fclose(f);
    }

    /* This will succeed — /tmp is read-write */
    f = fopen("/tmp/landlock_test.txt", "w");
    if (f) {
        fprintf(f, "Landlock allows writing here.\n");
        fclose(f);
        printf("Success: Wrote to /tmp/landlock_test.txt\n");
    }

    return 0;
}

Using Landrun for Quick Command-Line Sandboxing

So, you don't want to write C code just to sandbox a process? Fair enough. For administrators and DevOps engineers who need practical sandboxing right now, landrun is a lightweight CLI tool written in Go that wraps Landlock into a user-friendly interface. Think of it as firejail, but backed by kernel-native security — and it doesn't need root.

Install landrun:

# Download the latest release
curl -L https://github.com/Zouuup/landrun/releases/latest/download/landrun-linux-amd64 \
  -o /usr/local/bin/landrun
chmod +x /usr/local/bin/landrun

Here are some practical examples:

# Run ls with read-only access to /home, execute access to /usr
landrun --rox /usr/bin/ls --rox /usr/lib --ro /home -- ls /home

# Allow a build tool to read source and write to output only
landrun --ro /src --rw /src/build --rox /usr -- make -C /src

# Block all filesystem access except what is explicitly allowed
# This will fail because /tmp is not granted
landrun --rox /usr -- touch /tmp/blocked
# touch: cannot touch '/tmp/blocked': Permission denied

# Grant write access to /tmp
landrun --rox /usr --rw /tmp -- touch /tmp/allowed
# Success

# Restrict network access on Linux 6.7+ kernels
# Allow binding only to port 8080
landrun --rox /usr --rw /tmp --bind 8080 -- ./my-server

# Allow connecting only to port 443 (HTTPS)
landrun --rox /usr --rw /tmp --connect 443 -- curl https://example.com

One thing to watch out for: by default, landrun passes no environment variables to the sandboxed process. Use --env to explicitly pass what you need:

landrun --rox /usr --ro /etc --env PATH --env HOME -- my-command

The io_uring Security Problem

Why io_uring Is a Blind Spot

Here's where things get uncomfortable. io_uring, introduced in Linux 5.1, provides a high-performance asynchronous I/O interface using shared memory ring buffers between user space and the kernel. The performance gains are real and impressive for I/O-heavy workloads. But there's a catch — and it's a big one: io_uring operations bypass the traditional system call interface entirely.

This has three critical implications:

Seccomp bypass — Seccomp filters have zero visibility into operations submitted via io_uring. A process with access to io_uring can perform file reads, writes, network operations, and more — all while bypassing your carefully crafted seccomp sandbox entirely.
Security tool evasion — In April 2025, security firm ARMO published a proof-of-concept rootkit called "curing" that showed how io_uring can bypass major security tools including CrowdStrike Falcon, Microsoft Defender for Endpoint, Falco, and Tetragon. These tools rely on system call hooking, which io_uring simply sidesteps.
Vulnerability history — Google reported that 60% of kernel exploits submitted to their bug bounty program in 2022 targeted io_uring. That number was alarming enough for them to disable it entirely on ChromeOS, production servers, and Android devices.

When Google disables a feature on their own infrastructure, that should tell you something.

Disabling io_uring System-Wide

Linux 6.6 introduced the io_uring_disabled sysctl parameter, giving you a straightforward way to disable io_uring at runtime without recompiling the kernel:

# Check current io_uring status
cat /proc/sys/kernel/io_uring_disabled
# 0 = fully enabled (default)
# 1 = disabled for unprivileged users
# 2 = disabled for all users

# Disable io_uring for unprivileged users
echo 1 | sudo tee /proc/sys/kernel/io_uring_disabled

# Disable io_uring completely
echo 2 | sudo tee /proc/sys/kernel/io_uring_disabled

# Make the change persistent across reboots
echo "kernel.io_uring_disabled = 2" | sudo tee /etc/sysctl.d/99-disable-io-uring.conf
sudo sysctl --system

For the vast majority of server workloads, disabling io_uring carries no functional impact whatsoever. Unless your application specifically uses io_uring for async I/O (common in some database engines and high-performance network services), disabling it is frankly a no-brainer for your security posture.

Detecting io_uring Usage

Before flipping that switch, you'll probably want to check if anything on your system is actually using io_uring:

# Check for processes that have io_uring file descriptors
# io_uring fds show as "anon_inode:[io_uring]" in /proc
for pid in /proc/[0-9]*; do
    if ls -la "$pid/fd" 2>/dev/null | grep -q io_uring; then
        echo "PID $(basename $pid): $(cat $pid/comm 2>/dev/null)"
    fi
done

# Using bpftrace to monitor io_uring_setup calls in real-time
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_io_uring_setup {
    printf("PID %d (%s) called io_uring_setup\n", pid, comm);
}'

Restricting io_uring via Seccomp

If you can't use the sysctl approach (for example, on kernels older than 6.6), you can block io_uring at the process level using seccomp instead:

#include <seccomp.h>
#include <stdio.h>
#include <errno.h>

int block_io_uring() {
    scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_ALLOW);
    if (!ctx) return -1;

    /* Block io_uring_setup — prevents creating new io_uring instances */
    seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM),
        SCMP_SYS(io_uring_setup), 0);

    /* Block io_uring_enter — prevents submitting operations */
    seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM),
        SCMP_SYS(io_uring_enter), 0);

    /* Block io_uring_register — prevents registering buffers */
    seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM),
        SCMP_SYS(io_uring_register), 0);

    int ret = seccomp_load(ctx);
    seccomp_release(ctx);
    return ret;
}

For systemd services, it's even simpler — just one line:

[Service]
# Block io_uring syscalls
SystemCallFilter=~io_uring_setup io_uring_enter io_uring_register

Combining Mechanisms for Defense-in-Depth

A Layered Security Architecture

No single sandboxing mechanism is enough on its own. I can't stress this enough. The most secure approach layers seccomp, Landlock, and io_uring restrictions together. Here's the recommended order of operations:

Disable io_uring — Remove the primary bypass vector first.
Apply Landlock rules — Restrict filesystem and network access to the bare minimum.
Install seccomp filters — Whitelist only the system calls your application needs.
Drop capabilities — Remove any Linux capabilities that aren't required.
Set NO_NEW_PRIVS — Prevent privilege escalation via setuid binaries.

Order matters here. You want to close the io_uring bypass before you set up the filters it could bypass.

Combined Sandbox Implementation

Here's a shell script that applies comprehensive sandboxing to any command using available tools:

#!/bin/bash
# combined-sandbox.sh — Defense-in-depth sandboxing wrapper
# Usage: ./combined-sandbox.sh <command> [args...]

set -euo pipefail

COMMAND="$@"
SANDBOX_DIR="/tmp/sandbox-$$"

echo "[*] Setting up defense-in-depth sandbox..."

# Step 1: Disable io_uring system-wide if not already done
IO_URING_STATUS=$(cat /proc/sys/kernel/io_uring_disabled 2>/dev/null || echo "N/A")
if [ "$IO_URING_STATUS" = "0" ]; then
    echo "[!] WARNING: io_uring is enabled system-wide."
    echo "    Consider: echo 2 > /proc/sys/kernel/io_uring_disabled"
fi

# Step 2: Create isolated working directory
mkdir -p "$SANDBOX_DIR"
echo "[+] Created sandbox directory: $SANDBOX_DIR"

# Step 3: Apply Landlock + seccomp via systemd-run (systemd 254+)
if command -v systemd-run &>/dev/null; then
    echo "[+] Using systemd-run for comprehensive sandboxing..."
    systemd-run \
        --user \
        --scope \
        --property="NoNewPrivileges=yes" \
        --property="ProtectSystem=strict" \
        --property="ProtectHome=yes" \
        --property="PrivateTmp=yes" \
        --property="PrivateDevices=yes" \
        --property="ProtectKernelTunables=yes" \
        --property="ProtectKernelModules=yes" \
        --property="SystemCallFilter=~io_uring_setup io_uring_enter io_uring_register" \
        --property="SystemCallFilter=~@mount @reboot @swap @raw-io" \
        --property="ReadWritePaths=$SANDBOX_DIR" \
        -- $COMMAND

# Step 4: Fallback to landrun if available
elif command -v landrun &>/dev/null; then
    echo "[+] Using landrun for Landlock sandboxing..."
    landrun \
        --rox /usr \
        --rox /lib \
        --rox /lib64 \
        --ro /etc \
        --rw "$SANDBOX_DIR" \
        -- $COMMAND
else
    echo "[-] No sandboxing tool available. Running unsandboxed."
    $COMMAND
fi

# Cleanup
rm -rf "$SANDBOX_DIR"
echo "[+] Sandbox cleaned up."

Hardening a Container Runtime

For containerized workloads, you'll want to apply these mechanisms at the container level. Here's an OCI seccomp profile that explicitly blocks io_uring:

{
    "defaultAction": "SCMP_ACT_ERRNO",
    "defaultErrnoRet": 1,
    "archMap": [
        {
            "architecture": "SCMP_ARCH_X86_64",
            "subArchitectures": ["SCMP_ARCH_X86", "SCMP_ARCH_X32"]
        },
        {
            "architecture": "SCMP_ARCH_AARCH64",
            "subArchitectures": ["SCMP_ARCH_ARM"]
        }
    ],
    "syscalls": [
        {
            "names": ["read", "write", "open", "close", "stat", "fstat",
                       "mmap", "mprotect", "munmap", "brk", "exit_group",
                       "access", "getpid", "socket", "connect", "bind",
                       "listen", "accept4", "sendto", "recvfrom",
                       "clone", "execve", "wait4", "kill", "fcntl",
                       "dup2", "pipe2", "epoll_create1", "epoll_ctl",
                       "epoll_wait", "futex", "nanosleep"],
            "action": "SCMP_ACT_ALLOW"
        },
        {
            "names": ["io_uring_setup", "io_uring_enter", "io_uring_register"],
            "action": "SCMP_ACT_ERRNO",
            "errnoRet": 1,
            "comment": "Block io_uring to prevent seccomp bypass"
        }
    ]
}

Apply this profile when running a container:

# Docker
docker run --security-opt seccomp=custom-seccomp.json myapp

# Podman
podman run --security-opt seccomp=custom-seccomp.json myapp

# Kubernetes (via pod security context)
# Reference the profile in your pod spec:
# securityContext:
#   seccompProfile:
#     type: Localhost
#     localhostProfile: profiles/custom-seccomp.json

Auditing and Monitoring Your Sandbox

Verifying Seccomp Status

Trust but verify. Here's how you can confirm that seccomp filters are actually active for a running process:

# Check seccomp status for a process
grep Seccomp /proc/<PID>/status
# Seccomp:         2
# Seccomp_filters: 1
# Mode 2 = SECCOMP_MODE_FILTER (BPF active)

# List all processes with seccomp filters
for pid in /proc/[0-9]*/status; do
    if grep -q "Seccomp:[[:space:]]*2" "$pid" 2>/dev/null; then
        name=$(grep "^Name:" "$pid" | awk '{print $2}')
        p=$(echo "$pid" | grep -o '[0-9]*')
        echo "PID $p ($name): seccomp-bpf active"
    fi
done

Landlock Audit Integration

Starting with Linux kernel 6.15, Landlock integrates with the Linux Audit subsystem. Denied access attempts generate audit events you can actually monitor — a huge improvement over the earlier "silent deny" behavior:

# Enable audit logging for Landlock denials
sudo auditctl -a always,exit -F arch=b64 -S openat -k landlock-deny

# Search audit logs for Landlock-related denials
sudo ausearch -k landlock-deny --start recent

Building a Monitoring Dashboard

For production environments, you'll want to pipe sandbox monitoring into your observability stack. Here's a quick script that exports seccomp violations as Prometheus metrics:

#!/bin/bash
# seccomp-metrics.sh — Export seccomp audit events as Prometheus metrics

METRICS_FILE="/var/lib/node_exporter/seccomp.prom"

# Count seccomp violations in the last minute
VIOLATIONS=$(journalctl --since "1 minute ago" -k | \
    grep -c "seccomp" 2>/dev/null || echo 0)

cat > "$METRICS_FILE" <<METRICS_EOF
# HELP seccomp_violations_total Number of seccomp violations in the last minute
# TYPE seccomp_violations_total gauge
seccomp_violations_total $VIOLATIONS
METRICS_EOF

Best Practices and Recommendations

For Server Administrators

Disable io_uring system-wide unless you have a verified, specific need for it. Set kernel.io_uring_disabled = 2 in your sysctl configuration. This is the single highest-impact change you can make.
Use systemd hardening directives for every service you run. At minimum, set NoNewPrivileges=yes, ProtectSystem=strict, and a restrictive SystemCallFilter.
Audit your services with systemd-analyze security <service> to get a security exposure score and spot hardening opportunities. You might be surprised by what it finds.
Enable Landlock audit logging on kernels 6.15+ to gain visibility into access denials.

For Application Developers

Embed Landlock sandboxing directly in your application startup code. Drop filesystem and network access rights as early as possible — ideally right after initialization.
Use libseccomp rather than raw BPF for maintainability and cross-architecture support. Your future self will thank you.
Test your sandbox — make sure blocked operations actually fail. A sandbox that silently permits everything is worse than having no sandbox at all, because it gives you a false sense of security.
Handle ABI compatibility — check the Landlock ABI version at runtime and gracefully degrade on older kernels.

For DevSecOps Engineers

Include seccomp profiles in your CI/CD pipeline. Generate profiles during development and test them in staging before deploying to production.
Use OCI seccomp profiles with explicit io_uring blocks for all container workloads.
Automate sandbox verification — write integration tests that confirm sandboxed processes can't access restricted resources. If you're not testing your sandbox, you don't really have one.
Monitor sandbox violations as security signals, not just noise. A spike in seccomp denials might indicate an active exploitation attempt.

Conclusion

Linux kernel sandboxing has come a long way. With Landlock providing unprivileged filesystem and network access control, seccomp-BPF offering system call filtering, and the ability to restrict or disable io_uring, Linux administrators now have genuinely powerful tools to implement true defense-in-depth.

The key insight is that these mechanisms are complementary — seccomp controls what operations a process can perform, Landlock controls which resources it can access, and io_uring restrictions close the bypass vector that would otherwise undermine both.

The tooling has matured too. Systemd provides declarative sandboxing for services, landrun makes Landlock accessible from the command line, and container runtimes support custom seccomp profiles natively. There's really no longer a good excuse for running production workloads without kernel-level sandboxing.

My recommendation? Start with the highest-impact, lowest-effort change: disable io_uring system-wide. Then progressively apply systemd hardening directives to your services. Finally, invest in application-level Landlock and seccomp integration for your most security-critical workloads. Each layer you add raises the bar for attackers and shrinks the blast radius of any single vulnerability.

Introduction: Why Kernel-Level Sandboxing Matters More Than Ever

Understanding the Linux Sandboxing Stack

Seccomp-BPF: System Call Filtering

How Seccomp Works

Writing a Seccomp Filter in C

Using libseccomp for Easier Filter Management

Seccomp in Practice: Systemd Integration

Landlock: Unprivileged Filesystem and Network Access Control

What Makes Landlock Different

Landlock ABI Versions

Implementing Landlock in C

Using Landrun for Quick Command-Line Sandboxing

The io_uring Security Problem

Why io_uring Is a Blind Spot

Disabling io_uring System-Wide

Detecting io_uring Usage

Restricting io_uring via Seccomp

Combining Mechanisms for Defense-in-Depth

A Layered Security Architecture

Combined Sandbox Implementation

Hardening a Container Runtime

Auditing and Monitoring Your Sandbox

Verifying Seccomp Status

Landlock Audit Integration

Building a Monitoring Dashboard

Best Practices and Recommendations

For Server Administrators

For Application Developers

For DevSecOps Engineers

Conclusion

Related articles

Related Articles

Linux Kernel Live Patching in 2026: kpatch, Ubuntu Livepatch, and TuxCare KernelCare Compared

Wolfi vs Distroless vs Alpine in 2026: Choosing Secure Container Base Images

Keylime Remote Attestation on Linux: TPM 2.0 Trust for Server Fleets in 2026