Inspecting container checkpoints with checkpointctl

One of the newer features in Kubernetes (1.30 and later) is the Kubelet Checkpoint API. This new API allows users to create a stateful copy of a running container, a functionality which is often used for forensics or for debugging.

In Kubernetes installations where this feature is enabled, a checkpoint can be created by accessing the respective Kubelet API via curl or similar. In the following example I am also using the Kubernetes API /proxy endpoint (the same can also be done on the Node locally via localhost:10250/checkpoint/...):

$ curl -k -X POST --header "Authorization: Bearer $TOKEN" "$KUBERNETES_API_URL/api/v1/nodes/$NODE_NAME/proxy/checkpoint/$NAMESPACE_NAME/$POD_NAME/$CONTAINER_NAME"
{"items":["/var/lib/kubelet/checkpoints/checkpoint-fedora-74d79dd7f4-csrmg_skrenger-container-2024-12-12T12:56:19Z.tar"]}

So the above creates a checkpoint archive for the Pod fedora-74d79dd7f4-csrmg for the container container in the namespace skrenger. The kubelet will request a checkpoint from the underlying CRI implementation, which will then store it on the Node under the above path.

Once we copied the checkpoint archive to our local machine (and after renaming it a bit), we can inspect it using the checkpointctl inspect command, the --all switch gives us a good overview:

$ sudo checkpointctl inspect --all checkpoint.tar 

Displaying container checkpoint tree view from checkpoint.tar

container
├── Image: registry.fedoraproject.org/fedora:41
├── ID: 3d948544a98b53f3d55c4ee276ad1aa386edef338047e798660d878d41e44eaa
├── Runtime: runc
├── Created: 2024-12-12T12:51:29.151586225Z
├── Checkpointed: 2024-12-12T12:56:19Z
├── Engine: CRI-O
├── IP: 10.128.2.14
├── Checkpoint size: 122.8 KiB
│   └── Memory pages size: 104.0 KiB
├── Root FS diff size: 3.0 KiB
├── CRIU dump statistics
│   ├── Freezing time: 326 µs
│   ├── Frozen time: 60.087 ms
│   ├── Memdump time: 447 µs
│   ├── Memwrite time: 97 µs
│   ├── Pages scanned: 643
│   └── Pages written: 26
├── Metadata
│   ├── Pod name: fedora-74d79dd7f4-csrmg
│   ├── Kubernetes namespace: skrenger
│   └── Annotations
│       ├── io.kubernetes.container.name: container
│       ├── io.kubernetes.container.terminationMessagePath: /dev/termination-log
│       ├── io.kubernetes.cri-o.ContainerType: container
│       ├── io.kubernetes.cri-o.Created: 2024-12-12T12:51:29.151586225Z
[..]
├── Process tree
│   └── [1]  sleep infinity 
│       ├── Environment variables
│       │   ├── PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
│       │   ├── TERM=xterm
│       │   ├── HOSTNAME=fedora-74d79dd7f4-csrmg
│       │   ├── MYSECRETVARIABLE=supersecret
[..]
│       └── Open files
│           ├── [REG 0]  /dev/null
│           ├── [PIPE 1]  pipe[2114213]
│           ├── [PIPE 2]  pipe[2114214]
│           ├── [cwd]  /
│           └── [root]  /
└── Overview of mounts
    ├── Destination: /proc
    │   ├── Type: proc
    │   └── Source: proc
[..]
    ├── Destination: /run/secrets
    │   ├── Type: bind
    │   └── Source: /var/run/containers/storage/overlay-containers/3d948544a98b53f3d55c4ee276ad1aa386edef338047e798660d878d41e44eaa/userdata/run/secrets
    └── Destination: /var/run/secrets/kubernetes.io/serviceaccount
        ├── Type: bind
        └── Source: /var/lib/kubelet/pods/84f50b87-6a7f-4dd0-a492-88691809d92b/volumes/kubernetes.io~projected/kube-api-access-m7lkb

Of course we can also unpack the archive and have a look at it. In the archive, there is also a rootfs-diff.tar file which contains the ephemeral filesystem changes in the container, useful to identify files that were created. In this example I created an evilscript.sh file that I can later identify. Additionally the checkpoint/ directory contains a lot of lower level information such as open file descriptors, registers, stackframes, memorymaps and more:

$ tar xf checkpoint.tar
$ tar xf rootfs-diff.tar
$ sudo tree .
.
├── bind.mounts
├── checkpoint
│   ├── cgroup.img
│   ├── core-1.img
│   ├── descriptors.json
│   ├── fdinfo-2.img
│   ├── files.img
│   ├── fs-1.img
│   ├── ids-1.img
│   ├── inventory.img
│   ├── ipcns-var-11.img
│   ├── mm-1.img
│   ├── mountpoints-13.img
│   ├── netns-10.img
│   ├── pagemap-1.img
│   ├── pages-1.img
│   ├── pstree.img
│   ├── seccomp.img
│   ├── timens-0.img
│   ├── tmpfs-dev-649.tar.gz.img
│   ├── tmpfs-dev-652.tar.gz.img
│   ├── tmpfs-dev-653.tar.gz.img
│   ├── tmpfs-dev-654.tar.gz.img
│   └── utsns-12.img
├── checkpoint.tar
├── config.dump
├── dump.log
├── io.kubernetes.cri-o.LogPath
├── rootfs-diff.tar
├── spec.dump
├── stats-dump
└── tmp
    └── evilscript.sh

Note that the tmp/evilscript.sh file is one that I created manually within the container before creating the checkpoint, that one was extracted from the rootfs-diff.tar archive. So this is helpful to identify what is running in the container and what files were created.

Using CRIU we could even resume execution of the container on our local machine to further investigate the running processes. In the far future, this feature could even be used to move running containers from one node to another, but that will require some more work to be done.

Hello world

My name is Simon Krenger, I am a Technical Account Manager (TAM) at Red Hat. I advise our customers in using Kubernetes, Containers, Linux and Open Source.

Elsewhere

  1. GitHub
  2. LinkedIn
  3. GitLab