docker-runc(CVE-2019-5736)漏洞分析-第二版

runc是一个根据OCI(Open Container Initiative)标准创建并运行容器的命令行工具,docker引擎也是基于runc构建的。2019年2月11日,runc的相关研究人员通过oss-security邮件发布了runc逃逸漏洞的详情

攻击者可以利用该漏洞在容器内通过特定操作覆盖宿主机上的runc二进制文件,从而在宿主机上以root权限执行代码,达到容器逃逸的目的。

影响版本:runc <= 1.0-rc6

docker通过runc启动容器的流程可以概括为以下几步:

  • docker-cli根据用户命令发出请求给docker-daemon,docker-daemon经过docker-daemon -> contained -> containerd-shim -> runc的调用链启动runc
  • runc执行runc run命令,创建容器外的runc进程
  • 容器外的runc进程处理生成创建容器所需信息,再执行runc init命令创建子进程
  • runc init子进程对namespace等进行处理,将自身进程转变成为容器进程,并最终将自身的进程映像替换为用户指定的内容

rc6版本及之前的runc创建的runc init进程使用的二进制文件就是宿主机上的runc文件,因此攻击者可以在容器内修改容器内待执行文件的内容,写入#!/proc/self/exe,获取宿主机上的runc文件描述符,从而进行修改。

正常的创建容器并在容器内执行命令的过程示意图如下图所示

/2021-04-27-docker-runc-cve-2019-5736-%E6%BC%8F%E6%B4%9E%E5%88%86%E6%9E%90-%E7%AC%AC%E4%BA%8C%E7%89%88/upload_757c2cc51fb90da6bd6126b193581a20.png
upload_757c2cc51fb90da6bd6126b193581a20.png

而修改了待执行文件的内容之后,runc init进程会执行自身,从而将宿主机上的runc文件暴露给了容器内部

/2021-04-27-docker-runc-cve-2019-5736-%E6%BC%8F%E6%B4%9E%E5%88%86%E6%9E%90-%E7%AC%AC%E4%BA%8C%E7%89%88/upload_c5457e4e1d840ef2efb4fa18764992cf.png
upload_c5457e4e1d840ef2efb4fa18764992cf.png

首先runc run会执行run命令对应的action,从而执行startContainer函数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
func startContainer(context *cli.Context, spec *specs.Spec, action CtAct, criuOpts *libcontainer.CriuOpts) (int, error) {
    id := context.Args().First()
    if id == "" {
        return -1, errEmptyID
    }

    notifySocket := newNotifySocket(context, os.Getenv("NOTIFY_SOCKET"), id)
    if notifySocket != nil {
        if err := notifySocket.setupSpec(context, spec); err != nil {
            return -1, err
        }
    }

    container, err := createContainer(context, id, spec)
    if err != nil {
        return -1, err
    }

    if notifySocket != nil {
        if err := notifySocket.setupSocketDirectory(); err != nil {
            return -1, err
        }
        if action == CT_ACT_RUN {
            if err := notifySocket.bindSocket(); err != nil {
                return -1, err
            }
        }
    }

    // Support on-demand socket activation by passing file descriptors into the container init process.
    listenFDs := []*os.File{}
    if os.Getenv("LISTEN_FDS") != "" {
        listenFDs = activation.Files(false)
    }

    logLevel := "info"
    if context.GlobalBool("debug") {
        logLevel = "debug"
    }

    r := &runner{
        enableSubreaper: !context.Bool("no-subreaper"),
        shouldDestroy:   true,
        container:       container,
        listenFDs:       listenFDs,
        notifySocket:    notifySocket,
        consoleSocket:   context.String("console-socket"),
        detach:          context.Bool("detach"),
        pidFile:         context.String("pid-file"),
        preserveFDs:     context.Int("preserve-fds"),
        action:          action,
        criuOpts:        criuOpts,
        init:            true,
        logLevel:        logLevel,
    }
    return r.run(spec.Process)
}

函数最后的r.run()函数创建新进程并调用linuxContainer结构实现的Run()方法

1
2
3
4
5
6
7
8
9
func (c *linuxContainer) Run(process *Process) error {
	if err := c.Start(process); err != nil {
		return err
	}
	if process.Init {
		return c.exec()
	}
	return nil
}

函数调用Start()函数,Start()是对start()函数的封装。该函数会调用newParentProcess()函数生成initProcess类型的对象,再调用initProcess结构实现的start()方法

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
func (c *linuxContainer) start(process *Process) (retErr error) {
    parent, err := c.newParentProcess(process)
    if err != nil {
        return newSystemErrorWithCause(err, "creating new parent process")
    }

    ...

    if err := parent.start(); err != nil {
        return newSystemErrorWithCause(err, "starting container process")
    }

    ...
    return nil
}

initProcess结构实现的start()方法会执行runc init命令,并等待runc init子进程完成容器创建以及进程映像替换之后退出

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
func (p *initProcess) start() (retErr error) {
    defer p.messageSockPair.parent.Close()
    err := p.cmd.Start()
    p.process.ops = p

    ...

    ierr := parseSync(p.messageSockPair.parent, func(sync *syncT) error {
        switch sync.Type {
        case procReady:
            // set rlimits, this has to be done here because we lose permissions
            ...
            sentRun = true
        case procHooks:
            // Setup cgroup before prestart hook, so that the prestart hook could apply cgroup permissions.
            ...
            sentResume = true
        default:
            return newSystemError(errors.New("invalid JSON payload from child"))
        }

        return nil
    })
    return nil
}

随着runc init命令被执行,同理,init命令对应的action也会被执行,并且此时的runc init进程会最终变成启动的容器进程。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
var initCommand = cli.Command{
    Name:  "init",
    Usage: `initialize the namespaces and launch the process (do not call it outside of runc)`,
    Action: func(context *cli.Context) error {
        factory, _ := libcontainer.New("")
        if err := factory.StartInitialization(); err != nil {
            // as the error is sent back to the parent there is no need to log
            // or write it to stderr because the parent process will handle this
            os.Exit(1)
        }
        panic("libcontainer: container init failed to exec")
    },
}

StartInitialization()函数调用linuxStandardInit类型的信息对象,并调用Init()函数创建初始化进程

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// StartInitialization loads a container by opening the pipe fd from the parent to read the configuration and state
// This is a low level implementation detail of the reexec and should not be consumed externally
func (l *LinuxFactory) StartInitialization() (err error) {
    ...
    envInitType := os.Getenv("_LIBCONTAINER_INITTYPE")
    it := initType(envInitType)
    ...
    i, err := newContainerInit(it, pipe, consoleSocket, fifofd, logPipeFd)
    if err != nil {
        return err
    }

    // If Init succeeds, syscall.Exec will not return, hence none of the defers will be called.
    return i.Init()

Init()函数完成网络、路由、namespace、rootfs等设置,并最终替换进程映像为用户指定的内容。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
func (l *linuxStandardInit) Init() error {
    runtime.LockOSThread()
    defer runtime.UnlockOSThread()
    ...
    if err := setupNetwork(l.config); err != nil {
        return err
    }
    if err := setupRoute(l.config.Config); err != nil {
        return err
    }

    // initialises the labeling system
    selinux.GetEnabled()
    if err := prepareRootfs(l.pipe, l.config); err != nil {
        return err
    }
    // Set up the console. This has to be done *before* we finalize the rootfs,
    // but *after* we've given the user the chance to set up all of the mounts
    // they wanted.
    if l.config.CreateConsole {
        if err := setupConsole(l.consoleSocket, l.config, true); err != nil {
            return err
        }
        if err := system.Setctty(); err != nil {
            return errors.Wrap(err, "setctty")
        }
    }

    // Finish the rootfs setup.
    if l.config.Config.Namespaces.Contains(configs.NEWNS) {
        if err := finalizeRootfs(l.config.Config); err != nil {
            return err
        }
    }
    ... 
    // Compare the parent from the initial start of the init process and make
    // sure that it did not change.  if the parent changes that means it died
    // and we were reparented to something else so we should just kill ourself
    // and not cause problems for someone else.
    if unix.Getppid() != l.parentPid {
        return unix.Kill(unix.Getpid(), unix.SIGKILL)
    }
    // Check for the arg before waiting to make sure it exists and it is
    // returned as a create time error.
    name, err := exec.LookPath(l.config.Args[0])    
    ...
    if err := unix.Exec(name, l.config.Args[0:], os.Environ()); err != nil {
        return newSystemErrorWithCause(err, "exec user process")
    }
    return nil
}

而因为这里的runc init进程使用的是宿主机上的runc文件,因此如果攻击者在容器内部修改了待执行命令对应的文件内容(如/bin/sh),写入#!/proc/self/exe,就会使得runc init执行宿主机上的runc文件,从而将其暴露给容器内部。攻击者就可以在容器内部获取文件描述符,修改宿主机上的runc文件,实现宿主机上的root权限命令执行,完成逃逸。

POC可见https://github.com/Frichetten/CVE-2019-5736-PoC

linuxContainer实现的start()方法被调用作为起点的话,可以绘制出调用图与修复代码的位置关系

/2021-04-27-docker-runc-cve-2019-5736-%E6%BC%8F%E6%B4%9E%E5%88%86%E6%9E%90-%E7%AC%AC%E4%BA%8C%E7%89%88/upload_ae590742f6f2af6da2355b832cb069e0.png
upload_ae590742f6f2af6da2355b832cb069e0.png

runc团队在之后的版本中进行了修复。根据go的CGO机制,init.go文件中引入了nsenter包,而nsenter.go中的CGO内容指定了nsexec.c中定义的nsexec()函数会在包引入前被执行。

团队在nsexec()函数中增加了对宿主机的runc文件进行自我复制的操作。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
void nsexec(void)
{
    int pipenum;
    jmp_buf env;
    int sync_child_pipe[2], sync_grandchild_pipe[2];
    struct nlconfig_t config = { 0 };

    /*
     * Setup a pipe to send logs to the parent. This should happen
     * first, because bail will use that pipe.
     */
    setup_logpipe();

    /*
     * If we don't have an init pipe, just return to the go routine.
     * We'll only get an init pipe for start or exec.
     */
    pipenum = initpipe();
    if (pipenum == -1)
        return;

    /*
     * We need to re-exec if we are not in a cloned binary. This is necessary
     * to ensure that containers won't be able to access the host binary
     * through /proc/self/exe. See CVE-2019-5736.
     */
    if (ensure_cloned_binary() < 0)
        bail("could not ensure we are a cloned binary");
    ...
}

encure_cloned_binary()函数会复制runc文件并将其覆盖到当前进程

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
int ensure_cloned_binary(void)
{
    int execfd;
    char **argv = NULL;

    /* Check that we're not self-cloned, and if we are then bail. */
    int cloned = is_self_cloned();
    if (cloned > 0 || cloned == -ENOTRECOVERABLE)
        return cloned;

    if (fetchve(&argv) < 0)
        return -EINVAL;

    execfd = clone_binary();
    if (execfd < 0)
        return -EIO;

    if (putenv(CLONED_BINARY_ENV "=1"))
        goto error;

    fexecve(execfd, argv, environ);
    error:
    close(execfd);
    return -ENOEXEC;
}
  • 及时升级docker使用的runc版本到rc6以上
  • 对runc init进程进行监控,确保其执行的二进制文件位于容器内部