Unprivileged Container

最後更新: 2023-04-11

介紹

Unprivileged containers requires support for user namespaces in the kernel that the container is run on.

A process can have a normal unprivileged user ID outside a user namespace

  while at the same time having a user ID of 0 inside the namespace;

(the process has full privileges for operations inside the user namespace)

With such container, the use of SELinux, AppArmor, Seccomp and capabilities isn't necessary for security.

目錄

另見

 


Install

 

# LXC 本身

apt-get install lxc systemd-services

# Unprivileged container 用到的功能

apt-get install uidmap

# libpam-cgfs: PAM module for managing cgroups for LXC

# 如果只用 system-wide unprivileged container, 那就不用安裝它

apt-get install libpam-cgfs

  • /usr/share/pam-configs/cgfs
  • /lib/x86_64-linux-gnu/security/pam_cgfs.so

如果沒有安裝它, 用戶 start container 就會有 Error:

lxc-start: cgroups/cgfs.c: lxc_cgroupfs_create: 1027 Permission denied -
  Could not create cgroup '/user.slice/user-0.slice/session-2.scope/lxc' in '/sys/fs/cgroup/systemd'.

 


"Per User" unprivileged container

 

Create & Setup User

useradd vps-nginx -s /bin/bash -m

usermod --add-subuids 100000-165536 vps-nginx

usermod --add-subgids 100000-165536 vps-nginx

grep vps-nginx /etc/sub?id

/etc/subgid:vps-nginx:689824:65536
/etc/subgid:vps-nginx:100000:65537
/etc/subuid:vps-nginx:689824:65536
/etc/subuid:vps-nginx:100000:65537

ulimits

ulimits are as their name suggest, tied to a uid at the kernel level.

That's a global kernel uid, not a uid inside a user namespace.

That means that if two containers share through identical or overlapping id maps,

a common kernel uid, then they also share limits,

meaning that a user in a first container can effectively DoS the same user in another container.

 


The standard paths map

 

 * "per user" unprivileged container 才用到此設定

在 USER 內的 lxc 位置

/etc/lxc/lxc.conf => ~/.config/lxc/lxc.conf
/etc/lxc/default.conf => ~/.config/lxc/default.conf
    
/var/lib/lxc => ~/.local/share/lxc
/var/lib/lxcsnaps => ~/.local/share/lxcsnaps
/var/cache/lxc => ~/.cache/lxc

su - vps-nginx

mkdir -p ~/.config/lxc

cp /etc/lxc/default.conf ~/.config/lxc/default.conf

# container(0->65536) => host(100000->165536)
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

 


lxc-usernet(network devices quota)

 

lxc-usernet - unprivileged user network administration file (per user contrainer 才用此 file)

lxc-user-nic - setuid helper to create a veth pair and bridge it on the host

/etc/lxc/lxc-usernet

# USERNAME TYPE BRIDGE COUNT
# vps-nginx 可以在 br0 裡建立 10 個 veth
vps-nginx veth br0 10

Shared network bridges

As a container connected to a bridge can transmit any level 2 traffic that it wishes,

it can effectively do MAC or IP spoofing on the bridge.

When running untrusted containers or when allowing untrusted users to run containers,

one should ideally create one bridge per user or per group of untrusted containers and

configure /etc/lxc/lxc-usernet such that users may only use the bridges that they have been allocated.

 


"lxc-download" template

 

Usage

list lxc image

lxc-create -t download -n C1

Downloading the image index

---
DIST    RELEASE ARCH    VARIANT BUILD
---
almalinux       8       amd64   default 20210715_23:08
almalinux       8       arm64   default 20210715_23:08
alpine  3.11    amd64   default 20210716_13:00
...

建立 Container

lxc-create -t download -n {container-name-here} -- -d {DISTRONAME} -r {RELEASE} -a {ARCH}

Remark

如果不填 '-a {ARCH}' 就會問你用什麼 Architecture

Setting up the GPG keyring
Downloading the image index

---
DIST    RELEASE ARCH    VARIANT BUILD
---
centos  7       amd64   default 20201028_07:08
centos  7       armhf   default 20201028_07:08
centos  7       i386    default 20201028_07:08

Architecture:

i.e.

lxc-create -t download -n sshgw -B lvm --vgname myvg --fssize 30G -- -d centos -r 7 -a amd64

Privileged 與 Unprivileged Template

它有寫對 lxc 的 Version 要求

 NO
 YES (1.0 and up)
 YES (1.1 and up)
 YES (2.0 and up)

lxc-ls --version

2.0.11

Troubleshoot

[1]

lxc-create -t download -n C1

ERROR: Unable to fetch GPG key from keyserver.

lxc-create -t download -n C1 -- --keyserver keyserver.ubuntu.com

 



System-wide Unprivileged Container

 

UID, GID mapping

/etc/subuid

# 10^6
root:100000:65536

/etc/subgid

# 10^6
root:100000:65536

For instance uid 0 in the container could be uid 100000 on the host.

=> uids and gids 0~65535 in the container to uids and gids 100000~165535 on the host

/etc/lxc/default.conf

# LXC3 Unprivileged Container Setting
# 10^6
lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536

建立 Contrainer 後, 它的 config 同樣會有此設定, 但 此設定不可以直接放在 Container config 內,

否則 lxc-create 建立 vps 時 file 的 permission 會是錯的.

 * "root" doesn't need network devices quota (lxc-usernet)

P.S.

# LXC 2

lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

測試

# Creating unprivileged containers

lxc-create -t download -n C1 -- -d ubuntu -r xenial -a amd64

# Start vps

lxc-start -n C1

# Checking

lxc-attach -n C1 -- id

uid=0(root) gid=0(root) groups=0(root)

cat /proc/self/uid_map

# ID-inside-ns ID-outside-ns length
             0        100000  65536

cat /proc/self/gid_map

         0     100000      65536

Remark

啟用了 system-wide unprivileged container 後, 原本的 container 就不能再建立

i.e.

cat /etc/lxc/default.conf

lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

lxc-create -n sshgw -t centos

error

This template can't be used for unprivileged containers.
You may want to try the "download" template instead.

 


User Namespace Limtation

 

You won't be allowed to use mknod

 


Owner & Group 65534

 

ls -n /dev | grep 65534

crw-rw-rw- 1 65534 65534   1, 7 Aug 25 19:11 full
drwxrwxrwt 2 65534 65534     40 Dec 18 17:59 mqueue
crw-rw-rw- 1 65534 65534   1, 3 Aug 25 19:11 null
crw-rw-rw- 1 65534 65534   1, 8 Aug 25 19:11 random
crw-rw-rw- 1 65534 65534   5, 0 Dec 18 16:44 tty
crw-rw-rw- 1 65534 65534   1, 9 Aug 25 19:11 urandom
crw-rw-rw- 1 65534 65534   1, 5 Aug 25 19:11 zero

 * If you need /dev/null & friends to be owned by root then you need a privileged container.

65534 is used by the kernel to represent a uid/gid which is outside of the container’s map.

This can only be observed inside unprivileged containers as they are the only ones with such a map in place.

It’s normal for the kernel filesystems (/sys, /proc, …) to show up as such as those paths are owned by real root and
    not by the container root. The same goes for anything that you pass from outside the container.

 

Creative Commons license icon Creative Commons license icon