最後更新: 2023-04-11
介紹
Unprivileged containers requires support for user namespaces in the kernel that the container is run on.
A process can have a normal unprivileged user ID outside a user namespace
while at the same time having a user ID of 0 inside the namespace;
(the process has full privileges for operations inside the user namespace)
With such container, the use of SELinux, AppArmor, Seccomp and capabilities isn't necessary for security.
目錄
- Install
- "Per User" unprivileged container
- The standard paths map
- lxc-usernet(network devices quota)
-
"lxc-download" template
============================= - System-wide Unprivileged Container
- Template Tips
- User namespace limtation
- Owner & Group 65534
- Import image(rootfs.tar.xz & meta.tar.xz)
- LV with Unprivileged Container
另見
Install
# LXC 本身
apt-get install lxc systemd-services
# Unprivileged container 用到的功能
apt-get install uidmap
# libpam-cgfs: PAM module for managing cgroups for LXC
# 如果只用 system-wide unprivileged container, 那就不用安裝它
apt-get install libpam-cgfs
- /usr/share/pam-configs/cgfs
- /lib/x86_64-linux-gnu/security/pam_cgfs.so
如果沒有安裝它, 用戶 start container 就會有 Error:
lxc-start: cgroups/cgfs.c: lxc_cgroupfs_create: 1027 Permission denied - Could not create cgroup '/user.slice/user-0.slice/session-2.scope/lxc' in '/sys/fs/cgroup/systemd'.
"Per User" unprivileged container
Create & Setup User
useradd vps-nginx -s /bin/bash -m
usermod --add-subuids 100000-165536 vps-nginx
usermod --add-subgids 100000-165536 vps-nginx
grep vps-nginx /etc/sub?id
/etc/subgid:vps-nginx:689824:65536 /etc/subgid:vps-nginx:100000:65537 /etc/subuid:vps-nginx:689824:65536 /etc/subuid:vps-nginx:100000:65537
ulimits
ulimits are as their name suggest, tied to a uid at the kernel level.
That's a global kernel uid, not a uid inside a user namespace.
That means that if two containers share through identical or overlapping id maps,
a common kernel uid, then they also share limits,
meaning that a user in a first container can effectively DoS the same user in another container.
The standard paths map
* "per user" unprivileged container 才用到此設定
在 USER 內的 lxc 位置
/etc/lxc/lxc.conf => ~/.config/lxc/lxc.conf /etc/lxc/default.conf => ~/.config/lxc/default.conf /var/lib/lxc => ~/.local/share/lxc /var/lib/lxcsnaps => ~/.local/share/lxcsnaps /var/cache/lxc => ~/.cache/lxc
su - vps-nginx
mkdir -p ~/.config/lxc
cp /etc/lxc/default.conf ~/.config/lxc/default.conf
# container(0->65536) => host(100000->165536) lxc.id_map = u 0 100000 65536 lxc.id_map = g 0 100000 65536
lxc-usernet(network devices quota)
lxc-usernet - unprivileged user network administration file (per user contrainer 才用此 file)
lxc-user-nic - setuid helper to create a veth pair and bridge it on the host
/etc/lxc/lxc-usernet
# USERNAME TYPE BRIDGE COUNT # vps-nginx 可以在 br0 裡建立 10 個 veth vps-nginx veth br0 10
Shared network bridges
As a container connected to a bridge can transmit any level 2 traffic that it wishes,
it can effectively do MAC or IP spoofing on the bridge.
When running untrusted containers or when allowing untrusted users to run containers,
one should ideally create one bridge per user or per group of untrusted containers and
configure /etc/lxc/lxc-usernet such that users may only use the bridges that they have been allocated.
"lxc-download" template
Usage
list lxc image
lxc-create -t download -n C1
Downloading the image index --- DIST RELEASE ARCH VARIANT BUILD --- almalinux 8 amd64 default 20210715_23:08 almalinux 8 arm64 default 20210715_23:08 alpine 3.11 amd64 default 20210716_13:00 ...
建立 Container
lxc-create -t download -n {container-name-here} -- -d {DISTRONAME} -r {RELEASE} -a {ARCH}
Remark
如果不填 '-a {ARCH}' 就會問你用什麼 Architecture
Setting up the GPG keyring Downloading the image index --- DIST RELEASE ARCH VARIANT BUILD --- centos 7 amd64 default 20201028_07:08 centos 7 armhf default 20201028_07:08 centos 7 i386 default 20201028_07:08 Architecture:
i.e.
lxc-create -t download -n sshgw -B lvm --vgname myvg --fssize 30G -- -d centos -r 7 -a amd64
Privileged 與 Unprivileged Template
它有寫對 lxc 的 Version 要求
NO YES (1.0 and up) YES (1.1 and up) YES (2.0 and up)
lxc-ls --version
2.0.11
Troubleshoot
[1]
lxc-create -t download -n C1
ERROR: Unable to fetch GPG key from keyserver.
lxc-create -t download -n C1 -- --keyserver keyserver.ubuntu.com
System-wide Unprivileged Container
UID, GID mapping
/etc/subuid
# 10^6 root:100000:65536
/etc/subgid
# 10^6 root:100000:65536
For instance uid 0 in the container could be uid 100000 on the host.
=> uids and gids 0~65535 in the container to uids and gids 100000~165535 on the host
/etc/lxc/default.conf
# LXC3 Unprivileged Container Setting # 10^6 lxc.idmap = u 0 100000 65536 lxc.idmap = g 0 100000 65536
建立 Contrainer 後, 它的 config 同樣會有此設定, 但 此設定不可以直接放在 Container config 內,
否則 lxc-create 建立 vps 時 file 的 permission 會是錯的.
* "root" doesn't need network devices quota (lxc-usernet)
P.S.
# LXC 2
lxc.id_map = u 0 100000 65536 lxc.id_map = g 0 100000 65536
測試
# Creating unprivileged containers
lxc-create -t download -n C1 -- -d ubuntu -r xenial -a amd64
# Start vps
lxc-start -n C1
# Checking
lxc-attach -n C1 -- id
uid=0(root) gid=0(root) groups=0(root)
cat /proc/self/uid_map
# ID-inside-ns ID-outside-ns length 0 100000 65536
cat /proc/self/gid_map
0 100000 65536
Remark
啟用了 system-wide unprivileged container 後, 原本的 container 就不能再建立
i.e.
cat /etc/lxc/default.conf
lxc.id_map = u 0 100000 65536 lxc.id_map = g 0 100000 65536
lxc-create -n sshgw -t centos
error
This template can't be used for unprivileged containers. You may want to try the "download" template instead.
User Namespace Limtation
You won't be allowed to use mknod
Owner & Group 65534
ls -n /dev | grep 65534
crw-rw-rw- 1 65534 65534 1, 7 Aug 25 19:11 full drwxrwxrwt 2 65534 65534 40 Dec 18 17:59 mqueue crw-rw-rw- 1 65534 65534 1, 3 Aug 25 19:11 null crw-rw-rw- 1 65534 65534 1, 8 Aug 25 19:11 random crw-rw-rw- 1 65534 65534 5, 0 Dec 18 16:44 tty crw-rw-rw- 1 65534 65534 1, 9 Aug 25 19:11 urandom crw-rw-rw- 1 65534 65534 1, 5 Aug 25 19:11 zero
* If you need /dev/null & friends to be owned by root then you need a privileged container.
65534 is used by the kernel to represent a uid/gid which is outside of the container’s map.
This can only be observed inside unprivileged containers as they are the only ones with such a map in place.
It’s normal for the kernel filesystems (/sys, /proc, …) to show up as such as those paths are owned by real root and
not by the container root. The same goes for anything that you pass from outside the container.
LV with Unprivileged Container
Unprivileged Container chroot sftp 會有問題
Fix
mount /dev/vg3t/sftp /mnt/tmp
chown 100000:100000 /mnt/tmp
umount /mnt/tmp