linuxcontainers

最後更新: 2019-04-24

介紹

 

HomePage: https://linuxcontainers.org/

Project

  • LXC
  • LXD
  • LXCFS
  • CGManager

 


LXC Version

 

LXC 5.0 LTS

supported until June 2027

cgroup paths will be used for the container itself(nested (inner) cgroup)

  • lxc.cgroup.dir.container
  • lxc.cgroup.dir.monitor
  • lxc.cgroup.dir.monitor.pivot
  • lxc.cgroup.dir.container.inner

Time namespace support

  • lxc.time.offset.boot
  • lxc.time.offset.monotonic

VLAN support on VETH devices

  • veth.vlan.id               # sets the primary (untagged) VLAN
  • veth.vlan.tagged.id

Configurable transmit/receive queues on VETH devices

  • veth.n_rxqueues
  • veth.n_txqueues

LXC 4.0.x

[1] Add lxc.autodev.tmpfs.size

# LXC sets up a tmpfs mount on /dev
lxc.autodev = 1

Now it is possible to set a limit on the size of the tmpfs mount

LXC 3.2.x

Add IPVLAN support

Add support for static routes

Add router veth mode

LXC 3.1.x

LXC 3.0.x

This is the result of over 6 months of intense work since the LXC 2.1.0 release

New:

[1] All commands support "lxc-? <container-name>" syntax

The LXC tools now support passing the container name without the -n / -d (Default) command line flag.

[2] Introduced a new configuration key "lxc.cgroup2.[controller name]"

[3] LXC removes the legacy template-based container build system in favor of the new project distrobuilder

[4] support for creating application containers from OCI formats.

lxc-create -t oci -n a1 -- -u oci:../oci:alpine

[5] LXC 3.0 now introduces a ringbuffer for console logging.

link

[6] Allow seccomp to filter syscalls based on arguments

In order to support filtering syscalls based on arguments

the seccomp version 2 specification is extended to the following form:

syscall_name action [index,value,op,valueTwo] [index,value,op]...

[7] Support for daemonized app container

LXC has been running application container through a minimal init system

(always run the application as the second process)

lxc-execute xenial -d -- bash   # lxc-execute xenial -d -- sleep 100

lxc-attach xenial

lxc-stop xenial

[8] Support mount propagation for mounts

This adds support for mount propagation (private, shared, slave, unbindable, rprivate, rshared, rslave, runbindable)

to mount entries specified via lxc.mount.entry and lxc.mount.fstab.

[9] lxc.sysctl.[kernel parameters name]

Specify the kernel parameters to be set. The parameters available are those listed under /proc/sys/.

Note that not all sysctls are namespaced. Changing Non-namespaced sysctls will cause the system-wide setting to be modified.

sysctl(8). If used with no value, LXC will clear the parameters specified up to this point.

[10] lxc.hook.start-host

A hook to be run in the host’s namespace after the container has been setup,

and immediately before starting the container init.

[11] lxc.proc.[proc file name]

Specify the proc file name to be set. The file names available are those listed under the /proc/PID/ directory.

For example

lxc.proc.oom_score_adj = 10.

[12] lxc.execute.cmd

Absolute path from container rootfs to the binary to run by default.

for lxc-execute

[13] lxc.init.cwd

Absolute path inside the container to use as the working directory.

LXC will switch to this directory before executing init

[14] lxc.hook.start-host

A hook to be run in the host’s namespace after the container has been setup,

and immediately before starting the container init.

LXC 2.1.x

[1] LXC 2.1 comes with a new script "lxc-update-config" which can be used to upgrade existing legacy

[2] Limits for kernel resource

kernel is ware of by prefixing the name of the limit with "lxc.prlimit."

lxc.prlimit.nproc = unlimited
lxc.prlimit.nice = 4

[3] Support for unprivileged openvswitch networks

lxc.net.0.type = veth
lxc.net.0.link = ovsbr0

[4] Support for hybrid cgroup layout

cgroup v1 per-controller hierarchies can be used simultaneously with an empty cgroup v2 hierarchy.

Checking cgroup layout on host `findmnt | grep cgroup2`

[5] Limiting the number of ptys a container can allocate

lxc.pty.max = 10    # Default 1024

[6] The network configuration keys have all been given a new prefix.

# NIC 1
lxc.net.0.type  = veth
lxc.net.0.flags = up
lxc.net.0.link  = lxcbr0
lxc.net.0.name  = eth0
lxc.net.0.veth.pair = c1-eth0

# NIC 2
lxc.net.1.type      = veth
lxc.net.1.flags     = up
lxc.net.1.link      = lxcbr1
lxc.net.1.name      = eth1
lxc.net.1.veth.pair = c1-eth1
lxc.net.0.ipv4.address = 192.168.201.11/24
lxc.net.0.ipv4.gateway = 192.168.201.254     # 未必 work

[7] removed

lxc.kmsg
lxc.pivotdir

LXC 2.0.x

All main LXC commands have now been rewritten in C

lxc-ls, lxc-device, lxc-copy

lxc.rebootsignal: Allows to override the signal sent for container reboot

lxc.hook.stop: Run in the host context with references to the containers just before namespace teardown

lxc.init_uid: Used by lxc-execute to set an alternative user

lxc.init_gid: Used by lxc-execute to set an alternative group

LXC 1.1.x

LXC 1.0.x

 

 


CGManager

 

cgroup manager daemon.

It's designed to allow nested unprivileged containers to still be able to create and manage their cgroups through a DBus API.

 


LXD (lxc2)

 

fresh and intuitive user experience with a single command line tool to manage your containers.

Containers can be managed over the network in a transparent way through a REST API.
 


LXCFS

 

https://datahunter.org/lxcfs

 


Lifecycle management hooks

 

Pre-start - run in the host's namespace before the container

ttys, consoles, or mounts are up

Pre-mount - run in the container's namespaces,

but before the root filesystem has been mounted.

Mount - after the container filesystems have been mounted,

but before the container has called pivot_root to change its root filesystem.

Start - run immediately before executing the container's init.

Post-stop

 * If any hook returns an error, the container's run will be aborted.

 


seccomp

 

Since Ubuntu 12.10 (Quantal) a container can also be constrained by a seccomp filter.

Seccomp is a new kernel feature which filters the system calls which may be used by a task and its children.

While improved and simplified policy management is expected in the near future,

the current policy consists of a simple whitelist of system call numbers.

 

Even for system containers running a full distribution security gains may be had,

for instance by removing the 32-bit compatibility system calls in a 64-bit container.

 


 

 

 

Creative Commons license icon Creative Commons license icon