Capabilities

最後更新: 2023-09-07

Capabilities

目錄

 


Check Process Capabilities

 

# 當 process 沒有 thead 時, 那 TID=PID

/proc/PID/task/TID/status

ie.

cat /proc/$$/task/$$/status

Name:   bash
Umask:  0022
State:  S (sleeping)
Tgid:   1880
...
CapInh: 0000000000000000
CapPrm: 0000001fffffffff
CapEff: 0000001fffffffff
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000
...
  • Inherited capabilities (CapInh)
  • Permitted capabilities (CapPrm)
    capabilities that can be introduced into effective when needed using syscalls
  • Effective capabilities (CapEff)
    capabilities that will be verified for each privilege action
  • Bounding set (CapBnd)
    capabilities superset, nothing more than this can be done
  • Ambient capabilities set (CapAmb)

capsh --decode=0000001fffffffff

0x0000001fffffffff=cap_chown,cap_dac_override,...

getpcaps $$

Capabilities for `1913': = cap_chown,cap_dac_override,...

 


capsh

 

  • --print                 display capability relevant state
  • --decode=xxx      decode a hex string to a list of caps
  • --supports=xxx    exit 1 if capability xxx unsupported
  • --drop=xxx          remove xxx,.. capabilities from bset
  • --caps=xxx          set caps as per cap_from_text()
  • --inh=xxx            set xxx,.. inheritiable set
  • --secbits=<n>     write a new value for securebits
  • --keep=<n>        set keep-capabability bit to <n>
  • --uid=<n>          set uid to <n> (hint: id <username>)
  • --gid=<n>          set gid to <n> (hint: id <username>)
  • --groups=g,...     set the supplemental groups
  • --user=<name>  set uid,gid and groups to that of user
  • --chroot=path     chroot(2) to this path
  • --killit=<n>        send signal(n) to child
  • --forkfor=<n>     fork and make child sleep for <n> sec

 


Proccess 的執行身份

 

Make arbitrary manipulations of process GIDs and supplementary GID list;

  • CAP_SETUID
  • CAP_SETGID

 


audit

 

cap_audit_control

enable and disable kernel auditing; 

change auditing filter rules;

retrieve auditing status and filtering rules.

cap_audit_write (since linux 2.6.11)

write records to kernel auditing log

 


sys_resource

  • Increase resource limits (see setrlimit(2));
  • Override maximum number of consoles on console allocation;

cap_net_raw

 * use raw and packet sockets
 * bind to any address for transparent proxying

沒有它會影響 tcpdump 及 iptables

tcpdump -i eth0

tcpdump: eth0: You don't have permission to capture on that device

iptables -nL

iptables v1.4.12: can't initialize iptables table `filter': Permission denied (you must be root)
Perhaps iptables or your kernel needs to be upgraded.

cap_net_broadcast (unused)

make socket broadcasts, and listen to multicasts

cap_ipc_lock

lock memory (mlock(2), mlockall(2), mmap(2), shmctl(2)).

---

cap_sys_ptrace

trace arbitrary processes using ptrace(2);

---

cap_fsetid

    overrides the following restrictions, that the effective user id shall
    match the file owner id, when setting the s_isuid and s_isgid bits on
    that file; that the effective group id (or one of the supplementary
    group ids) shall match the file owner id when setting the s_isgid bit
    on that file; that the s_isuid and s_isgid bits are cleared on
    successful return from chown(2) (not implemented).

Well, the Pine "problem" isn't really a problem as long as I leave the
CAP_FSETID bit off of it. Pine opens the file $HOME/mail/saved-messages
when I look in saved messages, so if $HOME/mail/saved-messages is a
symlink to /etc/shadow and pine has the CAP_FSETID capability, it can
read /etc/shadow even though you normally wouldn't have read acccess to
it.

---

cap_block_suspend (since linux 3.5)

block  system  suspend  (epoll(7) epollwakeup, /proc/sys/wake_lock).

---

cap_sys_boot

use reboot(2) and kexec_load(2)

By default, lxc does not support rebooting a container from within.

It will simply stop and the host will not know to start it.

If you want your container to reboot gracefully, you need sys_boot capability

---

cap_sys_chroot

     use chroot(2)

---

cap_sys_time

set  system  clock (settimeofday(2), stime(2), adjtimex(2)); set real-time (hardware) clock.

---

cap_sys_module

load and unload kernel modules(see init_module(2) and delete_module(2));

---

CAP_SETFCAP

Set file capabilities.

CLI: setcap - set file capabilities

---

CAP_SETPCAP

make changes to the securebits flags

add any capability from the calling thread's bounding set to its inheritable set

drop capabilities from the bounding set

If file capabilities are not supported:

grant or remove any capability in the caller's permitted capability set to or from any other process.

---

cap_mac_admin

# mac = Mandatory Access Control

allow mac configuration or state changes

---

cap_mac_override

override mandatory access control (mac)

 


CAP_SYS_ADMIN

 

* Perform a range of system administration operations including: quotactl(2),   mount(2),   umount(2),  swapon(2),  swapoff(2),
   sethostname(2), and setdomainname(2);

* perform privileged syslog(2) operations (since  Linux  2.6.37, CAP_SYSLOG should be used to permit such operations);

* perform VM86_REQUEST_IRQ vm86(2) command;

* perform  IPC_SET and IPC_RMID operations on arbitrary System V  IPC objects;

* perform operations on trusted and security Extended Attributes  (see attr(5));

* use lookup_dcookie(2);

* use  ioprio_set(2) to assign IOPRIO_CLASS_RT and (before Linux 2.6.25) IOPRIO_CLASS_IDLE I/O scheduling classes;

* forge UID when passing socket credentials;

* exceed /proc/sys/fs/file-max, the  system-wide limit  on  the number  of  open files, in system calls that open files
   (e.g., accept(2), execve(2), open(2), pipe(2));

* employ CLONE_* flags that create new namespaces with  clone(2)  and unshare(2);

* call perf_event_open(2);

* access privileged perf event information;

* call setns(2);

* call fanotify_init(2);

* perform KEYCTL_CHOWN and KEYCTL_SETPERM keyctl(2) operations;

* perform madvise(2) MADV_HWPOISON operation;

* employ  the  TIOCSTI  ioctl(2)  to  insert characters into the input queue of a terminal other than the caller's  controlling terminal.

* employ the obsolete nfsservctl(2) system call;

* employ the obsolete bdflush(2) system call;

* perform various privileged block-device ioctl(2) operations;

* perform various privileged filesystem ioctl(2) operations;

* perform administrative operations on many device drivers.

 


LXC Settings

 

lxc.cap.drop          # space separation items

My LXC Settings

#### Capabilities ####
# 這行肯定有
lxc.cap.drop = sys_module sys_time mac_admin mac_override

lxc.cap.drop = sys_admin

lxc.cap.drop = sys_resource

lxc.cap.drop = sys_rawio

lxc.cap.drop = mknod setuid net_raw

lxc.cap.drop = setfcap setpcap

lxc.cap.drop = sys_pacct sys_ptrace

lxc.cap.drop = audit_control audit_write

lxc.cap.drop = sys_tty_config sys_resource

Notes

# ubuntu 12 要它才 start 到
#lxc.cap.drop = sys_admin

# 在 vps 內行 reboot 要它
#lxc.cap.drop = sys_boot

# ssh 要它才 start 到
#lxc.cap.drop = sys_chroot

# dhcp, iptables 及 tcpdump 要用它
#lxc.cap.drop = net_raw

# U16 要有它們才 login 到, 否則一直會 login fail
#lxc.cap.drop = audit_control audit_write

# 最好有它, 因為 /dev 下有機會少左野
#lxc.cap.drop = mknod

Remark

dropping sys_admin and net_admin isn't very practical, you won't make your container much safer,

原因: as root in the container will be able to re-grant itself any dropped capability

CAP_SYS_MODULE  should be specified as sys_module

 


Doc

 

http://manpages.ubuntu.com/manpages/trusty/en/man7/capabilities.7.html

 

Creative Commons license icon Creative Commons license icon