Keeping Docker containers safe

Weak Link

Contain Your Surprise

Host security, however, is only half the battle. Now, consider the fractionally less complex containers themselves. I say that securing the containers is less complex because one trick can reduce a container's attack vectors significantly – that is, running a container using the --read--only option:

$ docker run -d --read-only chrisbinnie/my-web-server

As you might expect, this launches a container to which you cannot write in any way, shape, or form. In other words, if your container is attacked, the attacker can't write to the application. This is not always a popular way to use containers, although it's recognized as a quick and highly effective way of reducing their risk to other containers on a system and the host machine itself.

The effect of making a container read-only means, among other things, that when the container is stopped and restarted, the hack needs to take place all over again to be effective and can't be written into your application automatically. I think of these containers as Knoppix-style boot disks or ROM (Read-Only Memory) [10].

Consider that even opening a page in a web browser needs to write session data to your desktop machine. If you make a container read-only, then although you can read all the data you want from a container's processes, you need to provide other ways of writing data. You might try and save to the host itself, but a better method would likely be to write to a permanent or temporary storage device, depending on the type of data you're dealing with. For example, ephemeral session data is thrown away most of the time, but storing input from a user into an application that isn't written to a database might suit Amazon S3 [11] or sophisticated, redundant, off-host storage like Ceph like Ceph or the more cloud-friendly GlusterFS [12].

It's What's Inside that Counts

Security used to be much worse when it came to the internals of a container. Up until Docker v1.10, a host's root user also tied to the container's root user, which could cause all sorts of chaos, such as being able to load a kernel module dynamically into the kernel and do any damage to the host machine that you want. Thankfully, Docker has addressed this issue skillfully, but getting it to work requires overhead. The official line [13] from the Docker site is:

 

As of Docker 1.10 User Namespaces are supported directly by the docker daemon. This feature allows for the root user in a container to be mapped to a non uid-0 user outside the container, which can help to mitigate the risks of container breakout. This facility is available but not enabled by default.

 

It's highly recommended that you enable this functionality along with that of running "untouchable" containers that are read-only. A blog post [14] has information on how to map the root user's UID 0 to another UID and discusses how each tenant on a host can run their own range of UID and GID values without overlapping into the territories of others, thus causing other security concerns.

Mitigation Techniques

Moving away from doom and gloom, I'll now spend some time looking at how you can improve your security on Docker hosts and containers alike. You might be surprised at the number of additions you need to make to your Docker config to mitigate the many types of attacks. I will continue by touching on a few of them briefly to give you food for thought, in the hope that you can investigate further, because the list is extensive and a little daunting, especially when written in detail.

  • Improve your host and container logging (e.g., with a centralized, alerting Syslog server).
  • Run through the Center For Internet Security (CIS) hardening document for your daemon [15].
  • Run Docker on a host by itself, and don't introduce further issues with other on-host applications; in other words, keep a stripped-down, minimal install of all other packages on the host.
  • Run your Docker daemon from a Unix socket, as recommended in modern versions.
  • Re-map your root UID, despite the sometimes time-consuming overhead.
  • Lean on the --read-only option, and don't let anyone tell you otherwise; store your data off-host.
  • Never run privileged containers unless in development, because they give unmitigated root user access on the host itself by design.
  • Limit CPU usage per container and define maximum RAM usage to limit attacks on a container affecting others.
  • Use SELinux, if possible; otherwise, use AppArmor [16], grsecurity [17], or PaX [18] to lock down unexpected system resource access.
  • Patch your systems more often than usual; even official Docker images can be riddled with known vulnerabilities.
  • Don't run containers with cap-add=ALL; instead, shut all extended container capabilities down and then explicitly open them up. Listing 1 shows how to switch everything off and explicitly allow host access to a container to limit the host's exposure to exploits on the container. Table 1 lists capabilities that are not included by Docker by default but that can be enabled, and Table 2 lists capabilities that are enabled by default but that can be disabled [19].
  • Limit container-to-container communications by using --icc=false.
  • Check your configuration with the extensible Docker Bench for Security tool [20].

Table 1

Capabilities Not Included by Default

Capability Key Capability Description
SYS_MODULE Load and unload kernel modules.
SYS_RAWIO Perform I/O port operations (iopl(2) and ioperm(2)).
SYS_PACCT Use acct(2) to switch process accounting on or off.
SYS_ADMIN Perform a range of system administration operations.
SYS_NICE Raise the process nice value (nice(2), setpriority(2)) and change the nice value for arbitrary processes.
SYS_RESOURCE Override resource limits.
SYS_TIME Set system clock (settimeofday(2), stime(2), adjtimex(2)); set real-time (hardware) clock.
SYS_TTY_CONFIG Use vhangup(2) to employ various privileged ioctl(2) operations on virtual terminals.
AUDIT_CONTROL Enable and disable kernel auditing; change auditing filter rules; retrieve auditing status and filtering rules.
MAC_OVERRIDE Allow MAC configuration or state changes. Implemented for the Smack Linux Security Module (LSM).
MAC_ADMIN Override Mandatory Access Control (MAC). Implemented for the Smack LSM.
NET_ADMIN Perform various network-related operations.
SYSLOG Perform privileged syslog(2) operations.
DAC_READ_SEARCH Bypass file read permission checks and directory read and execute permission checks.
LINUX_IMMUTABLE Set the FS_APPEND_FL and FS_IMMUTABLE_FL inode flags.
NET_BROADCAST Make socket broadcasts and listen to multicasts.
IPC_LOCK Lock memory (mlock(2), mlockall(2), mmap(2), shmctl(2)).
IPC_OWNER Bypass permission checks for operations on System V IPC objects.
SYS_PTRACE Trace arbitrary processes using ptrace(2).
SYS_BOOT Use reboot(2) and kexec_load(2) to reboot and load a new kernel for later execution.
LEASE Establish leases on arbitrary files (see fcntl(2)).
WAKE_ALARM Trigger something that will wake up the system.
BLOCK_SUSPEND Employ features that can block system suspend.

Table 2

Capabilities Enabled by Default

Capability Key Capability Description
SETPCAP Modify process capabilities.
MKNOD Create special files using mknod(2).
AUDIT_WRITE Write records to kernel auditing log.
CHOWN Make arbitrary changes to file UIDs and GIDs (see chown(2)).
NET_RAW Use RAW and PACKET sockets.
DAC_OVERRIDE Bypass file read, write, and execute permission checks.
FOWNER Bypass permission checks on operations that normally require the file system UID of the process to match the UID of the file.
FSETID Don't clear set-user-ID and set-group-ID permission bits when a file is modified.
KILL Bypass permission checks for sending signals.
SETGID Make arbitrary manipulations of process GIDs and supplementary GID list.
SETUID Make arbitrary manipulations of process UIDs.
NET_BIND_SERVICE Bind a socket to Internet domain privileged ports (port numbers <1024).
SYS_CHROOT Use chroot(2), change root directory.
SETFCAP Set file capabilities.

Listing 1

Shutting down container capabilities

$ docker run -d --cap-drop=CHOWN --cap-drop=DAC_OVERRIDE --cap-drop=FSETID --cap-drop=FOWNER --cap-drop=KILL --cap-drop=MKNOD --cap-drop=NET_RAW --cap-drop=SETGID
--cap-drop=SETUID --cap-drop=SETFCAP --cap-drop=SETPCAP --cap-drop=NET_BIND_SERVICE --cap-drop=SYS_CHROOT --cap-drop=AUDIT_WRITE chrisbinnie/my-web-server

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus

SysAdmin Day 2017!

  • Happy SysAdmin Day 2017!

    Download a free gift to celebrate SysAdmin Day, a special day dedicated to system administrators around the world. The Linux Professional Institute (LPI) and Linux New Media are partnering to provide a free digital special edition for the tireless and dedicated professionals who keep the networks running: “10 Terrific Tools."

Special Edition

Newsletter

Subscribe to ADMIN Update for IT news and technical tips.

ADMIN Magazine on Twitter

Follow us on twitter