Secure Linux Containers
In Fedora 18 we have enhanced the libvirt-sandbox package to allow for easy creation of Secure Containers.
Containers are a form of isolating one or more processes from the rest of the system. Some times containers are described as lightweight virtualization. Containers are really just a userspace concept. The Linux kernel has no concept of a container. The kernel implements namespaces and cgroups. Userspace tools can combine these kernel services into a "container".
Namespaces are a way of changing a processes view of its environment from its parents processes. For example the file system namespace allows me to change a processes view of the file system hierarchy. pam_namespace introduced way back in Fedora 6/RHEL5, allowed a login program to create a namespace and mount file systems that would not be seen by the ancestor processes. Meaning I could have multiple processes with different /tmp directories and multiple home directories mounted on /home/dwalsh.
The kernel currently implements 5 name spaces.
- mount - mounting and unmounting filesystems will not affect rest of the system
- UTS - setting hostname, domainname will not affect rest of the system
- IPC - process will have independent namespace for System V message queues, semaphore sets and shared memory segments
- network - process will have independent IPv4 and IPv6 stacks, IP routing tables, firewall rules, the /proc/net and /sys/class/net directory trees, sockets etc.
- pid - processes have an independent pids from the rest of the system. Each namespace can have its own pid 1.
pam_namespace, sandbox -X, unshare, systemd allow allow you to take advantage of namespaces.
Wikipedia describes cgroups as:
cgroups (control groups) is a Linux kernel feature to limit, account and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups.
Basically you can use cgroups to control the amount of resources a process or groups of processes can get on a system.
I put together a little screen-cast of cgroups to demonstrate their power.
Tools like LXC have existed for a while to allow users to create containers but the tool set is at a very low level.
"Libvirt is a C toolkit to interact with the virtualization capabilities of recent versions of Linux (and other OSes). The main package includes the libvirtd server exporting the virtualization support."
libvirt-lxc was introduced in Fedora 16. It enhanced the libvirt API to allow users to build containers using libvirt. This allows you to manage your kvm/qemu virtualization along with your linux containers, all within the same framework. The only problem, is setting up a linux container using the libvirt api is fairly difficult.
Dan Berrange created a new package called libvirt-sandbox in Fedora 17. The libvirt-sandbox package provides an application development library (libvirt-sandbox) to facilitate the embedding of virtualization into applications. One of the main advantages of this new tool set, was that it greatly simplified the API for creating virtual machines and containers.
Using containers by itself does not give you good security separation. The reason for this is kernel file systems like /proc, /sys, cgroupsfs and selinuxfs are not containerized. A privileged process running within a container can affect other processes running outside of the container or processes running in other containers. In libvirt-sandbox and libvirt-lxc you can use SELinux Labelling to further lock down privileged processes, for example preventing mounting of random file systems or stopping processes from disabling SELinux.
Dan Berrange and I have been working to enhance libvirt-sandbox. We have added a command line tool called virt-sandbox-service which allows a user to easily create an application sandbox. virt-sandbox-service allows an administrator to run multiple services on the same machine each service in a secure Linux Container. Some major features of virt-sandbox-service containers.
- Use systemd within the container as the init processes.
- Uses standard unit files for starting and stopping containerized applications.
- Shares the /usr partition, meaning if you are running hundreds of Apache containers, and update Apache code, each container will instantly use the new version of Apache.
- Uses SELinux MCS Labelling to separate each container, preventing even root processes from interfering with the host or other containers.
I have done preliminary tests on running. httpd, mysql, postgresql, dovecot within these containers. I am hoping people begin to play with the tool and help us expand which applications can run within the container. Also you can run multiple applications within a container at the same time. For example, I have tested httpd and mysql running within the same container.
How to use:
# yum install libvirt-sandbox httpd
There is a bug in the tool right now where it will not work without an /selinux file.
# touch /selinux
Use the virt-sandbox-service command to create a container.
virt-sandbox-service create -C -l s0:c1,c2 -u httpd.service container1 Created sandbox container dir /var/lib/libvirt/filesystems/container1 Created sandbox config /etc/libvirt-sandbox/services/container1.sandbox Created unit file /etc/systemd/system/container1_sandbox.service
Manipulate the data within the container while running outside of the container.
cd /var/lib/libvirt/filesystems/container1/var/log touch content ls -lZ content # Make sure the content gets created with the correct MCS label. # Content should be labeled with s0:c1,c2 : Not s0
Now create a file with a bad label for the container. cat "Secret" > badcontent chcon -l s0:c3,c4 badcontent
Start the container:
virt-sandbox-service start container1
In another window
Make sure the processes are running with the proper SELinux label. ps -eZ | grep svirt_lxc You should see processes like systemd, systemd-journal, dhclient and httpd running within the container with the MCS label of s0:c1,c2
Connect to the container
virt-sandbox-service connect container1 id getenforce # Should tell you SELinux is disabled. setenforce 1 # Should be denied touch /file # Should deny you creating this file touch /var/www/html/content # Should be allowed cat /var/www/html/badcontent # Should be denied Configure the apache server any way you would like, and manipulate html pages ifconfig eth0 # Grap IP Address for use on next test # Use the shell running with in the container to attempt to break out of the container. ^]
On your hosts Firefox use the IP within the container
firefox $IP # Using IP address from container, make sure you see the content. Shut down the container
virt-sandbox-service stop container1 Now lets try to do the same but starting and stopping the container using systemctl commands
systemctl start container1_sandbox.service systemctl enable container1_sandbox.service # Check on reboot if the container is running
Make sure the container is running.
virt-sandbox-service connect container1 ps -eZ ^] I would like to hear what you think? What enhancements you would like to see? What applications would you like to see run within the containers. Since this is a first version, we think there could be some growing pains, so use at your own risk, but we would love to work with the community to improve this tool set.