New Security Feature in Fedora 18 Part 7: Secure Linux Containers
danwalsh

Secure Linux Containers

In Fedora 18 we have enhanced the libvirt-sandbox package to allow for easy creation of Secure Containers.

Containers are a form of isolating one or more processes from the rest of the system.  Some times containers are described as lightweight virtualization.  Containers are really just a userspace concept.  The Linux kernel has no concept of a container.  The kernel implements namespaces and cgroups.  Userspace tools can combine these kernel services into a "container".

Namespaces

Namespaces are a way of changing a processes view of its environment from its parents processes.  For example the file system namespace allows me to change a processes view of the file system hierarchy.  pam_namespace introduced way back in Fedora 6/RHEL5, allowed a login program to create a namespace and mount file systems that would not be seen by the ancestor processes.  Meaning I could have multiple processes with different /tmp directories and multiple home directories mounted on /home/dwalsh.

The kernel currently implements 5 name spaces.

  1. mount - mounting  and unmounting filesystems will not affect rest of the system 
  2. UTS - setting hostname, domainname will not affect rest of the system
  3. IPC - process will have independent namespace for System V message queues, semaphore sets and shared memory segments
  4. network - process will have independent IPv4 and IPv6 stacks, IP routing tables, firewall rules, the /proc/net  and  /sys/class/net  directory trees, sockets etc.
  5. pid - processes have an independent pids from the rest of the system.  Each namespace can have its own pid 1. 
Note: A UID namespace is being developed, but is not ready to be used yet, and I have some concerns about how well this will work. Our tools do not currently use the UID namespace.

pam_namespace, sandbox -X, unshare, systemd allow allow you to take advantage of namespaces.

CGROUPS

Wikipedia describes cgroups as:

  cgroups (control groups) is a Linux kernel feature to limit, account and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups.

Basically you can use cgroups to control the amount of resources a process or groups of processes can get on a system. 
I put together a little screen-cast of cgroups to demonstrate their power.

LXC
Tools like LXC have existed for a while to allow users to create containers but the tool set is at a very low level

Libvirt-lxc

"Libvirt is a C toolkit to interact with the virtualization capabilities of recent versions of Linux (and other OSes). The main package includes the libvirtd server exporting the virtualization support."

libvirt-lxc was introduced in Fedora 16. It enhanced the libvirt API to allow users to build containers using libvirt.  This allows you to manage your kvm/qemu virtualization along with your linux containers, all within the same framework.  The only problem, is setting up a linux container using the libvirt api is fairly difficult.

libvirt-sandbox

Dan Berrange created a new package called libvirt-sandbox in Fedora 17.  The libvirt-sandbox package provides an application development library (libvirt-sandbox) to facilitate the embedding of virtualization into applications.  One of the main advantages of this new tool set, was that it greatly simplified the API for creating virtual machines and containers.

SELinux

Using containers by itself does not give you good security separation.  The reason for this is kernel file systems like /proc, /sys, cgroupsfs and selinuxfs are not containerized.  A privileged process running within a container can affect other processes running outside of the container or processes running in other containers.  In libvirt-sandbox and libvirt-lxc you can use SELinux Labelling to further lock down privileged processes, for example preventing mounting of random file systems or stopping processes from disabling SELinux. 

virt-sandbox-service

Dan Berrange and I have been working to enhance libvirt-sandbox.  We have added a command line tool called virt-sandbox-service which allows a user to easily create an application sandbox.  virt-sandbox-service allows an administrator to run multiple services on the same machine each service in a secure Linux Container.   Some major features of virt-sandbox-service containers.

  • Use systemd within the container as the init processes.
  • Uses standard unit files for starting and stopping containerized applications.
  • Shares the /usr partition, meaning if you are running hundreds of Apache containers, and update Apache code, each container will instantly use the new version of Apache.
  • Uses SELinux MCS Labelling to separate each container, preventing even root processes from interfering with the host or other containers.
The goal of this tool is not to allow general purpose applications to run within the container, although we will work to get most services to be able to run.  The tool is not goaled at running full OS chroot, but more towards particular applications.

I have done preliminary tests on running.  httpd, mysql, postgresql, dovecot within these containers.  I am hoping people begin to play with the tool and help us expand which applications can run within the container.  Also you can run multiple applications within a container at the same time.  For example, I have tested httpd and mysql running within the same container.

How to use:

# yum install libvirt-sandbox httpd
There is a bug in the tool right now where it will not work without an /selinux file.
# touch /selinux

Use the virt-sandbox-service command to create a container.

virt-sandbox-service create -C -l s0:c1,c2 -u httpd.service container1
Created sandbox container dir /var/lib/libvirt/filesystems/container1
Created sandbox config /etc/libvirt-sandbox/services/container1.sandbox
Created unit file /etc/systemd/system/container1_sandbox.service

Manipulate the data within the container while running outside of the container.

cd /var/lib/libvirt/filesystems/container1/var/log
touch content
ls -lZ content
# Make sure the content gets created with the correct MCS label.
# Content should be labeled with s0:c1,c2 : Not s0
Now create a file with a bad label for the container.
cat "Secret" > badcontent
chcon -l s0:c3,c4 badcontent
 

Start the container:

virt-sandbox-service start container1

In another window

Make sure the processes are running with the proper SELinux label. ps -eZ | grep svirt_lxc You should see processes like systemd, systemd-journal, dhclient and httpd running within the container with the MCS label of s0:c1,c2

Connect to the container

virt-sandbox-service connect container1
id 
getenforce   # Should tell you SELinux is disabled.
setenforce 1 # Should be denied
touch /file  # Should deny you creating this file
touch /var/www/html/content  # Should be allowed
cat /var/www/html/badcontent # Should be denied
Configure the apache server any way you would like, and manipulate html pages
ifconfig eth0  # Grap IP Address for use on next test
# Use the shell running with in the container to attempt to break out of the container. 
^] 

On your hosts Firefox use the IP within the container

firefox $IP # Using IP address from container, make sure you see the content.

Shut down the container
virt-sandbox-service stop container1

Now lets try to do the same but starting and stopping the container using systemctl commands
systemctl start container1_sandbox.service
systemctl enable container1_sandbox.service # Check on reboot if the container is running

Make sure the container is running.

virt-sandbox-service connect container1
ps -eZ
^]

I would like to hear what you think?  What enhancements you would like to see?  What 
applications would you like to see run within the containers.  

Since this is a first version, we think there could be some growing pains, so use at your own 
risk, but we would love to work with the community to improve this tool set.


Process Confinement in Fedora 18
danwalsh
I have not done this blog for a while. Fedora 12?

A good estimate of the number of different confined processes is to count the number of types with the domain attribute.

seinfo -adomain -x | tail -n +2 | wc -l
707


Note: I am removing the first line because it lists the attribute name.

Not all domain types are confined. If we want to look at the number of unconfined domains, we can use the unconfined_domain_type attribute.

seinfo -aunconfined_domain_type -x | tail -n +2 | wc -l
61

Unconfined Domains
sosreport_tbootloader_tdevicekit_power_tnova_api_t
nova_network_tdirsrvadmin_unconfined_script_tnova_objectstore_tcertmonger_unconfined_t
unconfined_cronjob_tabrt_handle_event_tsetfiles_mac_tinitrc_t
fsadm_tlvm_tmdadm_trpm_t
wine_tnova_vncproxy_tunconfined_dbusd_tnova_volume_t
nova_scheduler_tprelink_tanaconda_tboinc_project_t
nova_ajax_trpm_script_tsystem_cronjob_topenshift_initrc_t
samba_unconfined_net_tkdumpctl_tdevicekit_disk_tfirstboot_t
samba_unconfined_script_tnagios_eventhandler_plugin_thttpd_unconfined_script_tdepmod_t
insmod_tkernel_tlivecd_tpuppet_t
tomcat_tapmd_tclvmd_tcrond_t
inetd_tinit_tudev_tvirtd_t
nagios_unconfined_plugin_trgmanager_tdevicekit_tinetd_child_t
nova_direct_tsemanage_tsge_shepherd_txdm_unconfined_t
unconfined_tabrt_watch_log_tsge_job_txserver_t

If you disable the unconfined policy package, which I recommend.

This leaves only user domains unconfined, along with some domains that do not make sense to confine.  (anaconda, firstboot, kernel,rpm)

# semodule -d unconfined
seinfo -aunconfined_domain_type -x | tail -n +2 | wc -l
14


Unconfined Domains
rpm_tanaconda_trpm_script_topenshift_initrc_t
firstboot_tkernel_tlivecd_tunconfined_t


You can disable all unconfined domains by disabling unconfineduser module

# semodule -d unconfineduser

Note: You need to setup all your users as confined users, before removing the unconfineduser module.
Disabling the unconfined and unconfineduser policy modules is the equivalent of what we used to call strict policy.

One other interesting domain is permissive domains. Permissive domains can be listed with the --permissive qualifier.

# seinfo --permissive -x | tail -n +3 | wc -l
31

Permissive Domains
phpfpm_tvirt_qemu_ga_tpkcsslotd_trealmd_t
mandb_trngd_tslpd_tglusterd_t
stapserver_tsensord_t


A couple of other interesting statistics.

Total number of file types.

seinfo -afile_type -x | tail -n +2  | wc -l
2375


In order to get the number of allow rules, you need to use sesearch

sesearch --allow | wc -l
81736

Dontaudit Rules

sesearch --dontaudit | wc -l
6532

?

Log in

No account? Create an account