Containers and MLS

I have just updated the container-selinux policy to support MLS (Multi Level Security).  

SELinux and Container technology have a long history together.  Some people imagine that containers started just a few years ago with the introduction of Docker, but the technology goes back a lot further then that. 

SELinux originally supported two type of Mandatory Access Control,  Type Enforcement and RBAC (Roles Based Access Control).  

Type Enforcement

Type Enforcement is SELinux main security measure.  Basically every process on the system gets a assigned a type (httpd_t, unconfined_t, container_t ...)  And every object (file, directory, socket, tcp port ...) gets assigned a type. (httpd_sys_content_t, user_home_t, httpd_port_t, container_file_t).  Then we write rules that define the access between process types and object types. Note: The type field is always the third field of the SELinus label.

allow container_t container_file_t:file { open read write }

Anything that is not allowed is denied.

RBAC (Roles Based Access Control)

RBAC although seldom used was an enhancement on Type Enforcement, it basically controlled the types that a user process could become.  When a user logs onto the system he gets assigned User (staff_u, which can have one or more rules, including a default role  staff_r).    Note: the user field is the first field of the SELinux label and the role field is the second field.  Policy defines which types that staff_r role can run with.  For example a staff_r can run with the staff_t type, but it is not allowed to use the unconfined_t type.  If the SELinux user the user logs in with has multiple roles then he could change his role using the newrole command or sudo.  I often setup my account with a user with the staff_r and unconfined_r roles.   Where I login i get the staff_r role but when I become root through sudo I become the unconfined_r role with the unconfined_t type.  This means that if I accedently ran a setuid app on my system that made me root, I would still be staff_t and not able to modify the kernel.

MLS (Multi Level Security)

Back around the time we were working on RHEL 5 (2005/2006) time frame, we decided we wanted to use SELinux for MLS.  We wanted to get RHEL to be certified as EAL4+ LSPP, which required us to support handling data at different levels.  MLS Is very different then type enforcement, in that it is not concerned with the type of the process that is running, but at it sensitivity or security level.  The easiest way to understand this is imagine a group of processes running as TopSecret and another group of processes running as Secret, and then data/file on the system would also be assigned a level.  Like TopSecret and Secret.  The kernel then controls the communcations between these processes based on levels.  A Secret process can not read TopSecret data.  MLS Labels consist of this Sensitivity field and up to 1024 different categories.  Note: the MLS Field is the forth field of an SELinux Label.  You might have a TopSecret Process with the British Intelligence Category.   In order for processes to communicate the kernel enforces SELinux policy on both the sensitivities and categories.  I don't want to get into the weeds on this, there is plenty of information on the web on how MLS systems work. 

Way back in RHEL5 time the "mount namespace" was added to the kernel to handle MLS workloads.  We wanted to setup a system which allowed a user to login to a system at Secret  level, and then later login to the same system as TopSecret.  But we wanted him to have different home directories and /tmp directories when he logged in.  He would see his Secret Home directory when he logged in as secret and the Top Secret directory when he logged in as Top Secret. The Mount namespace allowed us to change the "mount tables" for a group of processes so /home/dwalsh and /tmp would vary for different processes on the system.  Now Mount Namespace is the crucial container feature that allows us to have multiple containers on a system each seeing different version of the OS mount on /.

MCS (Multi Category Security)

MCS was introduced around RHEL 6 Time frame (2008) in order to keep virtual machine separate.  We needed a way to make sure that two Virtual Machines could not attack each other if their was a hypervisor break out.  From an Type enforcement point of view each VM would run as svirt_t, and thier images would be svirt_image_t, we had rules that said:

allow svirt_t svirt_image_t:file manage_file_perms;

This means we could control a VM to only be able to read/write svirt images, and would not be allowed to attack the rest of the host.  But if every VM ran as svirt_t and all their images were svirt_image_t, they would be able to attack each other.  We needed a way in SELinux to keep the VMS apart.

Most systems did not use the MLS field of SELinux since they were not running in MLS mode,  If you are on a targeted system, you would see most processes running with an MLS filed of `s0`.  We decided to take advantage of the MLS field and create a new form of policy enforcement.  We would ignore Sensitivity and just use the categories.  We modified libvirt, the tool we use to launch VMs, to assign 2 random categories to each process and image, making sure the categories for the VM matched its image.  Then the Kernel would enforce that the categories had to match exactly to allow access.  This means a VM running with a Label like system_u:system_r:svirt_t:s0:C1,C2 would be allowed to manage an image labeled system_u:object_r:svirt_image_t:s0:c1,c2 but not allowed access to one labeled system_u:object_r:svirt_image_t:s0:c3,c4. We called this isolation svirt.

We now use MCS Separation and Type Enforcement to isolate containers.  We use different types for containers, so containers run as container_t and content is labeled container_file_t.

Containers and MLS

But what about people who want to run containers on MLS systems?  I recently worked with a user who wanted to run containers on an MLS machine, so we had to modify the policy.  Container Selinux policy is available in github.

init_ranged_daemon_domain(container_runtime_t, container_runtime_exec_t, s0 - mls_systemhigh)

The only changed we need to make were to allow the container runtimes to run fully ranged.  This means container runtimes like CRI-O, Podman, Buildah and Docker can execute any any level and MCS Category.  We also had to allow container runtimes to be able to manage content at all of its sensitivies and categories.  Luckily this only meant we had to add three lines to the existing policy.  Now an MLS user could install the container-selinux package onto a MLS system and install a container runtime, and they can select which MLS Label they want to run at.

podman run --secutity-opt level:TopSecret:BritishIntelligence fedora sh
docker run --security-opt level:s15:c100,c200 -v /var/lib/content:/var/lib/content:Z nginx 

And the processes in the container and the content in the container will match the label.


Anonymous comments are disabled in this journal

default userpic

Your reply will be screened