Container Domains (Types)

One of the things people have always had a hard time understanding about SELinux is around different types.  In this blog, I am going to discuss Contianer Domains.

Recently I had someone questioning me about specifying types to run containers inside of Kubernetes.  Basically he wanted to run a locked down container that could read and write content inside of /var/log.  He saw that the content in /var/log was labeled var_log_t, and made the assumption that he would run the container with var_lot_t and it would be able to manage content with that label.  

This is not a crazy assumption, after all in DAC, if a file is owned by the user dwalsh, usually processes owned by dwalsh are able to read and write them. (If the permission flags allow it).  But in SELinux type enforcement is different.  CRI-O failed to execute the container process for Kubernetes and an AVC was generated that looked like:

type=AVC msg=audit(1558135492.958:247182): avc:  denied  { transition } for  pid=22423 comm="runc:[2:INIT]" path="/usr/bin/pod" dev="sda1" ino=570425443 scontext=system_u:system_r:container_runtime_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=process permissive=0

Collapse )

Musings on Hybrid Cloud

I work on the lowest levels of container runtimes and usually around process security.  My team and I work on basically everything needed run containers on the host operating system under Kubernetes.  I also work in the OpenShift group at Red Hat.

I hear a lot of thoughts on Hybrid Cloud and how the goal of OpenShift is to bridge the gap between on-prem data center services and virtualization with cloud services.  Usually these services are provided by the big three clouds.  Amazon AWS, Microsoft Azure, and Google GCE.  Maybe I should add Alibaba to this list. 

It is really cool that OpenShift and Kubernetes have the ability to move workloads from your in-house data centers to different clouds.  Imagine you have VMWare, OpenStack or RHEV virtualization Kubernetes nodes running along with nodes running in the cloud services, all powered by OpenShift/Kubernetes. 

OpenShift/Kubernetes can scale up off of your in-house data centers to the cloud, basically renting capacity when demand skyrockets but then drops back when demand slackens, saving you the rent check.  

I envision a world where you could get deals off of Microsoft Azure to save .05 cents per hour on your rent.  You press a button on OpenShift which moves  hundreds/thousands of nodes off of AWS and onto Azure.  (Of course to make this work customers need to make sure they don’t get tied into services on any of the big cloud vendors)

Big Cloud Vendors == Walmart/Amazon Retail Business

Collapse )

Container Labeling

An issue was recently raised on libpod, the github repo for Podman.

"container_t isn't allowed to access container_var_lib_t"

Container policy is defined in the container-selinux package. By default containers run with the SELinux type "container_t" whether this is a container launched by just about any container engine like: podman, cri-o, docker, buildah, moby.  And most people who use SELinux with containers from container runtimes like runc, systemd-nspawn use it also.

By default container_t is allowed to read/execute labels under /usr, read generically labeled content in the hosts /etc directory (etc_t). 

The default label for content in /var/lib/docker and  /var/lib/containers is container_var_lib_t, This is not accessible by  containers, container_t,  whether they are running under podman, cri-o,  docker, buildah ...  We specifically do not want containers to be able to read this content, because content that uses block devices like devicemapper and btrfs(I believe) is labeled container_var_lib_t, when the containers are not running.  

For overlay content we need to allow containers to read/execute the content, we use the type container_share_t, for this content.  So container_t is allowed to read/execute container_share_t files, but not write/modify them.

Content under /var/lib/containers/overlay* and /var/lib/docker/overlay* is labeled container_share_ by default.

Collapse )

SELinux blocks podman container from talking to libvirt

I received this bug report this week.

"I see this when I try to use vagrant from a container using podman on Fedora 29 Beta.

Podman version: 0.8.4

Command to run container:

sudo podman run -it --rm -v /run/libvirt:/run/libvirt:Z -v $(pwd):/root:Z localhost/vagrant vagrant up

Logs:

...

Sep 30 21:17:25 Home audit[22760]: AVC avc:  denied  { connectto } for  pid=22760 comm="batch_action.r*" path="/run/libvirt/libvirt-sock" scontext=system_u:system_r:container_t:s0:c57,c527 tcontext=system_u:system_r:virtd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=0

"

This is an interesting use case of using SELinux and containers.  SELinux is protecting the file system, and the host from attack from inside of the container.  People who have listened to me over the years understand that SELinux is protecting the label of files, in the case of containers, it only allows a container_t to read/write/execute files labeled container_file_t.

But the reporter of the bug, thinks he did the right thing, he told podman to relabel the volumes he was mounting into the container.

Lets look at his command to launch the container.

sudo podman run -it --rm -v /run/libvirt:/run/libvirt:Z -v $(pwd):/root:Z localhost/vagrant vagrant up

Collapse )

SELinux prevent users from executing programs, for security? Who cares.

I recently received the following email about using SELinux to prevent users from executing programs.
 

I just started to learn SELinux and this is nice utility if you want confine any user who interact with your system.

A lot of information on Net about how to confine programs, but can't find about confining man's :)

I found rbash (https://access.redhat.com/solutions/65822) which help me forbid execution any software inside and outside user home directory except few.

As I understand correctly to do this using SELinux I need a new user domain(customuser)  which by default should deny all or I can start with predefined       guest_t?

Next then for example I can enable netutils_exec_ping(customuser_t, customuser_r).

I responded that:

SELinux does not worry so much about executing individual programs, although it can do this.  SELinux is basically about  defining the access of a process type.  
Just because a program can execute another program does not mean  that this process type is going to be allowed the access that the program requires.  For example.  

A user running as guest_t can execute su and sudo, and even if the user might discover the       correct password to become root, they can not become root on the system, SELinux would block it.  Similarly guest_t is not allowed to connect out of the system, so being able to execute ssh or ping does not mean that the user would be able to ping another host or       ssh to another system.

Collapse )

unlabeled_t type

I often see bug reports or people showing AVC messages about confined domains not able to deal with unlabeled_t files.

type=AVC msg=audit(1530786314.091:639): avc:  denied  { read } for  pid=4698 comm="modprobe" name="modules.alias.bin" dev="dm-0" ino=9115100 scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=file

I just saw this AVC, which shows the openvswitch domain attempting to read a file, modules.alias.bin, with modprobe.   The usual response to this is to run restorecon on the files and everything should be fine.

But the next question I get is how did this content get the label unlabeled_t, and my response is usually I don't know, you did something.

Well lets look at how unlabeled_t files get created.

unlabeled_t really just means that the file on disk does not have an SELinux xattr indicating a file label.  Here are a few ways these files can get created

1 File was created by on a file system when the kernel was not running in SELinux mode.  If you take a system that was installed without SELinux (God forbid) or someone booted the machine with SELinux disabled, then all files created will not have labels.  This is why we force a relabel, anytime someone changes from SELinux disabled to SElinux enabled at boot time.

Collapse )

Fun with DAC_OVERRIDE and SELinux

Lately the SELinux team has been trying to remove as many SELinux Domain Types that have DAC_OVERRIDE.

man capabilities

...

       CAP_DAC_OVERRIDE

              Bypass file read, write, and execute permission checks.  (DAC is an abbreviation of "discretionary access control".)

This means a process with CAP_DAC_OVERRIDE can read any file on the system and can write any file on the system from a standard permissions point of view.  With SELinux it means that they can read all file types that SELinux allows them to read, even if they are running with a process UID that is not allowed to read the file.  Similar they are allowed to write all SELinux writable types even if they aren't allowed to write based on UID.  

Obviously most confined domains never need to have this access, but some how over the years lots of domains got added this access.  

I recently received and email asking about syslog, generating lots of AVC's.  The writer said that he understood SELinux and has set up the types for syslog to write to, and even the content was getting written properly.  But the Kernel was generating an AVC every time the service started.

Here is the AVC.

Jul 09 15:24:57

 audit[9346]: HOSTNAME AVC avc:  denied  { dac_override }  for  pid=9346 comm=72733A6D61696E20513A526567 capability=1   scontext=system_u:system_r:syslogd_t:s0  tcontext=system_u:system_r:syslogd_t:s0 tclass=capability permissive=0

Collapse )

Cool SELinux hack provide by systemd

Sometimes content is created in /run during boot that ends up mislabeled.  We sometimes here, every time I boot, this file gets created with the wrong label.   

This can happen if initramfs is creating content before systemd has loaded policy.  This means the content would get created with var_run_t as the label.

Well I was looking at tmpfs.d and it has a cool feature.

man tmpfs.d

...

       Z

           Recursively set the access mode, group and user, and restore the SELinux security context of a file or directory

           if it exists, as well as of its subdirectories and the files contained therein (if applicable). Lines of this type

           accept shell-style globs in place of normal path names. Does not follow symlinks.

One hack you could try, would be to add /run to the tmpfiles.d directory and systemd will relabel all of the content in /run when the system reboots.

echo "Z /run — — — — —" > /etc/tmpfiles.d/relabelrun.conf

Of course if the content gets created after the tmpfs runs with the wrong label, you are out of luck, or enabled the old service restorecond...

SELinux team works to remove DAC_OVERRIDE Permissions.

DAC_OVERRIDE is one of the most powerful capabilities, and most app developers don't understand when they are taking advantage of it, or how easy it is to eliminate the need.

What is DAC_OVERRIDE?

man capabilities

...

       CAP_DAC_OVERRIDE

              Bypass file read, write, and execute permission checks.  (DAC is an abbreviation of "discretionary access control".)

Looking at /usr/include/linux/capability.h

#define CAP_DAC_OVERRIDE     1

/* Overrides all DAC restrictions regarding read and search on files and directories, including ACL restrictions if [_POSIX_ACL] is defined. Excluding DAC access covered by CAP_LINUX_IMMUTABLE. */

Giving a process this access means it can ignore file system permission checks. Admittedly everyone thinks root can do this by default anyways, but if you can eliminate this access from a system service, you really can tighten the security.  

SELinux

SELinux ignores DAC permissions, it does not care if a a processes is running as root or any other UID.  The only part of SELinux that concerns itself with UID/GID permissions is in linux capabilities like DAC_OVERRIDE.

With SELinux we often look at what process types require DAC_OVERRIDE and try to figure out if we can rid of the access.  

Collapse )

Customizing container types

In my previous blog, I talked about about container types container_t and svirt_lxc_net_t. Today I get an email, asking about the new container_t type replacing svirt_lxc_net_t.

On 05/23/2018 11:50 PM, Dustin C. Hatch wrote:
I recently upgraded some of my Docker hosts to CentOS 7.5 and started getting "Permission Denied" errors inside of containers. I traced this down to any container that mounts and uses /etc/passwd from the host (so that UIDs inside the container map to the same username as on the host), because the SELinux policy in CentOS 7.5 does not allow the new container_t domain to read passwd_file_t.  
The old svirt_lxc_net_t domain had the nsswitch_domain attribute, while its replacement, container_t, does not. I cannot find any reference for this change, so I was wondering if it was deliberate or not. If it was deliberate, what would be the consequences if I were to make a local policy change to add that attribute back? If it was not deliberate, I would be happy to open a ticket in Bugzilla. 

First let's remove the misconception, container_t was not a new type replacing svirt_lxc_net_t, it was a rename (typealias) of the old type.  

Collapse )