danwalsh


Dan Walsh's Blog

Got SELinux?


pam_mkhomedir versus SELinux -- Use pam_oddjob_mkhomedir
danwalsh
SELinux is all about separation of powers, minamal privs or reasonable privs.

If  you can break a program into several separate applications, then you can use SELinux to control what each application is allowed.  Then SELinux could prevent a hacked application from doing more then expected.

The pam stack was invented a long time ago to allow customizations of the login process.  One problem with the pam_stack is it allowed programmers to slowly hack it up to give the programs more and more access.  I have seen pam modules that do some crazy stuff.

Since we confine login applications with SELinux, we sometimes come in conflict with some of the more powerful pam modules.
We in the SELinux world want to control what login programs can do.  For example we want to stop login programs like sshd from reading/writing all content in your homedir.

Why is this important?

Over the years it has been shown that login programs have had bugs that led to information leakage without the users ever being able to login to a system.

One use case of pam, was the requirement of creating a homedir, the first time a user logs into a system.  Usually colleges and universities use this for students logging into a shared service.  But many companies use it also.

man pam_mkhomedir
  The pam_mkhomedir PAM module will create a users home directory if it does not exist when the session begins. This allows    users to be present in central database (such as NIS, kerberos or LDAP) without using a distributed file system or pre-creating a large number of directories. The skeleton directory (usually /etc/skel/) is used to copy default files and also sets a umask for the creation.


This means with pam_mkhomedir, login programs have to be allowed to create/read/write all content in your homedir.  This means we would have to allow sshd or xdm to read the content even if the user was not able to login, meaning a bug in one of these apps could allow content to be read or modified without the attacker ever logging into the machine.

man pam_oddjob_mkhomedir
       The pam_oddjob_mkhomedir.so module checks if the user's home  directory exists,  and  if it does not, it invokes the mkhomedirfor method of the com.redhat.oddjob_mkhomedir service for the PAM_USER if the  module  is running with superuser privileges.  Otherwise, it invokes the mkmyhome‐dir method.
       The location of the skeleton directory and the default umask are deter‐mined  by  the  configuration for the corresponding service in oddjobd-mkhomedir.conf, so they can not be specified as arguments to this  module.
       If  D-Bus  has  not been configured to allow the calling application to invoke these methods provided as part of the  com.redhat.oddjob_mkhome‐dir interface of the / object provided by the com.redhat.oddjob_mkhome‐dir service, then oddjobd will not receive the  request  and  an  error  will be returned by D-Bus.


Nalin Dahyabhai wrote pam_oddjob_mkhomedir many years ago to separate out the ability to create a home directory and all of the content from the login programs.  Basically the pam module sends a dbus signal to a dbus service oddjob, which launches a tool to create the homedir and its content.  SELinux policy is written to allow this application to succeed.   We end up with much less access required for the login programs.

If you want the home directory created at login time if it does not exist. Use pam_oddjob_mkhomedir instead of pam_mkhomedir.

DAC_READ_SEARCH/DAC_OVERRIDE - common SELinux issue that people handle badly.
danwalsh
MYTH: ROOT is all powerful.

Root is all powerful is a common misconception by administrators and users of Unix/Linux systems.  Many years ago the Linux kernel tried to break the power of root down into a series of capabilities.  Originally there were 32 capabilities, but recently that grew to 64.  Capabilities allowed programmers to code application in such a way that the ping command can create rawip-sockets or httpd can bind to a port less then 1024 and then drop all of the other capabilities of root.

SELinux also controls the access to all of the capabilities for a process.    A common bugzilla is for a process requiring the DAC_READ_SEARCH or DAC_OVERRIDE capability.  DAC stands for Discretionary Access Control.  DAC Means standard Linux Ownership/permission flags.  Lets look at the power of the capabilities.

more /usr/include/linux/capability.h
...
/* Override all DAC access, including ACL execute access if
   [_POSIX_ACL] is defined. Excluding DAC access covered by
   CAP_LINUX_IMMUTABLE. */

#define CAP_DAC_OVERRIDE     1

/* Overrides all DAC restrictions regarding read and search on files
   and directories, including ACL restrictions if [_POSIX_ACL] is
   defined. Excluding DAC access covered by CAP_LINUX_IMMUTABLE. */

#define CAP_DAC_READ_SEARCH  2


If you read the descriptions these basically say a process running as UID=0 with DAC_READ_SEARCH can read any file on the system, even if the permission flags would not allow a root process to read it.  Similarly DAC_OVERRIDE, means the process can ignore all permission/ownerships of all files on the system.  Usually when I see AVC messages that require this access, I take a look at the process UID, and almost always I see the process is running as uid=0.

What users often do when they see this access denial is to add the permissions, which is almost always wrong.  These AVC's indicate to me that you have permission flags to tight on a file. Usually a config file.

Imagine the httpd process needs to read /var/lib/myweb/content which is owned by the httpd user and has permissions 600 set on it.

 ls -l /var/lib/myweb/content
-rw-------. 1 apache apache 0 May 12 13:50 /var/lib/myweb/content

If for some reason the httpd process needs to read this file while it is running as UID=0, the system will deny access and generate a DAC_* AVC.  A simple fix would be to change the permission on the file to be 644.

# chmod 644 /var/lib/myweb/content
# ls -l /var/lib/myweb/content
-rw-r--r--. 1 apache apache 0 May 12 13:50 /var/lib/myweb/content


Which would now allow a root process to read the file using the "other" permissions.

Another option would be to change the group to root and change the permissions to 640.

# chmod 640 /var/lib/myweb/content
# chgrp root /var/lib/myweb/content
# ls -l /var/lib/myweb/content
-rw-r-----. 1 apache root 0 May 12 13:50 /var/lib/myweb/content


Now root can read the file based on the group permissions. but others can not read it.  You could also use ACLs to provide access.    Bottom line this is probably not an SELinux issue, and not something you want to loosen SELinux security around.

One problem with SELinux system here is the capabilities AVC message does not tell you which object on the file system blocked the access by default.  The reason for this is performance as I explained in previous blog.


Why doen't SELinux give me the full path in an error message?


If you turn on full auditing and regenerate the AVC, you will get the path of the object with the bad DAC Controls, as I explained in the blog.

Writing custom policy for an Apache Application.
danwalsh
Received the following email this week:

I've a PHP application that sends data to a USB tty device e.g. /dev/usbDataCollector

Unfortunately selinux is blocking this action. When set to permissive, the alert browser suggests the command:

setsebool -P daemons_use_tty 1

The documentation says Allow all daemons the ability to use unallocated ttys. This naturally doesn't sound like a good idea although admittedly it probably won't hurt in this particular installation. However, I thought it would be good to find the 'correct' solution to this.

But I am unable to find a more fine grain SELinux control for this, Fedora 20 has no documentation and the only vaguely relevant one I could find elsewhere is httpd_tty_com which appears unrelated as it is about allow httpd to communicate with terminal.


So the question is whether there is any way to do this or is allowing all daemons the only option?

My Answer

Simplest would be to just use

# grep usbDataCollector /var/log/audit/audit.log | audit2allow -M myhttp
# semodule -i myhttp.pp

This would allow all httpd_t processes the ability to use usb_device_t, of course if you had other usb_device_t devices on your system, your apache processes would also gain access to them.

Tighter Controls

SELinux is a labeling system, so you can manipulate the labels on the system to get tighter controls.

If you really wanted to tighten it up, you could build a custom policy that put a different label on /dev/usbDataCollector and allow httpd_t processes access to this label.

Something like

# cat myhttp.te
policy_module(myhttp, 1.0)
gen_require(`
type httpd_t;
')

type httpd_device_t;
dev_node(httpd_device_t)

allow httpd_t httpd_device_t:chr_file rw_chr_file_perms;


Note that I am create a new type httpd_device_t, and I define it as a device node, which gives it attributes to allow domains that manage devices to manage this new device. Then I allow the apache process type, httpd_t, to be able to read/write chr_files with this label.

# cat myhttpd.fc
/dev/usbDataCollector -c
gen_context(system_u:object_r:httpd_device_t,s0)


I also want to put the label on the device automatically so I add a file context file with the /dev/usbDataCollector labeled as httpd_device_t.

# make -f /usr/share/selinux/devel/Makefile
# semodule -i myhttpd.pp
# restorecon -v /dev/usbDataCollector


Finally I compile the myhttpd.te an myhttpd.fc file into a myhttpd.pp policy package, and install it on the system. Since the device is probably already created I need to run restorecon on it to fix the label. udev will set the label automatically on the next reboot.


Now httpd_t processes would only be able to use the /dev/usbDataCollector chr_file and not other usb devices on the system.
</div>

DAC check before MAC check. SELinux will stop wine'ing.
danwalsh
When it comes to SELinux, one of the most aggravating bugs we see are when the kernel does a MAC check before a DAC Check. 

This means SELinux checks happen before normal ownership/permission checks.  I always prefer to have the DAC check happen first.  This is important because code that is attempting the denied access usually will handle the EPERM silently and go down a different code path.    But if a MAC Failure happens, SELinux writes an AVC to the audit log, and setroubleshoot reports it to the user.

One of the biggest offenders of this was the mmap_zero check.  Every time a process tries to map low kernel memory, the kernel denies it, in both DAC and MAC.  Wine applications are notorious for this.  We block mmap_zero because it can potentially trigger kernel bugs which can lead to privilege escalation.

Eric Paris explains the vulnerability here.

Since the MAC check was done before the DAC check, the wine applications tend to work correctly.  When the wine application attempts to mmap low memory, it gets denied, and then reattempts the mmap with a higher memory value.  On an SELinux system the kernel generates AVC.  The user sees something like:

SELinux is preventing /usr/bin/wine-preloader from 'mmap_zero' accesses on the memprotect.

Reading about the mmap_zero, scares the user and they think their machine is vulnerable.  The only thing SELinux policy writers can do is write a dontaudit rule or allow the access, which defeats the purpose of the check.

We still want to block this access if a privileged confined process got it and report the SELinux violation.   If an confined application running as root, attempts a mmap_zero access, SELinux should block it and report the AVC.  If a normal unprivileged process triggered the access check, we would prefer to allow DAC to handle it, and not print the message.

To give you an idea of how often people have seen this; Google "SELinux mmap_zero" and you will get more then 13,000 hits.

Today the upstream kernel has been fixed to report check for mmap_zero for MAC AFTER DAC.

Thanks to Eric Paris and Paul Moore for fixing this issue.

SELinux Transitions do not happen on mountpoints mounted with nosuid.
danwalsh
Today one of our customers was trying to run openshift enterprise and it was blowing up because of SELinux.
Openshift sets up the Apache daemon to run /var/www/openshift/broker/script/broker_ruby.

When looked at the log, it was stating that Apache was not allowed to execute broker_ruby permission denied.

ls -lZ /var/www/openshift/broker/script/broker_ruby
Shows that broker_ruby is labeled as httpd_sys_content_t

I went and looked at policy, I saw.

sesearch -A -s httpd_t -t httpd_sys_content_t -p execute -C
DT allow httpd_t httpdcontent : file { ioctl read write create getattr setattr lock append unlink link rename execute open } ; [ httpd_enable_cgi httpd_unified && httpd_builtin_scripting && ]


This shows that the httpd_t (Apache) process is allowed to execute the broker_ruby script if all of the following booleans are enabled.
httpd_enable_cgi, httpd_unified, httpd_builtin_scripting

Turns out the were.  I then went back and looked at the AVC.

type=AVC msg=audit(28/02/14 13:56:52.702:24992) : avc:  denied  { execute_no_trans } for  pid=6031 comm=PassengerHelper path=/var/www/openshift/broker/script/broker_ruby dev=dm-3 ino=817 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=unconfined_u:object_r:httpd_sys_content_t:s0 tclass=file

This AVC means that the Apache daemon (httpd_t) is not allowed to execute the broker_ruby application (httpd_sys_content_t) without a transition, meaning in the current label (httpd_t).

Which I understood, since when the above booleans are turned on httpd_t is supposed to transition to httpd_sys_script_t when executing httpd_sys_content_t.  This sesearch command shows the transition rule.

sesearch -T -s httpd_t -t httpd_sys_content_t -c process -C
DT type_transition httpd_t httpd_sys_content_t : process httpd_sys_script_t; [ httpd_enable_cgi httpd_unified && httpd_builtin_scripting && ]


Why wasn't the process transitioning?

Then I remembered that SELinux transitions do not happen on mounted partitions that are mounted with the nosuid flag.

man mount
...
       nosuid Do not allow set-user-identifier or set-group-identifier bits to take effect. (This seems safe, but is in fact rather  unsafe  if  you have suidperl(1) installed.)


SELinux designers feel that a transition can be a potential privilege escalation similar to a suid root application.  Therefore if an administrator has told the system that no suid apps should be allowed on a mount point, then it also means no SELinux transitions will happen.

Removing the nosuid flag from the mount point fixes the problem.

Containers your time is now. Lets look at Namespaces.
danwalsh
Lately I have been spending a lot of time working on Containers.  Containers are a mechanism for controlling what a process does on a system.

Resource Constraints can be considered a form of containerment.

In Fedora and RHEL we use cgroups for this, and with the new systemd controls in Fedora and RHEL7, managing cgroups has gotten a lot easier.  Out of the box all of your processes are put into a cgroup based on whether they are a user, system service or a Machine (VMs).  These processes are grouped at the unit level, meaning two users logged into a system will get and "Fair Share" of the system, even if one user forks off thousands of processes.  Similarly if you run an httpd service and a mariadb service, they each get an equal share of the system, meaning that httpd can not fork 1000 process while mariadb only runs three, the httpd 1000 processes can not dominate the machine leaving no memory of cpu for mariadb.  Of course you can go into the unit files for httpd or mariadb and add a couple of simple resource constraints to further limit them

Adding

MemoryLimit: 500m

to httpd.service  unit file

For example will limit the service to only use 500 megabytes to httpd processes.

Security Containment

Some could say I have been working on containers for years since SELinux is a container technology for controlling what a process does on the system.  I will talk about SELinux and advanced containers in my next blog.

Process Separation Containment

The last component of containers is Namespaces.  The linux kernel implements a few namespaces for process separation.  There are currently 6 namespaces.

Namespaces can be used to Isolate processes. They can create a new environment where changes to the process are not reflected in other namespace.
Once set up, namespaces are transparent for processes.

Red Hat Enterprise Linux  and Fedora currently support 5 namespace

  • ipc

  • ipc namespace allows you to have shared memory, semaphores with only processes within the namespace.

  • pid

  • pid namespace eliminates the view of other processes on the system and restarts pids at pid 1.

  • mnt

  • mnt namespace allows processes within the container to mount file systemd over existing files/directories without affecting file systems outside the namespace

  • net

  • net namespace creates network devices that can have IP Addresses assigned to them, and even configure iptables rules and routing tables

  • uts

  • uts namespace allows you to assign a different hostname to processes within the container. Often useful with the network namespace

Rawhide also supports the user namespace.  We hope to add the user namespace support to a  future Red Hat Enterprise Linux 7.

User namespace allows you to map real user ids on the host to container uids.  For example you can map UID 5000-5100 to 0-100 within the container.  This means you could have uid=0 with rights to manipulate other namespaces within the container.  You could for example set the IP Address on the network namespaced ethernet device.  Outside of the container your process would be treated as a non privileged process.  User namespace is fairly young and people are just starting to use it.

I have put together a video showing namespaces in Red Hat Enterprise Linux 7.
https://www.youtube.com/watch?v=e4NXJ5nM-_M&feature=youtu.be

file_t we hardly new you...
danwalsh
file_t disappeared as a file type in Rawhide today.  It is one of the oldest types in SELinux policy.  It has been aliased to unlabeled_t.

Why did we remove it?

Let's look at the comments written in the policy source to describe file_t.

# file_t is the default type of a file that has not yet been
# assigned an extended attribute (EA) value (when using a filesystem
# that supports EAs).


Now lets look at the description of unlabeled_t

# unlabeled_t is the type of unlabeled objects.
# Objects that have no known labeling information or that
# have labels that are no longer valid are treated as having this type.


Notice the conflict.

If a file object does not have a labeled assigned to it, then it would be labeled unlabeled_t.  Unless it is on a file system that supports extended attributes then it would be file_t?

I always hated explaining this, and we have finally removed the conflict for future Fedora's.  Sadly this change has not been made in RHEL7 or any older RHELs or Fedoras.

We also added a type alias for unlabeled_t to file_t.

Note: Seandroid made this change when the policy was first being written.

One other conflict I would like to fix is that a file with a label that the kernel does not understand, is labeled unlabeled_t. (IE It has a label but it is invalid.)  I have argued for having the kernel differentiate the two situations.

  • No label -> unlabeled_t

  • Invalid Label -> invalid_t.

Upstream has pointed out from a practical/security point of view you really need to treat them both as the same thing.  Confined domains are not allowed to use unlabeled_t objects.  And if it is a file system object you should run restorecon on it.  Putting a legitimate label on the object.  Probably I will not get this change, but I can always hope. 

How come somethings get blocked by SELinux in permissive mode?
danwalsh
SELinux can be setup to run in three modes.

* Enforcing (My favorite)
* Permissive
* Disabled

Often permissive is described as the same as enforcing except everything is allowed and logged.

For the most part this is true, except when their are bugs or a "Access Control Manager" does not respect the permissive flag.

Most of SELinux is written where the kernel control's access, and it would be very strange for the kernel to block an access in permissive mode. 

But there are several situations where we want to check access outside the kernel.  For example.

  • Can an application connect to a particular dbus daemon?

  • Can a service start a particular systemd daemon?

  • Can a root process change the password of something?

  • Will sshd allow dwalsh to login as unconfined_t?

All of these checks are not seen by the kernel.  We implement SELinux checks in places like dbus daemon, systemd, X Server, sshd, passwd ...  When one of these services denies access you will see a USER_AVC generated rather then an AVC.  If these SELinux checks are not written correctly to check the permissive flag when an access is denied, you could get a real denial in permissive mode.

Usually we see these as bugs, but in certain situations the upstream does not want to accept patches to check the permissive flag.

If you know of a situation where this happens, open a bugzilla on it and we can work with the packager to fix the problem.

When you see an AVC or USER_AVC that is generated in permissive mode, you should see a flag that states "success=yes" in the AVC record, this indicates that the AVC was generated but still allowed.  If it says "success=no" in permissive mode then that should be considered a bug.

Awesome new coreutils with improved SELinux support
danwalsh

When I first started working on SELinux over 10 years ago, one of the first packages I worked on was coreutils.    We were adding SELinux support to insure proper handling of labeling.  After that we did not touch it for several years.

Last year, I decided to investigate if I could improve coreutils handling of labels on initial content creation.   Well it took a while but my patches were finally accepted, with lots of fixes from the upstream, and coreutils-8.22 just showed up today in  Rawhide.

I am very excited about this release.  I believe it can allow Administrators to fix one of the biggest problems users have with SELinux, objects getting created with the incorrect context.

My patches basically standardized "-Z" with no options to indicate you wanted the target directory to get the "default" label.

Example:

# touch /tmp/foobar
# mv /tmp/foobar /etc
# ls -lZ /etc/foobar
# -rw-r--r--. root root staff_u:object_r:user_tmp_t:s0   /etc/foobar


As opposed to:

# touch /tmp/foobar
# mv -Z /tmp/foobar /etc
# ls -lZ /etc/foobar
# -rw-r--r--. root root staff_u:object_r:etc_t:s0   /etc/foobar


The traditional use of a command like mv was to maintain the "Security" of an object you are moving.  mv command would maintain the ownership, permissions, and SELinux Labels.  The problem with this is users/administrators would not expect this, by adding the "-Z" to the mv command, the administrator guarantees that the object will get he correct label based on the destination path, which over the years, I believe is what the administrator would expect.  The "-Z" option in coreutils now indicates the equivalent of running restorecon on the target, except in most cases the label is correct on creation of the content.

"mv -Z /tmp/foobar /etc/foobar" == "mv /tmp/foobar /etc/foobar; restorecon /tmp/foobar"

One of the reasons we did not do this sooner, was the speed of reading in the labeling database.  The latest SELinux toolchain loads the labeling database in a fraction of the previous time, allowing us to make these changes.

Setting up coreutils alias

I would even suggest that it would be a good idea to alias

alias mv='mv -Z'

for most users.

A common mistake is to mv content around in the homedir.   A mistake I have made in the past was to an html file to a my account on people.fedoraproject.org and then to ssh into the machine and then mv it to the public_html directory.  ~/public_html is labeled httpd_user_content_t which is readable by default from the apache server, while the default label of my homedir is not, user_home_t.

mv ~/content.html ~/public_html/

This command would end up with the ~/public_html/content.html being labeled user_home_t, and the page would not show up on the web site.  Users would not know why, and would probably not no about SELinux.  But if the admistrator changed the alias for the mv command, everything would just work.

Other Commands

Similarly the -Z option has been implemented for all of the commands that create content in coreutils.

mknod -Z, mkdir -Z, mkfifo -Z, install -Z

Currently in init scripts we have lots of code that does; \

mkdir /run/myapp; restorecon /run/myapp

Which can be replaced with

mkdir -Z /run/myapp

What about Disabled Machines, or machines that do not support SELinux?

On an SELinux disabled system, the -Z option will be ignored.

Conclusion

Getting the Label correct at file creation has been improved greatly in the current Fedora's with the introduction of file name transitions.  Fixing coreutils to allow administrators to change the default of standard tools to set default labels on object creation is nice.

alias mv='mv -iZ'
alias cp='cp -iZ'
alias mkdir='mkdir -Z"
alias mknod='mknod -Z"
alias install='install -Z"


I hope to get this new coreutils backported into RHEL7...

Security

One thing to remember about this from a security point of view.  A calling confined domain would still be prevented from creating content with the default label, if it was not allowed by SELinux policy to create content with that label.  The change to coreutils, just allows the process to attempt to create the content with the correct label.

Thanks to coreutils upstream for working on these patches with us.


golang support for libselinux in Rawhide.
danwalsh
Every so often I get to spend a couple of days working on a new computer language, but it has been a while.

I am working on a project to bring SELinux support to docker.

The basic idea is to launch containers with a specific SELinux type and Random MCS label.  Using pretty much the same technology as we use with sVirt.  We do this using libvirt and virt-sandbox-service in Fedora now, but we want to implement similar support for docker.

One problem I had when I first starting working on this project was that docker is written in the go programming language. I did not know the go language and there were no libselinux bindings for go.

Luckily go is fairly easy to bind to the C Language using cgo.  After a couple of weeks work, I put together selinux.go which implements all of the functions that I needed to get containers running with SELinux labels.  Going forward it would be nice to hook up all of the libselinux functions. (Patches welcomed).

Package will show up in libselinux-2.2.1-3.fc21

/usr/share/gocode/selinux/selinux.go

Any input for improvements to go code would be welcome.

You are viewing danwalsh