danwalsh


Dan Walsh's Blog

Got SELinux?


Previous Entry Share Next Entry
SELinux Reveals Bugs in Code
danwalsh
I have noticed over the years random confined process that get avc denials for the SYS_RESOURCE and SYS_ADMIN capabilities.  Most of the time, these are not easily repeated.  The combination of these two usually indicate a confined processes is attempting to use a system resources beyond the limits for the owner UID.  For example in RHEL6 a user, dwalsh, is only allowed to run 1025 processes, and an individual process running as dwalsh, is only allowed to open 1024 files.

/usr/include/linux/capability.h documents SYS_RESOURCE as the following

/* Override resource limits. Set resource limits. */
/* Override quota limits. */
/* Override reserved space on ext2 filesystem */
/* Modify data journaling mode on ext3 filesystem (uses journaling
   resources) */
/* NOTE: ext2 honors fsuid when checking for resource overrides, so
   you can override using fsuid too */
/* Override size restrictions on IPC message queues */
/* Allow more than 64hz interrupts from the real-time clock */
/* Override max number of consoles on console allocation */
/* Override max number of keymaps */


The goal of these limits is to prevent an individual user from doing a fork bomb or opening so many files the system gets a Denial of Service.

Even root processes are governed by these limits,  however root processes almost always have SYS_ADMIN and SYS_RESOURCE capabilities, unless they have dropped them using something like libcap/libcap-ng or you are using an Mandatory Access System like SELinux.

Lon Hohberger, who is working on the OpenStack team at Red Hat and currently working on making SELinux work well with OpenStack, discovered some problems in RHEL6, that could and probably are triggering these AVC's.

Bug #1

Prior to RHEL6.4 ANY login to the system, including root,  would get a process limit of 1024.

# ulimit -u
1024


Meaning if you started any processes on a system that already had 1025 processes running, the kernel would be checking SYS_RESOURCE.  If you executed a command like

# service httpd restart

Then httpd would fall under the same limits, since httpd_t is not allowed these Capabilities, httpd_t would generate the AVC's and probably fail to start.

Worse then this, if you executed:

# yum -y update

There is a decent chance that during the update some packages post install would do a

# service foobar restart

If foobar was confined by SELinux, then the AVC could be generated.

Luckily we have had a fix for this in RHEL6.4,  although this fix has not gone into Fedora yet...

The following line as added to /etc/security/limits.d/90-nproc.conf

root soft nproc unlimited

BUG #2

First if you login as root to a RHEL6.4 system, and check the max processes limit, you will get something like:

# ulimit -u
29924


If you login as a normal user you would get:

> ulimit -u
1024


If you run su from your normal user account and check the ulimit you get:

# ulimit -u
29924


But if you run sudo as a normal user and run ulimit you get:
# ulimit -u
1024


su and sudo should work the same way.

This is probably a bug in sudo or in the sudo pam stack.  Have not determined which yet.

BUG #3

This one might be controversial, since these limits are supposed to count the resources used by a particular UID, we should be looking at ALL of the processes kicked off by the user, not only those running under his UID.    Since sudo in the above example was not modifying the maximum running processes when user dwalsh became root, the process started counted against the total number of root processes rather then the total number of processes started by dwalsh.  I believe this is wrong.  The kernel should be counting the number of processes started by user dwalsh and should look at the LoginUID rather then the actual UID.  Now if dwalsh logged onto a system and was able to get some processes running as root, they could continue to count against dwalshs resource constraints and not against the systems.    I realize this might be difficult since you would probably want processes that have the SYS_RESOURCE capability to not count against the total.

systemd to the rescue

systemd has helped fix a lot of these issues.

In RHEL6 if an admin starts/restarts a system daemon, that daemon ends up being a subprocess of the user, meaning inherits any of the constraints on the user processes.  In the latest Fedora's, systemd starts most system daemons.  The user process sends a message to systemd and systemd starts/restarts the service, which means the service gets the system constraints not the user constraints.

As Dan points out, it works both ways. You can accidentally restrict root processes by rlimit inheritance across sudo, but you can also grant things unintentionally. For example, if root restarts apache (via sudo, if you want, but you need to start as root or 'su -' to root):

First, apache's UID:

[root@ayanami ~]# id apache
uid=48(apache) gid=48(apache) groups=48(apache)

So, here's our instance of apache started by 'service apache start':
crash> set 10076
    PID: 10076
COMMAND: "httpd"
   TASK: ffff88017b423540  [THREAD_INFO: ffff88012b75a000]
    CPU: 0
  STATE: TASK_INTERRUPTIBLE 
crash> task | grep real_cred
  real_cred = 0xffff880179f6ebc0, 
crash> p ((struct cred *)0xffff880179f6ebc0)->uid
$3 = 48
crash> task | grep signal
  exit_signal = 17, 
  pdeath_signal = 0, 
  signal = 0xffff880155efc1c0, 
    signal = {
crash> p ((struct signal_struct *)0xffff880155efc1c0)->rlim[6]
$4 = {
  rlim_cur = 38567, 
  rlim_max = 38567
}

In the above, httpd run as the apache UID has root's process limit. Now, if I exit and run as 'lon', 'sudo service httpd restart', we get the following:
crash> set 11025
    PID: 11025
COMMAND: "httpd"
   TASK: ffff88017c182aa0  [THREAD_INFO: ffff880179756000]
    CPU: 0
  STATE: TASK_INTERRUPTIBLE 
crash> task | grep real_cred
  real_cred = 0xffff88017a0dbb40, 
crash> p ((struct cred *)0xffff88017a0dbb40)->uid
$3 = 48
crash> task | grep signal
  exit_signal = 17, 
  pdeath_signal = 0, 
  signal = 0xffff880155f60900, 
    signal = {
crash> p ((struct signal_struct *)0xffff880155f60900)->rlim[6]
$4 = {
  rlim_cur = 1024, 
  rlim_max = 38567
}


Edited at 2013-03-14 08:26 pm (UTC)

I'd argue that Bug #2 isn't a bug so much as design. ulimits are applied when you login to the system and by default while sudo lets you act like root, it doesn't log you in as root (or any other user for that matter). In order to get ulimits set properly you need to use "sudo -i". This sort of issue comes up regularly when you start dealing with products like cassandra or riak that like to open a lot of handles and the work around is to start them using "sudo -i -u cassandra service cassandra restart "for example.

Well I guess we can agree to disaggree. You might be techically right but from a usability point of view, This is poor.

If a user user can blow up an application from doing
sudo service APP restart

Not because of a process he started but because of other root processes running, then this could be seen as unexpected and confusing.

You are viewing danwalsh