June 20th, 2017

What capabilities do I really need in my container?

I have written previous blogs discussing using linux capabilities in containers.

Recently I gave a talk in New York and someone in the audience asked me about how do they figure out what capabilities their containers require?

This person was dealing with a company that was shipping their software as a container image, but they had instructed the buyer, that you would have to run their container ‘fully privileged”.  He wanted to know what privileges the container actually needed.  I told him about a project we worked on a few years ago, we called Friendly Eperm.

Permission Denied!  WHY?

A few years ago the SELinux team realized that more and more applications were getting EPERM returns when a syscall requested some access.  Most operators understood EPERM (Permission Denied) inside of a log file to mean something was wrong with the Ownership of a process of the contents it was trying to access or the permission flags on the object were wrong.  This type of Access Control is called DAC (Discretionary Access Control) and under certain conditions SELinux also caused the kernel to return EPERM.  This caused Operators to get confused and is one of the reasons that Operators did not like SELinux. They would ask, why didn’t httpd report that Permission denied because of SELinux?  We realized that there was a growing list of other tools besides regular DAC and SELinux which could cause EPERM.  Things like SECCOMP, Dropped Capabilities, other LSM …   The problem was that the processes getting the EPERM had no way to know why they got EPERM.  The only one that knew was the kernel and in a lot of cases the kernel was not even logging the fact that it denied access.  At least SELinux denials usually show up in the audit log (AVCs).   The goal of Friendly EPERM was to allow the processes to figure out why they got EPERM and make it easier for admin to diagnose.

Here is the request that talks about the proposal.


The basic idea was to have something in the /proc file system which would identify why the previous EPERM happened.  You are running a process, say httpd, and it gets permission denied. Now somehow the process can get information on why it got permission denied.  One suggestion was that we enhanced the libc/kernel to provide this information. The logical place for the kernel to reveal it would be in /proc/self.  But the act of httpd attempting to read the information out of /proc/self itself could give you a permission denied.  Basically we did not succeed because it would be a race condition, and the information could be wrong.

Here is a link to the discussion https://groups.google.com/forum/#!msg/fa.linux.kernel/WQyHPUdvodE/ZGTnxBQw4ioJ

Bottom line, no one has figured a way to get this information out of the kernel.


Later I received an email discussing the Friendly EPERM product and asking if there was a way to at least figure out what capabilities the application needed.

I wondered if the audit subsystem would give us anything here.  But I contacted the Audit guys at Red Hat, Steve Grubb and Paul Moore,  and they informed me that there is no Audit messages generated when DAC Capabilities are blocked.

An interesting discussion occurred in the email chain:

DWALSH: Well I would argue most developers have no idea what capabilities their application requires.

SGRUBB: I don't think people are that naive. If you are writing a program that runs as root and then you get the idea to run as a normal user, you will immediately see your program crash. You would immediately look at where it’s having problems. Its pretty normal to lookup the errno on the syscall man page to see what it says about it. They almost always list necessary capabilities for that syscall. If you are an admin restricting software you didn't write, then it’s kind of a  puzzle. But the reason there's no infrastructure is because historically it’s never been a problem because the software developer had to choose to use capabilities and it’s incumbent on the developer to know what they are doing.  With new management tools offering to do this for you, I guess it’s new territory.

But here we had a vendor telling a customer that it needed full root, ALL Capabilities,  to run his application,

DWALSH:  This is exactly what containers are doing.  Which is why the emailer is asking.  A vendor comes to him telling him it needs all Capabilities.  The emailer does not believe them and wants to diagnose what they actually need.

DWALSH: With containers and SELinux their is a great big "TURN OFF SECURITY" button, which is too easy for software packagers to do, and then they don't have to figure out exactly what their app needs.

Paul Moore - Red Hat SELinux Kernel Engineer suggested

That while audit can not record the DAC Failures, SELinux also enforces the capability checks.  If we could put the processes into a SELinux type that had no capabilities by default, then ran the process with full capabilities and SELinux in permissive mode, we could gather the SELinux AVC messages indicating which capabilities the application required to run.

“ (Ab)using security to learn through denial messages. What could possibly go wrong?! :)

After investigating further, turns out the basic type used to run containers, `container_t`, can be setup to have no capabilities by turning off an SELinux boolean.

To turn off the capabilities via a boolean, and put the machine into permissive mode.

setsebool virt_sandbox_use_all_caps=0

setenforce 0

Now execute the application via docker with all capabilities allowed.

docker run --cap-add all IMAGE ...

Run and test the application. This should cause SELinux to generate AVC messages about capabilities used.

grep capability /var/log/audit/audit.log

type=AVC msg=audit(1495655327.756:44343): avc:  denied  { syslog } for  pid=5246 comm="rsyslogd" capability=34  scontext=system_u:system_r:container_t:s0:c795,c887 tcontext=system_u:system_r:container_t:s0:c795,c887 tclass=capability2   


Now you know your list.

Turns out the application the emailer was trying to containerize was a tool which was allowed to manipulate the syslog system, and the only capability it needed was CAP_SYSLOG.  The emailer should be able to run the container by simply adding the CAP_SYSLOG capability and everything else about the container should be locked down.

docker run --cap-add syslog IMAGE ...


After writing this blog, I was pointed to

Find what capabilities an application requires to successful run in a container

Which is similar in that it finds out the capabilities needed for a container/process by using SystemTap.