Boolean: virt_use_execmem What? Why? Why not Default?
danwalsh
In a recent bugzilla, the reporter was asking about what the virt_use_execmem.

  • What is it?

  • What did it allow?

  • Why was it not on by default?


What is it?

Well lets first look at the AVC

type=AVC msg=audit(1448268142.167:696): avc:  denied  { execmem } for  pid=5673 comm="qemu-system-x86" scontext=system_u:system_r:svirt_t:s0:c679,c730 tcontext=system_u:system_r:svirt_t:s0:c679,c730 tclass=process permissive=0

If you run this under audit2allow it gives you the following message:


#============= svirt_t ==============

#!!!! This avc can be allowed using the boolean 'virt_use_execmem'
allow svirt_t self:process execmem;


Setroubleshoot also tells you to turn on the virt_use_execmem boolean.

# setsebool -P virt_use_execmem 1

What does the virt_use_execmem boolean do?

# semanage boolean -l | grep virt_use_execmem
virt_use_execmem               (off  ,  off)  Allow confined virtual guests to use executable memory and executable stack


Ok what does that mean?  Uli Drepper back in 2006 added a series of memory checks to the SELInux kernel to handle common
attack vectors on programs using executable memory.    Basically these memory checks would allow us to stop a hacker from taking
over confined applications using buffer overflow attacks.

If qemu needs this access, why is this not enabled by default?

Using standard kvm vm's does not require qemu to have execmem privilege.  execmem blocks certain attack vectors 
Buffer Overflow attack where the hacked process is able overwrite memory and then execute the code the hacked 
program wrote. 

When using different qemu emulators that do not use kvm, the emulators require execmem to work.  If you look at 
the AVC above, I highlighted that the user was running qemu-system-x86.  I order for this emulator to work it
needs execmem so we have to loosen the policy slightly to allow the access.  Turning on the virt_use_execmem boolean
could allow a qemu process that is susceptible to buffer overflow attack to be hacked. SELinux would not block this
attack.

Note: lots of other SELinux blocks would still be in effect.

Since most people use kvm for VM's we disable it by default.



I a perfect world, libvirt would be changed to launch different emulators with different SELinux types, based on whether or not the emulator
requires execmem.   For example svirt_tcg_t is defined which allows this access.

Then you could run svirt_t kvm/qemus and svirt_tcg_t/qemu-system-x86 VMs on the same machine at the same time without having to lower
the security.  I am not sure if this is a common situation, and no one has done the work to make this happen.

How come MCS Confinement is not working in SELinux even in enforcing mode?
danwalsh
MCS separation is a key feature in sVirt technology.

We currently use it for separation of our Virtual machines using libvirt to launch vms with different MCS labels.  SELinux sandbox relies on it to separate out its sandboxes.  OpenShift relies on this technology for separating users, and now docker uses it to separate containers.  

When I discover a hammer, everything looks like a nail.

I recently saw this email.

"I have trouble understanding how MCS labels work, they are not being enforced on my RHEL7 system even though selinux is "enforcing" and the policy used is "targeted". I don't think I should be able to access those files:

$ ls -lZ /tmp/accounts-users /tmp/accounts-admin
-rw-rw-r--. backup backup guest_u:object_r:user_tmp_t:s0:c3
/tmp/accounts-admin
-rw-rw-r--. backup backup guest_u:object_r:user_tmp_t:s0:c99
/tmp/accounts-users
backup@test ~ $ id
uid=1000(backup) gid=1000(backup) groups=1000(backup)
context=guest_u:guest_r:guest_t:s0:c1

root@test ~ # getenforce
Enforcing

I can still access them even though they have different labels (c3 and
c99 as opposed to my user having c1).
backup@test ~ $ cat /tmp/accounts-users
domenico balance: -30
backup@test ~ $ cat /tmp/accounts-admin
don't lend money to domenico

Am I missing something?
"

MCS Is different then type enforcement.

We decided not to apply MCS Separation to every type.    We only apply it to the types that we plan on running in a Multi-Tennant way.  Basically it is for objects that we want to share the same access to the system, but not to each other.  We introduced an attribute called mcs_constrained_type.

On my Fedora Rawhide box I can look for these types:

seinfo -amcs_constrained_type -x
   mcs_constrained_type
      netlabel_peer_t
      docker_apache_t
      openshift_t
      openshift_app_t
      sandbox_min_t
      sandbox_x_t
      sandbox_web_t
      sandbox_net_t
      svirt_t
      svirt_tcg_t
      svirt_lxc_net_t
      svirt_qemu_net_t
      svirt_kvm_net_t

If you add the mcs_constrained_type attribute to a type the kernel will start enforcing MCS separation on the type.

Adding a policy like this will MCS confine guest_t

# cat myguest.te 
policy_module(mymcs, 1.0)
gen_require(`
    type guest_t;
    attribute mcs_constrained_type;
')

typeattribute guest_t mcs_constrained_type;

# make -f /usr/share/selinux/devel/Makefile
# semodule -i myguest.pp

Now I want to test this out.  First i have to allow the guest_u user to use multiple MCS labels.  You would not
have to do this with non user types. 

# semanage user -m -r s0-s0:c0.c1023 guest_u

Create content to read and change it MCS label

# echo Read It > /tmp/test
# chcon -l s0:c1,c2 /tmp/test
# ls -Z /tmp/test
unconfined_u:object_r:user_tmp_t:s0:c1,c2 /tmp/test

Now login as a guest user

# id -Z
guest_u:guest_r:guest_t:s0:c1,c2
# cat /tmp/test
Read It

Now login as a guest user with a different MCS type

# id -Z
guest_u:guest_r:guest_t:s0:c3,c4
# cat /tmp/test
cat: /tmp/test: Permission denied

libselinux is a liar!!!
danwalsh
On an SELinux enabled machine, why does getenforce in a docker container say it is disabled?

SELinux is not namespaced

This means that there is only one SELinux rules base for all containers on a system.  When we attempt to confine containers we want to prevent them from writing to kernel file systems, which might be one mechanism for escape.  One of those file systems would be /proc/fs/selinux, and we also want to control there access to things like /proc/self/attr/* field.

By default Docker processes run as svirt_lxc_net_t and they are prevented from doing (almost) all SELinux operations.  But processes within containers do not know that they are running within a container.  SELinux aware applications are going to attempt to do SELinux operations, especially if they are running as root.

For example,  if you are running yum/dnf/rpm inside of a docker build container and the tools sees that SELinux is enabled, the tool is going to attempt to set labels on the file system, if SELinux blocks the setting of these file labels these calls will fail causing the tool will fail and exit.  Because of it SELinux aware applications within containers would mostly fail.

Libselinux is a liar

We obviously do not want  these apps failing,  so we decided to make libselinux lie to the processes.  libselinux checks if /proc/fs/selinux is mounted onto the system and whether it is mounted read/write.  If /proc/fs/selinux not mounted read/write, libselinux will report to calling applications that SELinux is disabled.  In containers we don't mount these file systems by default or we mount it read/only causing libselinux to report that it is disabled.

# getenforce
Enforcing
# docker run --rm fedora id -Z
id: --context (-Z) works only on an SELinux-enabled kernel

# docker run --rm -v /sys/fs/selinux:/sys/fs/selinux:ro fedora id -Z
id: --context (-Z) works only on an SELinux-enabled kernel
# docker run --rm -v /sys/fs/selinux:/sys/fs/selinux fedora id -Z
system_u:system_r:svirt_lxc_net_t:s0:c196,c525


When SELinux aware applications like yum/dnf/rpm see SELinux is disabled, they stop trying to do SELinux operations, and succeed within containers.

Applications work well even though SELinux is very much enforcing, and controlling their activity.

I believe that SELInux is the best tool we currently use to make Contaieners actually contain.

In this case SELinux disabled does not make me cry. 

nsenter gains SELinux support
danwalsh
nsenter is a program that allows you to run program with namespaces of other processes

This tool is often used to enter containers like docker, systemd-nspawn or rocket.   It can be used for debugging or for scripting
tools to work inside of containers.  One problem that it had was the process that would be entering the container could potentially
be attacked by processes within the container.   From an SELinux point of view, you might be injecting an unconfined_t process
into a container that is running as svirt_lxc_net_t.  We wanted a way to change the process context when it entered the container
to match the pid of the process who's namespaces you are entering.

As of util-linux-2.27, nsenter now has this support.

man nsenter
...
       -Z, --follow-context
              Set the SELinux  security  context  used  for  executing  a  new process according to already running process specified by --tar‐get PID. (The util-linux has to be compiled with SELinux support otherwise the option is unavailable.)



docker exec

Already did this but this gives debuggers, testers, scriptors a new tool to use with namespaces and containers.

'CVE-2015-4495 and SELinux', Or why doesn't SELinux confine Firefox?
danwalsh
Why don't we confine Firefox with SELinux?

That is one of the most often asked questions, especially after a new CVE like CVE-2015-4495, shows up.  This vulnerability in firefox allows a remote session to grab any files in your home directory.  If you can read the file then firefox can read it and send it back to the website that infected your browser.

The big problem with confining desktop applications is the way the desktop has been designed.

I wrote about confining the desktop several years ago. 

As I explained then the problem is applications are allowed to communicate with each other in lots of different ways. Here are just a few.

*   X Windows.  All apps need full access to the X Server. I tried several years ago to block applications access to the keyboard settings, in order to block keystroke logging, (google xspy).  I was able to get it to work but a lot of applications started to break.  Other access that you would want to block in X would be screen capture, access to the cut/paste buffer. But blocking
these would cause too much breakage on the system.  XAce was an attempt to add MAC controls to X and is used in MLS environments but I believe it causes to much breakage.
*   File system access.  Users expect firefox to be able to upload and download files anywhere they want on the desktop.  If I was czar of the OS, I could state that upload files must go into ~/Upload and Download files go into ~/Download, but then users would want to upload photos from ~/Photos.  Or to create their own random directories.  Blocking access to any particular directory including .ssh would be difficult, since someone probably has a web based ssh session or some other tool that can use ssh public key to authenticate.  (This is the biggest weakness in described in CVE-2015-4495
*   Dbus communications as well as gnome shell, shared memory, Kernel Keyring, Access to the camera, and microphone ...

Every one expects all of these to just work, so blocking these with MAC tools and SELinux is most likely to lead to "setenforce 0" then actually adding a lot of security.

Helper Applications.

One of the biggest problems with confining a browser, is helper applications.  Lets imagine I ran firefox with SELinux type firefox_t.  The user clicks on a .odf file or a .doc file, the browser downloads the file and launches LibreOffice so the user
can view the file.  Should LibreOffice run as LibreOffice_t or firefox_t?  If it runs as LibreOffice_t then if the LibreOffice_t app was looking at a different document, the content might be able to subvert the process.  If I run the LibreOffice as firefox_t, what happens when the user launched a document off of his desktop, it will not launch a new LibreOffice it will just communicate with the running LibreOffice and launch the document, making it accessible to firefox_t.

Confining Plugins.

For several years now we have been confining plugins with SELinux in Firefox and Chrome.  This prevents tools like flashplugin
from having much access to the desktop.  But we have had to add booleans to turn off the confinement, since certain plugins, end up wanting more access.

mozilla_plugin_bind_unreserved_ports --> off
mozilla_plugin_can_network_connect --> off
mozilla_plugin_use_bluejeans --> off
mozilla_plugin_use_gps --> off
mozilla_plugin_use_spice --> off
unconfined_mozilla_plugin_transition --> on


SELinux Sandbox

I did introduce the SELinux Sandbox a few years ago.

The SELinux sandbox would allow you to confine desktop applications using container technologies including SELinux.  You could run firefox, LibreOffice, evince ... in their own isolated desktops.  It is quite popular, but users must choose to use it.  It does not work by default, and it can cause unexpected breakage, for example you are not allowed to cut and paste from one window to another.

Hope on the way.

Alex Larsson is working on a new project to change the way desktop applications run, called Sandboxed Applications.

Alex explains that their are two main goals of his project.

* We want to make it possible for 3rd parties to create and distribute applications that works on multiple distributions.
* We want to run the applications with as little access as possible to the host. (For example user files or network access)

The second goal might allow us to really lock down firefox and friends in a way similar to what Android is able to do on your cell phone (SELinux/SEAndroid blocks lots of access on the web browser.)

Imagine that when a user says he wants upload a file he talks to the desktop rather then directly to firefox, and the desktop
hands the file to firefox.  Firefox could then be prevented from touching anything in the homedir.  Also if a user wanted to
save a file, firefox would ask the desktop to launch the file browser, which would run in the desktop context.   When the user
selected where to save the file, the browser would give a descriptor to firefox to write the file.

Similar controls could isolate firefox from the camera microphone etc.

Wayland which will eventually replace X Windows, also provides for better isolation of applications.

Needless to say, I am anxiously waiting to see what Alex and friends come up with.

The combination of Container Techonolgy including Namespaces and SELinux gives us a chance at controling the desktop

To exec or transition that is the question...
danwalsh
I recently recieved a question on writing policy via linkedin.

Hi, Dan -

I am working on SELinux right now and I know you are an expert on it.. I believe you can give me a help. Now in my policy, I did in myadm policy like
require { ...; type ping_exec_t; ...;class dir {...}; class file {...}; }

allow myadm_t ping_exec_t:file { execute execute_no_trans };

Seems the ping is not work, I got error
ping: icmp open socket: Permission denied

Any ideas?


My response:

When running another program there are two things that can happen:
1. You can either execute the program in the current context (Which is what  you did)
This means that myadm_t needs to have all of the permissions of ping.

2. You can transition to the executables domain  (ping_t)

We usually use interfaces for this.

netutils_domtrans_ping(myadm_t)

netutils_exec_ping(myadm_t)


I think if you looked at your AVC's you would probbaly see something about myadm_t needing the net_raw capability.

sesearch -A -s ping_t -c capability
Found 1 semantic av rules:
   allow ping_t ping_t : capability { setuid net_raw } ;


net_raw access allows ping_t to create and send icmp packets.  You could add that to myadm_t, but that would allow it
to listen at a low level to network traffic, which might not be something you want.  Transitioning is probably better.

BUT...

Transitioning could cause other problems, like leaked file descriptors or bash redirection.  For example if you do a
ping > /tmp/mydata, then you might have to add rules to ping_t to be allowed to write to the label of /tmp/mydata.

It is your choice about which way to go.

I usually transition if their is a lot of access needed, but if their is only a limited access, that I deem not too risky, I
exec and add the additional access to the current domain.

I get a SYS_PTRACE AVC when my utility runs ps, how come?
danwalsh
We often get random SYS_PTRACE AVCs, usually when an application is running the ps command or reading content in /proc.

https://bugzilla.redhat.com/show_bug.cgi?id=1202043

type=AVC msg=audit(1426354432.990:29008): avc:  denied  { sys_ptrace } for  pid=14391 comm="ps" capability=19  scontext=unconfined_u:unconfined_r:mozilla_plugin_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:mozilla_plugin_t:s0-s0:c0.c1023 tclass=capability permissive=0

sys_ptrace usually indicates that one process is trying to look at the memory of another process with a different UID.

man 

man capabilites
...

       CAP_SYS_PTRACE
              *  Trace arbitrary processes using ptrace(2);
              *  apply get_robust_list(2) to arbitrary processes;
              *  transfer data to or from the memory  of  arbitrary  processes
                 using process_vm_readv(2) and process_vm_writev(2).
              *  inspect processes using kcmp(2).


These types of access should probably be dontaudited. 

Running the ps command was a privileged process can cause sys_ptrace to happen.  

There is special data under /proc that a privileged process would access by running the ps command,  

This data is almost never actually needed by the process running ps, the data is used by debugging tools 
to see where some of the randomized memory of a process is setup.  

Easiest thing for policy writers to do is to dontaudit the access.



How do files get mislabled?
danwalsh
Sometimes we close bugs as CLOSED_NOT_A_BUG, because of a file being mislabeled, we then tell the user to just run restorecon on the object.

But this leaves the user with the question,

How did the file get mislabeled?

They did not run the machin in permissive mode or disable SELinux, but still stuff became mislabeled?  How come?

The most often case of this in my experience is the mv command, when users mv files around their system the mv command maintains the security contenxt of the src object.

sudo mv ~/mydir/index.html /var/www/html

This ends up with a file labeled user_home_t in the /var/www/html, rather then http_sys_content_t, and apache process is not allowed to read it.  If you use mv -Z on newer SELinux systems, it will change the context to the default for the target directory.

Another common cause is debugging a service or running a service by hand.

This bug report is a potential example.

https://bugzilla.redhat.com/show_bug.cgi?id=1175934

Sometimes we see content under /run (/var/run) which is labeled var_run_t, it should have been labeled something specific to the domain that created it , like apmd_var_run_t.
The most likely cause of this, is that the object was created by an unconfined domain like unconfined_t.  Basically an unconfined domain creates the object based on the parent directory, which would label it as var_run_t.

I would guess that the user/admin ran the daemon directly rather then through the init script.

# /usr/bin/acpid
or
#gdb /usr/bin/acpid


When acpid created the /run/acpid.socket then the object would be mislableed.  Later when the user runs the service through the init system it would get run with the correct type (apmd_t) and would be denied from deleting the file.

type=AVC msg=audit(1418942223.880:4617): avc:  denied  { unlink } for  pid=24444 comm="acpid" name="acpid.socket" dev="tmpfs" ino=2550865 scontext=system_u:system_r:apmd_t:s0 tcontext=system_u:object_r:var_run_t:s0 tclass=sock_file permissive=0


Sadly their is not much we can do to prevent this type of mislabeled file from being created, and end up having to tell the user to run restorecon.

Is SELinux good anti-venom?
danwalsh
SELinux to the Rescue 

If you have been following the news lately you might have heard of the "Venom" vulnerabilty.

Researchers found a bug in Qemu process, which is used to run virtual machines on top of KVM based linux
machines.  Red Hat, Centos and Fedora systems were potentially vulnerable.  Updated packages have been released for all platforms to fix the problem.

But we use SELinux to prevent virtual machines from attacking other virtual machines or the host.  SELinux protection on VM's is often called sVirt.  We run all virtual machines with the svirt_t type.  We also use MCS Separation to isolate one VM from other VMs and thier images on the system.

While to the best of my knowlege no one has developed an actual hack to break out of the virtualization layer, I do wonder whether or not the break out would even be allowed by SELinux. SELinux has protections against executable memory, which is usually used for buffer overflow attacks.  These are the execmem, execheap and execstack access controls.  There is a decent chance that these would have blocked the attack. 

# sesearch -A -s svirt_t -t svirt_t -c process -C
Found 2 semantic av rules:
   allow svirt_t svirt_t : process { fork sigchld sigkill sigstop signull signal getsched setsched getsession getcap getattr setrlimit } ; 
DT allow svirt_t svirt_t : process { execmem execstack } ; [ virt_use_execmem ]

Examining the policy on my Fedora 22 machine, we can look at the types that a svirt_t process would be allowed to write. These are the types that SELinux would allow the process to write, if they had matching MCS labels, or s0.

# sesearch -A -s svirt_t -c file -p write -C | grep open 
   allow virt_domain qemu_var_run_t : file { ioctl read write create getattr setattr lock append unlink link rename open } ; 
   allow virt_domain svirt_home_t : file { ioctl read write create getattr setattr lock append unlink link rename open } ; 
   allow virt_domain svirt_tmp_t : file { ioctl read write create getattr setattr lock append unlink link rename open } ; 
   allow virt_domain svirt_image_t : file { ioctl read write create getattr setattr lock append unlink link rename open } ; 
   allow virt_domain svirt_tmpfs_t : file { ioctl read write create getattr setattr lock append unlink link rename open } ; 
   allow virt_domain virt_cache_t : file { ioctl read write create getattr setattr lock append unlink link rename open } ; 
DT allow virt_domain fusefs_t : file { ioctl read write create getattr setattr lock append unlink link rename open } ; [ virt_use_fusefs ]
DT allow virt_domain cifs_t : file { ioctl read write create getattr setattr lock append unlink link rename open } ; [ virt_use_samba ]
ET allow virt_domain dosfs_t : file { ioctl read write create getattr setattr lock append unlink link rename open } ; [ virt_use_usb ]
DT allow virt_domain nfs_t : file { ioctl read write create getattr setattr lock append unlink link rename open } ; [ virt_use_nfs ]
ET allow virt_domain usbfs_t : file { ioctl read write getattr lock append open } ; [ virt_use_usb ]

Lines beginning with the D are disabled, and only enabled by toggling the boolean.  I did a video showing the access avialable to an OpenShift process running as root on your system using the same
technology.  Click here to view.

SELinux also blocks capabities, so the qemu process even if running as root would only have the net_bind_service capabilty, which allows it to bind to ports < 1024.

# sesearch -A -s svirt_t -c capability -C
Found 1 semantic av rules:
   allow svirt_t svirt_t : capability net_bind_service ; 

Dan Berrange, creator of libvirt, sums it up nicely on the Fedora Devel list:

"While you might be able to crash the QEMU process associated with your own guest, you should not be able to escalate from there to take over the host, nor be able to compromise other guests on the same host. The attacker would need to find a second independent security flaw to let them escape SELinux in some manner, or some way to trick libvirt via its QEMU monitor connection. Nothing is guaranteed 100% foolproof, but in absence of other known bugs, sVirt provides good anti-venom for this flaw IMHO."

Did you setenforce 1?


A follow up to the Bash Exploit and SELinux.
danwalsh
One of the advantages of a remote exploit is to be able to setup and launch attacks on other machines.

I wondered if it would be possible to setup a bot net attack using the remote attach on an apache server with the bash exploit.

Looking at my rawhide machine's policy

sesearch -A -s httpd_sys_script_t -p name_connect -C | grep -v ^D
Found 24 semantic av rules:
   allow nsswitch_domain dns_port_t : tcp_socket { recv_msg send_msg name_connect } ;
   allow nsswitch_domain dnssec_port_t : tcp_socket name_connect ;
ET allow nsswitch_domain ldap_port_t : tcp_socket { recv_msg send_msg name_connect } ; [ authlogin_nsswitch_use_ldap ]


The apache script would only be allowed to connect/attack a dns server and an LDAP server.  It would not be allowed to become a spam bot (No connection to mail ports) or even attack other web service.

Could an attacker leave a back door to be later connected to even after the bash exploit is fixed?

# sesearch -A -s httpd_sys_script_t -p name_bind -C | grep -v ^D
#

Nope!  On my box the httpd_sys_script_t process is not allowed to listen on any network ports.

I guess the crackers will just have to find a machine with SELinux disabled.

?

Log in