Next week I will be at the Red Hat Summit talking about SELinux, specifically sVirt, Secure Virtualization.
While virtualization seems to be next big thing, providing great opportunities in resource allocation, system management, savings on power and cooling, and the ability to grow and shrink resources depending on demand.
But what about the security?
What happens when a cracker breaks into a virtual machine and takes it over? What happens if there is a bug in the hypervisor?
Before virtualization, we had isolated servers. A cracker taking over one server meant that he controlled just that server. The cracker would then have to launch network attacks against other servers in the environment. System administrators had lots of tools to defend against network attacks on machines: firewalls, network traffic analysis tools, intrusion detection tools, etc.
After virtualization, we have multiple services running on the same host. If a virtual machine is broken into, the cracker just needs to break though the hypervisor. If a hypervisor vulnerability exists, the cracker can take over all of the virtual machines on the host. He can even write into any virtual host images that are accessible from the host machine.
This is very scary stuff. The question is not "if", but "when". Hacker/cracker conventions are already examining hypervisor vulnerabilities. Crackers have already broken though the xen hypervisor, as I documented in one of my previous blogs.
Now let's examine libvirtd/qemu/kvm in Fedora 11.
libvirtd starts all virtual machines. All virtual machines run as separate processes. Virtual images are stored as files or devices like logical volumes and iscsi targets.
What is SELinux really good at?
It is great at labeling processes, files, and devices. It is great at defining rules on how labeled processes interact with labeled processes, files, and devices.
Seems like a nice match. SELinux can be used to mitigate the problems of a vulnerability in the hypervisor.
But, you ask, "Didn't we do this in Red Hat Enterprise Linux 5?" Yes, but we were still vulnerable to the Xen breakout.
If you read the Xen vulnerability document, it explains the mechanism used to thwart SELinux protection in RHEL5. The cracker realized that the xen process, labeled xend_t, was allowed to read/write all fixed disks labeled fixed_disk_device_t. This allowed the cracker to break out of the SELinux confinement by writing to the physical disk. When I was writing policy for Xen in RHEL5, I had initially required the administrator to label volume xen image devices as xen_image_t. The xen developers thought this was too difficult for the administrators to have to manage, and would cause too many failures. We ran out of time to make the management tool do this automatically. It was decided that usability was more important then security in this instance, and I had to allow this access. I won't make that mistake again.
In Fedora 11, James Morris, Daniel Berrange, myself and others have added SELinux support to libvirt, in the form of sVirt. We added a security plug-in architecture to libvirt that defaults to SELinux protection. Theoretically you can use other security architectures. libvirt dynamically labels the image files and starts the virtual machines with the correct labels. This allows us to avoid the problem of the administrator having to remember to set the correct label on the image files and devices. By default all virtual machines in F11 get labeled with the svirt_t type and all image files get the svirt_image_t type.
SELinux policy has rules that allow the svirt_t processes to read/write svirt_image_t files and devices.
This protection allows us to protect the host machine from any of its virtual machines. A virtual machine will only be able to interact with the files and devices with the correct labels. A compromised virtual machine would not be allowed to read my home directory, for example, even if the virtual machine is running as root.
However, this "type" protection does not prevent one virtual machine from attacking another virtual machine. We needed a way to label the domains and the image files with the same TYPES, but at the same time, stop virtual machine 1, running as svirt_t, attacking virtual machine 2, which would also be running as svirt_t.
Multi Category Security (MCS) to the rescue!
When we developed RHEL5 we added Multi Level Security (MLS) support. This involved adding a fourth field to the SELinux context.
Originally in RHEL4 the SELinux context consisted of three fields ("USER:ROLE:TYPE"). In RHEL5 the SELinux context consists of four fields ("USER:ROLE:TYPE:MLS"). For example, files in the home directory could be labeled "system_u:system_r:user_home_t:TopSecret
This field was only used in MLS policy. We attempted to make use of it in our default policy ("targeted"), by only defining a single sensitivity level ("s0") and allowing administrators to define categories. We called this Multi Category Security (MCS). The goal was to allow administrators and users to label their files based on the nature of their contents. For example, system_u:object_r:database_t:PatientReco
When we were developing sVirt, though, we realized that we could use MCS to provide us separation between two virtual machines running with the same SELinux type, svirt_t. We designed libvirt to assign a different randomly-selected MCS label to each virtual machine and its associated virtual image. libvirt guarantees that the MCS fields it selects are unique. SELinux prevents different virtual machines running with different MCS fields from interacting with each other or any of their content.
For example, libvirt creates two virtual machines with these labels:
|Name||Virtual Machine Process label||Virtual Machine Image Label|
|Virtual Machine 1||system_u:system_r:svirt_t:s0:c0,c10||system_u:object_r:svirt_image_t:s0:c0,c1|
|Virtual Machine 2||system_u:system_r:svirt_t:s0:c101,c230||system_u:object_r:svirt_image_t:s0:c101,c|
SELinux prevents virtual machine 1 (system_u:system_r:svirt_t:s0:c0,c10) from accessing virtual machine 2's image file (system_u:object_r:svirt_image_t:s0:c101,c
These are the labels libvirt assigns.
|Virtual Machine Processes||system_u:system_r:svirt_t:MCS1||MCS1 is a randomly selected MCS field. Currently we support ~500,000 labels.|
|Virtual Machine Image||system_u:object_r:svirt_image_t:MCS1||Only svirt_t processes with the same MCS fields are able to read/write these image files and devices.|
|Virtual Machine Shared Read/Write content||system_u:object_r:svirt_image_t:s0||All svirt_t processes are allowed to write to the svirt_image_t:s0 files and devices.|
|Virtual Machine Shared Shared Read Only content||system_u:object_r:svirt_content_t:s0||All svirt_t processes are able to read files/devices with this label.|
|Virtual Machine images||system_u:object_r:virt_content_t:s0||When a virtual machine exits, its image file is relabeled to the system default, which usually is virt_content_t:s0, No svirt_t virtual processes are allowed to read files/devices with this label.|
We also added the ability to do static labeling to sVirt. Static labels allow the administrator to select a particular label, including the MCS/MLS field, for a virtual machine. The virtual machine will always be started with that label. Administrator who run static virtual machines are responsible for setting the correct label on the image files. libvirt will never modify the label of a statically-labelled virtual machine's content. This allows the sVirt component to run in an MLS environment. You can run multiple virtual machines on a libvirt system at different sensitivity levels.