Wednesday, May 28, 2008

Work Load balancing with IRQ SMP Affinity

Everytime a piece of hardware needs attention from the CPU, it throws an interrupt. In order to prevent mutliple devices from sending the same interrupts, the IRQ system was established where each device in a computer system is assigned its own special IRQ so that its interrupts are unique.
Linux has the ability to assign certain IRQs to specific processors or groups of processors. This is known as SMP IRQ affinity, and it allows you control how your system will respond to various hardware events.

It allows you to restrict or repartition the work load that you server must do so that it can more efficient. Obviously you will need a system that has more than one processor (SMP) and you will also need to be running a 2.4 or higher kernel.
In /proc/irq/ are directories that correspond to the IRQs present on your system. In each of these directories /proc/irq/IRQ#/smp_affinity specifies which target CPUs are permitted for a given IRQ source, it's a bitmask of allowed CPUs.
It's not allowed to turn off all CPUs, and if an IRQ controller does not support IRQ affinity then the value will not change from the default 0xffffffff. The information on what IRQ a device is using is available in the /proc/interrupts file.

[root@boss ~]# cat /proc/interrupts
           CPU0       CPU1
  0:   30932404   28295981   IO-APIC-edge  timer
  1:          6          5   IO-APIC-edge  i8042
  2:          0          0         XT-PIC  cascade
  3:     117347     109140   IO-APIC-edge  serial
  8:          1          0   IO-APIC-edge  rtc
 12:         81         43   IO-APIC-edge  i8042
 14:     135539     138135   IO-APIC-edge  ide0
 15:     274707     249107   IO-APIC-edge  ide1
145:     238566     238566   IO-APIC-level  sii3114
153:    1888724    1725224   IO-APIC-level  mga@pci:0000:01:05.0
161:    1333787    1333787   IO-APIC-level  eth0
NMI:          0          0
LOC:   59233954   59233953
ERR:          6
MIS:          0
[root@boss ~]#

This is a 2 processor machine, the first column lists the IRQs used on the system. The columns labelled CPU0-CPU1 show the number of times the corresponding process has handled an interrupt from that particular IRQ.
For example, all of the CPUs have handled the same number of interrupts for IRQ 145. The fourth column lists whether or not the device driver associated with the interrupt supports IO-APIC.
SMP affinity will only work for IO-APIC enabled device drivers, for example, we will not be able to change the affinity for the "cascade" driver (IRQ 2) because it doesn't support IO-APIC. The last column lists the driver or device that is associated with the interrupt. In the above example, the ethernet card (eth0) is using IRQ 161, and the RAID controller (sii3114) is using IRQ 145. We want to adjust the SMP affinity for the RAID controller (IRQ 145).
Now that we've got the IRQ, we can change the processor affinity. To do this, we'll go into the /proc/irq/145/ directory, and see what the affinity is currently set to:

[root@boss ~]# cat /proc/irq/145/smp_affinity 
 ffffffff
[root@boss ~]#

This is a bitmask that represents which processors any interrupts on IRQ 145 should be routed to. Each field in the bit mask corresponds to a processor. The number held in the "smp_affinity" file is presented in hexadecimal format, so in order to manipulate it properly we will need to convert our bit patterns from binary to hex before setting them in the proc file.
Let's assume that we want to dedicate our first CPU (CPU0) to handling the RAID controller interrupts. To do this, we would simply run the following command:

[root@boss ~]# echo 01 > /proc/irq/145/smp_affinity 
[root@boss ~]# cat /proc/irq/145/smp_affinity 
 00000001
[root@boss ~]#

and the second CPU (CPU1) to handling the eth0 controller interrupts:

[root@boss ~]# echo 02 > /proc/irq/161/smp_affinity 
[root@boss ~]# cat /proc/irq/161/smp_affinity 
 00000002
[root@boss ~]#

Now, let's test it out and see what happens:

[root@boss ~]# cat /proc/interrupts
           CPU0       CPU1
  0:   31568091   28887972   IO-APIC-edge  timer
  1:          6          5   IO-APIC-edge  i8042
  2:          0          0         XT-PIC  cascade
  3:     119765     111417   IO-APIC-edge  serial
  8:          1          0   IO-APIC-edge  rtc
 12:         81         43   IO-APIC-edge  i8042
 14:     138394     140972   IO-APIC-edge  ide0
 15:     280409     254277   IO-APIC-edge  ide1
145:     477740        169   IO-APIC-level  sii3114
153:    1927454    1761466   IO-APIC-level  mga@pci:0000:01:05.0
161:         19    2689528   IO-APIC-level  eth0
NMI:          0          0
LOC:   60461754   60461753
ERR:          6
MIS:          0
[root@boss ~]#

All of the interrupts from the RAID controller are now handled exclusively by the first CPU (CPU0) and the eth0 by the second CPU (CPU1). By setting the parameters in the file /etc/rc.local the few interrupts still processed by both CPUs concern the booting of the system.


No comments:

Post a Comment