This is the first of my “Linux Performance” series. In this part, I am going to be focusing on CPU and Linux process scheduler as performance bottlenecks, how to detect it etc. I will later write a series focusing on tuning resources etc.
While discovering the entire system, uptime can come in-handy. It gives a glimps of whether a system has been busy in the last 15 minutes and if it’s going better or worse. It’s an average of total number of runnable (running + waiting to run) and uninterruptable (i.e waiting for IO etc) processes divided by the number of cpu (this includes CPU hardware threads)
In this case, I see that system was sitting almost idle, but in the last minute some processes ran. But keep in mind that these processes could have been CPU bound or IO bound or both. uptime does not give any clue to that.
If I want to know what’s using cpu, I run pidstat to see what’s going on, -I option devides the %CPU count by the total number of CPU threads. A CPU saturation can be detected if the %CPU goes >= 100%
As it shows, in this example sysbench was running and at times consumed >100% cpu. That’s a saturation point because I have only two CPU threads.
Other important fields are %usr and %system. High %usr can be caused by application doing calculation, producing hashes etc. High %system can be caused by application calling too many syscalls or syscalls taking high cpu time. A %usr to %system ratio of 20/80 could mean that the process is IO bound, and could turn out to be a performance issue depending upon your environment. But more on that later.
Quick Tip: If you want to know what a particular application is doing at the moment, you can take a look at /proc/XXX/wchan, /proc/XXX/status, /proc/XXX/syscall files, where XXX stands for the PID of the process. For a process which is completely running in userspace, you would not see any syscall entry, wchan will be 0, and status file would say state as running. To get more insight of such a process, you may need to attach GDB to it (if it’s an ELF binary process) or may invoke other debuggers (if the process is of interpretive language like Python, Ruby, JAVA etc.)
For an SMP (Symmetric Multiprocessing) system, it is worth verifying if the application is able to take advantage of multiple CPU cores and hardware threads. Suppose if the pidstat output above showed me 50% CPU time and ~0% system time, it would mean that the process is using only 1 of 2 cpu cores available (also watch out for CPU column, which shows the processor number, as reported by /proc/cpuinfo, the process was running on during the smapling time). If the process is reported to be “slower than expected”, I would investigate more if the process is able to use more than one core at all. This is where mpstat comes in handy.
In this example, we can see that utilization of one of the cores is always almost 100%. The other core is almost idle. This could indicate that the appkication is probably not multithreaded or unable to use multiple cores. Before diving into application troubleshooting, I would check for any hardware issue. Depending upon the hardware vendor (or if it’s a virtual machine you may need to check VM settings in KVM or ESXi), I would take a look at the faults reported by service processor either via ipmitool or via the service processor console. I have seen power and temperature issues causing CPU to disable certain cores refer Intel SpeedStep technology
Processor Affinity and taskset(1):
If it’s found that the process only running on a single core, I would verify if the processor affinity of that process is set properly. taskset command gets and sets processor affinity of a process. Man pages of taskset(1) and sched_getaffinity(2)/sched_setaffinity(2) have good detail on the performance impact on process. Sometimes a process might want to run on a single CPU to leverage already warm L1/L2 caches.
In the above example, PID 4927 is set to run on all available processors (two in my case).
Let’s look at things from the process scheduler’s perspective and see if it all looks good.
The first thing I will look is the sched file in proc a specific pid, it contains the scheduler statistics. The most important parts for troubleshooting are nr_voluntary_switches and nr_involuntary_switches.
nr_voluntary_switches stands for number of context switches due to current thread blocking i.e waiting for IO etc.
nr_involuntary_switches stands for number of times the kernel process scheduler has to kick the process out of running state, probably because it exhauseted its allocated cput time slice.
Now, high number of nr_voluntary_switches may indicate that the process is doing lots of systemcalls, we can further confirm this by looking at pidstat data for this process.
But, high number of (how high? you need to compare stats on a different machine with same kernel version and similar workload) nr_involuntary_switches may always mean that that the task is not getting enough time to run on cpu i.e probably due to too many tasks in the runnable queue, or simply because the priority of the process needs to be higher.