CPU Busy: It’s not a linear function
CPU busy, as plotted on a graph, is not a linear function. And yet, many of us assume it is, and use that assumption for capacity planning.
Linear regression may work for CPU busy functions from 20-80%. It doesn’t help you much at the lower or higher end of the utilization graph. And that’s where you actually want to know the impact of changes in transaction volume or batch workload.
Let’s say you have 10,000 CICS transactions and each uses .02 seconds of CPU. It’s easy to assume that you can readily calculate the impact of increasing transaction rates. The same might be assumed by looking at SMF records and calculating the utilization of each job, then projecting what happens if the jobs grow by X%.
But if your CPU is really busy, you’ll find that you will actually run out of capacity before you should. So it’s important to understand this.
What are you waiting for?
Wait time is a significant part of the response time as you push the processor busy. Dispatcher thrashing is also an issue. While you can push a mainframe to 100% busy and often, get reasonable results, the workloads on it matter. Some workloads, like OLTP, probably aren’t as happy, because waits over the course of a transaction can add up to unacceptable response time.
The low utilization effect, noted at the low end of the CPU graph, is the result of work that has to take place even if no business workloads are available. There is always some demand, which can be simply the system looking for work. The low end effect is less important, as it rarely impacts performance and doesn’t affect capacity planning functions. Most people project CPU busy past existing numbers and are looking to see when they will run out of CPU. The high end effect means that linear regression is simply not a good approach to capacity planning, if you are seeking out the processor limits.
Dealing with multi-processors
In a multi-processor scenario, the situation gets more complex. You often get an average number that represents the average CPU busy of all processors. If work is uniformly balanced across all processors, the average isn’t a bad number. But that isn’t always the way it works. Some jobs (or workloads) can absorb an entire engine. Try running a very computationally-intense workload, or even some SAS jobs; you’ll find CPU busy >90% in many cases, but only on one processor.
This could impact performance, particularly if a workload is assigned to a specific processor, or was already running there when the CPU-soaker started up. What this means is that you want to report on the CPU busy per engine, whenever you can, and not rely on averages. Otherwise, you can’t ensure that the workloads are balanced, and performing as well as possible.
CPU metrics are much more complex than they were when uniprocessors were the norm and work was submitted on punch cards. We now have a 24/7 demand for online work, competing with necessary batch and management functions. Therefore, it’s more critical than ever to understand the CPU data you are gathering. Ensure that you have an accurate picture of your system.