Sunday, July 3, 2011

Code Profiling in Linux Using Gprof

Today, performance is a pivotal point in the programming world. Programmers constantly strive to make their code run in fewer milliseconds. Performance analysis can be done in various ways, static as well as dynamic. One way to optimise code is to dynamically analyse it for call-time and execution delays in order to find the bottlenecks in the execution. In Linux, Gprof can help to profile the code you compiled by using GCC.

Complete Code with compiled files can be downloaded from here.

To explain code profiling in Linux for C and C++ code in GCC, I developed a small C program for function calling. This shows the usage of Gprof, the GCC profiling program, and its reporting pattern. Profiling is used in Linux to improve code performance by analyzing call times and call chains involved in the operation. You can find out the time taken by functions (which function code took a long time to run) and this can be very useful to identify bottlenecks. There are two types of profiling -- the flat or the call-graph variety.

Environment and prerequisites
For the purpose of this article, I am using Ubuntu Server 9.10 in Oracle VirtualBox, with the following VM characteristics:

  • Virtual hardware: Single CPU, 2000 MB RAM, I/O APIC enabled; Acceleration: VT-X, Nested Paging, PAE/NX; Virtual hard disk: 8 GB on SATA controller
  • Software/OS: Linux 2.6.31-14-generic-pae, Ubuntu; GCC 4.4.1 (Ubuntu 4.4.1-4ubuntu9); SAR utility for CPU reporting (sysstat package); HTOP utility
The results may vary in different scenarios. I would recommend running this code in a highly isolated environment, as it can exhaust the CPU in no time. The code is pretty basic and relies only on basic mathematical operations, with a few system calls for the MD5 sum generation. The internal calls of the functions are counted by global counters to verify with the Gprof calls.

The code
A brief explanation about the code: First, create the files to append the hashes and random number. ‘p’  is the integer assigned; the value of ‘p’ must be less than 10000. p++ increases the value stored at the ‘p’ pointer by 1. func1() is called if the value lies within 9600 and 10000; i.e., func1() is called 399 times. func3() is called for the remaining ‘p’ values, i.e. 9601.

In func1(), ‘j’ is incremented. When ‘j’ reaches 66668, it increases the global counter1 to keep track of function calls. In func2(), ‘m’ is incremented in the for loop. The filegen() function is called thrice. Counter2 is incremented to track the filegen() function. Otherwise, it returns to func1().

In filegen(), tempc.txt is created and opened in append mode; 4000 values are added to this file in every filegen() call. Then the tempc.txt file is hashed, and this hash stored in hashes.txt. So for every single call of func2(), three calls of filegen() are made—3 MD5 hashes are computed for every increment of p. MD5 hashing is used to keep the CPU busy. func3() comprises of the ‘l’ variable, which is incremented to 50000, and will increment counter3 every time to track the calls. Every function has different time consumptions.
/*
  Name :Subodh Pachghare
  Author :I am the author!!!
  Description :Function calling test program for Gprof
*/
#include <stdio.h>
int counter1=0;                                  /*Global Variable*/
int counter2=0;                                  /*Global Variable*/
int counter3=0;                                  /*Global Variable*/
int main()
{
int p=0;
system("> /root/hashes.txt");           /*File will be overwritten or Created*/
system("> /root/tempc.txt");             /*File will be overwritten or Created*/
printf("Initial value in Main function= %d\n",p);
while(p<10000)                    /*Condition for while loop*/
            {
                        if(p>9600 && p<10000)
                                    {
                                    func1();
                                    }
                        else
                                    {
                                    func3();
                                    }
                        p++;                           
            }
printf("Last value in Main function (value(p) incremented for) = %d\n",p);
printf("Func1(),Func2(),Func3() & filegen() called for %d %d %d %d times\n",counter1,counter1,counter3,counter2);
}
func1()
{
int j=0;
while(j<=66667)                               /*Another operation to eat CPU cycles*/
            j++;
if(j==66668)                                      
            {
            counter1++;
            }
func2();                                   /*Call to Function func2()*/
}
func2()
{
int m=0;
for (m=1;m<1000000;m++)
            {
            if(m<4)
                        {
                                    filegen();         /*Call to Function filegen()*/
                                    counter2++;
                        }
            else
                        {
                                    return 0;          /*return to the func1()*/
                        }
            }
}
filegen()
{
FILE *fo,*fg;
int f=0;
fo=fopen("/root/tempc.txt","a");
while(f<4000)
            {
                        fprintf(fo,"%d\n",f);
                        f++;
            }
fclose(fo);
system("md5sum /root/tempc.txt >> /root/hashes.txt"); /*Unique MD5 hash generation*/
}
func3()
{
int l=0,b=0;
while(l<=50000)
            {
                        b=l;
                        l++;
            }
if(l==50001)
            {
            counter3++;
            }
}

Killing code before completion: Note that this code cannot be interrupted with Ctrl-C due to presence of system call. So if you want to stop execution in between, use kill -9 `pidof a.out` from another terminal to send a SIGKILL signal.

Reporting CPU activity
To report CPU utilisation in user-space and kernel-space while executing the code, use the SAR (system activity report) from the sysstat package, which is my favourite.

sar –u 2 400

Here, ‘-u’ is for CPU reporting; ‘2’ implies a 2-second interval; and ‘400’ is to run it 400 times.

Surprisingly, the code eats lots of cycles for kernel-space, as compared to user-space.

Compile the code using the –pg option to include the profiler code in the executable, as follows:

gcc –pg gprof_test_code_svp.c

This will generate an a.out executable file. You can specify another output filename with ‘-o’. Now run the binary with ./a.out . After executing the code, the following output will be generated:

root@ubuntu:~# gcc -pg gprof_test_code_svp.c
root@ubuntu:~# ./a.out
Initial value in Main function= 0
Last value in Main function (value(p) incremented for) = 10000
Func1(),Func2(),Func3() & filegen() called for 399 399 9601 1197 times
root@ubuntu:~#

Flat-profile generation
After successful execution of the program, it will generate the file gmon.out. Use Gprof on this file to generate the graph: 

gprof -p a.out gmon.out 

Flat profile:
Each sample counts as 0.01 seconds.
  %   cumulative self self total          
 time seconds seconds calls us/call us/call name   
 97.98 0.97 0.97 9601 101.03 101.03 func3
  2.02 0.99 0.02 399 50.13 50.13 func2
  0.00 0.99 0.00 1197 0.00 0.00 filegen
  0.00 0.99 0.00 399 0.00 50.13 func1
<output snipped>

This generates flat-profile analytics, where time calls are given for comparison. Indications from this profile are:
  • The program took 1.46 minutes of clock time to complete. From the flat-profile, func3 consumed 0.97 seconds per call and was called 9601 times; around 101 milliseconds are spent in this function for every call to func3. The same values in the self us/call and total us/call field indicate that the complete time was spent on the function's own operations, and not on the children functions.
  • In func1, which was the last entry presented in the profile, there is practically no time spent in itself (self us/call) but as it calls func2, most of its time is spent there. Hence, the total us/call field shows 50.13 milliseconds—the same as that of func2.
  • In func2, the self us/call and total us/call field are the same—the time is spent in doing self operations.
  • There is practically no time spent in the filegen function according to Gprof, as the md5sum is a system call, which is not presented in Gprof.
So, using this, we can clearly look for bottlenecks present in the code consuming CPU time.

Call-graph profile generation
To generate a call-graph profile of the functions used in the program, use gprof with the ‘q’ switch:

gprof -q a.out gmon.out

                             Call graph (explanation follows)

granularity: each sample hit covers 4 byte(s) for 1.01% of 0.99 seconds

index % time self children called name
                                                 <spontaneous>
[1]    100.0 0.00 0.99 main [1]
                0.97 0.00 9601/9601 func3 [2]
                0.00 0.02 399/399 func1 [3]
-----------------------------------------------
                0.97 0.00 9601/9601 main [1]
[2]     98.0 0.97 0.00 9601 func3 [2]
-----------------------------------------------
                0.00 0.02 399/399 main [1]
[3]      2.0 0.00 0.02 399 func1 [3]
                0.02 0.00 399/399 func2 [4]
-----------------------------------------------
                0.02 0.00 399/399 func1 [3]
[4]      2.0 0.02 0.00 399 func2 [4]
                0.00 0.00 1197/1197 filegen [5]
-----------------------------------------------
                0.00 0.00 1197/1197 func2 [4]
[5]      0.0 0.00 0.00 1197 filegen [5]
-----------------------------------------------

<Output Snipped>

Indications from call-graph are:
  • The number after the name indicates the index numbers. So we can see that main spends 0.99 seconds in the children waiting for func3 to complete, which is in the children field. This is substantiated by the 0.97 seconds spent per call in the self field of func3. This is again followed up by func1.
  • Every entry has the function name in the middle of it with the same index number; i.e., main with value 1 in index 1 field. So you can easily see the cycle for it, in Index 1 field kernel operations calling main, then main calling func3 and so on. The same follows for the rest of the index.
  • In Index 3, func1 is called by main with 399 calls; spent 0.02 time in children process (this is substantiated by 0.02 in func2 call). func3 is directly executed from main.
  • The Middle line in each section is referenced by unique index number. Above the line indicates originating function for the function given in Index Number & below the line indicates the process call forwarded to the next function. For example, In Index 3 section, the call is originated from main() function to func1() & then to func2(). Here func1() has index number 3.
  • <spontaneous> indicates that the parent function cannot be determined, generally due to system-related APIs.
  • A cumulative time recording implies that 98 per cent of the total time was spent in performing func3 operations, as per index 2.
Function calls substantiate that the filegen() function is called thrice for a single call of func2(). func1() and func2() are called for the same time; the counter values present in the code prove this. Counters are displayed after successful execution of the code. func1() and func2() functions comprise the same timings for calculations. func3 is called for 9601 times. The call-graph gives the call-tree data.

Source code annotations in profiling
To generate annotations in the source code presented, the source code has to be included in the executable while compiling code, using the ‘g’ switch:

gcc gprof_test_code_svp.c –g -pg -lc

Now run the code normally, and use the source code highlighting option ‘-A’ in Gprof to have annotations marked in the source code. This gives line-by-line profiling, which is easy to understand.

gprof a.out gmon.out -A


*** File /root/gprof_test_code_svp.c:
                /*
                  Name :Subodh Pachghare
                  Author :I am the author!!!
                  Description :Function calling test program for gprof
                */
                #include <stdio.h>
                int counter1=0;                              /*Global Variable*/
                int counter2=0;                              /*Global Variable*/
                int counter3=0;                              /*Global Variable*/
                int main()
       ##### -> {
                int p=0;
                system("> /root/hashes.txt");       /*File will be overwritten or Created*/
                system("> /root/tempc.txt");         /*File will be overwritten or Created*/
                printf("Initial value in Main function= %d\n",p);
                while(p<10000)                /*Condition for while loop*/
                        {
                                    if(p>9600 && p<10000)
                                                {
                                                func1();
                                                }
                                    else
                                                {
                                                func3();
                                                }
                                    p++;                           
                        }
                printf("Last value in Main function (value(p) incremented for) = %d\n",p);
                printf("Func1(),Func2(),Func3() & filegen() called for %d %d %d %d times\n",counter1,counter1,counter3,counter2);
                }
                func1()
         399 -> {
                int j=0;
                while(j<=66667)                           /*Another operation to eat CPU cycles*/
                        j++;
                if(j==66668)                                              
                        {
                        counter1++;
                        }
                func2();                               /*Call to Fucntion func2()*/
                }
                func2()
         399 -> {
                int m=0;
                for (m=1;m<1000000;m++)
                        {
                        if(m<4)
                                    {
                                                filegen();         /*Call to Function filegen()*/
                                                counter2++;
                                    }
                        else
                                    {
                                                return 0;          /*return to the func1()*/
                                    }
                        }
                }
                filegen()
        1197 -> {
                FILE *fo,*fg;
                int f=0;
                fo=fopen("/root/tempc.txt","a");
                while(f<4000)
                        {
                                    fprintf(fo,"%d\n",f);
                                    f++;
                        }
                fclose(fo);
                system("md5sum /root/tempc.txt >> /root/hashes.txt");/*Unique MD5 hash generation*/
                }
                func3()
        9601 -> {
                int l=0,b=0;
                while(l<=50000)
                        {
                                    b=l;
                                    l++;
                        }
                if(l==50001)
                        {
                        counter3++;
                        }
                }

Top 10 Lines:

     Line Count

       72 9601
       59 1197
       32 399
       43 399

Execution Summary:

        5 Executable lines in this file
        5 Lines executed
   100.00 Percent of the file executed

    11596 Total number of line executions
  2319.20 Average executions per line

The timing presented is by user-space; so the system calls generally spend far less time in user-space as compared to kernel-space. That’s why we see the 0.00 in the Time collection of Gprof. From the SAR (system activity report) snapshot attached, it can be seen that system space is using around 94 per cent of the CPU processing, as compared to 5-6 per cent of the user space.

That’s it, guys; happy coding! I will be back with some more cool stuff.

References
‘Gprof: a Call Graph Execution Profiler’ by Susan L Graham, Peter B Kessler, Marshall K McKusick

Note : - This Article is posted by me in the Linux for you Magazine for June 2011 issue. Linux for you is the leading Open-source & Linux magazine in the Asia Region.

No comments:

Post a Comment