Monday, September 30, 2013

My Partial BookShelf or shall I say "Shelfie"!

Catching segmentation fault, it's analysis & bunch of other stuff

We all know what is segmentation fault (SIGSEGV) very well, one liner definition of it would be when process writes to a memory location which isn't mapped into its address space, segmentation fault occurs. This is actually a SIGSEGV signal sent to offending process by kernel. To analyze & debug the core dump file is needed, now if due to some reasons core file is not getting generated then you can use catchsegv script provided in Linux to analyze this violation situation. This nifty little script gives us exact details when SIGSEGV signal is received.

Important excerpt from catchsegv script which does all the magic :
"$prog" ${1+"$@"} 2>&3 3>&-)
exval=$? itself generates segmentation fault if executed, but if it is preloaded using LD_PRELOAD then SIGSEGV signal can be caught on any executable following it. This library is provided by-default in any standard Linux distribution. is part of glibc package. When preloaded it basically takes the responsibility of signal handling before the main user program is invoked, if program doesn't handle the kernel signal, catches it and prints debugging information on console.

I have wrote a simple code to demonstrate this using mprotect() call. Luckily when idea about this post came into my mind I was writing a code to understand Memory Mapped I/O in Linux and its performance implication, so I decided to use it for generating segmentation fault. Following is the flow of code.
  1. A file will be present in current directory to which text content needs to be appended.
  2. Fix string to be appended to file contents.
  3. Optional query and print Kernel Page Size, as mapped pages will be aligned with this page boundary.
  4. Open file in RW permission and A mode.
  5. Initialize fstat() structure for original file size.
  6. Calculate offset for ftruncate() call by adding original size and length of string to be appended.
  7. Truncate the file to accommodate new contents.
  8. Again initialize fstat() structure with new file size.
  9. Memory map the file from beginning with new file size as RW.
  10. Put program in sleep to analyze the process memory mappings using pmap or /proc filesystem.
  11. If SEGFAULT is needed call mprotect() and make previously RW memory pages to Read Only.
  12. Again put program in sleep to analyze mappings with new permissions.
  13. Append string contents to memory file using memcpy() call at offset calculated based on initial length of file. This operation fails if mprotect() is executed i.e. memcpy() is going to write on read only memory page causing kernel to generate SIGSEGV, otherwise it will continue to next step.
  14. msync() call will flush all new changes back to disk with Synchronous I/O i.e. actual disk commit unlike Asynchronous I/O where you write to Kernel Buffer layer. Refer to my earlier post for more details.
    1. What are exactly O_DIRECT, O_SYNC Flags, Buffers & Cached in Linux-Storage I/O?
  15. msync() call doesn't generate SIGSEGV even after mprotect() is executed as it only reads memory pages.
  16. Unmap the whole region.
  17. Close file descriptor.
  18. Optionally measure timing at each stage.
Code excerpt from my program for mprotect() call :
//If user needs to generate segmentation fault, then only execute this mprotect() section
if (atoi(argv[1]) == 1)
        //Convert mapping protocol to read only
        //mmap_start          : Starting address of mapping
        //myfilestat2.st_size : Length to which mapping protocol needs to be changed, so whole pages are changed
        //PROT_READ           : New protocol is read only
        //FYI                 : Any write operation on these pages after this call will be aborted with SIGSEGV signal
        mprotect (mmap_start, myfilestat2.st_size, PROT_READ);

                if (errno)
                                printf ("Error from mprotect(): %s\n",strerror(errno));
                                return 1;
                                printf ("Mprotect() successful, mmap'ed converted to PROT_READ, SIGSEGV Imminent on memcpy()\n");
        //Print Memory Map
        system ("sleep 60");
Full source code can be downloaded from here.
Now lets execute this code and see it under catchsegv, gdb, valgrind, pmap & /proc/self/maps.
Both sleep timers will give us plenty of time to see memory mapping of file dst_file changing from RW-S to R--S.

catchsegv :

I ran segmentation fault generator under catchsegv utility and output is given below. Here you can see that memcpy()  call after mprotect() call generated the SIGSEGV as map was read only. Now a quick scan through below excerpt will reveal that "/tmp/dst_file" is mapped by mmap() in read mode(r) & shared(s), not for writing at all. This is done by mprotect() call. Now for those who don't know what is shared(s) flag on mapped pages here is a one liner explanation - it indicates that the changes made to this page are committed to the mapped file and are visible to all other processes sharing this mapping. Registers starting with R indicates this as 64 bit operation. 64 bit general purpose register R15 is holding address of starting point of dst_file mapping and accumulator register is holding a address which is in address range of dst_file. These are the registers at the time of segmentation fault. RAX stores string load & store operations, RDI stores points to destination in string operations. Explaining of all registers is all together a different topic. Hopefully you got the idea of debugging using this nifty utility.
[root@testsrv ~]# catchsegv ./sigsegv_generator 1
Appending initial file contents from src_file to dst_file using system() call
[+]cat redirection time: 0.002880 sec
Opened /tmp/dst_file for reading/writing
file desc = 3
Actual string length str=13 content to append with new line = "No Segfault
Original File size before truncate = 221
Successfully truncated file to digest new contents, new file size = 234
Mmap() successful
[+]mmap time: 0.000005 sec
Mprotect() successful, mmap'ed converted to PROT_READ, SIGSEGV Imminent on memcpy()
*** Segmentation fault
Register dump:

 RAX: 00002b12ead980dd   RBX: 000000000040146a   RCX: 000000000000004e
 RDX: 000000000000000d   RSI: 000000000040146a   RDI: 00002b12ead980dd
 RBP: 0000000000000003   R8 : 000000000040146a   R9 : 53202c444145525f
 R10: 00007fffe9b94a40   R11: 00002b12eae24530   R12: 00002b12eb12df08
 R13: 00007fffe9b94df0   R14: 00007fffe9b94de0   R15: 00002b12ead98000
 RSP: 00007fffe9b94cb8

 RIP: 00002b12eae24541   EFLAGS: 00010202

 CS: 0033   FS: 0000   GS: 0000

 Trap: 0000000e   Error: 00000006   OldMask: 00000000   CR2: ead980dd

 FPUCW: 0000037f   FPUSW: 00000000   TAG: 00000000
 RIP: 00000000   RDP: 00000000

 ST(0) 0000 000000000000003b   ST(1) 0000 0000000000000005
 ST(2) 0000 0000000000000005   ST(3) 0000 0000000000001d79
 ST(4) 0000 0000000000000000   ST(5) 0000 0000000000000000
 ST(6) 0000 0000000000000000   ST(7) 0000 0000000000000000
 mxcsr: 1fa0
 XMM0:  000000000000000000000000ff000000 XMM1:  000000000000000000000000ff000000
 XMM2:  000000000000000000000000ff000000 XMM3:  000000000000000000000000ff000000
 XMM4:  000000000000000000000000ff000000 XMM5:  000000000000000000000000ff000000
 XMM6:  000000000000000000000000ff000000 XMM7:  000000000000000000000000ff000000
 XMM8:  000000000000000000000000ff000000 XMM9:  000000000000000000000000ff000000
 XMM10: 000000000000000000000000ff000000 XMM11: 000000000000000000000000ff000000
 XMM12: 000000000000000000000000ff000000 XMM13: 000000000000000000000000ff000000
 XMM14: 000000000000000000000000ff000000 XMM15: 000000000000000000000000ff000000


Memory map:

00400000-00402000 r-xp 00000000 00:5a 25958017 /root/sigsegv_generator
00601000-00602000 rw-p 00001000 00:5a 25958017 /root/sigsegv_generator
00b56000-00b7b000 rw-p 00000000 00:00 0 [heap]
2b12ea970000-2b12ea990000 r-xp 00000000 00:5a 25300304 /lib64/
2b12ea990000-2b12ea991000 rw-p 00000000 00:00 0
2b12eab8f000-2b12eab90000 r--p 0001f000 00:5a 25300304 /lib64/
2b12eab90000-2b12eab91000 rw-p 00020000 00:5a 25300304 /lib64/
2b12eab91000-2b12eab92000 rw-p 00000000 00:00 0
2b12eab92000-2b12eab96000 r-xp 00000000 00:5a 25300249 /lib64/
2b12eab96000-2b12ead95000 ---p 00004000 00:5a 25300249 /lib64/
2b12ead95000-2b12ead96000 r--p 00003000 00:5a 25300249 /lib64/
2b12ead96000-2b12ead97000 rw-p 00004000 00:5a 25300249 /lib64/
2b12ead97000-2b12ead98000 rw-p 00000000 00:00 0
2b12ead98000-2b12ead99000 r--s 00000000 00:5a 25957962 /tmp/dst_file
2b12ead9c000-2b12eaf23000 r-xp 00000000 00:5a 25300327 /lib64/
2b12eaf23000-2b12eb123000 ---p 00187000 00:5a 25300327 /lib64/
2b12eb123000-2b12eb127000 r--p 00187000 00:5a 25300327 /lib64/
2b12eb127000-2b12eb128000 rw-p 0018b000 00:5a 25300327 /lib64/
2b12eb128000-2b12eb12f000 rw-p 00000000 00:00 0
2b12eb134000-2b12eb14a000 r-xp 00000000 00:5a 25300207 /lib64/
2b12eb14a000-2b12eb349000 ---p 00016000 00:5a 25300207 /lib64/
2b12eb349000-2b12eb34a000 rw-p 00015000 00:5a 25300207 /lib64/
7fffe9b82000-7fffe9b97000 rw-p 00000000 00:00 0 [stack]
7fffe9bfe000-7fffe9c00000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
[root@testsrv ~]#

gdb :

To debug above code under gdb compile using command to include DWARF debugging symbols inside the code.
gcc -O2 <<source file>> -ggdb -o <<executable name>>
I have modified the source string to be appended to a single NULL character for ease of calculation of offset in memcpy() call, source file is also containing a single character, so offset happens to be 2. Now execute the code under gdb.
gdb sigsegv_generator
Now run the executable inside gdb with argument 1.
(gdb) run 1
Now either you can use breakpoints to control the flow of execution or simply use  CTRL + C to stop at initial stage and then do step execution using n. When SIGSEGV is caught after memcpy()gdb will show following message.
(gdb) n

Program received signal SIGSEGV, Segmentation fault.
0x00002aaaaad55541 in memcpy () from /lib64/
Above address 0x00002aaaaad55541 is the instruction on which SEGFAULT happened. Same address is also indicated by 64 bit Instruction Pointer RIP shown in below register information. Now lets see what happened before and after above memcpy() call.
(gdb) x 0x00002aaaaad55541
0x2aaaaad55541 <memcpy+17>:     0xff480f88
(gdb) x/10i 0x00002aaaaad55530
   0x2aaaaad55530 <memcpy>:     cmp    $0x20,%rdx
   0x2aaaaad55534 <memcpy+4>:   mov    %rdi,%rax
   0x2aaaaad55537 <memcpy+7>:   jae    0x2aaaaad555b0 <memcpy+128>
   0x2aaaaad55539 <memcpy+9>:   test   $0x1,%dl
   0x2aaaaad5553c <memcpy+12>:  je     0x2aaaaad55549 <memcpy+25>
   0x2aaaaad5553e <memcpy+14>:  movzbl (%rsi),%ecx
=> 0x2aaaaad55541 <memcpy+17>:  mov    %cl,(%rdi)
   0x2aaaaad55543 <memcpy+19>:  inc    %rsi
   0x2aaaaad55546 <memcpy+22>:  inc    %rdi
   0x2aaaaad55549 <memcpy+25>:  test   $0x2,%dl
Note that above assembly instruction are of AT&T syntax, I got little confused regarding Intel vs. AT&T related to MOV instruction and it appears that AT&T has MOV SRC DST format. As I am a noob in Assembly Programming I Stackoverflow'ed above query and got convincing response. Link is mentioned at end of section. Above arrow indicates the memcpy() operation.
(gdb) info proc mappings
Related excerpt from output :
Mapped address spaces:

Start Addr      End Addr        Size    Offset  objfile
0x2aaaaaacf000  0x2aaaaaad0000  0x1000  0       /tmp/dst_file
Lets dump the registers. (x86_64 machine, so expect registers starting with R)
(gdb) info registers
rax            0x2aaaaaacf002   46912496267266
rbx            0x40146a 4199530
rcx            0x69     105
rdx            0x1      1
rsi            0x40146a 4199530
rdi            0x2aaaaaacf002   46912496267266
rbp            0x7      0x7
rsp            0x7fffffffe468   0x7fffffffe468
r8             0x40146a 4199530
r9             0x53202c444145525f       5989836176067220063
r10            0x7fffffffe1f0   140737488347632
r11            0x2aaaaad55530   46912498914608
r12            0x2aaaab05eac8   46912502098632
r13            0x7fffffffe5a0   140737488348576
r14            0x7fffffffe590   140737488348560
r15            0x2aaaaaacf000   46912496267264
rip            0x2aaaaad55541   0x2aaaaad55541 <memcpy+17>
eflags         0x10202  [ IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
Now we have all the inputs to make a conclusion what went wrong. As you can see from mapped address space output file dst_file starting address is 0x2aaaaaacf000, including the offset it will be 0x2aaaaaacf002. Now this offset address is present at RDI register which is used for Destination Index. RSI Source Index register is having address of fix string gets appends to the offset location. What "mov %cl,(%rdi)" sentence does is that it transfers contents at source address in CL register to content at offset address at RDI register. It is using register indirect addressing here to do so. CL register is low 8 bits of ECX register which in turn is low 32 bit of 64 bit RCX register. This ECX register is used to perform string operation. memcpy() call comes from string library family. Instruction "movzbl (%rsi),%ecx" just before memcpy() loads source address into ECX register, movzbl instruction performs operation on low 8 bits of address. So as RDI address is Read Only due to mprotect() call, write operation causes illegal memory access and causes kernel to send SIGSEGV to process.

Note 1 : To switch to Intel Assembly syntax, use following in gdb.
set disassembly-flavor intel
Stackoverflow link -

Note 2 : A very good way to analyze assembly code in gdb is to do it along with C code. So that each statement is properly implicated to assembly. To do this use following command in gdb just after memcpy() call.
(gdb) disassemble /m

pmap :

I have sleep() calls in program to give us time to analyze the process mappings using pmap command. After executing the code, it will sleep for 60 seconds after mmap() call, in this time we will observe that before the mprotect() call the dst_file mapping was RW and it changes to R afterwards. Use following command to see the process mappings.
pmap <<PID>> -x
RW mapping :
[root@testsrv ~]# pmap 1997 -x
1997:   ./sigsegv_generator 1
Address           Kbytes     RSS   Dirty Mode   Mapping
00002b6c7ad08000       4       0       0 rw-s-  dst_file
[root@testsrv ~]#
Same mapping R only :
[root@testsrv ~]# pmap 1997 -x
1997:   ./sigsegv_generator 1
Address           Kbytes     RSS   Dirty Mode   Mapping
00002b6c7ad08000       4       0       0 r--s-  dst_file
[root@testsrv ~]#

Note : "S" indicates that the map is shared, "P" will indicate it is private.

Extension to above section for more detailed reporting is to use /proc/<<pid>>/smaps. Here we can see Dirty section and other bunch of relevant stuff related to performance of memcpy().

RW mapping :
2b8571cf0000-2b8571cf1000 rw-s 00000000 00:5a 25957962                   /tmp/dst_file
Size:                  4 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
For Read only map, above information is the same except permissions which becomes r--s.

Valgrind :

Valgrind may not be completely useful here to debug illegal access to memory in our case, but it does report the call on which SEGFAULT occurred. Command line to use.
valgrind -v ./<<executable>> <<args>>
Output at Segfault :
--2064-- REDIR: 0x4eb5530 (memcpy) redirected to 0x4c28ec0 (memcpy)
==2064== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==2064==  Bad permissions for mapped region at address 0x4022005
==2064==    at 0x4C2914F: memcpy (mc_replace_strmem.c:628)
==2064==    by 0x400CC5: main (mprotect_sigsegv.c:173)
--2064-- REDIR: 0x4ea77f0 (free) redirected to 0x4c26890 (free)
Output of Valgrind is pretty much self explanatory. There are no leaks detected.

So that's it for now, I learned a ton of new stuff. It's really been a long post which I didn't intended too but no worries this is the good stuff. Keep debugging.

Sunday, September 15, 2013

Compiling a strict x86 executable on x86_64 architecture kernel

A quick post, recently I felt a need to compile a strict 32-bit binary executable code to test abnormal behavior of CUDA C source code. I wanted the host code of CUDA to be compiled as x86 architecture on RHEL 6.0 x86_64 OS. CUDA compiler nvcc uses gcc to link and compile in the back-end with added inputs, you can easily verify this by looking at file in Nvidia Computing SDK.

First make sure to create object files for elf32-i386 architecture. x86_64 system uses 64-bit ELF (Executable and Linkable Format) as default. GCC on x86_64 refers to machine architecture as x86_64 by default if not specified, so there is a "-m64" implicit flag. You can verify this by using verbosity flag in gcc while compiling. COLLECT_GCC_OPTIONS doesn't indicate "-march=1686" as in case of "-m32" explicit flag. GCC provides options as "-m<<32 or 64>>" for specifying the architecture.

"gcc -v test.c" results
"gcc -m32 -v test.c" results
Remember if you are going to generate object files first and then going to link it, those object files must be in elf32-i386 architecture too. If you try to link elf64-x86-64 object files with elf32-i386 compilation following error will occur.

Linking x86_64 object files to x86 linker
Object files can be verified for ELF architecture by using either objdump or file command.

ELF 32-bit format of object file
ELF64-x86-64 format of object file
You might have noticed additional ELF section .eh_frame in x86_64 ELF header of object file. GCC doesn't seem to include this section in object file creation for x86 machine architecture. However if you go ahead and create executable out of this, objdump reveals that both x86 & x86_64 executables have .eh-frame & .eh_frame_hdr in them. This is strange & I want to dig more when I will have time. Anyways .eh_frame contains information for frame unwinding during exception handling & .eh_frame_hdr contains a pointer to the .eh_frame section. The concept of Call Stack Unwinding is hard to describe in this short post, though I will try to explain in short in next section. For detailed Information refer to following Linux Foundation documentation & supporting links which I found useful.

Linux Foundation - Linux Standard Base Core Specification 4.1

SCO SYSTEM V Application Binary Interface

ELF Specification for x86

ELF64 Object File Format

Good Call Stack Reference for C

A call stack is mainly used for keeping track of the point to which control should return after execution of subroutine or in general a called function. In general, if you guys are families with push and pop operations, adding a subroutine's entry to the call stack i.e. push operation is called as Winding & removing entries from call stack is obviously called as Unwinding. Unwinding transfers control out of nested subroutine to previous subroutine which in turn restores proper context after popping many stack frames to assist error handling. In other words if exception handling is enabled, then when run-time exception is thrown the stack unwound until an exception handler is matched. So as we can see elf64-x86_64 by default adds exception handling headers to the object file, however the specification document doesn't clearly mention this & I assumed it to be so. Remember that exception handling for C & C++ is not the same. I would suggest you to read more about this on your own. One more important tool in Linux to dig around ELF is readelf.

On the side note, if you got following error while compiling x86 executable on x86_64 system, maybe you need to install glibc-devel-i386 package compatible with your existing glibc version.