The problem

What happened to ROP?

Ah the classic overflow challenge. By now most people are familiar with this style of exploit, where you have some buffer on the stack, and you can provide more data than what's been allocated for it, leading to a classic overflow. And because it's so well known, so are the techniques for it. Take the simple program below.

// gcc demo.c -o demo -no-pie -fno-stack-protector
#include <stdio.h>

int main() {
	char buf[0x20];
	puts("ROP me if you can!");
	gets(buf);
}

It's obvious that we can overflow the buf buffer, so from here we'd typically use the classic ret2plt attack, where we:

  • Use puts to leak a GOT entry

  • Return to main

  • Call system("/bin/sh")

First we're going to need to find the pop rdi ; ret gadget, so let's run ROPgadget.

$ ROPgadget --binary demo
Gadgets information
============================================================
0x00000000004010ab : add bh, bh ; loopne 0x401115 ; nop ; ret
0x0000000000401037 : add byte ptr [rax], al ; add byte ptr [rax], al ; jmp 0x401020
0x000000000040115f : add byte ptr [rax], al ; add byte ptr [rax], al ; leave ; ret
0x0000000000401078 : add byte ptr [rax], al ; add byte ptr [rax], al ; nop dword ptr [rax] ; ret
0x0000000000401160 : add byte ptr [rax], al ; add cl, cl ; ret
0x000000000040111a : add byte ptr [rax], al ; add dword ptr [rbp - 0x3d], ebx ; nop ; ret
0x0000000000401039 : add byte ptr [rax], al ; jmp 0x401020
0x0000000000401161 : add byte ptr [rax], al ; leave ; ret
0x000000000040107a : add byte ptr [rax], al ; nop dword ptr [rax] ; ret
0x0000000000401034 : add byte ptr [rax], al ; push 0 ; jmp 0x401020
0x0000000000401044 : add byte ptr [rax], al ; push 1 ; jmp 0x401020
0x0000000000401009 : add byte ptr [rax], al ; test rax, rax ; je 0x401012 ; call rax
0x000000000040111b : add byte ptr [rcx], al ; pop rbp ; ret
0x0000000000401162 : add cl, cl ; ret
0x00000000004010aa : add dil, dil ; loopne 0x401115 ; nop ; ret
0x0000000000401047 : add dword ptr [rax], eax ; add byte ptr [rax], al ; jmp 0x401020
0x000000000040111c : add dword ptr [rbp - 0x3d], ebx ; nop ; ret
0x0000000000401117 : add eax, 0x2f03 ; add dword ptr [rbp - 0x3d], ebx ; nop ; ret
0x0000000000401118 : add ebp, dword ptr [rdi] ; add byte ptr [rax], al ; add dword ptr [rbp - 0x3d], ebx ; nop ; ret
0x0000000000401013 : add esp, 8 ; ret
0x0000000000401012 : add rsp, 8 ; ret
0x00000000004010a8 : and byte ptr [rax + 0x40], al ; add bh, bh ; loopne 0x401115 ; nop ; ret
0x0000000000401010 : call rax
0x0000000000401133 : cli ; jmp 0x4010c0
0x0000000000401130 : endbr64 ; jmp 0x4010c0
0x000000000040100e : je 0x401012 ; call rax
0x00000000004010a5 : je 0x4010b0 ; mov edi, 0x404020 ; jmp rax
0x00000000004010e7 : je 0x4010f0 ; mov edi, 0x404020 ; jmp rax
0x000000000040103b : jmp 0x401020
0x0000000000401134 : jmp 0x4010c0
0x00000000004010ac : jmp rax
0x0000000000401163 : leave ; ret
0x00000000004010ad : loopne 0x401115 ; nop ; ret
0x0000000000401116 : mov byte ptr [rip + 0x2f03], 1 ; pop rbp ; ret
0x000000000040115e : mov eax, 0 ; leave ; ret
0x00000000004010a7 : mov edi, 0x404020 ; jmp rax
0x00000000004010af : nop ; ret
0x000000000040112c : nop dword ptr [rax] ; endbr64 ; jmp 0x4010c0
0x000000000040107c : nop dword ptr [rax] ; ret
0x00000000004010a6 : or dword ptr [rdi + 0x404020], edi ; jmp rax
0x000000000040111d : pop rbp ; ret
0x0000000000401036 : push 0 ; jmp 0x401020
0x0000000000401046 : push 1 ; jmp 0x401020
0x0000000000401016 : ret
0x0000000000401042 : ret 0x2f
0x0000000000401022 : retf 0x2f
0x000000000040100d : sal byte ptr [rdx + rax - 1], 0xd0 ; add rsp, 8 ; ret
0x0000000000401169 : sub esp, 8 ; add rsp, 8 ; ret
0x0000000000401168 : sub rsp, 8 ; add rsp, 8 ; ret
0x000000000040100c : test eax, eax ; je 0x401012 ; call rax
0x00000000004010a3 : test eax, eax ; je 0x4010b0 ; mov edi, 0x404020 ; jmp rax
0x00000000004010e5 : test eax, eax ; je 0x4010f0 ; mov edi, 0x404020 ; jmp rax
0x000000000040100b : test rax, rax ; je 0x401012 ; call rax

Unique gadgets found: 53

Wait a second. Where's pop rdi ; ret? It should be here somewhere right??? Well actually a lot of gadgets here are missing such as:

  • pop rdi ; ret

  • pop rsi ; pop r15 ; ret

  • pop rbp ; pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret

So what's going on? Let's investigate, shall we?

Where does pop rdi ; ret come from?

Where did it come from, and where did it go? Where did it come from cotton eye joe?

To find this out, let's take a binary which has this gadget (which hasn't been stripped). There's plenty to choose from from countless CTFs, so I chose hackthebox's ropme. Running ROPgadget.

ROPgadget --binary ropme
Gadgets information
============================================================
...
0x00000000004006d3 : pop rdi ; ret
...

Let's see where this lies in the binary.

pwndbg> x/2i 0x00000000004006d3
   0x4006d3 <__libc_csu_init+99>:	pop    rdi
   0x4006d4 <__libc_csu_init+100>:	ret

So it seems our beloved gadget belongs to a function called __libc_csu_init.

pwndbg> disassemble __libc_csu_init
Dump of assembler code for function __libc_csu_init:
   0x0000000000400670 <+0>:	push   r15
   0x0000000000400672 <+2>:	push   r14
   0x0000000000400674 <+4>:	mov    r15d,edi
   0x0000000000400677 <+7>:	push   r13
   0x0000000000400679 <+9>:	push   r12
   0x000000000040067b <+11>:	lea    r12,[rip+0x20078e]        # 0x600e10
   0x0000000000400682 <+18>:	push   rbp
   0x0000000000400683 <+19>:	lea    rbp,[rip+0x20078e]        # 0x600e18
   0x000000000040068a <+26>:	push   rbx
   0x000000000040068b <+27>:	mov    r14,rsi
   0x000000000040068e <+30>:	mov    r13,rdx
   0x0000000000400691 <+33>:	sub    rbp,r12
   0x0000000000400694 <+36>:	sub    rsp,0x8
   0x0000000000400698 <+40>:	sar    rbp,0x3
   0x000000000040069c <+44>:	call   0x4004b0 <_init>
   0x00000000004006a1 <+49>:	test   rbp,rbp
   0x00000000004006a4 <+52>:	je     0x4006c6 <__libc_csu_init+86>
   0x00000000004006a6 <+54>:	xor    ebx,ebx
   0x00000000004006a8 <+56>:	nop    DWORD PTR [rax+rax*1+0x0]
   0x00000000004006b0 <+64>:	mov    rdx,r13
   0x00000000004006b3 <+67>:	mov    rsi,r14
   0x00000000004006b6 <+70>:	mov    edi,r15d
   0x00000000004006b9 <+73>:	call   QWORD PTR [r12+rbx*8]
   0x00000000004006bd <+77>:	add    rbx,0x1
   0x00000000004006c1 <+81>:	cmp    rbx,rbp
   0x00000000004006c4 <+84>:	jne    0x4006b0 <__libc_csu_init+64>
   0x00000000004006c6 <+86>:	add    rsp,0x8
   0x00000000004006ca <+90>:	pop    rbx
   0x00000000004006cb <+91>:	pop    rbp
   0x00000000004006cc <+92>:	pop    r12
   0x00000000004006ce <+94>:	pop    r13
   0x00000000004006d0 <+96>:	pop    r14
   0x00000000004006d2 <+98>:	pop    r15
   0x00000000004006d4 <+100>:	ret
End of assembler dump.

Interestingly the function's disassembly doesn't seem to contain pop rdi ; ret? That's because pop rdi ; ret doesn't show up in regular code, but rather comes from splitting an instruction in half, specifically pop r15. You may have noticed that we have __libc_csu_init+98 and __libc_csu_init+100 in the disassembly, but pop rdi ; ret is at __libc_csu_init+99.

Quirk of x86

pop r15 ; ret = 41 5f c3
                   ~~~~~

pop rdi ; ret = 5f c3
                ~~~~~

The above is how these instructions get assembled. You can notice that pop r15 ; ret is longer, but the last 2 bytes are the same as pop rdi ; ret (for some reason). Due to how x86 instructions aren't fixed-length like ARM, we can execute an instruction from any point. So, we could take the address of pop r15 ; ret, increment it by 1, and get pop rdi ; ret.

pwndbg> x/2i 0x00000000004006d2
   0x4006d2 <__libc_csu_init+98>:	pop    r15
   0x4006d4 <__libc_csu_init+100>:	ret
pwndbg> x/2i 0x00000000004006d2+1
   0x4006d3 <__libc_csu_init+99>:	pop    rdi
   0x4006d4 <__libc_csu_init+100>:	ret

So wherever there is a pop r15, there will also be pop rdi. And since __libc_csu_init will always contain pop r15 ; ret, binaries with __libc_csu_init will have pop rdi ; ret!

Where did pop rdi ; ret go?

So if any binary containing __libc_csu_init has the pop rdi gadget, what's happening in the demo binary?

pwndbg> info functions
All defined functions:

Non-debugging symbols:
0x0000000000401000  _init
0x0000000000401030  puts@plt
0x0000000000401040  gets@plt
0x0000000000401050  _start
0x0000000000401080  _dl_relocate_static_pie
0x0000000000401090  deregister_tm_clones
0x00000000004010c0  register_tm_clones
0x0000000000401100  __do_global_dtors_aux
0x0000000000401130  frame_dummy
0x0000000000401136  main
0x0000000000401168  _fini

Aha! In the demo binary, __libc_csu_init is not present! But why is that?

Well recently in glibc 2.34, there was a patch which stopped __libc_csu_init being compiled into binaries. The patch was designed to remove useful ROP gadgets for ret2csu, and has the effect of removing pop rdi ; ret in binaries compiled against glibc 2.34+.

Side note on __libc_start_main

This would change a few things, such as __libc_start_main, which took __libc_csu_init as an argument, expecting it to be run. Now that it doesn't exist, it still takes the argument, but does nothing with it, so it had be versioned off for 2.34, as it now had different behaviour. This meant that you couldn't run binaries compiled for 2.34+ on older glibc versions, otherwise you'd get the very annoying error:

/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found

So you have this patch to thank for this :)

Sooo what now?

So has this completely killed ROP on modern binaries? Has our time spent practising and mastering the art of ROP all been for nothing, and we'll have to move on to other types of exploits entirely?

Woah slow down, I want to argue that it hasn't. While it has thrown a wrench in how we do ROP, there are still tricks we can do to get around this, which I will showcase in the following sections.

Other sources of pop rdi ; ret

For one thing, __libc_csu_init isn't our only source of pop rdi ; ret. Recall that wherever there's pop r15 ; ret there is pop rdi ; ret. But why does pop r15 ; ret show up in __libc_csu_init?

Well normally when we compile, the variables in a function are stored on the stack. But, if we compile using optimization flags (or use register when defining a variable), some variables can be stored in registers instead. The registers typically used include rbp, r12, r13, r14, r15, rbx. For this to work, a function using these registers must push the old values of these registers before using them, and then restore them when returning. This is because other functions may also be using these registers, and so this function could clobber those registers for the other function(s). So, this involves pushing these to the stack at the start, then popping them at the end. If you look back at __libc_csu_init's disassembly above, it follows this same pattern, because it's compiled with optimization.

This means that if your binary is compiled for optimization, there is a chance r15 is used for a variable, meaning it must also be pushed, and more importantly popped, which would result in pop r15 -> pop rdi.

Unfortunately, from my testing, r15 seems to be one of the last ones that gets used, so a function would likely need many register variables for r15 to be used.

This is why glibc will always contain pop rdi, because it's compiled for optimization, and there so many functions with lots of variables stored in registers, so it's basically guaranteed that r15 is used in at least one of them.

This means that once you have a libc leak, you'll always be able to find a pop rdi ; ret gadget. However if all you have is an overflow, and no leak, then this will still cause problems.

Summary

So if your binary doesn't have pop rdi ; ret, then you have a few approaches:

  • Get a libc leak using a different bug

  • Controlling rdi some other way

  • Use overflow to leak libc

In the following sections, I'll show some tricks you can use to do the latter 2 approaches :)

Last updated