format-string
Last updated
Last updated
format-string
was the easy pwn challenge from this CTF, which I unfortunately only managed to solve 1 hour after the CTF concluded due to my focus on another pwn challenge corchat_v3
. While the description claims you'll learn nothing, I would argue that the mechanisms which allow this challenge to be solvable at all are interesting, even if they're quite lucky.
Running pwn checksec
reveals we have maximum protections (minus FORTIFY
).
We're also given the source code and, given the name of the challenge, it's very obvious to spot that the problem here is using C a printf
vuln in the aptly-named do_printf
.
It uses scanf("%3s", buf)
to read in the format string, meaning we only have a maximum of 3 characters (plus a null byte) to construct a format string. This is barely enough to do anything meaningful, and immediately rules out anything like %n
overwrites or %[num]$p
leaks.
We also have do_call
, which prompts us for the memory address of a function to call with /bin/sh
. The obvious idea here is to use system
, and we get our shell, but for that we'd need a libc leak.
So the plan would seem obvious:
Use do_printf
to leak libc
Use do_call
to get a shell
So with the size restriction of 3 characters, what do we have available?
We only have enough room for 1 format string
Any of the regular specifiers of the form: %[specifier]
(here's a list of specifiers)
We can pad specifiers: %[0-9][specifier]
We can use variable width specifiers: %*[specifier]
The description tells us that it's a common specifier, so let's try %p
:
%p
(and most specifiers that we can use) will access the first argument (besides the format string), so it looks at rsi
. In our case, it just so happens that rsi
is set to some stack address after the first printf("Here: ")
, which also interestingly points to the string Here:
(I'm sure that won't be significant later).
So we can get a stack leak at the very least, but this doesn't help much. NX
is enabled, so we can't jump to the stack, but even if we could, we'd struggle to write any shellcode due to the small buffer size.
Many of the remaining specifiers aren't very useful either, as most of them would be different ways of printing integers, so they'd only print that stack address back to us, but just in different forms.
%s
doesn't seem that useful at first either, as it would just print Here:
back to us.
One idea I had however was floating point specifiers like %f
. The reason these are interesting is because they would access different arguments, as in, they wouldn't use rsi
. To see this, let's have a closer look at printf:
Not much to see here, except for the use of variadic arguments using va_list
. You'll probably be familiar with the fact that va_list
allows for an unlimited number arguments. It stores the argument registers, plus a pointer to the stack arguments. But it doesn't just store the general purpose (gp) registers (rdi
, rsi
, rdx
, rcx
, r8
, r9
), it can also store the floating point (fp) registers (xmm0-7
):
Here we see that the fp registers can be saved to [rsp+0x50]
if al != 0
, because in a call to a variadic function, al
contains the number of fp registers used as arguments, so if any fp registers are used, we should save them.
Since the program doesn't use floating point arguments, it sets al
to 0
both times, which means that section of memory is left uninitialised, and if we're lucky, could have a libc address.
And it seems like we may have been lucky? If we can use a floating point specifier which can cover the full 16 bytes, we may be able to leak it as a floating point number, then convert it back. Unfortunately there is no such specifier.
%f
is a float
(32 bits) and %lf
is a double
(64 bits), which aren't enough to cover it. While there is %Lf
, which is for a long double
, on x86 this only covers 80 bits, not 128. Also it seems that these long double
arguments are passed using the stack, and so it won't be accessing [rsp+0x50]
anyway, it'll instead use the stack arguments (which overlap with the do_printf
buffer), which are useless for us:
It turns out that the description wasn't lying when it said it was a widely used specifier (shocker). But doesn't %s
just leak the string Here:
?
Well, my teammate ir0nstone found something interesting that happens when you do %p
then %s
:
The 2nd %s
now contains the tail end of the %p
output, which is quite interesting. It implies that rsi
points to some internal stack buffer that the output gets copied to.
So what if now you do %s
followed by some character. In theory, this could append the character to the buffer, as it would output the current buffer + an extra character, which must be copied to this buffer.
And sure enough this works! Now if this is just some stack buffer, there could be uninitialised stack data inside it, which we could pad up to and print back to us. We can verify this with the following script:
This is some weird behaviour, and seems quite lucky for us (which it is). Understandably I had questions.
First of all, what is this stack buffer? And also why is there a buffer at all, isn't stdout
supposed to be unbuffered?
To answer these, let's look further into printf
. We saw that it calls vfprintf, so lets start there.
It starts off by doing some sanity checks, but then checks if the file is unbuffered, and if it is, it will call buffered_vfprintf:
It seems that when the file is unbuffered, it creates a "helper file" on the stack that uses a stack buffer, and then it prints to the "helper file". This seems like such a hack (cos it is), but thankfully it helps us out, because CHAR_T buf[BUFSIZ]
is where the output goes before it's written out.
After it gets printed to the "helper file", the output sits in the stack buffer, and now needs to be written out to stdout. It does this by calling _IO_sputn, which goes on to call the write
syscall in _IO_new_file_write.
This is where rsi
would be set to the buffer, and from this point the rsi
register remains untouched. The main reason for this is the fact that no function calls (with 2+ arguments) take place after this, so it never gets clobbered.
However there is still an element of "luck", because rdi
is clobbered by _IO_funlockfile
(see here), and rdx
gets clobbered as a scratch register.
In my solution I used the first address I found (the one found by the script) for my leak. Unfortunately this address was an ld address (_dl_process_pt_note+539
), not libc. However, due to how the libraries are mapped, for each system the offsets between the libraries are deterministic, so we can convert this into a libc leak.
The offset for my system was 0x1f4000
, but remotely I had to fiddle a bit to find that it was 0x1f6000
(not the cleanest way to do it, but oh well).
Armed with a libc leak, we can now call do_call
and get RCE.
The above is my solution, but there was another approach that was shared on the discord after the event finished, which involved the variable width specifier: %*x
.
The variable width specifier allows the user to specify the width for an argument as an argument:
This can be useful in some printf exploits using %n
, as you can use the lower 32 bits of an address as a width, and print a variable number of bytes depending on that address (here's an example).
In our case however, it would use rsi
as a width and, since thats an address, it will print a lot of bytes (most of which are spaces). This is useful because this amount of printing will fill up the buffer, and then when another function is called (like scanf
), it will clobber the stack buffer with its stack usage, leaving behind addresses. Now the buffer has spaces padding up to some address, which in our case happens to be _IO_2_1_stdin_
, which we can print.
Below is my implementation of this solution (I opted to find a process with a low width, so that the exploit would run quicker remotely).