Mac OS X Expert Challenge 2005.1

Runners Up: Andrew Wellington and Graham Dennis

Analysis

Panpipes uses the Mach IPC mechanisms to call various functions in the kernel. Using these calls panpipes creates a new Mach task, creates a new thread in that task, and attempts to run it. This causes a panic because we use Mach calls to create and run the thread inside a new task, and the kernel hasn't set the bsd_info element of that task's task_t structure. The only places that the bsd_info is set for this structure is in BSD exec() and BSD fork() calls. Upon the creation of a new Mach task this field is set to 0 (NULL).

When the kernel later tries to access the bsd_info while it attempting to run the thread, a panic occurs reading from the bsd_info structure due to a null pointer dereference.

How did we figure this out?

The obvious answer: lots of hard work. Doubtless there was an easier way, but we didn't know one at the time. Basically we got an assembler dump of the app and started tracing through with gdb and following the assembler code.

We quickly found that gdb stopped working when panpipes calls task_set_exception_ports and sets the main port for the task to intercept all exceptions it generates. Thus breakpoints in gdb stop working. It wasn't until the early hours of Sunday morning that we figured out this was all unnecessary.

Firstly this was due to panpipes modifying some of its own code to write in some syscall instructions. Dumping that memory we were able to read the code and see what was happening. At first the negative syscall numbers confused us (neither having significant experience with the Mach side of the xnu kernel) until we found that they were Mach system calls.

After more reverse engineering we found out that these Mach calls were sending messages to various ports, and these messages had certain message id's which we didn't understand.

Due to our inexperience with the Mach part of the kernel this gave us a road block for a while until a lucky Google search that turned up some information on what all the message id's that are used as part of the IPC mechanism mean. With this list (which we later found is also in various files under /usr/include/mach/) we were able to translate the message id's to functions and start to make some sense of what was going on.

It turned out that the function we had mostly been looking at was called from the dyld module initialiser function. So, this was called before main, and panpipes had to deliberately set this up by adding an entry to the module initialiser table in the binary. With the function names that were being called we could look up the documentation and figure out that calls like task_set_exception_ports were just there to screw with gdb.

Modifying the binary to erase those calls with an instruction setting the return value to zero soon fixed those. But then there was a pthread_create. Now realising that panpipes had already used a lot of red herrings we tried disabling the pthread_* calls, and we still had a kernel panic!

The only other cloaking measures we found were multiple calls to thread_create (3 of them), and setting the state of the thread in the newly created task to look like something bad happened with the stack (register 1 was set so that *($r1) == 0), so that it was executing at a specific address (0x52b4), and that the calling function was a function which calls into I/O Kit — the I/O Kit had nothing to do with the panic even though these were perhaps the only "interesting" symbols linked into panpipes.

Proposed Fix

The problem is caused by the BSD and Mach parts of the kernel not being in sync. The Mach part of the kernel needs to ensure either that task->bsd_info is set to a reasonable value for newly created mach tasks (via task_create), or that a new task can not be created by userland without creating it through the BSD side of the kernel (that is, through a fork() or exec() call).

A simple (and completely untested) fix would be to set a new task's bsd_info field to that of its parent task. There may be a huge number of unintended consequences created by this style of fix that someone with a much greater understanding of the xnu kernel would need to evaluate, however it should cause this panic to be resolved.

Time Spent

We spent roughly 25 to 30 hours.

Bios

Andrew and Graham are both Australian National University (ANU) undergraduate students. Andrew is studying Computer Science, and Graham is studying Honours in Physics.

Andrew Wellington and Graham Dennis

Amit: The astute reader will notice some minor discrepancies between this analysis and my description of panpipes. However, I point this out for technical accuracy: given that Andrew and Graham essentially started with a black box, the amount of detail they extracted is remarkable.