Crafting a Tiny Mach-O Executable

The other day I came across this web page in which the author describes his experiment to create a tiny ELF executable that will run on Linux. The result: a 45-byte ELF executable that executes and returns a value. The executable is functionally equivalent to the one generated from compiling the following C program.

  /* tiny.c */
  int main(void) { return 42; }

Apparently recent Linux kernels do stricter checks on ELF executables, because of which the aforementioned 45-byte executable no longer works. A slightly larger, 64-byte version still works at the time of this writing.

Anyway, as far as tiny executables go, ELF on Linux is taken care of. It would be interesting to repeat a similar experiment for Mach-O executables on Mac OS X.

Let us first see how large the executable generated from the C program is on Mac OS X.

$ cat tiny.c
main() { return 42; }
$ sw_vers
...
ProductVersion: 10.5.6
...
$ gcc -Oz -o tiny tiny.c
$ strip tiny
$ ls -las tiny
32 -rwxr-xr-x  1 singh  wheel  12348 Mar 15 17:26 tiny

The following assembly language program can be compiled to generate a 165-byte Mach-O executable that runs on Mac OS X and returns the wisely chosen value 42.

; tiny.asm for Mac OS X (Mach-O Object File Format)
; nasm -f bin -o tiny tiny.asm

BITS 32
        org   0x1000

        db    0xce, 0xfa, 0xed, 0xfe       ; magic
        dd    7                            ; cputype (CPU_TYPE_X86)
        dd    3                            ; cpusubtype (CPU_SUBTYPE_I386_ALL)
        dd    2                            ; filetype (MH_EXECUTE)
        dd    2                            ; ncmds
        dd    _start - _cmds               ; cmdsize
        dd    0                            ; flags
_cmds:
        dd    1                            ; cmd (LC_SEGMENT)
        dd    44                           ; cmdsize
        db    "__TEXT"                     ; segname
        db    0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ; segname
        dd    0x1000                       ; vmaddr
        dd    0x1000                       ; vmsize
        dd    0                            ; fileoff
        dd    filesize                     ; filesize
        dd    7                            ; maxprot

        dd    5                            ; cmd (LC_UNIXTHREAD)
        dd    80                           ; cmdsize
        dd    1                            ; flvaor (i386_THREAD_STATE)
        dd    16                           ; count (i386_THREAD_STATE_COUNT)
        dd    0, 0, 0, 0, 0, 0, 0, 0       ; state
        dd    0, 0, _start, 0, 0, 0, 0, 0  ; state
_start:
        xor   eax,eax
        inc   eax
	push  byte 42
        sub   esp, 4
        int   0x80                         ; _exit(42)

filesize equ  $ - $$

We can compile and run this program as follows, verifying that its return value is indeed 42.

$ nasm -f bin -o tiny tiny.asm
$ ls -las tiny
8 -rw-r--r--  1 singh  admin  165 Mar 15 12:21 tiny
$ chmod 755 tiny
$ ./tiny
$ echo $?
42

Some points to note:

  • The executable—Mach-O header, load commands, the text segment—is manually crafted in assembly using the nasm 80x86 assembler. The C compiler toolchain is not involved.
  • The executable is unusual for Mac OS X in that the dynamc link editor (dyld) is not involved in running it. No dynamic libraries are involved either.
  • The program makes a "direct" system call through the int 0x80 interface. This is a big no-no on Mac OS X—production code should not be bypassing the C library for making system calls, but then hopefully you won't be writing production code using such techniques. A specific caveat is that system call implementation may do things differently—in terms of arguments, return values, and such—from the user-callable interface. The implementation may also change across system revisions, so such code may break with a system update.
  • The int 0x80 system call path is a legacy path on Mac OS X. It may be removed from Mac OS X some day, in which case the program would need to be modified to use newer alternatives such as sysenter.
  • The executable is not a "correct" Mach-O file, even though the kernel can parse and run it in our case. The reason it's incorrect is because Mach-O load commands have been deliberately made to overlap each other to save some bytes. The otool object-file introspection command will be quite unhappy with this executable. In the following output, inconsistencies are shown in red.

$ otool -l tiny
tiny:
Load command 0
      cmd LC_SEGMENT
  cmdsize 44 Inconsistent size
  segname __TEXT
   vmaddr 0x00001000
   vmsize 0x00001000
  fileoff 0
 filesize 165
  maxprot 0x00000007
 initprot 0x00000005
   nsects 80
    flags 0x1
Section
  sectname
   segname  (does not match segment)
      addr 0x00000000
      size 0x00000000
    offset 0
     align 2^4248 (16777216)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
section structure command extends past end of load commands
Section
  sectname
   segname  (does not match segment)
      addr 0x00000000
      size 0x00000000
    offset 0
     align 2^0 (1)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0

At the cost of roughly 80 additional bytes, we can create a "more correct" Mach-O executable that otool will be happy with.

; nicertiny.asm for Mac OS X (Mach-O Object File Format)
; nasm -f bin -o nicertiny nicertiny.asm

BITS 32
        org   0x1000

        db    0xce, 0xfa, 0xed, 0xfe       ; magic
        dd    7                            ; cputype (CPU_TYPE_X86)
        dd    3                            ; cpusubtype (CPU_SUBTYPE_I386_ALL)
        dd    2                            ; filetype (MH_EXECUTE)
        dd    2                            ; ncmds
        dd    _start - _cmds               ; cmdsize
        dd    0                            ; flags
_cmds:
        dd    1                            ; cmd (LC_SEGMENT)
        dd    124                          ; cmdsize
        db    "__TEXT"                     ; segname
        db    0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ; segname
        dd    0x1000                       ; vmaddr
        dd    0x1000                       ; vmsize
        dd    0                            ; fileoff
        dd    filesize                     ; filesize
        dd    7                            ; maxprot
        dd    5                            ; initprot
        dd    1                            ; nsects
        dd    0                            ; flags
        db    "__text"                     ; sectname
        db    0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ; sectname
        db    "__TEXT"                     ; segname
        db    0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ; segname
        dd    _start                       ; addr
        dd    _end - _start                ; size;
        dd    _start - 0x1000              ; offset
        dd    2                            ; align
        dd    0                            ; reloff
        dd    0                            ; nreloc
        dd    0                            ; flags
        dd    0                            ; reserved1
        dd    0                            ; reserved2

        dd    5                            ; cmd (LC_UNIXTHREAD)
        dd    80                           ; cmdsize
        dd    1                            ; flavor (i386_THREAD_STATE)
        dd    16                           ; count (i386_THREAD_STATE_COUNT)
        dd    0, 0, 0, 0, 0, 0, 0, 0       ; state
        dd    0, 0, _start, 0, 0, 0, 0, 0  ; state
_start:
        xor   eax, eax
        inc   eax
        push  dword 42
        sub   esp, 4
        int   0x80                         ; _exit(42)
_end:
filesize equ  $ - $$

Let us compile and run this version, and see what otool has to say.

$ nasm -f bin -o nicertiny nicertiny.asm
$ ls -las nicertiny
8 -rw-r--r--  1 singh  admin  248 Mar 15 14:49 nicertiny
$ chmod 755 nicertiny
$ ./nicertiny
$ echo $?
42
$ otool -l nicertiny
nicertiny:
Load command 0
      cmd LC_SEGMENT
  cmdsize 124
  segname __TEXT
   vmaddr 0x00001000
   vmsize 0x00001000
  fileoff 0
 filesize 248
  maxprot 0x00000007
 initprot 0x00000005
   nsects 1
    flags 0x0
Section
  sectname __text
   segname __TEXT
      addr 0x000010e8
      size 0x00000010
    offset 232
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Load command 1
        cmd LC_UNIXTHREAD
    cmdsize 80
     flavor i386_THREAD_STATE
      count i386_THREAD_STATE_COUNT
	    eax 0x00000000 ebx    0x00000000 ecx 0x00000000 edx 0x00000000
	    edi 0x00000000 esi    0x00000000 ebp 0x00000000 esp 0x00000000
	    ss  0x00000000 eflags 0x00000000 eip 0x000010e8 cs  0x00000000
	    ds  0x00000000 es     0x00000000 fs  0x00000000 gs  0x00000000

It would be a nice exercise for the reader to try to shrink tiny.asm and nicertiny.asm even further, while retaining the high-level behavior of the corresponding executables. There are plenty of zeros lurking in there.

Comments are closed.


All contents of this site, unless otherwise noted, are ©1994-2014 Amit Singh. All Rights Reserved.