Chapters

Hide chapters

Advanced Apple Debugging & Reverse Engineering

Fourth Edition · iOS 16, macOS 13.3 · Swift 5.8, Python 3 · Xcode 14

Section I: Beginning LLDB Commands

Section 1: 10 chapters
Show chapters Hide chapters

Section IV: Custom LLDB Commands

Section 4: 8 chapters
Show chapters Hide chapters

17. Hello, Mach-O
Written by Walter Tyree

Heads up... You're reading this book for free, with parts of this chapter shown beyond this point as scrambled text.

Mach-O is the file format used for a compiled program running on any of your Apple operating systems. Knowledge of the format is important for both debugging and reverse engineering, since the layout Mach-O defines is applicable to how the executable is stored on disk as well as how the executable is loaded into memory.

Knowing which area of memory an instruction is referencing is useful on the reverse engineering side, but there are a number of useful hidden treasures on the debugging front when exploring Mach-O. For example:

  • You can introspect an external function call at runtime.
  • You can quickly find the reference to a singleton’s memory address without having to trip a breakpoint.
  • You can inspect and modify variables in your own app or other frameworks
  • You can perform security audits and make sure no internal, secret messages are being sent out into production in the form of strings or methods.

This chapter introduces the concepts of Mach-O, while the next chapter, Mach-O Fun will show the amusing things that are possible with this knowledge. Make sure you have that caffeine on board for this chapter since the theory comes first, followed by the fun in the following chapter.

Terminology

Before diving into the weeds with all the different C structs you’re about to view, it would be best to take a high level view of the Mach-O layout.

This is the layout of every compiled executable; every main program, every framework, every kernel extension, everything that’s compiled on an Apple platform.

At the start of every compiled Apple program is the Mach-O header that gives information about the CPU this program can run on, the type of executable it is (A framework? A standalone program?) as well as how many load commands immediately follow it.

Load commands are instructions on how to load the program and are made up of C structs, which vary in size depending on the type of load command.

Some of the load commands provide instructions about how to load segments. Think of segments as areas of memory that have a specific type of memory protection. For example, executable code should only have read and execute permissions; it doesn’t need write permissions.

Other parts of the program, such as global variables or singletons, need read and write permissions, but not executable permissions. This means that executable code and the address to global variables will live in separate segments.

Segments can have 0 or more subcomponents called sections. These are more finely-grained areas bound by the same memory protections given by their parent segment.

Take another look at the above diagram. Segment Command 1, points to an offset in the executable that contains four section commands, while Segment Command 2 points to an offset that contains 0 section commands. Finally, Segment Command 3 doesn’t point to any offset in the executable.

It’s these sections that can be of profound interest to developers and reverse engineerers since they each serve a unique purpose to the program. For example, there’s a specific section to store hard-coded UTF-8 strings, there’s a specific section to store references to statically defined variables and so on.

The ultimate goal of these two Mach-O chapters is to show you some interesting load commands in this chapter, and reveal some interesting sections in the next chapter.

In this chapter, you’ll be seeing a lot of references to system headers. If you see something like mach-o/stab.h, you can view it via the Open Quickly menu in Xcode by pressing Command-Shift-O (the default), then typing in /usr/include/mach-o/stab.h.

I’d recommend adding a /usr/include/ to the search query since Xcode isn’t all that smart at times.

If you want to view this header without Xcode, then the physical location will be at:

${PATH_TO_XCODE}/Contents/Developer/Platforms/${SYSTEM_PLATFORM}.platform/Developer/SDKs/${SYSTEM_PLATFORM}.sdk/usr/include/mach-o/stab.h

Where ${SYSTEM_PLATFORM} can be MacOSX, iPhoneOS, iPhoneSimulator, WatchOS, etc.

Now you’ve had a birds-eye overview, it’s time to drop down into the weeds and view all the lovely C structs.

The Mach-O Header

At the beginning of every compiled Apple executable is a special struct that indicates if it’s a Mach-O executable. This struct can be found in mach-o/loader.h.

struct mach_header_64 {
  uint32_t  magic;    /* mach magic number identifier */
  cpu_type_t  cputype;  /* cpu specifier */
  cpu_subtype_t cpusubtype; /* machine specifier */
  uint32_t  filetype; /* type of file */
  uint32_t  ncmds;    /* number of load commands */
  uint32_t  sizeofcmds; /* the size of all the load commands */
  uint32_t  flags;    /* flags */
  uint32_t  reserved; /* reserved */
};
/* Constant for the magic field of the mach_header_64 (64-bit architectures) */
#define MH_MAGIC_64 0xfeedfacf /*the 64-bit mach magic number*/
#define MH_CIGAM_64 0xcffaedfe /*NXSwapInt(MH_MAGIC_64)*/
#define MH_OBJECT 0x1   /* relocatable object file */
#define MH_EXECUTE  0x2 /* demand paged executable file */
#define MH_FVMLIB 0x3   /* fixed VM shared library file */
#define MH_CORE   0x4   /* core file */
... // there’s way more below but ommiting for brevity...

Mach-O Header in grep

Open up a Terminal window. I’ll pick on the grep executable command, but you can pick on any Terminal command that suits your interests. Type the following:

xxd -l 32 /usr/bin/grep
00000000: cffa edfe 0700 0001 0300 0080 0200 0000  ................
00000010: 1300 0000 4007 0000 8500 2000 0000 0000  ....@..... .....
cffa edfe
cf fa ed fe
fe ed fa cf
xxd -e -l 32 /usr/bin/grep
00000000: feedfacf 01000007 80000003 00000002  ................
00000010: 00000013 00000740 00200085 00000000  ....@..... .....
struct mach_header_64 {
  uint32_t      magic      = 0xfeedfacf
  cpu_type_t    cputype    = 0x01000007
  cpu_subtype_t cpusubtype = 0x80000003
  uint32_t      filetype   = 0x00000002
  uint32_t      ncmds      = 0x00000013
  uint32_t      sizeofcmds = 0x00000740
  uint32_t      flags      = 0x00200085
  uint32_t      reserved   = 0x00000000
};
#define CPU_ARCH_ABI64    0x01000000  /* 64 bit ABI */
...
#define CPU_TYPE_X86    ((cpu_type_t) 7)

The Fat Header

Some executables are actually a group of one or more executables “glued” together. For example, many apps compile both a 32-bit and 64-bit executable and place them into a “fat” executable. This “gluing together” of multiple executables is indicated by a fat header, which also has a unique magic value differentiating it from a Mach-O header.

#define FAT_MAGIC 0xcafebabe
#define FAT_CIGAM 0xbebafeca  /* NXSwapLong(FAT_MAGIC) */

struct fat_header {
  uint32_t  magic;    /* FAT_MAGIC or FAT_MAGIC_64 */
  uint32_t  nfat_arch;  /* number of structs that follow */
};

...
#define FAT_MAGIC_64  0xcafebabf
#define FAT_CIGAM_64  0xbfbafeca  /* NXSwapLong(FAT_MAGIC_64) */
struct fat_arch_64 {
  cpu_type_t  cputype;  /* cpu specifier (int) */
  cpu_subtype_t cpusubtype; /* machine specifier (int) */
  uint64_t  offset;   /* file offset to this object file */
  uint64_t  size;   /* size of this object file */
  uint32_t  align;    /* alignment as a power of 2 */
  uint32_t  reserved; /* reserved */
};
struct fat_arch {
  cpu_type_t  cputype;  /* cpu specifier (int) */
  cpu_subtype_t cpusubtype; /* machine specifier (int) */
  uint32_t  offset;   /* file offset to this object file */
  uint32_t  size;   /* size of this object file */
  uint32_t  align;    /* alignment as a power of 2 */
};
file /System/Library/Frameworks/WebKit.framework/Frameworks/libWebKitSwift.dylib
/System/Library/Frameworks/WebKit.framework/Frameworks/libWebKitSwift.dylib: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit dynamically linked shared library x86_64] [arm64e:Mach-O 64-bit dynamically linked shared library arm64e]
/System/Library/Frameworks/WebKit.framework/Frameworks/libWebKitSwift.dylib (for architecture x86_64):	Mach-O 64-bit dynamically linked shared library x86_64
/System/Library/Frameworks/WebKit.framework/Frameworks/libWebKitSwift.dylib (for architecture arm64e):	Mach-O 64-bit dynamically linked shared library arm64e
lldb /usr/bin/grep
(lldb) image list -h dyld
[  0] 0x000000019607e000
(lldb) x/8wx 0x000000019607e000
0x19607e000: 0xfeedfacf 0x0100000c 0x80000002 0x00000007
0x19607e010: 0x0000000f 0x00000838 0x80000085 0x00000000
#define CPU_SUBTYPE_LIB64       0x80000000      /* 64 bit libraries */
#define CPU_SUBTYPE_PTRAUTH_ABI 0x80000000      /* pointer authentication with versioned ABI */
....
#define CPU_SUBTYPE_ARM64E              ((cpu_subtype_t) 2)
xxd -l 68 -e /usr/lib/dyld
00000000: bebafeca 03000000 07000000 03000000   ................
00000010: 00400000 f0570d00 0e000000 07000001   ..@...W.........
00000020: 03000000 00c00d00 d0e80f00 0e000000   ................
00000030: 0c000001 02000080 00c01d00 50620f00   ..............bP
00000040: 0e000000                              ....
xxd -l 68 -g 4 /usr/lib/dyld
00000000: cafebabe 00000003 00000007 00000003  ................
00000010: 00004000 000d57f0 0000000e 01000007  ..@...W.........
00000020: 00000003 000dc000 000fe8d0 0000000e  ................
00000030: 0100000c 80000002 001dc000 000f6250  ..............bP
00000040: 0000000e                             ....
xxd -l 32 -e -s 16384 /usr/lib/dyld
00004000: feedface 00000007 00000003 00000007   ................
00004010: 00000012 000006d4 00000085 00000001   ................

The Load Commands

Immediately following the Mach-O header are the load commands providing instructions on how an executable should be loaded into memory, as well as other miscellaneous details. This is where it gets interesting. Each load command consists of a series of structs, each varying in struct size and arguments.

struct load_command {
  uint32_t cmd;   /* type of load command */
  uint32_t cmdsize; /* total size of command in bytes */
};
#define LC_SEGMENT_64 0x19  /*64-bit segment of this file to be mapped*/
#define LC_ROUTINES_64  0x1a  /* 64-bit image routines */
#define LC_UUID   0x1b  /* the uuid */
/*
 * The uuid load command contains a single 128-bit unique random number that
 * identifies an object produced by the static link editor.
 */
struct uuid_command {
    uint32_t  cmd;    /* LC_UUID */
    uint32_t  cmdsize;  /* sizeof(struct uuid_command) */
    uint8_t uuid[16]; /* the 128-bit uuid */
};
otool -l /usr/bin/grep | grep LC_UUID -A2
    cmd LC_UUID
  cmdsize 24
    uuid F6870A1F-5337-3CF8-B7F5-2573A085C90E

Segments

The LC_UUID is a simple load command since it’s self-contained and doesn’t provide offsets into the executable’s segments/sections. It’s now time to turn your attention to segments.

lldb -n SpringBoard
(lldb) image dump sections SpringBoard
Sections for '/Users/wtyree/vmShare/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Library/Developer/CoreSimulator/Profiles/Runtimes/iOS.simruntime/Contents/Resources/RuntimeRoot/System/Library/CoreServices/SpringBoard.app/SpringBoard' (arm64):
  SectID     Type             Load Address                             Perm File Off.  File Size  Flags      Section Name
  ---------- ---------------- ---------------------------------------  ---- ---------- ---------- ---------- ----------------------------
  0x00000100 container        [0x0000000000000000-0x0000000100000000)* ---  0x00000000 0x00000000 0x00000000 SpringBoard.__PAGEZERO
  0x00000200 container        [0x0000000102bc0000-0x0000000102bd0000)  r-x  0x00000000 0x00010000 0x00000000 SpringBoard.__TEXT
  0x00000001 code             [0x0000000102bc15cc-0x0000000102bc15d0)  r-x  0x000015cc 0x00000004 0x80000400 SpringBoard.__TEXT.__text
  ... etc ...
(lldb) image dump objfile SpringBoard

Programmatically Finding Segments and Sections

For the demo part of this chapter, you’ll build a macOS executable that iterates through the loaded modules and prints all the segments and sections found in each module.

import Foundation
import MachO // 1

for i in 0..<_dyld_image_count() { // 2
  let imagePath =
    String(validatingUTF8: _dyld_get_image_name(i))! // 3
  let imageName = (imagePath as NSString).lastPathComponent
  let header = _dyld_get_image_header(i)! // 4
  print("\(i) \(imageName) \(header)")
}

CFRunLoopRun() // 5
8 CoreFoundation 0x00007fff33cf6000
(lldb) x/8wx 0x00007fff33cf6000
0x7fff33cf6000: 0xfeedfacf 0x01000007 0x00000008 0x00000006
0x7fff33cf6010: 0x00000013 0x00001100 0xc2100085 0x00000000
var curLoadCommandIterator = Int(bitPattern: header) +
  MemoryLayout<mach_header_64>.size // 1
for _ in 0..<header.pointee.ncmds {
  let loadCommand =
    UnsafePointer<load_command>(
      bitPattern: curLoadCommandIterator)!.pointee // 2

  if loadCommand.cmd == LC_SEGMENT_64 {
    let segmentCommand =
      UnsafePointer<segment_command_64>(
        bitPattern: curLoadCommandIterator)!.pointee // 3

    print("\t\(segmentCommand.segname)")
  }

  curLoadCommandIterator =
    curLoadCommandIterator + Int(loadCommand.cmdsize) // 4
}
0 MachOPOC 0x0000000100000000
  (95, 95, 80, 65, 71, 69, 90, 69, 82, 79, 0, 0, 0, 0, 0, 0)
  (95, 95, 84, 69, 88, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
  (95, 95, 68, 65, 84, 65, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
  (95, 95, 76, 73, 78, 75, 69, 68, 73, 84, 0, 0, 0, 0, 0, 0)
func convertIntTupleToString(name : Any) -> String {
  var returnString = ""
  let mirror = Mirror(reflecting: name)
  for child in mirror.children {
    guard let val = child.value as? Int8,
      val != 0 else {
        break
    }
    returnString.append(Character(UnicodeScalar(UInt8(val))))
  }

  return returnString
}
let segName = convertIntTupleToString(
  name: segmentCommand.segname)
print("\t\(segName)")
0 libBacktraceRecording.dylib 0x0000000100118000
  __TEXT
  __DATA_CONST
  __DATA
  __LINKEDIT
1 libMainThreadChecker.dylib 0x0000000100214000
  __TEXT
  __DATA_CONST
  __DATA
  __LINKEDIT
2 MachOSegments 0x0000000100000000
  __PAGEZERO
  __TEXT
  __DATA_CONST
  __DATA
  __LINKEDIT
...
for j in 0..<segmentCommand.nsects { // 1
  let sectionOffset = curLoadCommandIterator +
    MemoryLayout<segment_command_64>.size // 2
  let offset = MemoryLayout<section_64>.size * Int(j) // 3
  let sectionCommand =
    UnsafePointer<section_64>(
      bitPattern: sectionOffset + offset)!.pointee

  let sectionName =
    convertIntTupleToString(name: sectionCommand.sectname) // 4
  print("\t\t\(sectionName)")
}
2 MachOSegments 0x0000000100000000
  __PAGEZERO
  __TEXT
    __text
    __stubs
    __swift5_typeref
    __cstring
    __objc_methname
    __swift5_entry
    __const
    __swift5_builtin
    __swift5_reflstr
    __swift5_fieldmd
    __swift5_types
    __unwind_info
    __eh_frame
  __DATA_CONST
    __got
    __const
    __objc_imageinfo
  __DATA
    __objc_selrefs
    __data
  __LINKEDIT

Key Points

  • All Apple executables and libraries conform to the Mach-O format.
  • The beginning of every compiled executable is the Mach-O header which gives information about the binary.
  • Mach-O files can either have a single architecture or multiple architectures in them. Files with multiple architectures are called fat.
  • Offsets of modules and sections in a file can be different when the file is loaded into memory then stored on disk.

Where to Go From Here?

If I haven’t indirectly hinted it enough, go check out mach-o/loader.h. I’ve read that header many times myself, and each time I read it I still learn something new. There’s a lot there, so don’t get frustrated if this chapter knocked you back into your chair.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2024 Kodeco Inc.

You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a Kodeco Personal Plan.

Unlock now