Mach-O 文件结构

本文源码从苹果开源官网获得

什么是Mach-O

Mach-OMach Object文件格式的缩写,是用于 iOS 和 macOS 的可执行文件,目标代码,动态库,内核转储的文件格式。

Mach-O 文件格式

苹果官方给的一张文件结构图:


Mach-O文件结构

我们编写一个HelloWorld程序,将其编译,然后通过MachOView来打开.out文件:

可以知道Mach-O由三部分组成:

  • Header:指明了CPU架构、文件类型、Load Commands 个数等一些基本信息。
  • Load Commands:描述了怎样加载每个 Segment 的信息。在 Mach-O 文件中可以有多个 Segment,每个 Segment 可能包含零个、一个或多个 Section。
  • Data:Segment 的具体数据,包含了代码和数据等。

Header

/*
 * The 32-bit mach header appears at the very beginning of the object file for
 * 32-bit architectures.
 */
struct mach_header {
    uint32_t    magic;      /* mach magic number identifier */
    cpu_type_t  cputype;    /* cpu specifier */
    cpu_subtype_t   cpusubtype; /* machine specifier */
    uint32_t    filetype;   /* type of file */
    uint32_t    ncmds;      /* number of load commands */
    uint32_t    sizeofcmds; /* the size of all the load commands */
    uint32_t    flags;      /* flags */
};

/*
 * The 64-bit mach header appears at the very beginning of object files for
 * 64-bit architectures.
 */
struct mach_header_64 {
    uint32_t    magic;      /* mach magic number identifier */
    cpu_type_t  cputype;    /* cpu specifier */
    cpu_subtype_t   cpusubtype; /* machine specifier */
    uint32_t    filetype;   /* type of file */
    uint32_t    ncmds;      /* number of load commands */
    uint32_t    sizeofcmds; /* the size of all the load commands */
    uint32_t    flags;      /* flags */
    uint32_t    reserved;   /* reserved */
};
  • magic:魔数,0xfeedface是32位,0xcefaedfe是64位
/* Constant for the magic field of the mach_header (32-bit architectures) */
#define MH_MAGIC    0xfeedface  /* the mach magic number */
#define MH_CIGAM    0xcefaedfe  /* NXSwapInt(MH_MAGIC) */
  • cputype:CPU类型
  • cpusubtype:CPU具体类型
  • filetype:文件类型,例如可执行文件、库文件等
    文件类型filetype的宏定义有:
#define MH_OBJECT   0x1     /* relocatable object file */
#define MH_EXECUTE  0x2     /* demand paged executable file */
#define MH_FVMLIB   0x3     /* fixed VM shared library file */
#define MH_CORE     0x4     /* core file */
#define MH_PRELOAD  0x5     /* preloaded executable file */
#define MH_DYLIB    0x6     /* dynamically bound shared library */
#define MH_DYLINKER 0x7     /* dynamic link editor */
#define MH_BUNDLE   0x8     /* dynamically bound bundle file */
#define MH_DYLIB_STUB   0x9     /* shared library stub for static */
                    /*  linking only, no section contents */
#define MH_DSYM     0xa     /* companion file with only debug */
                    /*  sections */
#define MH_KEXT_BUNDLE  0xb     /* x86_64 kexts */
  • ncmds:Load Commands的数量
  • sizeofcmds:Load Commands的总大小
  • flags:标志位,用于描述该文件的详细信息。
  • reserved:64位才有的保留字段,暂时没用

标志位flags的宏定义有:

#define MH_NOUNDEFS 0x1     /* the object file has no undefined
                       references */
#define MH_INCRLINK 0x2     /* the object file is the output of an
                       incremental link against a base file
                       and can't be link edited again */
#define MH_DYLDLINK 0x4     /* the object file is input for the
                       dynamic linker and can't be staticly
                       link edited again */
#define MH_BINDATLOAD   0x8     /* the object file's undefined
                       references are bound by the dynamic
                       linker when loaded. */
#define MH_PREBOUND 0x10        /* the file has its dynamic undefined
                       references prebound. */
#define MH_SPLIT_SEGS   0x20        /* the file has its read-only and
                       read-write segments split */
#define MH_LAZY_INIT    0x40        /* the shared library init routine is
                       to be run lazily via catching memory
                       faults to its writeable segments
                       (obsolete) */
#define MH_TWOLEVEL 0x80        /* the image is using two-level name
                       space bindings */
#define MH_FORCE_FLAT   0x100       /* the executable is forcing all images
                       to use flat name space bindings */
#define MH_NOMULTIDEFS  0x200       /* this umbrella guarantees no multiple
                       defintions of symbols in its
                       sub-images so the two-level namespace
                       hints can always be used. */
#define MH_NOFIXPREBINDING 0x400    /* do not have dyld notify the
                       prebinding agent about this
                       executable */
#define MH_PREBINDABLE  0x800           /* the binary is not prebound but can
                       have its prebinding redone. only used
                                           when MH_PREBOUND is not set. */
#define MH_ALLMODSBOUND 0x1000      /* indicates that this binary binds to
                                           all two-level namespace modules of
                       its dependent libraries. only used
                       when MH_PREBINDABLE and MH_TWOLEVEL
                       are both set. */ 
#define MH_SUBSECTIONS_VIA_SYMBOLS 0x2000/* safe to divide up the sections into
                        sub-sections via symbols for dead
                        code stripping */
#define MH_CANONICAL    0x4000      /* the binary has been canonicalized
                       via the unprebind operation */
#define MH_WEAK_DEFINES 0x8000      /* the final linked image contains
                       external weak symbols */
#define MH_BINDS_TO_WEAK 0x10000    /* the final linked image uses
                       weak symbols */

#define MH_ALLOW_STACK_EXECUTION 0x20000/* When this bit is set, all stacks 
                       in the task will be given stack
                       execution privilege.  Only used in
                       MH_EXECUTE filetypes. */
#define MH_DEAD_STRIPPABLE_DYLIB 0x400000 /* Only for use on dylibs.  When
                         linking against a dylib that
                         has this bit set, the static linker
                         will automatically not create a
                         LC_LOAD_DYLIB load command to the
                         dylib if no symbols are being
                         referenced from the dylib. */
#define MH_ROOT_SAFE 0x40000           /* When this bit is set, the binary 
                      declares it is safe for use in
                      processes with uid zero */
                                         
#define MH_SETUID_SAFE 0x80000         /* When this bit is set, the binary 
                      declares it is safe for use in
                      processes when issetugid() is true */

#define MH_NO_REEXPORTED_DYLIBS 0x100000 /* When this bit is set on a dylib, 
                      the static linker does not need to
                      examine dependent dylibs to see
                      if any are re-exported */
#define MH_PIE 0x200000         /* When this bit is set, the OS will
                       load the main executable at a
                       random address.  Only used in
                       MH_EXECUTE filetypes. */

对于上面的HelloWorld程序来说,它的Header信息如下:

Load Commands

struct load_command {
    uint32_t cmd;       /* type of load command */
    uint32_t cmdsize;   /* total size of command in bytes */
};
  • cmd类型:指定command类型
  • cmdsize:表示command大小,用于计算到下一个command的偏移量

cmd类型:

cmd 作用
LC_SEGMENT/LC_SEGMENT_64 将段内数据加载映射到内存中去
LC_SYMTAB 符号表信息
LC_DYSYMTAB 动态符号表信息
LC_DYLD_INFO_ONLY 动态库信息
LC_LOAD_DYLINKER 启动dyld
LC_UUID 唯一标识符
LC_SOURCE_VERSION 源代码版本
LC_MAIN 程序入口
LC_LOAD_DYLIB 加载动态库
LC_FUNCTION_STARTS 函数符号表
LC_DATA_IN_CODE Data注入代码地址
LC_CODE_SIGNATURE 代码签名信息

segment

首先看看segment的定义:

struct segment_command { /* for 32-bit architectures */
    uint32_t    cmd;        /* LC_SEGMENT */
    uint32_t    cmdsize;    /* includes sizeof section structs */
    char        segname[16];    /* segment name */
    uint32_t    vmaddr;     /* memory address of this segment */
    uint32_t    vmsize;     /* memory size of this segment */
    uint32_t    fileoff;    /* file offset of this segment */
    uint32_t    filesize;   /* amount to map from the file */
    vm_prot_t   maxprot;    /* maximum VM protection */
    vm_prot_t   initprot;   /* initial VM protection */
    uint32_t    nsects;     /* number of sections in segment */
    uint32_t    flags;      /* flags */
};
  • cmd:上面提到的Load Command类型
  • cmdsize:Load Command大小
  • segname[16]:段名称
segname 含义
__PAGEZERO 可执行文件捕获空指针的段
__TEXT 代码段和只读数据
__DATA 全局变量和静态变量
__LINKEDIT 包含动态链接器所需的符号、字符串表等数据
  • vmaddr:段虚拟地址(未偏移),真实虚拟地址要加上ASLR的偏移量
  • vmsize:段的虚拟地址大小
  • fileoff:段在文件内的地址偏移
  • filesize:段在文件内的大小
    加载segment的过程,就是从文件偏移fileoff处,将大小为filesize的段,加载到虚拟机vmaddr处。
  • nsects:段内section数量
  • flags:标志位,用于描述详细信息
    标志位宏定义:
#define SG_HIGHVM   0x1 /* the file contents for this segment is for
                   the high part of the VM space, the low part
                   is zero filled (for stacks in core files) */
#define SG_FVMLIB   0x2 /* this segment is the VM that is allocated by
                   a fixed VM library, for overlap checking in
                   the link editor */
#define SG_NORELOC  0x4 /* this segment has nothing that was relocated
                   in it and nothing relocated to it, that is
                   it maybe safely replaced without relocation*/
#define SG_PROTECTED_VERSION_1  0x8 /* This segment is protected.  If the
                       segment starts at file offset 0, the
                       first page of the segment is not
                       protected.  All other pages of the
                       segment are protected. */

section

section的定义:

struct section { /* for 32-bit architectures */
    char        sectname[16];   /* name of this section */
    char        segname[16];    /* segment this section goes in */
    uint32_t    addr;       /* memory address of this section */
    uint32_t    size;       /* size in bytes of this section */
    uint32_t    offset;     /* file offset of this section */
    uint32_t    align;      /* section alignment (power of 2) */
    uint32_t    reloff;     /* file offset of relocation entries */
    uint32_t    nreloc;     /* number of relocation entries */
    uint32_t    flags;      /* flags (section type and attributes)*/
    uint32_t    reserved1;  /* reserved (for offset or index) */
    uint32_t    reserved2;  /* reserved (for count or sizeof) */
};
  • sectname:section名称
  • segname:所属的segment名称
    (大写的__TEXT代表segment,小写的__text代表section
sectname 含义
__text 主程序代码
__subs 桩代码
__stub_helper 用于动态链接,启动dyld
__cstring 硬编码的C字符串
__la_symbol_ptr 延迟加载
__data 初始化的可变的变量
  • addr:section在内存中的地址
  • size:section大小
  • offset:section在文件中的偏移
  • align:内存对齐边界
  • reloff:重定位入口在文件中的偏移
  • nreloc:重定位入口数量

推荐阅读更多精彩内容