Android Dex文件结构
dex— Dalvik Executable FormatCopyright © 2007 The Android Open Source Project
This document describes the layout and contents of.dexfiles, which are used to hold a set of class definitions and their associated adjunct data.
Guide To Types
Name
Description
byte
8-bit signed int
ubyte
8-bit unsigned int
short
16-bit signed int, little-endian
ushort
16-bit unsigned int, little-endian
int
32-bit signed int, little-endian
uint
32-bit unsigned int, little-endian
long
64-bit signed int, little-endian
ulong
64-bit unsigned int, little-endian
sleb128
signed LEB128, variable-length (see below)
uleb128
unsigned LEB128, variable-length (see below)
uleb128p1
unsigned LEB128 plus1, variable-length (see below)
LEB128
LEB128 ("Little-EndianBase128") is a variable-length encoding for arbitrary signed or unsigned integer quantities. The format was borrowed from theDWARF3specification. In a.dexfile, LEB128 is only ever used to encode 32-bit quantities.
Each LEB128 encoded value consists of one to five bytes, which together represent a single 32-bit value. Each byte has its most significant bit set except for the final byte in the sequence, which has its most significant bit clear. The remaining seven bits of each byte are payload, with the least significant seven bits of the quantity in the first byte, the next seven in the second byte and so on. In the case of a signed LEB128 (sleb128), the most significant payload bit of the final byte in the sequence is sign-extended to produce the final value. In the unsigned case (uleb128), any bits not explicitly represented are interpreted as0.
Bitwise diagram of a two-byte LEB128 value
First byte
Second byte
1
bit6
bit5
bit4
bit3
bit2
bit1
bit0
0
bit13
bit12
bit11
bit10
bit9
bit8
bit7
The variantuleb128p1is used to represent a signed value, where the representation is of the valueplus oneencoded as auleb128. This makes the encoding of-1(alternatively thought of as the unsigned value0xffffffff) — but no other negative number — a single byte, and is useful in exactly those cases where the represented number must either be non-negative or-1(or0xffffffff), and where no other negative values are allowed (or where large unsigned values are unlikely to be needed).
Here are some examples of the formats:
Encoded Sequence
Assleb128
Asuleb128
Asuleb128p1
00
0
0
-1
01
1
1
0
7f
-1
127
126
80 7f
-128
16256
16255
Overall File Layout
Name
Format
Description
header
header_item
the header
string_ids
string_id_item[]
string identifiers list. These are identifiers for all the strings used by this file, either for internal naming (e.g., type descriptors) or as constant objects referred to by code. This list must be sorted by string contents, using UTF-16 code point values (not in a locale-sensitive manner).
type_ids
type_id_item[]
type identifiers list. These are identifiers for all types (classes, arrays, or primitive types) referred to by this file, whether defined in the file or not. This list must be sorted bystring_idindex.
proto_ids
proto_id_item[]
method prototype identifiers list. These are identifiers for all prototypes referred to by this file. This list must be sorted in return-type (bytype_idindex) major order, and then by arguments (also bytype_idindex).
field_ids
field_id_item[]
field identifiers list. These are identifiers for all fields referred to by this file, whether defined in the file or not. This list must be sorted, where the defining type (bytype_idindex) is the major order, field name (bystring_idindex) is the intermediate order, and type (bytype_idindex) is the minor order.
method_ids
method_id_item[]
method identifiers list. These are identifiers for all methods referred to by this file, whether defined in the file or not. This list must be sorted, where the defining type (bytype_idindex) is the major order, method name (bystring_idindex) is the intermediate order, and method prototype (byproto_idindex) is the minor order.
class_defs
class_def_item[]
class definitions list. The classes must be ordered such that a given class's superclass and implemented interfaces appear in the list earlier than the referring class.
data
ubyte[]
data area, containing all the support data for the tables listed above. Different items have different alignment requirements, and padding bytes are inserted before each item if necessary to achieve proper alignment.
link_data
ubyte[]
data used in statically linked files. The format of the data in this section is left unspecified by this document; this section is empty in unlinked files, and runtime implementations may use it as they see fit.
Bitfield, String, and Constant Definitions
DEX_FILE_MAGIC
embedded inheader_item
The constant array/stringDEX_FILE_MAGICis the list of bytes that must appear at the beginning of a.dexfile in order for it to be recognized as such. The value intentionally contains a newline ("\n"or0x0a) and a null byte ("\0"or0x00) in order to help in the detection of certain forms of corruption. The value also encodes a format version number as three decimal digits, which is expected to increase monotonically over time as the format evolves.
ubyte DEX_FILE_MAGIC = { 0x64 0x65 0x78 0x0a 0x30 0x33 0x35 0x00 }
= "dex\n035\0"
Note:At least a couple earlier versions of the format have been used in widely-available public software releases. For example, version009was used for the M3 releases of the Android platform (November-December 2007), and version013was used for the M5 releases of the Android platform (February-March 2008). In several respects, these earlier versions of the format differ significantly from the version described in this document.
ENDIAN_CONSTANTandREVERSE_ENDIAN_CONSTANT
embedded inheader_item
The constantENDIAN_CONSTANTis used to indicate the endianness of the file in which it is found. Although the standard.dexformat is little-endian, implementations may choose to perform byte-swapping. Should an implementation come across a header whoseendian_tagisREVERSE_ENDIAN_CONSTANTinstead ofENDIAN_CONSTANT, it would know that the file has been byte-swapped from the expected form.
uint ENDIAN_CONSTANT = 0x12345678;
uint REVERSE_ENDIAN_CONSTANT = 0x78563412;
NO_INDEX
embedded inclass_def_itemanddebug_info_item
The constantNO_INDEXis used to indicate that an index value is absent.
Note:This value isn't defined to be0, because that is in fact typically a valid index.
Also Note:The chosen value forNO_INDEXis representable as a single byte in theuleb128p1encoding.
uint NO_INDEX = 0xffffffff; // == -1 if treated as a signed int
access_flagsDefinitions
embedded inclass_def_item,field_item,method_item, andInnerClass
Bitfields of these flags are used to indicate the accessibility and overall properties of classes and class members.
Name
Value
For Classes (andInnerClassannotations)
For Fields
For Methods
ACC_PUBLIC
0x1
public: visible everywhere
public: visible everywhere
public: visible everywhere
ACC_PRIVATE
0x2
*private: only visible to defining class
private: only visible to defining class
private: only visible to defining class
ACC_PROTECTED
0x4
*protected: visible to package and subclasses
protected: visible to package and subclasses
protected: visible to package and subclasses
ACC_STATIC
0x8
*static: is not constructed with an outerthisreference
static: global to defining class
static: does not take athisargument
ACC_FINAL
0x10
final: not subclassable
final: immutable after construction
final: not overridable
ACC_SYNCHRONIZED
0x20
synchronized: associated lock automatically acquired around call to this method.Note:This is only valid to set whenACC_NATIVEis also set.
ACC_VOLATILE
0x40
volatile: special access rules to help with thread safety
ACC_BRIDGE
0x40
bridge method, added automatically by compiler as a type-safe bridge
ACC_TRANSIENT
0x80
transient: not to be saved by default serialization
ACC_VARARGS
0x80
last argument should be treated as a "rest" argument by compiler
ACC_NATIVE
0x100
native: implemented in native code
ACC_INTERFACE
0x200
interface: multiply-implementable abstract class
ACC_ABSTRACT
0x400
abstract: not directly instantiable
abstract: unimplemented by this class
ACC_STRICT
0x800
strictfp: strict rules for floating-point arithmetic
ACC_SYNTHETIC
0x1000
not directly defined in source code
not directly defined in source code
not directly defined in source code
ACC_ANNOTATION
0x2000
declared as an annotation class
ACC_ENUM
0x4000
declared as an enumerated type
declared as an enumerated value
(unused)
0x8000
ACC_CONSTRUCTOR
0x10000
constructor method (class or instance initializer)
ACC_DECLARED_
SYNCHRONIZED
0x20000
declaredsynchronized.Note:This has no effect on execution (other than in reflection of this flag, per se).
*Only allowed on forInnerClassannotations, and must not ever be on in aclass_def_item.
MUTF-8 (Modified UTF-8) Encoding
As a concession to easier legacy support, the.dexformat encodes its string data in a de facto standard modified UTF-8 form, hereafter referred to as MUTF-8. This form is identical to standard UTF-8, except:
[*]Only the one-, two-, and three-byte encodings are used.
[*]Code points in the rangeU+10000…U+10ffffare encoded as a surrogate pair, each of which is represented as a three-byte encoded value.
[*]The code pointU+0000is encoded in two-byte form.
[*]A plain null byte (value0) indicates the end of a string, as is the standard C language interpretation.
The first two items above can be summarized as: MUTF-8 is an encoding format for UTF-16, instead of being a more direct encoding format for Unicode characters.
<span style="">The final two items above make it simultaneously possible to i
页:
[1]