DHC11 - A 68HC11 Disassembler

logo
    This page described a multi-pass code-seeking disassembler for the Motorola 68HC11 and other compatible processors includig the 6800, 6801, 6802, 6803, etc. It includes a number of features to enhance the readability of traditional disassemblies. It has been used by the author for various applications including disassembling GM (including Holden) vehicle ECMs.

    Get latest version of DHC11

The disassembler is described in enough detail to enable anyone familiar with disassembler and/or assembler concepts to begin using it immediately. There is a disassembler tutorial for DCH11. The disassembler's output is designed to be assembled by the author's companion macro assembler - ASHC11. but the output is flexible enough to be used by most good 68HC11 assemblers. The mnemonics chosen are not Motorola standard, but have been designed for readability by a majority of programmers familiar with either the 68xx, Z80, or other manufacturer's microcontrollers. The disassembler's output can be immediately assembled back to the original binary image for verification of the disassembly process.

Features of the Disassembler.

Code Seeker: A code-seeking disassembler will "look for" instructions that modify the target CPU's PC. Target addresses for CALL and BRANCH instructions are tagged as such. On each disassembly pass, new target addresses may be found as more code is "discovered" from the CPU's address space. The CPU's address space (ie. the loaded ROM/code image) is initially assumed to be data, and when a disassembly pass produces no more new target addresses for code, the code seeker has finished.

Flexible Output: A number of options allow the output to be tailored for firstly finding out the structure of an unknown binary file, and later, for producing an output that can be most easily commented and made ready for re-assembly.

Few Limitations: There are very few limitations on the disassembler. It will disassemble up 64 kbyte (65536 byte) binary files with the only real limitation being on the size of the user's symbol table, such that up to 8,192 labels of up to 128 characters each may be defined. Commands that declare labels are Label, Entry, Vectors, and Indirect.

Supporting Tools: The disassembler is a complete package, but a couple of tools round out its use. These include:

The next sections describe the disassembler.

Configuration File

Information about the code is stored in a configuration file, and controls how the disassembler operates. Labels, address tables, and data names can all be stored here, and re-used in subsequent disassemblies. Each file line is made up of a command with optional parameters. The following lists all supported commands, and each is further described below. Case (upper or lower) is not significant except in label names if they are supplied. Optional parameters are shown in square brackets [<optional>].
 Command & Arguments     Action
 INput <file>            Specify source file to disassemble.
 OUTput <file>           Specify where disassembly is written.
 LOad <addr>             Source file will be loaded to address <addr>.
 Entry <addr> [<name>]   Provide a code entry point <addr> with optional label <name>.
 Label <addr> <name>     Assign a label <name> to address <addr>.
 INDEXed <start> <end>   Define address range where indexed address are entered.
 Addresses               Show addresses in disassembly.
 OPcodes                 Show opcodes in disassembly.
 ASCii                   Show byte data as ASCII strings.
 Bytes <addr> <count> [<name>]  Define a byte table at <addr> of <count> length.
 Words <addr> <count> [<name>]  Define a word table at <addr> of <count> length.
 Indirect <addr> [<name>[ <here>]]         Define a pointer to to an (indirect) address.
 Vectors <addr> <count> [<name>[ <here>]]  Define a range of indirect addresses.

Command Line Switches: A number of options are handled by supplying command line switch options. Command line switches override any matching options that are also specified in the configuration file. Only an abbreviated form of the command line switch is required, eg, -a is enough to specify that the output should contain addresses. The minimum number of characters required for each switch is indicated by the upper case characters in the following description (eg for OVerwrite, just OVA, or OVA, is required):

  Switch        Effect
  -INput=       Name of binary file to read.
  -OUTput=      Name of disassembly file produced.
  -OVerwrite    Forces output file to be overwritten if it already exists.
  -LOad=        Specify hex start address to load binary file into memory.
  -Addresses    Show addresses on left of each disassembled line.
  -OPcodes      Show opcodes for instructions disassembled.
  -ASCii        Show data byte ASCII equivalents.
  -@            Use procedure local labels, ie. "@ labels".
  -LColons      Use a colon (:) suffix on labels (default=TRUE).
  -LPrefix=     Prefix string for labels (default=L).
  -HPrefix=     Prefix string for hex constants (default=$).
  -Bitimmediate Display immediate bytes as a bit# [+bit# ..] mask.
  -Defsperline= Maximum number of db, or dw items per line (default 10).
  -FILLminimum= db count of same value to force the fill pseudo-op (default 10).
  -FRagment     Decode a code Fragment, don't relocate it to high memory.
  -Verbose      Show control file information as it's decoded.
  -#            Calculate data addresses from probable IX/IY immediates.
A switch option can be negated with a "-" suffix, or asserted with a "+" suffix (the default), as in: -op- to turn the option off. Switches requiring a parameter must use either an equals "=" or colon ":" separator, as in: LOAD=$c000 to define the load address. Note also that DHC11 does not really need to use the initial "-" when defining a switch.

Option and Switch Details

The following describes all commands (as they appear in the control file) and switches (as they appear on the command line). Command that have equivalent switches are shown with a leading "-".

File Specification -OUTput=, -INput=, -OVerwrite, -Load=

-OUTput=<filename>
Specifies the output file name the disassembly will be written to. The disassembler will first test to see if the file exist, and will exit without any action if so. To over come this situation you can use the -OVA option (described below) to overwrite the old file.

-INput=<filename>
Specifies the input file. It is assumed to be in a BINARY format.

-OVerwrite
This tells the disassembler to overwrite the old output file (which results in the old file's contents being lost).

-Load=<loadaddress>
Is the address the Binary file image will be loaded into. If the binary image is too large, or the load address selected causes the data to overflow, then an error message is generated and the disassembler aborts. Note that the load address is not required as the the disassembler assumes the last word of the binary file will be at address $FFFE, as this is the HC11's reset vector.

Disassembly display: -Addresses, -OPcodes

-Addresses
Displays the instruction/code address at each disassembly line, as in:

  D063            beq     LD071
  D065    LD065:  ldaA    LC008
  D068            cmpA    #$AA
-OPcodes
Displays the opcode bytes for each instruction (note: this does not display data bytes, that are already decoded):

  5F                      clrB
  08                      incX
  18 BC C0 06             cmpY    LC006
Combining the two options -A -OP produces:

  F091    12 2D 40 11     LF091:  brset   L002D, #%01000000, LF0A6
  F095    CE F3 17                ldX     #$F317
  F098    18 1F 00 10 0B          brclr   0, Y, #%00010000, LF0AC

Label Options: -@ (local labels), -LColons, -LPrefix

-@ (local labels)
Specifies that procedure local labels are to be used instead of the default label (described below). Below is an example of code disassembled with the -@ switch. Note that the instruction at label @21 branches to a default style label (LE277). Local labels are bounded by data or entry points that are the target of call instructions.

  @19     brset   L0039, #%0000010, @21    ; @19 and @20 are local labels
          bset    L0039, #%0000010
  @20     ldaA    LC682                    ; LC682 is an entry point
          staA    L00C5                    ; L00C5 is a data lable
  @21     brset   L0001, #%0000010, LE277  ; local labels & an entry point

-LColons
Specifies that a colon (:) suffix is to be used on labels. Note that local labels are always shown without a colon. By default colons are used.

-LPrefix=<labelPrefix>
Specifies the prefix string used for non-local labels automatically generated by the disassembler. The default prefix is "L" so the labels for address $1A2B would be shown as L1A2B. A prefix string of more than two characters may cause undesirable indenting of the disassembly.

Constant Options : -HPrefix, -Bitimmediate

-HPrefix=<hexPrefixString>
Specifies the prefix string for hex constants. The default is $ and another possible prefix is 0x. The prefix you use may depend on what your assembler will accept. Here's an example using HPrefix=0x, LPrefix=x and LColons-.

          adcA    #0x00                   ; 0x is the hex prefix
          call    xEDDB                   ; x is the label prefix
  xC134   ldX     #0xC69B                 ; xC134 is label for this instruction
          staA    x00C1                   ; x00C1 is a data byte label

-Bitimmediate
Display immediate bytes as either a bit mask or an inverted bit mask. Normally the immediate byte field used for instructions such as ldaA is shown as a binary value %00100010. When this option is enabled, this value would be shown as (bit5+bit1). If more than 4 bits in a mask are set then the inverted form of the mask is used as shown:

          bitB    #bit0                  ; same as  "bitB   #$01"
          xorB    #(bit5+bit4)           ; same as  "xorB   #%00110000"
          andA    #~bit3                 ; same as  "andA   #%11110111"
          brclr   L00C3, #$FF, @22       ; $FF still is used if all bits set
  LEEBF:  andB    #~(bit5+bit0)          ; same as  "andB   #%11011110"

Disassembly Control - Label, Entry, Indirect, Vectors

The following commands, all within the control file, tell the disassembler information about the binary image it will process. The more information that can be determined and supplied here, the better the resulting disassembly will be. Optional parameters are enclosed in square [ ] brackets.

Label <addr> <label>
This simply assigns a label to an address. No assumptions is made that this address corresponds to data or code. Note that all commands can use optional labels, so this command is generally not required.

Entry <addr> [<entrylabel>]
Tells the disassembler the location of starting points for code. Optionally, a label will be assigned to this entry point. The code seeking algorithm scans memory for these locations, and automatically adds new entry points as branch and call instructions are encountered. When no new entry points are added in a single pass, then the disassembler has completed the code seek phase.

Indirect <addr> [<indirectlabel> [<herelabel>]]
An indirect address is a 16 bit quantity (ie. word, or two bytes) that is used by the processor to form a target (jump or call) address. This is illustrated by the disassembler's output:

  <indirectlabel>:  ldS     #$01FF
                     .
                     .
  <addrlabel>:      dw      <indirectlabel>
The word at memory address <addrlabel> points to an address that is tagged as an entry point, and the optional <indirectlabel> is the label for this address. And finally, <addrlabel> will be tagged as a data word. The ordering of labels was chosen to maintain compatibility with existing disassemblers.

Vectors <addr> <count> [<labelbase>[ <herelabel>]]
The Vectors command describes a list of indirect data words, as would be produced by a jump table, or list of procedure addresses. The number of data words (or vectors) is defined by <count>. The optional <labelbase>, if supplied, is used to create a label, for each indirect address, of the form <labelbase>_NN, where NN starts from 00. The optional <herelabel> is the address (ie. <addr>) of the word table. Note that NN is one less than <count>.

  <labelbase>_00:  ldS     #$01FF
                    .
  <labelbase>_01:  ldX     #$1234
                    .
                    .
  <labelbase>_NN:  xorB    #%00110000
                    .

  <herelabel>:     dw      <labelbase>_00
                   dw      <labelbase>_01
                    .
                   dw      <labelbase>_NN

Advanced Disassembly Commands - INDEXed, -# (calculate index addresses)

INDEXed <startaddr> <endaddr>
The INDEXed command specifies a range of addresses for which the disassembler will attempt to use the last know value of the IX and IY registers in indexed address calculations.

-# (calculate index addresses)
Indexed data access calculations will only be made when the -# command line switch is supplied. The purpose of this switch, and the INDEXed command, is to ensure that all data accesses are recorded as well as can be done.

Different Mnemonics

The following mnemonics are different to those as specified by Motorola.
  DHC11's Mnemonics           Motorola's     Function Performed
  call                           JSR         Call
  callr                          BSR         Call Relative (short call)
  cmpD, cmpX cmpY                CP?         Compare (16 bit register)
  decX, decY, decS               DE?         Decrement (16 bit register)
  di                             SEI         Disable Interrupts
  ei                             CLI         Enable Interrupts
  incX, incY, incS               IN?         Increment (16 bit register)
  jr                             BRA         Jump Relative (short jump)
  push, pushB, pushX, pushY      PSH?        Push on to stack
  popA, popB, popX, popY         PUL?        Pop off stack
  ret                            RTS         Return (from subroutine)
  reti                           RTI         Return From Interrupt
  xorA, xorB                     EOR?        eXclusive Or
As you can see, DHC11's mnemonics use, at most, one extra character, but this makes their meaning much clearer, and is closer to a majority of other assembler syntaxes. In addition, the mnemonics are displayed in a mixed case that is designed to highlight the registers use by the instruction. For example, LDA, the Load A instruction is displayed as ldA to emphasise that the A register is used in this ld instruction. The tAB and xgDY are examples of instructions that use two registers in the one mnemonic.
















Last Updated 29th November 2001 (links)

Statistics by   www.digits.com
Shows approximate hits since 15 May 2000.


This document is copyright © 2000, 2001, Tech Edge Pty. Ltd.
Author P. Gargano

Home | e-mail DHC11 Feedback | Copyright