-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathml-help.txt
505 lines (381 loc) · 27.3 KB
/
ml-help.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
CBM-Transfer V1.24 (C)2021 Steve J. Gray
===================================================
Interactive Symbolic Disassembler with Code Tracing
Help File
===================================================
INTRODUCTION
------------
The CBM-Transfer "ASM Viewer" allows you to disassemble 6502-family binary files. This includes ROM code,
or RAM-based programs. It is a symbolic disassembler, which associates meaningful names to memory
locations. It can handle code and non-code segments. Non-code segments can be displayed in various
different formats, and comments can be added to make the resulting output easier to read. Various
Platforms can be selected with important memory location symbols predefined for immediate use. Different
6502 variants are supported as are undefined opcodes. You can easily define your own Platforms or
processor variants and add them to the program. Several code output formats are supported from
interactive-friendly, to final form, and for different target assemblers.
PROCESS OVERVIEW
----------------
To successfully disassemble binary code we need to know:
* What platform it is for?
* What CPU is used?
* Where the binary loads and/or runs?
* What is code and what is non-code (data)?
The first two are the most straight-forward. You probably know what platform it is designed for,
(if not see below) which also tells you what CPU is used. The third one (load/run) may be more
difficult. ROM files may not contain a load address, so you must know where they go. Similarly,
loadable files may load anywhere, or may load at one spot and be relocated to another to run. The last
one (code vs non-code) is going to be the most difficult part. The majority of binary files will
contain both code and non-code. The non-code could be any data in any format and in any location. Code
and non-code could be mixed together, meaning the disassembler could get to a data block and try to
interpret it as code, producing non-sensical instructions and potentially getting out of sync
resulting in an incorrect disassembly.
As part of the disassembly process you will need to go over and over the results, identifying
non-code areas, and defining them so that the disassembler knows when to disassemble code and
when to display data until the entire binary disassembles properly. The Flow Tracer can help here.
STARTING
--------
The disassembler is built into CBM-Transfer's File Viewer. You select a file from the LocalPC or from
within a disk image then click the VIEW button. A window will open with several coloured boxes
along the top. Since CBM files have no set file extensions you must manually choose how to view
the file. The "ASM" button activates the disassembler. If there is an associated "ASM-Proj" file
then the ASM tab will automatically be selected.
The first time you disassemble a file it's almost certain that you will get some code but mixed with
unknown opcodes. This is because most machine code files contain non-code segments such as data tables,
command tables, text strings, vector tables etc. These must be identified before a complete disassembly
is possible.
CBM files can be straight binary or can be "loadable", meaning that the load address is at the start of
the file. It is important that you know which type of file you are disassembling. If the load address is
NOT in the file then you must manually enter it. The load address is vital because the disassembler must
be able to determine if jumps, branches and vectors are within the range of the file, or not, in order to
generate proper labels.
Ok, so we have a file loaded and viewed. If there was an "ASM-Proj" file associated with the viewed file
it will automatically be loaded and the side panel will be made visible. If there is no Project file then
you should click on the ">>" button to open up the side panel. The side panel is where you will be
specifying information to help the disassembler do its thing. There are 9 tabs:
1) Project..... Has settings and buttons to control the overall operation.
2) Tracer...... Flow tracer. Traces code and finds data blocks.
3) Gen Labels... Shows generated labels and will be updated each time the ASM code is refreshed.
4) Ext JSR..... Lists External JSR target addresses, updated on refresh.
5) Entry Pt.... Entry points into the code. Used for the Tracer.
6) Symbols..... Shows Symbolic addresses, names and comments, for known machine locations.
7) Tables...... Shows various data tables.
8) Labels...... Shows user entered labels.
9) Comments.... Shows user entered comments.
The "Gen Labels" and "Ext JSR" tabs show internally generated info which is updated on each refresh.
The Entry Pt, Symbols, Tables, Labels, and Comments tabs show user-inputted entries. For these tabs there
are LOAD and SAVE buttons to load or save the entries from/to a file. There are also ADD, DEL, and FIND
buttons for adding new entries, deleting entries or finding entries in the output list.
Project Tab
-----------
The project tab is the control centre for disassembly. It lets you select options and settings.
Load...................... Loads an "ASM-PROJECT" file containing symbols, tables, labels and comments.
Save...................... Save the project file.
Project Status Box........ Displays status of Project (Green=OK, Red=Changed, White=No Project Loaded).
New....................... Clears all the Lists and starts a new Project.
Clear Lists on Load....... When check, lists are cleared before new data is loaded.
Platform dropdown......... Lets you select pre-configured symbols for a specific machine platform.
CPU....................... Sets which 6502 CPU or variant opcodes are used. There is also an entry for
a normal 6502 CPU "illegal opcodes" set.
View Fmt.................. Set the output format. The first entry is recommended when starting. Other
entries are more suited to final outputs for assemblers.
Target.................... Set the strings used for BYTE/TEXT/WORD special directives.
Label Prefix.............. Sets the prefix string used for generated labels.
Comment Divider length.... Sets the width of a divider comment.
Inline Comment Position... Column position to start inline comments.
Starting Ln# / Increment... Sets the first line and increment when using a format with 'nnn'
Show Equates.............. Adds used symbols to the top of the output.
Show Block Comments....... Block comments can be quite long. Uncheck to hide them.
Add blank line...RTS/RTI... Option to add a blank line after subroutines to make listing more readable.
Add blank line...labels.... Option to add a blank line before labels to make listing more readable.
Include Symbol comments... Option to add comments associated with symbols inline.
Disassembly:
Save.................... Save output to a file.
Copy to Clipboard....... Places output in the clipboard.
Re-Assemble............. Assemble using current view format
Compare................. Check to load Re-ASM binary to HEX viewer
Symbols:
Purge................... Removes symbols not referenced in the code.
Import.................. Imports symbols from comma-delimited or fixed-width text files. See below.
Help...................... Displays this file.
When the ML Disassembler is first selected it will load its configuration info from the file
"ml-config.txt". You may edit this file to add additional platforms, CPUs, or Prefix's. Look at the existing
entries to make sure your new entries are compatible.
There is a Platform called "CBM Identifier". If you are disassembling a CBM binary and you don't know
what Platform it is for, select this one first. This platform contains a collection of callable ROM
routine addresses for multiple platforms, with each entry marked with the platform it is for. These will
be displayed in the "Ext JSR" list. Once you identify the target platform you can select it from the
platform list to get platform-specific symbols.
If you create a new platform and you think it might be useful for others, feel free to email it to me for
inclusion in future CBM-Transfer releases. I will be sure to credit you as well.
The Current Disassembly VIEW FMT can be re-assembled using ACME. Make sure the ACME.EXE file is in the CBM-Transfer
folder. Make sure to select a VIEW FMT that is compatible for ACME (IE: no line numbers or byte values before
the assembly mneumonics. Enable the "Compare" option if you want to load the resulting binary into a SPLIT VIEW
with the HEX Viewer.
Click the Re-Assemble button. The current ASM lising will be saved to the same folder as the source file but with
a ".REASM" extension. The output will have a ".OUT" extension and previous versions will be deleted before ACME runs.
If there are errors in the reassembly a pop-up window will display them. Note any line numbers with errors. Enable
the info line ("v" button) to show the line# of the selected line. Or, click on the box to enter a line# to jump to.
This will let you see where the error is. If there are no errors and you have the Compare option clicked, the
SPLIT VIEW mode, with the HEX viewer on the right side. The .OUT file will be loaded and the comparison report will
be visible.
Tracer Tab
----------
The Flow Tracer is a simple CPU simulator that tries to determine what is code, and what is data, automatically.
As the simulator reads instructions they are marked as "code". Anything not marked is considered "data". For
the tracer to function it is important to add all Entry Points into the EntryPt (EP) list. If an entry point
is missed then the Flow Tracer will not flow and potential code will not be marked, and will be treated as
"data". This process is not perfect but may help in identifying data vs code.
The Flow Tracer starts from the last EP in the list. The EP is deleted, and then code "execution" starts from that
point. If an instruction does not change the flow of execution it is ignored. When an instruction is reached
that changes the flow, each possible path is added to the EP list. When code reaches an instruction that stops
execution in that path, it is done that branch. New entry points are read from the EP list until the list is
empty.
When the list is empty the tracer is done. Any bytes that have not been marked are assumed to be data. The
entire code range is checked and a list of data blocks is built up.
For example, if you were to disassemble the C64 KERNAL ROM you would know that there are 3 fixed entry points
for RESET, IRQ and NMI (FFF7, FFFA, FFFD) at the end. These addresses would be entered into the EP list.
There is also the Kernal Jump table. Each one must be added to the EP list.
Once all your EP's have been added, clicking on START will start the flow tracing. Output from the tracer will
be shown in the ML list window. As new entry points are found they will be added to the EP list window. As
branches are followed they are removed from the EP list until the list is empty, at which point tracing is
complete. The data bytes are then listed. Click the "Add to Tables" button to add each data table block to the
Tables list. Each table block is added as "selected". If "Add Labels" checkbox is enabled there will be a Label
entry added for each data block.
If you find that the data blocks seem very large it's possible that you missed an entry point, or that the code
uses jump tables (additional entry points into code).
To display the disassembly again click the REFRESH button.
Gen Labels Tab
--------------
This tab shows labels that have been automatically generated by the disassembler. Whenever the target
address of a JMP, JSR or Branch instruction falls in the code range a label is generated. A separate
label is generated for each instance so you will likely see multiple identical entries. That will actually
help you see which addresses are used the most, which will help you make permanent named labels.
Generated labels will appear in the ASM listing in the form "{PREFIX}XXXX", where XXXX is the hex value
of the target address. The Prefix can be selected in the Project tab. You can click on the
REMOVE DUPLICATES button, to remove duplicate entries, but remember that next time you refresh they will
be added again.
Clicking the FIND button finds the currently selected entry.
Ext JSR Tab
-----------
This tab lists the target addresses of JSR calls external to the code range. Usually these will be calls
to ROM subroutines (ie: BASIC or KERNAL). This will help you identify platform subroutines used in the
code. If the address is in the symbol table the comment will be shown beside the address. Click on an entry
then the FIND button to find references in the output. You can click on the REMOVE DUPLICATES button, to
remove duplicate entries, but remember that next time you refresh they will be added again.
Clicking the FIND button finds the currently selected entry.
EntryPt, Symbols, Tables, Labels, Comment Tabs - Common Features
----------------------------------------------------------------
These Tabs share common features.
Load.............. Loads a text file containing entries for the selected tab list. The list will be cleared first if
the "Clear Lists on Load" option is checked in the Project tab. If you un-check this option then
new entries will be added to existing entries and will be sorted by address.
Save.............. Saves entries to a text file.
Add............... Adds an entry to the selected tab list. You will be prompted to input the data and shown
the proper data format. All addresses must be in hex.
Del............... Delete the selected entry.
Find.............. Finds the selected entry in the disassembly output.
You can edit an entry in the list by Double-clicking it. See below for details of each tab.
EntryPt Tab
-----------
This tab shows Entry Points. Entry points are addresses that are known to be code that is executed. This could
be the address a SYS command might use to initialize the code, or a known Vector that the CPU has etc. These
are used by the Flow Tracer. Each entry is followed in order to mark addresses as 'code'. In this way the
Tracer can find non-code (Data Tables). EntryPt items are optional.
Symbols Tab
-----------
This tab shows Symbols. Symbols are locations in RAM or ROM, or chip registers etc that are fixed and
have important meaning. For example, in zero page are many operating system storage locations that are
commonly used by programs. ROM contains callable routines that many programs will need to operate.
There are also things like screen ram, video, sound, and IO chips for the target platform. By defining
(or loading) a set of Symbols at the start the disassembler will be able to substitute meaningful names
into the code that will make understanding the code easier.
As the code is disassembled, whenever a Symbol is referenced it will automatically be marked with a check.
This will let you see which Symbols are being used. Only referenced Symbols will be listed when the
"Show Equates" option is enabled.
Symbols have three parts: Hex Address, Symbol Name, and Comment
The Hex Address is required, but you may leave out the name or comment. If you leave out the Name, the 4-digit
Hex Address will be used and the comment will appear on every line that the symbol appears on. If you include
the name but not the comment then obviously no comment will be shown. Leaving out both doesn't make much sense
but is allowed.
Platforms are really just symbol files, as they have the same purpose. After loading a Platform you are free to
add new symbols. They will be mixed in with existing platform symbols. When you save the project, all symbols
are saved to the project file.
Using the LOAD button, you can select symbol files of type SYM, DT, TXT or SY4. SYM and DT extension were used
in earlier versions of CBM-Transfer for SYMbols and DataTables. I recommend you use TXT files for portability.
You can load 'ReGenerator' system files (LABEL.TXT and COMMENT.TXT) into the symbol table. First, clear
all Lists, and uncheck the 'Clear Lists on Load' option. Click the LOAD button and navigate to the
Regenerator system folder you want. Now pick the "labels.txt" file. You will be asked to confirm that it
is a ReGenerator file. Now click LOAD again and load the "comments.txt" file and confirm. The comments
will be merged with the labels. If a comment and label have the same hex address the comment will be added
to the existing label entry.
You can import any comma-delimited or fixed-width text file using the Import button in the Projects TAB.
See below for details.
Tables Tab
----------
This tab shows Tables (or Data Blocks). Tables are ranges within the file that generally contain
non-executing code. These tables can contain almost anything including text strings, mathematical tables,
vectors, sprite data, fonts etc. Finding and identifying the tables in a binary file is key to generating
a complete disassembly. When adding Tables you will need to enter the start and ending address, the table
format/type, and an optional comment in the following format:
SSSS,EEEE,T{num},{COMMENT}
- Where {num} is an optional numeric value that specifies how many entries per line for that table.
Make sure your range is correct and you have the correct block type. For a quicker way, use the top
line D,H,T,R,V,W,Z buttons. See below for table types.
Each entry has a checkbox. Only selected Tables (ones that have a checkmark) are treated as non-code. By
un-checking an entry the bytes in the range will be treated as code. This can help you find code segments
that may not have been identified in the Flow Tracer.
You could also have multiple entries with the same address range but different table types. This would allow
you to determine the best format to view. You could even have different ranges marked as "X" so they are
hidden, letting you concentrate on understanding specific code sections.
Labels Tab
----------
This tab show user-added Labels. Generally you'll want to make permanent named labels for major subroutines
or code sections. Give labels a meaningful name to help you remember the code's function.
Comments Tab
------------
This tab shows Comments. Comments are text added to the disassembly to help document, clarify or divide
the code sections, and/or any individual line. There are 3 types of comments:
Inline.......... Appear after program code.
Standalone...... Are on their own line(s).
Block........... Multi-line comment blocks.
You add comments by using the "Add" button from the Comments tab, or by first selecting a line then
using the top line buttons. Your output format must include addresses!
When adding comments with the "Add" button you need to enter an address containing an opcode byte,
the comment type (See below for CODEs), and optional description. Addresses of Operand bytes will
not display!
The "[ ]" button can be used for block comments. First select the line with the address and use the
"[ ]" button. The ML area will be replaced by the Block Comment editor. You can enter any characters
here and use as many lines as you want. Press ENTER to go to the next line, or up/down etc. Do not
use any ";" characters in your comment as it is used to break lines. To edit a block comment,
double-click it in the comment tab.
You can add one or more divider lines by starting a line with the "/" character followed by the
character you wish to use for the divider. For example:
/=
/-
/*
Always start the line with the "/" and use only a single character after that, with no other characters
on the line. The divider length is set in the project tab.
Click "SAVE" when done. Use "CANCEL" to abort changes. The ML listing will be displayed again.
Remember, the "Show Block Comments" option can be used to hide or show block comments.
Top Line
========
At the top are options for Refreshing the output, quickly specifying blocks, or adding comments.
">>" or "<<"............ Hide or Show the side panel.
Black/White Toggle...... Toggles the output from White on Black to Black on White
Status Box.............. Shows Disassembly errors as yellow, or green when no errors are found. Errors are
illegal opcodes of the 6502. This box will not be relevant if a CPU with completely filled
instruction set is selected.
Refresh Checkbox........ When checked, most actions will cause a refresh. If you plan to make a lot of changes/additions
and your code is large, turn off auto refresh, and then click the Refresh Button when desired.
Refresh Button.......... Refreshes the disassembly.
Find.................... Enter a string to be searched for. Search starts from top of the list.
Lines containing string are hi-lighted. To un-hi-light all lines, manually select any line.
Find All................ Hi-light all lines containing search string.
[> ..................... From TOP DOWN, find line containing string.
> ...................... Find NEXT line containing string.
< ...................... Find PREVIOUS line containing string.
<]...................... From BOTTOM UP, find line containing string.
"v" Button.............. Toggles Info display area. Lets you see a wider display of information in the Tab lists. Click
on an entry and it will be shown here. Also shows the LINE NUMBER and Find string.
/ ...................... Toggles ML listing Split view. Shows two independent views of the ML listing. Note: Only the
Top View can be used for adding labels, comments etc.
Quick Add Buttons
-----------------
Label................... Adds the current line address to the Labels list.
EntryPt................. Adds the current line address to the Entry Points list.
Block Buttons: (first select a block range using SHIFT)
D....................... Decimal - Treats block as decimal numbers.
H....................... Hex - Treats block as hex numbers.
Z....................... Binary - Treats block as 8-bit binary numbers.
T....................... Text String - Display printable text as quoted text strings, or unprintable as
hex values.
R....................... RTS vectors - Treats the block as a list of word vectors to be pushed onto
the stack. Words are assumed to be pointers to addresses in the code and
labels are automatically generated for each entry. Entries are displayed with
a "-1" so that the proper bytes can be pushed to the stack.
V....................... Vectors - Treats the block as vectors to code (as above, without the -1).
W....................... Word - Treats the block as simple hex words.
X....................... Hidden - Hides the block.
Comment Buttons:
BUTTON CODE
[ ] .... [ ............ Adds a block comment.
;C ..... i ............ Adds an inline comment.
C ...... s ............ Adds a standalone comment as a single line.
-C- .... - ............ Adds a standalone comment with "-" divider lines above and below the comment.
=C= .... = ............ Adds a standalone comment with "=" divider lines.
*C* .... * ............ Adds a standalone comment with "*" divider lines.
--- .... - ............ Adds a single "-" divider line.
=== .... = ............ Adds a single "=" divider line.
*** .... * ............ Adds a single "*" divider line.
Pro Tip: You can manually edit the comment CODE and any character that is not "[","i", or "s" will be
used as the divider line character. If you supply a comment then you get a line above and below. If
you do not supply a comment you will get a single line.
And lastly, there is a status message. This will show the code range while you are working. When
disassembling it will display which pass it is working on.
Tips and Tricks
===============
* Start with standard 6502 unless you know there are 'illegal' opcodes in your file.
* Turn on Dual View and have the HEX listing as the second view.
Clicking on a line the ML output window will highlight the matching line in the HEX listing!
* Dual View with BASIC as the second view will show if there is a BASIC loader in front of the ML code.
* Click on the "Gen Labels" Tab. The address with the most duplicate entries is probably important.
* Click on the "Ext JSR" Tab. Can quickly show you routines in BASIC or KERNAL that get called.
* When you are mostly done disassembly you can PURGE the symbols that are not in use. This will also
speed up disassembly.
* Hiding the side panel will let you see long comments or give more room in dual view mode.
Importing Symbols
=================
Having proper symbols is vital when disassembling files. Symbols can help you understand how the code
works, and lets you modify it easily by changing symbol addresses etc. It's handy to be able to bring
in symbols from external sources. Symbols are stored in the program in the following format:
S1,S2,S3
Where: S1 = ADDRESS (4 Hex digits)
S2 = SYMBOL
S3 = COMMENT
You can import from almost any text file containing records with Delimited or Fixed-width fields. Delimited
records contain fields separated by a specific character. Fixed-Width records have fields in specific character
positions, and of set lengths. Lines starting with ";" or ":" will not be processed. You will be asked to enter
the "import control string". The following are supported:
TYPE DESCRIPTION CONTROL STRING WHERE
---- ----------- ------------------- -----
C Comma-delimited C,F1,F2,F3 Fn = Field number (starting at 1) for 'Sn'
T Tab-delimited T,F1,F2,F3 Fn = Field number (starting at 1) for 'Sn'
F Fixed-width F,P1,L1,P2,L2,P3,L3 Pn = Start Position for 'Sn'. Ln = Length for 'Sn'
Examples
--------
Comma-delimited records:
NMI,FFFA,65530,Processor NMI Vector
RESET,FFFC,65532,Processor RESET Vector
IRQ,FFFE,65534,Processor IRQ/BRK Vector
Field#: 1 ,2 ,3 ,4
So, Comma-Delimited is "C",
S1 (ADDRESS) is Field#2,
S2 (SYMBOL) is Field#1,
S3 (COMMENT) is Field#4.
Use control string: C,2,1,4
Fixed-width records:
NMI FFFA 65530 Processor NMI Vector
RESET FFFC 65532 Processor RESET Vector
IRQ FFFE 65534 Processor IRQ/BRK Vector
Pos: 1234567890123456789012345678901...
So, Fixed-width is "F",
S1 (ADDRESS) is at positions 8, with length 4. NOTE: Length should always be 4!
S2 (SYMBOL) is at positions 1, with length 6.
S3 (COMMENT) is at positions 22, with length about 60 (or so).
Use control string: F,8,4,1,6,22,60
When you import it will tell you how many symbols were loaded. Click on the Symbols tab to make sure
that they loaded in as expected. Save the symbol table separately, or as part of the project file.
Feedback
========
I wrote this mostly as a tool for myself. I realize that the average CBM user probably won't make use of it,
but if you try it and have ideas to improve it, please contact me. Of course bug reports and suggestions are
always welcome!
Thank-you's
===========
Thank-you to the following who have provided resources or inspiration:
* Graham of Oxyron.de for nicely organized 6502 opcode info
* DA65 by Ullrich von Bassewitz - I wanted my disassembler to be a kind of GUI version of DA65's info files
* Mike from 6502.org for great 6502 resources and hosting for my project and software webpages.
* Bo from zimmers.net for great CBM resources
* ReGenerator by n0stalgia - Inspired the Platforms feature (you can import its system files too!)
* ACME by Marco Baye - Assembler I use for all my 6502 projects