layout | title | permalink |
---|---|---|
page |
QuickAssembler 2.01 Programmer's Guide |
/pubs/pc/reference/microsoft/mspl13/masm/qaprog/ |
{% raw %}
Microsoft(R) QuickAssembler Programmer's Guide Version 2.01
════════════════════════════════════════════════════════════════════════════
Microsoft(R) QuickAssembler Programmer's Guide Version 2.01
════════════════════════════════════════════════════════════════════════════
Information in this document is subject to change without notice and does
not represent a commitment on the part of Microsoft Corporation. The
software described in this document is furnished under a license agreement
or nondisclosure agreement. The software may be used or copied only in
accordance with the terms of the agreement. It is against the law to copy
the software on any medium except as specifically allowed in the license
or nondisclosure agreement. No part of this manual may be reproduced or
transmitted in any form or by any means, electronic or mechanical,
including photocopying and recording, for any purpose without the express
written permission of Microsoft.
(C)Copyright Microsoft Corporation, 1989. All rights reserved.
Simultaneously published in the U.S. and Canada.
Printed and bound in the United States of America.
Microsoft, MS, MS-DOS, GW-BASIC, QuickC, and XENIX are registered
trademarks of Microsoft Corporation.
IBM is a registered trademark of International Business Machines
Corporation.
Intel is a registered trademark of Intel Corporation.
Document No. LN0114-201-R00-0689
Part No. 06792
10 9 8 7 6 5 4 3 2 1
────────────────────────────────────────────────────────────────────────────
Table of Contents
Introduction
Chapter 1 The QuickAssembler Interface
1.1 Creating the Program
1.2 Building and Running a Program
1.3 Assembling from the Command Line
1.4 Choosing C or Assembler Defaults
1.5 Using the Quick Advisor (Help)
1.6 Debugging Assembly Code
1.6.1 Debugging .COM Files
1.6.2 Specifying Expressions
1.6.3 Tracing Execution
1.6.4 Modifying Registers and Flags
1.7 Viewing a Listing File
Chapter 2 Introducing 8086 Assembly Language
2.1 Programming the 8086 Family
2.2 Instructions, Directives, and Operands
2.2.1 The Name Field
2.2.2 The Operation Field
2.2.3 The Operand Field
2.2.4 The Comment Field
2.2.5 Entering Numbers in Different Bases
2.2.6 Line-Continuation Character
2.3 8086-Family Instructions
2.3.1 Data-Manipulation Instructions
2.3.1.1 The MOV Instruction
2.3.1.2 The ADD Instruction
2.3.1.3 The SUB Instruction
2.3.1.4 The INC and DEC Instructions
2.3.1.5 The AND Instruction
2.3.1.6 The MUL Instruction
2.3.2 Control-Flow Instructions
2.3.2.1 The JMP Instruction
2.3.2.2 The CMP Instruction
2.3.2.3 The Conditional Jump Instructions
2.4 Declaring Simple Data Objects
2.5 8086-Family Registers
2.5.1 The General-Purpose Registers
2.5.1.1 The AX Register
2.5.1.2 The BX Register
2.5.1.3 The CX Register
2.5.1.4 The DX Register
2.5.2 The Index Registers
2.5.3 The Pointer Registers
2.5.3.1 The BP Register
2.5.3.2 The SP Register
2.5.3.3 The IP Register
2.5.4 The Flags Register
2.6 Addressing Modes
2.6.1 Immediate Operands
2.6.2 Register Operands
2.6.3 Direct Memory Operands
2.6.4 Indirect Memory Operands
2.7 Segmented Addressing and Segment Registers
Chapter 3 Writing Assembly Modules for C Programs
3.1 A Skeleton for Procedure Modules
3.1.1 The .MODEL Directive
3.1.2 The .CODE Directive
3.1.3 The PROC Directive
3.1.4 The ENDP and END Statements
3.2 Instructions Used in This Chapter
3.3 Decimal Conversion Example
3.4 Decimal Conversion with Far Data Pointers
3.4.1 Writing a Model-Independent Procedure
3.4.2 Accessing Far Data through ES
3.5 Hexadecimal Conversion Example
Chapter 4 Writing Stand-Alone Assembly Programs
4.1 A Skeleton for Stand-Alone Programs
4.1.1 The .MODEL Directive
4.1.2 The .STACK, .CODE, and .DATA Directives
4.1.3 The .STARTUP Directive
4.2 Instructions Used in This Chapter
4.3 A Program That Says Hello
4.4 Inside the Stack Segment
4.5 Inside the Data Segment
4.6 Inside the Code Segment
4.7 Making the Program Repeat Itself
4.8 Creating .COM Files
4.9 Creating .COM Files with Full Segment Definitions
Chapter 5 Defining Segment Structure
5.1 Simplified Segment Directives
5.1.1 Understanding Memory Models
5.1.2 Specifying DOS Segment Order
5.1.3 Defining Basic Attributes of the Module
5.1.4 Defining Simplified Segments
5.1.4.1 How to Use Simplified Segments
5.1.4.2 How Simplified Segments Are Implemented
5.1.5 Using Predefined Segment Equates
5.1.6 Simplified Segment Defaults
5.1.7 Default Segment Names
5.2 Full Segment Definitions
5.2.1 Setting the Segment-Order Method
5.2.2 Defining Full Segments
5.2.2.1 Controlling Alignment with Align Type
5.2.2.2 Defining Segment Combinations with Combine Type
5.2.2.3 Controlling Segment Structure with Class Type
5.3 Defining Segment Groups
5.4 Associating Segments with Registers
5.5 Initializing Segment Registers
5.5.1 Initializing the CS and IP Registers
5.5.2 Initializing the DS Register
5.5.3 Initializing the SS and SP Registers
5.5.4 Initializing the ES Register
5.6 Nesting Segments
Chapter 6 Defining Constants, Labels, and Variables
6.1 Constants
6.1.1 Integer Constants
6.1.1.1 Specifying Integers with Radix Specifiers
6.1.1.2 Setting the Default Radix
6.1.2 Packed Binary Coded Decimal Constants
6.1.3 Real-Number Constants
6.1.4 String Constants
6.1.5 Determining Floating-Point Format
6.2 Assigning Names to Symbols
6.3 Using Type Specifiers
6.4 Defining Code Labels
6.4.1 Near-Code Labels
6.4.2 Anonymous Labels
6.4.3 Procedure Labels
6.4.4 Code Labels Defined with the LABEL Directive
6.5 Defining and Initializing Data
6.5.1 Variables
6.5.1.1 Integer Variables
6.5.1.2 Binary Coded Decimal Variables
6.5.1.3 String Variables
6.5.1.4 Real-Number Variables
6.5.2 Arrays and Buffers
6.5.3 Labeling Variables
6.5.4 Pointer Variables
6.6 Setting the Location Counter
6.7 Aligning Data
Chapter 7 Using Structures and Records
7.1 Structures
7.1.1 Declaring Structure Types
7.1.2 Defining Structure Variables
7.1.3 Using Structure Operands
7.2 Records
7.2.1 Declaring Record Types
7.2.2 Defining Record Variables
7.2.3 Using Record Operands and Record Variables
7.2.4 Record Operators
7.2.4.1 The MASK Operator
7.2.4.2 The WIDTH Operator
7.2.5 Using Record-Field Operands
Chapter 8 Creating Programs from Multiple Modules
8.1 Declaring Symbols Public
8.2 Declaring Symbols External
8.3 Using Multiple Modules
8.4 Declaring Symbols Communal
8.5 Specifying Library Files
Chapter 9 Using Operands and Expressions
9.1 Using Operands with Directives
9.2 Using Operators
9.2.1 Calculation Operators
9.2.1.1 Arithmetic Operators
9.2.1.2 Structure-Field-Name Operator
9.2.1.3 Index Operator
9.2.1.4 Shift Operators
9.2.1.5 Bitwise Logical Operators
9.2.2 Relational Operators
9.2.3 Segment-Override Operator
9.2.4 Type Operators
9.2.4.1 PTR Operator
9.2.4.2 SHORT Operator
9.2.4.3 THIS Operator
9.2.4.4 HIGH and LOW Operators
9.2.4.5 SEG Operator
9.2.4.6 OFFSET Operator
9.2.4.7 .TYPE Operator
9.2.4.8 TYPE Operator
9.2.4.9 LENGTH Operator
9.2.4.10 SIZE Operator
9.2.5 Operator Precedence
9.3 Using the Location Counter
9.4 Using Forward References
9.4.1 Forward References to Labels
9.4.2 Forward References to Variables
9.5 Strong Typing for Memory Operands
Chapter 10 Assembling Conditionally
10.1 Using Conditional-Assembly Directives
10.1.1 Testing Expressions with IF and IFE Directives
10.1.2 Testing the Pass with IF1 and IF2 Directives
10.1.3 Testing Symbol Definition with IFDEF and IFNDEF Directi
10.1.4 Verifying Macro Parameters with IFB and IFNB Directives
10.1.5 Comparing Macro Arguments with IFIDN and IFDIF Directiv
10.1.6 ELSEIF Directives
10.2 Using Conditional-Error Directives
10.2.1 Generating Unconditional Errors with .ERR, .ERR1, and .
Directives
10.2.2 Testing Expressions with .ERRE or .ERRNZ Directives
10.2.3 Verifying Symbol Definition with .ERRDEF and .ERRNDEF
Directives
10.2.4 Testing for Macro Parameters with .ERRB and .ERRNB
Directives
10.2.5 Comparing Macro Arguments with .ERRIDN and .ERRDIF
Directives
Chapter 11 Using Equates, Macros, and Repeat Blocks
11.1 Using Equates
11.1.1 Redefinable Numeric Equates
11.1.2 Nonredefinable Numeric Equates
11.1.3 String Equates
11.1.4 Predefined Equates
11.2 Using Macros
11.2.1 Defining Macros
11.2.2 Calling Macros
11.2.3 Using Local Symbols
11.2.4 Exiting from a Macro
11.3 Text-Macro String Directives
11.3.1 The SUBSTR Directive
11.3.2 The CATSTR Directive
11.3.3 The SIZESTR Directive
11.3.4 The INSTR Directive
11.3.5 Using String Directives Inside Macros
11.4 Defining Repeat Blocks
11.4.1 The REPT Directive
11.4.2 The IRP Directive
11.4.3 The IRPC Directive
11.5 Using Macro Operators
11.5.1 Substitute Operator
11.5.2 Literal-Text Operator
11.5.3 Literal-Character Operator
11.5.4 Expression Operator
11.5.5 Macro Comments
11.6 Using Recursive, Nested, and Redefined Macros
11.6.1 Using Recursion
11.6.2 Nesting Macro Definitions
11.6.3 Nesting Macro Calls
11.6.4 Redefining Macros
11.6.5 Avoiding Inadvertent Substitutions
11.7 Managing Macros and Equates
11.7.1 Using Include Files
11.7.2 Purging Macros from Memory
Chapter 12 Controlling Assembly Output
12.1 Sending Messages to the Standard Output Device
12.2 Controlling Page Format in Listings
12.2.1 Setting the Listing Title
12.2.2 Setting the Listing Subtitle
12.2.3 Controlling Page Breaks
12.2.4 Naming the Module
12.3 Controlling the Contents of Listings
12.3.1 Suppressing and Restoring Listing Output
12.3.2 Controlling Listing of Conditional Blocks
12.3.3 Controlling Listing of Macros
Chapter 13 Loading, Storing, and Moving Data
13.1 Transferring Data
13.1.1 Copying Data
13.1.2 Exchanging Data
13.1.3 Looking Up Data
13.1.4 Transferring Flags
13.2 Converting between Data Sizes
13.2.1 Extending Signed Values
13.2.2 Extending Unsigned Values
13.3 Loading Pointers
13.3.1 Loading Near Pointers
13.3.2 Loading Far Pointers
13.4 Transferring Data to and from the Stack
13.4.1 Pushing and Popping
13.4.2 Using the Stack
13.4.3 Saving Flags on the Stack
13.4.4 Saving All Registers on the Stack
13.5 Transferring Data to and from Ports
Chapter 14 Doing Arithmetic and Bit Manipulations
14.1 Adding
14.1.1 Adding Values Directly
14.1.2 Adding Values in Multiple Registers
14.2 Subtracting
14.2.1 Subtracting Values Directly
14.2.2 Subtracting with Values in Multiple Registers
14.3 Multiplying
14.4 Dividing
14.5 Calculating with Binary Coded Decimals
14.5.1 Unpacked BCD Numbers
14.5.2 Packed BCD Numbers
14.6 Doing Logical Bit Manipulations
14.6.1 AND Operations
14.6.2 OR Operations
14.6.3 XOR Operations
14.6.4 NOT Operations
14.7 Shifting and Rotating Bits
14.7.1 Multiplying and Dividing by Constants
14.7.2 Moving Bits to the Least-Significant Position
14.7.3 Adjusting Masks
14.7.4 Shifting Multiword Values
Chapter 15 Controlling Program Flow
15.1 Jumping
15.1.1 Jumping Unconditionally
15.1.2 Jumping Conditionally
15.1.2.1 Comparing and Jumping
15.1.2.2 Jumping Based on Flag Status
15.1.2.3 Testing Bits and Jumping
15.2 Looping
15.3 Using Procedures
15.3.1 Calling Procedures
15.3.2 Defining Procedures
15.3.3 Passing Arguments on the Stack
15.3.4 Declaring Parameters with the PROC Directive
15.3.5 Using Local Variables
15.3.6 Creating Locals Automatically
15.3.7 Variable Scope
15.3.8 Setting Up Stack Frames
15.4 Using Interrupts
15.4.1 Calling Interrupts
15.4.2 Defining and Redefining Interrupt Routines
15.5 Checking Memory Ranges
Chapter 16 Processing Strings
16.1 Setting Up String Operations
16.2 Moving Strings
16.3 Searching Strings
16.4 Comparing Strings
16.5 Filling Strings
16.6 Loading Values from Strings
16.7 Transferring Strings to and from Ports
Chapter 17 Calculating with a Math Coprocessor
17.1 Coprocessor Architecture
17.1.1 Coprocessor Data Registers
17.1.2 Coprocessor Control Registers
17.2 Emulation
17.3 Using Coprocessor Instructions
17.3.1 Using Implied Operands in the Classical-Stack Form
17.3.2 Using Memory Operands
17.3.3 Specifying Operands in the Register Form
17.3.4 Specifying Operands in the Register-Pop Form
17.4 Coordinating Memory Access
17.5 Transferring Data
17.5.1 Transferring Data to and from Registers
17.5.1.1 Real Transfers
17.5.1.2 Integer Transfers
17.5.1.3 Packed BCD Transfers
17.5.2 Loading Constants
17.5.3 Transferring Control Data
17.6 Doing Arithmetic Calculations
17.7 Controlling Program Flow
17.7.1 Comparing Operands to Control Program Flow
17.7.1.1 Compare
17.7.1.2 Compare and Pop
17.7.2 Testing Control Flags after Other Instructions
17.8 Using Transcendental Instructions
17.9 Controlling the Coprocessor
Chapter 18 Controlling the Processor
18.1 Controlling Timing and Alignment
18.2 Controlling the Processor
18.3 Processor Directives
Appendix A Mixed-Language Mechanics
A.1 Writing the Assembly Procedure
A.1.1 Setting Up the Procedure
A.1.2 Entering the Procedure
A.1.3 Allocating Local Data (Optional)
A.1.4 Preserving Register Values
A.1.5 Accessing Parameters
A.1.6 Returning a Value (Optional)
A.1.7 Exiting the Procedure
A.2 Calls from Modules Using C Conventions
A.3 Calls from Non-C Modules
A.4 Calling High-Level Languages from Assembly Language
A.5 Using Full Segment Definitions
Appendix B Using Assembler Options with QCL
B.1 Specifying the Segment-Order Method
B.2 Checking Code for Tiny Model
B.3 Selecting Case Sensitivity
B.4 Defining Assembler Symbols
B.5 Displaying Error Lines on the Screen
B.6 Creating Code for a Floating-Point Emulator
B.7 Creating Listing Files
B.8 Enabling One-Pass Assembly
B.9 Listing All Lines of Macro Expansions
B.10 Creating a Pass 1 Listing
B.11 Specifying an Editor-Oriented Listing
B.12 Suppressing Tables in the Listing File
B.13 Adding a Line-Number Index to the Listing
B.14 Listing False Conditionals
B.15 Controlling Display of Assembly Statistics
B.16 Setting the Warning Level
Appendix C Reading Assembly Listings
C.1 Reading Code in a Listing
C.2 Reading a Macro Table
C.3 Reading a Structure and Record Table
C.4 Reading a Segment and Group Table
C.5 Reading a Symbol Table
C.6 Reading Assembly Statistics
C.7 Reading a Pass 1 Listing
Index
────────────────────────────────────────────────────────────────────────────
Introduction
If you're a C programmer who has been wanting to try out the full power of
assembly language, this is the product for you.
Microsoft(R) QuickC(R) with QuickAssembler is a package you install along
with Microsoft QuickC Version 2.0 in order to create a single powerful
environment in which you can develop C, assembly, and mixed-language
programs. What's more, QuickAssembler is an integrated environment,
containing tools for editing, assembling, compiling, and linking.
Integrated tools help you achieve faster development of assembly-language
programs.
Each MS-DOS(R) and IBM(R) PC-DOS computer is driven by one of the
processors in the 8086 family. A processor is the central motor of a
computer. It responds to its own numeric language, called "machine code."
Assembly language is very close to machine code, but it lets you use
meaningful keywords and variable names instead of difficult-to-remember
numeric codes. As a result, assembly language is convenient to use, but
gives you the ultimate in ability to control hardware and optimize code.
To support the low-level operations of assembly language, QuickAssembler
expands the general power of the QuickC environment. Increased debugging
capabilities let you change flag settings and modify registers──including
registers of the 8087 math coprocessor. Furthermore, the Quick Advisor
(the on-line Help system) is expanded to provide help on QuickAssembler
keywords as well as DOS and ROM-BIOS services.
A Note about Operating-System Terms
Microsoft documentation uses the term "OS/2" to refer to the OS/2 system──
Microsoft Operating System/2 (MS(R) OS/2) and IBM OS/2. Similarly, the
term "DOS" refers to both the MS-DOS and IBM Personal Computer DOS
operating systems. The name of a specific operating system is used when it
is necessary to note features unique to that system.
General Features
QuickAssembler does not replace the QuickC in-line assembler, which you
can continue to use inside .C files. The joint QuickC/QuickAssembler
environment puts both QuickAssember and the in-line assembler at your
disposal. But Microsoft QuickAssembler supports a number of features
beyond those supported by the in-line assembler:
■ You can write stand-alone assembly programs. These programs begin and
end with assembly code and do not include the C start-up code. Unlike
programs written from within C modules, useful stand-alone assembly
programs can be 1K (kilobyte) or even smaller.
■ You can use the assembler's rich set of macro-definition capabilities,
which go far beyond the macro capabilities supported by C. An
assembly-language macro can handle variable parameter lists, recursion,
and repeated operations. These macros are roughly as powerful and
flexible as procedure calls, but execute faster.
■ Your assembly modules can be shared by many different programs. Since
an assembly-language module is in its own file, you can write the
module once and link it to any program you want.
■ QuickAssembler is a full implementation of 8086 assembly language. You
can use the full set of the Microsoft Macro Assembler 5.1 directives
and operators.
In addition, QuickAssembler provides the best set of keywords yet
available for simplifying tedious programming tasks, such as initializing
registers at the beginning of a program or determining how to access
parameters on the stack. (Part 1 of this manual focuses on the use of
these keywords.)
QuickAssembler for QuickC is a DOS-based product, and it does not include
the following extensions to 8086 assembly language:
■ 80386 extended registers and special instructions
■ 80387 extended instructions
■ OS/2 protected-mode operation
QuickAssembler does support the 80286 extended instruction set, as well as
the 8087 and 80287 coprocessors. The 80386 processor can run all
QuickAssembler programs; the only limitation is that QuickAssembler does
not support extended capabilities of the 80386.
The Microsoft Macro Assembler supports 80386 extended features and
development of protected-mode applications.
System Requirements
In addition to a computer with one of the 8086-family processors, you must
have Version 2.1 or later of the MS-DOS or IBM PC-DOS operating system.
You can also run QuickAssembler in the 3.x compatibility box of OS/2
systems. Your computer system must have approximately 512K of memory. A
hard-disk setup is strongly recommended.
To enable the use of QuickAssembler, you should first choose Full Menus
from the Options menu.
──────────────────────────────────────────────────────────────────────────
NOTE The 8086 family is a set of processors that all support the same
basic instruction set. This family includes the 8088, 8086, 80188, 80186,
80286, and 80386 chips. All of these processors support the entire
instruction set of the 8086 itself; some support additional instructions.
Rather than list the entire set of chips, this manual often discusses the
core instruction set by referring only to the 8086.
──────────────────────────────────────────────────────────────────────────
Installing QuickAssembler
If you purchased QuickC and QuickAssembler together, the installation
procedure described in Up and Running automatically installs both
products. A few of the questions shown in that booklet are reworded in the
install program to make more sense for the joint QuickC/QuickAssembler
installation.
If you purchased QuickAssembler separately, run the installation program
on the QuickAssembler distribution disks. The first screen asks you the
following questions:
Source of assembler files [A:]:
Installing on a hard disk drive [Y]:
Copy QuickAssembler documentation files [Y]:
Copy sample Assembler programs [N]:
Do you want to change any of the above options? [N]
As with the QuickC installation program, the default responses are
indicated in brackets ([]). Each of these questions is accompanied by an
explanation at the bottom of the screen. To accept a default response,
press ENTER. If you enter an incorrect response, just answer no (N) to the
last question.
The second screen asks you the following questions:
Directory for QuickC executable files [C:\QC2\BIN]:
Directory for Sample files [C:\QC2\SAMPLES]:
Do you want to change any of the above options? [N]
The QuickAssembler installation program replaces some of the existing
QuickC files. QuickAssembler must be installed in the directory that
currently contains QC.EXE. Make sure you enter the location of your
current QuickC executable files. If you're not sure, press CTRL+C to stop
the installation and examine your setup.
Getting Information about Assembly Language
The combined paper and on-line documentation with QuickAssembler gives you
a complete reference to the language. This manual provides three basic
kinds of information:
■ Part 1, "Introducing QuickAssembler," provides a basic introduction to
programming in assembly language. Chapter 1 describes how the
interface changes when you install QuickAssembler. Chapter 2 gives a
general background to 8086 architecture and assembly-language concepts.
Chapters 3 and 4 demonstrate how to use special QuickAssembler
keywords to simplify programming. Even if you have used assembly
language before, you should take a look at these chapters.
■ Parts 2 ("Using Directives") and 3 ("Using Instructions") give a
reference to the use of directives and instructions. This material is
much less tutorial than Part 1, but it does illustrate the use of each
directive and instruction in context.
■ The appendixes explain low-level mixed-language techniques, the use of
assembly options with the QCL driver, and how to read listing files.
This manual does not teach systems programming or advanced programming
techniques. Even with the tutorial material provided in this manual, you
may want to purchase other books on assembly language, such as the ones
listed in the next section.
In addition, this manual assumes you understand certain basic concepts of
programming, such as modules, variables, and pointers. If you need more
background in one of these topics, you should first read the appropriate
sections in C For Yourself. Part 1 of this manual often explains concepts
by comparing a language feature to C.
The Quick Advisor (the on-line Help system) is an integral part of the
overall documentation. As explained in Section 1.5, "Using the Quick
Advisor (Help)," QuickAssembler provides help on all keywords──in
particular, you get instant reference information on each instruction,
including timing, encoding, and flag settings. The Help Contents and Index
screens also provide information on each DOS service.
Books on Assembly Language
The following books may be useful in learning to program in assembly
language:
Duncan, Ray. Advanced MS-DOS. Redmond, WA: Microsoft Corporation, 1986.
An intermediate book on writing C and assembly-language programs that
interact with MS-DOS (includes DOS and BIOS function descriptions)
Jourdain, Robert. Programmer's Problem Solver for the IBM PC, XT and AT.
New York: Brady Communications Company, Inc., 1986.
Reference of routines and techniques for interacting with hardware
devices through DOS, BIOS, and ports (high-level routines in BASIC and
low- or medium-level routines in assembler)
Lafore, Robert. Assembly Language Primer for the IBM PC & XT. New York:
Plume/Waite, 1984.
An introduction to assembly language, including some information on DOS
function calls and IBM-type BIOS
Metcalf, Christopher D., and Sugiyama, Marc B. COMPUTE!'s Beginner's Guide
to Machine Language on the IBM PC & PCjr. Greensboro, NC: COMPUTE!
Publications, Inc., 1985.
Beginning discussion of assembly language, including information on the
instruction set and MS-DOS function calls
Microsoft MS-DOS Programmer's Reference. Redmond, WA: Microsoft Press,
1986, 1987.
Reference manual for MS-DOS
Morgan, Christopher, and the Waite Group. Bluebook of Assembly Routines
for the IBM PC. New York: New American Library, 1984.
Sample assembly routines that can be integrated into assembly or
high-level-language programs
Norton, Peter. The Peter Norton Programmer's Guide to the IBM PC. Redmond,
WA: Microsoft Press, 1985.
Information on using IBM-type BIOS and MS-DOS function calls
Scanlon, Leo J. IBM PC Assembly Language: A Guide for Programmers. Bowie,
MD: Robert J. Brady Co., 1983.
An introduction to assembly language, including information on DOS
function calls
Schneider, Al. Fundamentals of IBM PC Assembly Language. Blue Ridge
Summit, PA: Tab Books Inc., 1984.
An introduction to assembly language, including information on DOS
function calls
These books are listed for your convenience only. Microsoft Corporation
does not endorse these books (with the exception of those published by
Microsoft) or recommend them over others on the same subjects.
Document Conventions
The following document conventions are used throughout this manual:
Example of Description
Convention
──────────────────────────────────────────────────────────────────────────
SAMPLE2.ASM Uppercase letters indicate file names, segment names,
registers, and terms used at the DOS-command level.
.MODEL Boldface type indicates assembly-language directives,
instructions, type specifiers, and predefined equates,
as well as keywords in other programming languages.
placeholders Italic letters indicate placeholders for information
you must supply, such as a file name. Italics are also
occasionally used for emphasis in the text.
target This font is used to indicate example programs, user
input, and screen output.
SHIFT Names of keys on the keyboard appear in small capital
letters. Notice that a plus (+) indicates a
combination of keys. For example, CTRL+E means to hold
down the CTRL key while pressing the E key.
[[argument ]] Items inside double square brackets are optional.
{register | memory} Braces and a vertical bar indicate a choice between
two or more items. You must choose one of the items
unless double square brackets surround the braces.
Repeating Three dots following an item indicate that more items
elements... having the same form may appear.
Program A column of three dots tells you that part of a
. program has been intentionally omitted.
.
.
Fragment
"processor flag" The first time a new term is defined, it is enclosed
in quotation marks.
Color Graphics The first time an acronym is used, it is spelled out.
Adapter (CGA)
Getting Assistance or Reporting Problems
If you need help or feel you have discovered a problem in the software,
please provide the following information to help us locate the problem:
■ The version of DOS you are running (use the DOS VER command)
■ Your system configuration (the type of machine you are using, its total
memory, and its total free memory at assembler execution time, as well
as any other information you think might be useful)
■ The assembly command line used (or the link command line if the problem
occurred during linking)
■ Any object files or libraries you linked with if the problem occurred
at link time
If your program is very large, please try to reduce its size to the
smallest possible program that still produces the problem.
Use the Product Assistance Request form at the back of this manual to send
this information to Microsoft.
If you have comments or suggestions regarding any of the manuals
accompanying this product, please indicate them on the Document Feedback
card at the back of this manual.
If you are not already a registered QuickAssembler owner, you should fill
out and return the Registration Card. This enables Microsoft to keep you
informed of updates and other information about the assembler.
────────────────────────────────────────────────────────────────────────────
PART 1: Using Assembler Programs
Part 1 of the Programmer's Guide (comprising Chapters 1-4) will help
you start using assembly language quickly.
Chapter 1 summarizes all the differences between the standard QuickC
interface and the expanded QuickC/QuickAssembler interface. Read this
chapter to learn how to enter, assemble, and run an assembly-language
program.
Read Chapter 2 if you are new to 8086 assembly language or need to review
basic concepts. Chapter 2 explains the architecture of 8086-family
processors, as well as how to write simple code and data statements.
Whether or not you're new to assembly language, you'll want to read
Chapters 3 and 4, which show the use of QuickAssembler's simplified
keywords in useful examples. These keywords make programming easier.
────────────────────────────────────────────────────────────────────────────
Chapter 1: The QuickAssembler Interface
After you install Microsoft QuickC with QuickAssembler, you'll have a
single environment for both compiling and assembling. You can create C
programs, assembly-language programs, and programs that combine both
languages.
The environment completely supports the standard QuickC features,
including all editing commands as well as mouse, keyboard, and menu
techniques. This manual assumes you have read QuickC Up and Running and
have used the on-line Help system to learn how to use each menu. Refer to
these sources of information for basic help on using the interface.
The combined QuickC/QuickAssembler interface provides some new menu
selections and dialog boxes to support development of assembly-language
programs. This chapter describes the new features, focusing on areas where
the interface adds new functionality: creating a program, building a
program, getting help, debugging, and viewing a listing file. To enable
all the features described in this chapter, you should first choose Full
Menus from the Options menu if you are not already using full menus.
1.1 Creating the Program
Start the environment with the QC command, regardless of whether you're
creating a C or assembly-language source file. You can type QC by itself
or QC followed by the name of a file.
By default, QC assumes that a file name on the command line has a .C
extension. You'll learn how to change this behavior later (by choosing
Display from the Options menu), but for now, make sure you include the
.ASM file extension when you want to create an assembly-language module:
QC SAMPLE.ASM
If the file is new, the QuickC/QuickAssembler environment asks you if you
would like to create the file.
Once inside the QuickC/QuickAssembler environment, you can enter a program
by using all the QuickC editing commands. You can get started by entering
the following stand-alone assembly program. By default, QuickAssembler is
not case sensitive (except for external symbols), so you can enter
statements as uppercase or lowercase.
.MODEL small
.STACK
.CODE
.STARTUP
mov ah,2
mov dl,7
int 21h
mov ax,4c00h
int 21h
END
Enter the program above in a file with a .ASM extension. No other modules
and no special assembly or link flags are required. When run, the program
beeps and exits.
For now, you may just want to run the program to see how the
QuickC/Quick-Assembler environment works. However, you can read the rest
of this section to get a brief explanation of why the program works.
The four statements are directives──nonexecutable statements that give
basic structure to the program by declaring a memory model, stack segment,
and code segment.
The next five statements perform the actions of the program. The first
three set up a call to a DOS function that prints the beep character. (The
QuickAssembler Advisor, which you access through the Help menu, provides
information on each DOS function.) The first three statements are shown
below, with comments:
mov ah,2 ; Move 2 to AH (select Print function)
mov dl,7 ; Move 7 to DL (select Beep character)
int 21h ; Call DOS function
The next two statements, shown below with comments, call DOS to exit
gracefully. Unlike C programs, assembly-language programs must make an
explicit function call to exit, or else cause the processor to execute
meaningless instructions beyond the end of the program.
mov ax,4c00h ; Move 4c00h to AX (select Exit
; function and 0 return code)
int 21h ; Call DOS function
The last statement ends the module.
1.2 Building and Running a Program
Once inside the QuickC/QuickAssembler environment, you build an
assembly-language program the same way you build a C program. Choose the
Go command from the Run menu, or press the F5 key.
The environment assembles and links the program if it needs to be built.
Then, if there are no errors, it executes the program. You can also
assemble a program by using the Make menu. The Compile File command
assembles your file rather than compiling it, assuming the current file
has a .ASM extension.
To help you create assembly-language programs, the QuickC/QuickAssembler
interface adds the following extensions to QuickC:
■ A program list can now have .ASM files as well as .C, .OBJ, and .LIB
files if you work with multiple modules.
■ The Make dialog box from the Options menu has a new option button:
Assembler Flags.
■ The Assembler Flags dialog box lets you control how .ASM files are
assembled.
If your program has multiple modules, you can add .ASM files to the
program list as well as other kinds of files. When you build the program,
the environment compiles each .C file module that needs to be built and
assembles each .ASM module that needs to be built.
For example, the program list in Figure 1.1 creates a mixed-language
program with both C and assembly-language source files.
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 1.2 of the manual │
└────────────────────────────────────────────────────────────────────────┘
The environment sets the default file extension by looking at the
extension of the last file loaded. If the last file loaded had a .ASM
extension, the File List field now displays all the .ASM files for the
current directory. If the last file loaded had a .C extension, the File
List field displays all .C files.
You can alter this behavior by choosing Display from the Options menu, as
explained in Section 1.4, "Choosing C or Assembler Defaults." In any
case, you can always control which files are displayed by entering a
wildcard expression, such as *.asm, in the File Name field.
The environment lets you set assembler options as well as compiler
options. When you open the Options menu and choose Make, the dialog box
shown in Figure 1.2 appears.
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 1.2 of the manual │
└────────────────────────────────────────────────────────────────────────┘
This dialog box contains one new field: Assembler Flags. When you choose
this field, a new dialog box, shown in Figure 1.3, appears.
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 1.2 of the manual │
└────────────────────────────────────────────────────────────────────────┘
By setting flags in the Assembler Flags dialog box, you control the action
of the assembler whenever it builds a program. These settings have no
effect on .C modules, but do affect how each .ASM module is assembled.
This dialog box contains a Debug Flags section, which has options that
apply only to Debug builds, and a Global Flags section, which has options
that apply to every build. Choose the Help button for an explanation of
each option.
──────────────────────────────────────────────────────────────────────────
NOTE You control the type of build operation (Debug or Release) by
choosing the appropriate option button in the dialog box shown in Figure
1.2. You can return to that dialog box by choosing the OK or Cancel
command button. By choosing Debug (the default), you can use all of the
QuickC debugging commands while running the program. By choosing Release,
you produce a program that cannot be debugged but is somewhat smaller.
──────────────────────────────────────────────────────────────────────────
The Custom Flags section lets you enter additional options. In the three
Custom Flags text boxes, you can type any of the assembly options accepted
by the QCL driver. See Appendix B for a description of these options. The
next section describes how to use the QCL driver to assemble programs.
1.3 Assembling from the Command Line
You can run QuickAssembler from the command line, just as you can run
QuickC. One utility, QCL, invokes both the assembler and compiler. You can
even use it to compile, assemble, and link mixed-language programs in one
step. However, make sure you use the version of QCL copied during
QuickAssembler installation.
If you type a file name that has a .C extension, QCL invokes the C
compiler. For example, the following command compiles and links the file
SAMPLE1.C:
QCL SAMPLE1.C
If you type a file name that has a .ASM extension, QCL invokes the
QuickAssembler. For example, the following command assembles and links the
file SAMPLE2.ASM:
QCL SAMPLE2.ASM
In any case, QCL links all resulting object files to create a .EXE file,
unless you specify /c on the command line. (You can also create a .COM
file if the program is written entirely in assembly language.) For
example, the following command compiles SAMPLE1.C and assembles
SAMPLE2.ASM, but does not link the resulting object files:
QCL /c SAMPLE1.C /Cl SAMPLE2.ASM
As always, you can specify .LIB files and .OBJ files on the QCL command
line. A file with no extension is assumed to have a .OBJ extension by
default. For example, the following QCL command compiles M1.C, assembles
M2.ASM (with lowercase symbols preserved), and links M1.OBJ, M2.OBJ, and
M3.OBJ. Finally, QCL searches M4.LIB for any unresolved references.
QCL /Cx M1.C M2.ASM M3 M4.LIB
You can specify a number of QuickAssembler options, in addition to the
ones provided specifically for C. See Appendix B, "Using Assembler
Options with QCL," for a description of all these options.
1.4 Choosing C or Assembler Defaults
At all times, you can use the QuickC/QuickAssembler environment to create
either C modules or assembly-language modules. However, there are some
details of operation that make it a little easier to work with one
language or another.
For example, one consideration is whether the dialog box starts by
displaying all the C files in the directory (*.c) or all the
assembly-language files (*.asm) when you choose the Open command from the
File menu. You can control this behavior by choosing Display from the
Options menu. Figure 1.4 shows the dialog box that appears.
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 1.4 of the manual │
└────────────────────────────────────────────────────────────────────────┘
In the Language section of this dialog box, select either C, Assembler, or
Auto. The Auto selection uses C or Assembler defaults, depending on what
file was last loaded into the active window. For example, if you load the
PROG.ASM file into the source window, all the defaults (described below)
change to assembly-language settings.
──────────────────────────────────────────────────────────────────────────
NOTE When you first use QuickAssembler, the environment starts up in Auto
mode. Thereafter, it looks at the settings in QC.INI to determine what
mode to start in; this feature has the effect of saving display-mode
settings between sessions.
──────────────────────────────────────────────────────────────────────────
The following items change when the display mode changes──either because
you change the mode manually or because you are in Auto mode and load a
different kind of file:
■ For commands on the File menu, the default file name changes to *.c or
*.asm.
■ The Include command on the View menu responds to .H files if the
display mode is C, or .INC files if the display mode is Assembler.
■ The Index and Contents items from the Help menu bring up lists of
topics for either C or Assembly, as determined by the display mode.
Auto display mode assumes C defaults until you load a .ASM file. When you
start the environment with the QC command, QC assumes that file names on
the command line have .C extensions, unless the environment is in
Assembler display mode.
1.5 Using the Quick Advisor (Help)
QuickAssembler extends the number of topics you can get information on,
and updates QCENV.HLP so you can get context-sensitive help on the new
menu items and dialog boxes. In addition, you still continue to get help
on all of the C-language topics. The new topics, added for use with
assembly language, are shown below:
■ QuickAssembler instructions
■ QuickAssembler directives and operators
■ DOS and ROM-BIOS services
You can get help on assembly-language topics by using one of two different
methods:
1. Topical Help (press F1)
2. The Help menu
At all times, the expanded environment provides topical Help for both
assembler and C keywords. Place the cursor on the keyword, then press F1.
You can also get topical Help by moving the mouse cursor to the desired
word and clicking the Right mouse button. The display mode (described in
the previous section) determines whether C help files or assembly help
files are searched first.
──────────────────────────────────────────────────────────────────────────
NOTE If the keyword starts with a dot (.), do not place the cursor on the
dot or click on the dot to get topical Help. Place the cursor on the
keyword or click on the keyword.
──────────────────────────────────────────────────────────────────────────
QuickAssembler keywords include instructions, directives, and operators.
Chapter 2, "Introducing 8086 Assembly Language" provides information on
each of these concepts. An "instruction" is a specific action that the
processor executes. Instructions are the primary building blocks of an
assembly-language program.
The Help screens on instructions are particularly useful, because they
provide detailed information on timing, syntax, and processor flags. This
manual features a topical discussion of instructions, but provides only
limited information on timing and flags. To write the most efficient
assembly-language programs, you should refer often to the on-line Help for
instructions.
To get help on DOS or ROM-BIOS services, select Contents or Index from the
Help menu. These menu items give you help on assembly-language topics
rather than C topics whenever the display mode (described in the previous
section) is set to Assembler.
The Help system offers other paths to get to information on DOS and BIOS
functions. Move the mouse cursor to an interrupt number (such as 21H or
33) and click the Right mouse button, or move the cursor to the number and
press F1. The Help system responds by showing a screen listing of all the
functions accessed through that interrupt number. You can then go to the
specific Help screen you want. You can also get help on interrupt
functions by selecting context-sensitive help for the INT keyword.
You call these DOS and BIOS functions by using the INT instruction, as
described in Chapter 4. These services perform basic input and output
functions for you, giving you access to DOS and to hardware.
By default, the Smart Help display option is on. This option makes the
system more flexible by ignoring the presence or absence of a leading
underscore (_) in front of a name. Consequently, pressing F1 while on
_printf gives you help for the printf function.
1.6 Debugging Assembly Code
You can run a Debug build by choosing Debug in the dialog box opened by
the Options menu's Make command. Debug is the default setting, so you
probably won't need to choose it.
You can use all of QuickC's debugging commands with programs written in
assembly language. But keep in mind these considerations:
■ You must use an extra file with a .DBG file extension to debug programs
in .COM format.
■ You must use C syntax to specify expressions to watch or modify, even
when you debug assembly code. In addition, you can use the BY, WO, and
DW memory operators, register names, and the colon (:) operator. The
colon operator helps to specify segmented addresses.
■ When you trace execution of an assembly-language module, the behavior
of the environment changes. Screen swapping is turned on by default,
and the first line of code is never highlighted.
■ You can alter flag values and registers from within the environment.
Sections 1.6.1-1.6.4 discuss each debugging feature in turn.
1.6.1 Debugging .COM Files
Section 4.8, "Creating .COM Files," explains how to use tiny memory
model, along with a linker flag, to generate a program in the .COM-file
format. A .COM file has a total size limitation of 64K, but is slightly
smaller and loads faster than a similar .EXE file.
When you run a Debug build that creates a .COM file, the linker places
debugging information in a separate file with the same base name as the
program and with a .DBG extension. If you delete the .DBG file, you cannot
debug your program until you run another Debug build.
Otherwise, all the considerations that apply to debugging .EXE files apply
to .COM files as well.
1.6.2 Specifying Expressions
The Debug menu provides two commands──Watch Value and Watchpoint──that let
you specify an expression for the QuickC/QuickAssembler environment to
dynamically update and display. The environment displays the updated
values in the Watch window. When you choose one of these commands, a
dialog box appears, prompting you to enter an expression. Figure 1.5 shows
the dialog box for the Watch Value command.
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 1.6.2 of the manual │
└────────────────────────────────────────────────────────────────────────┘
You can enter any combination of variable names, constants, and C-language
syntax. You cannot enter assembly-language keywords. However, the
environment does recognize all valid register names (including names of
both 8-bit and 16-bit registers). See Chapter 2, "Introducing 8086
Assembly Language," for information on registers.
In addition to register names, the expanded environment supports the
optional use of the colon operator (:) for specifying segmented addresses:
segment:offset
In the syntax display above, segment can be a constant or a register;
offset can be any expression. The QuickC/QuickAssembler environment
combines the segment and offset addresses to determine a physical address,
as described in Section 2.7, "Segmented Addressing and Segment
Registers."
The following examples demonstrate the use of the colon in valid
expressions. Note that you use C-language syntax to specify hexadecimal
numbers:
0xb000:0x0000
es:0x0100
es:(array[2])
ss:bp
The QuickC/QuickAssembler environment considers a segmented-address
expression to be a pointer to a character, which the Watch window
evaluates by displaying the character pointed to. However, you can use
QuickC type specifiers to alter how an expression is displayed. For
example, the Watch window evaluates the following expression by displaying
the numeric value of the address es:(warray+3):
es:(warray+3),p
You can use the three memory operators──BY, WO, and DW──to view the byte,
word, or doubleword of memory at a given address.
With pointer expressions and registers, BY returns the byte pointed to by
the expression. (Segmented addresses are pointer expressions, as are
procedure parameters declared with PTR.) With nonpointer variables, BY
returns the byte at the same address as the variable. WO and DW work the
same way, but return a word or doubleword, respectively.
The rest of this section demonstrates how to use the three memory
operators to specify useful expressions.
To watch the contents of a register, enter just the register's name. To
examine the value that the register points to, enter the BY, WO, and DW
operators followed by the register name.
Example Value Specified
──────────────────────────────────────────────────────────────────────────
bx The contents of the BX register
BY bx The byte pointed to by the BX register
WO bx The word pointed to by the BX register
DW es:si The doubleword pointed to by the SI register, relative
to the segment address in ES
To watch the value of a variable, enter the variable's name. To watch the
byte, word, or doubleword at the same address as the variable, use the BY,
WO, and DW operators. In this context, these operators function as the
QuickAssembler PTR operator does: they change the size of data to be
examined. They are similar, but not identical, to C type casts. In the
following examples, assume that Var is a word variable defined with DW:
Example Value Specified
──────────────────────────────────────────────────────────────────────────
Var The variable Var (the word at the address of Var)
BY Var The byte at the address of Var
DW Var The doubleword at the address of Var
You can use BY, WO, and DW to specify an array element, but you must
understand that expressions in the Debug window are treated like C
expressions rather than assembler expressions. As a result, the syntax you
use to watch a memory location in the Debug window is often different from
the syntax in your assembly source. For example, assume you have the
following data and code:
warr DW 1, 2, 3, 4, 5, 6
.
.
.
mov bx,0
mov cx,5
loop1: add ax,warr[bx]
add bx,2
loop loop1
You cannot watch the assembler expression warr[bx] directly. However, you
can put an equivalent C expression in the Debug window:
WO (char*)&warr + bx
The address-of operator is necessary to make the C debugger look at the
MASM array as a C array──that is, as an address. The value must cast to a
character pointer because the debugger looks at it as a scaled C index
rather than an unscaled assembler index. In this case, the assembler code
adds 2 to the pointer BX to adjust for the variable size. You must tell
the debugger to ignore its normal word scaling.
Expressions are only scaled when there is a variable in the expression. In
the expression WO BP+6 the 6 is not scaled──the expression means, "look at
the word six bytes beyond the address that is in BP." However, in the
expression WO &warr+6, the 6 is scaled because of the word size of the
variable. Note that the variable size, not the expression type ( BY, WO,
or DW), determines the size of scaling.
If you are comfortable with C, you can also use C expressions to look at
assembler expressions. Here are some examples you might find useful:
Example Value Specified
──────────────────────────────────────────────────────────────────────────
&Var The address of Var
es:0x81,s The string at es:[81h] (the DOS command line when a
program is started)
&Arr[3] The third element of an array (note that the 3 will be
scaled)
*(&Arr+3) Equivalent to the previous expression
1.6.3 Tracing Execution
The Run menu's Trace Into, Animate, and Step Over commands execute one
statement of your program at a time. These commands are fully functional
with assembly-language programs. However, debugging commands behave
differently when you trace execution of an assembly-language module, as
summarized below:
■ By default, screen swapping is on.
■ If the main module of the program is an assembly-language module, the
first line of the program is never highlighted.
■ The Calls menu does not function unless you write your program
according to certain guidelines.
The rest of this section elaborates on these differences.
When you trace execution of an assembly-language module, screen swapping
is turned on. The environment does not support an Auto screen-swapping
mode for assembly-language programs because it cannot detect when a
program writes to the screen. Therefore, when executing a .ASM file, the
environment equates the Auto screen-swapping selection with screen
swapping turned on.
You can always turn screen swapping off manually by choosing the Run/Debug
command from the Options menu. When a dialog box appears, choose the Off
option button in the Screen Swapping field.
Screen swapping causes the environment to switch to a full output screen
each time the program executes code. The effect is particularly noticeable
when you choose the Animate command. Leaving screen swapping on preserves
program output. However, if large portions of your program do not write to
the video display, you may want to turn screen swapping off temporarily.
The second debugging feature that operates differently for assembly-
language programs is current-line highlighting. When you restart a
program, the environment does not highlight the first line of code. The
debugging facility does not know which line of code is the first to be
executed, since this information is stored in the executable-file header.
After you execute a trace, the second program line is highlighted, and
thereafter current-line highlighting works as you would expect.
The third feature that operates differently is the Calls command from the
Debug menu. To ensure that the command works with assembly-language
modules, either use the PROC directive with an argument list or local
variables, as described in Chapter 3, "Writing Assembly Modules for C
Programs," or else set up the framepointer (the BP register) as described
in Appendix A, "Mixed-Language Mechanics." Both these methods set up a
stack frame for each procedure, using the standard Microsoft methods. The
environment checks stack frames to see what procedures have been called,
and with what parameters.
1.6.4 Modifying Registers and Flags
With the expanded QuickC/QuickAssembler environment, you can get much
greater use from the Registers window. The Registers window displays more
information than it does in the simple QuickC environment, and you can
also use the window to alter register and flag values.
──────────────────────────────────────────────────────────────────────────
NOTE By default, the environment does not display the Registers window.
To open this window, choose the Window command from the View menu. A
dialog box appears that lists all windows. Move the cursor to Registers
and press the ENTER key, or move the mouse cursor to Registers and double
click the Left mouse button. To close the window, repeat the procedure.
──────────────────────────────────────────────────────────────────────────
The Registers window displays the contents of both 8086 and 8087
registers. You can remove 8087 registers from the Registers window by
choosing Display from the Options menu. When the dialog box appears, turn
the Show 8087 option button off. The environment only displays 8087
registers if you have a math coprocessor or have a program that calls
floating-point emulator routines from a high-level language.
You can alter values in the window by either using the mouse or the
keyboard. To alter a value, you first select the item you want to change:
■ To alter a value with the mouse, select an item by clicking the Left
mouse button.
■ To alter a value with the keyboard, first place the cursor on an item
in the window. (Press TAB or SHIFT+TAB to cycle quickly through the
items.) Then select the item by pressing the ENTER key. The List field
has no function in this context and should be ignored.
Choosing a flag toggles the flag to the opposite setting. Choosing a
register brings up a dialog box. Type the new value for the register and
press ENTER.
1.7 Viewing a Listing File
When you assemble a module with the Debug build setting (the default),
QuickAssembler can create a listing file. Choose the type of listing by
using the Assembler Flags dialog box. (To access this dialog box, choose
Make from the Options menu, then choose Assembler Flags.) You should also
make sure that the One Pass Assembly option is not selected.
A QuickAssembler listing file shows precisely how the assembler translated
each line of code during the last program build. Each instruction in the
source code is listed next to its corresponding numeric code (machine
instruction).
Listing files are particularly useful if your program uses macro calls or
include files. The listing file displays each statement generated by a
macro call and each line of code copied from an include file. Tables at
the end of the listing file give information on macros, symbols,
structures, groups, and records. Part 2 of this manual describes each of
these features of assembly language.
To view the listing file, assemble the source code at least once. You can
view the listing file for the current module by choosing the Listing
command from the View menu. You can also view the file with the CTRL+F2
shortcut key.
The listing file is then displayed in the Source window, as shown in
Figure 1.6. You can page through this file by using all the normal
cursor-movement commands. When you want to return to the previous file,
press F2 or use the File menu. You can also leave the listing file by
choosing the Listing command again; this action causes the environment to
switch to the original line of source code that generated the current line
of code. In particular, if you are in a listing file and move the cursor
to a line generated from an include file (.INC), the Listing command
switches directly to that include file.
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 1.7 of the manual │
└────────────────────────────────────────────────────────────────────────┘
Normally, you would choose the Listing command when in a .LST file or in a
.ASM file with a corresponding .LST file (previously generated by a
program build). If you are not in either of these types of files, the
environment responds by displaying a dialog box for opening a file; *.lst
is the default file name.
────────────────────────────────────────────────────────────────────────────
Chapter 2: Introducing 8086 Assembly Language
Assembly-language programs control hardware directly, giving you the
ability to write the fastest, smallest programs possible and to execute
any operation. But assembly-language programming also requires an
understanding of the architecture of 8086-family processors.
Assembly language is close to machine code──the processor's numeric
language of 1's and 0's. Each QuickAssembler instruction corresponds to an
8086 instruction but consists of a meaningful name (mnemonic) instead of a
number. For example, the ADD instruction computes the sum of two items.
QuickAssembler translates this instruction to produce a numeric code, such
as 10000010 binary. The processor responds to this code when you run the
program.
This process of translation is called "assembling." Before you can
assemble a program, you need to understand the basic concepts of the
processor and of assembly language. This chapter presents these concepts.
2.1 Programming the 8086 Family
If you have programmed in C, you can get a good grasp of 8086 assembly
language by focusing on the differences between the two languages:
1. A C statement may combine many complex operations, but each line of
assembly language specifies just one limited action called an
"instruction." QuickAssembler also supports a number of nonexecutable
statements called "directives," which provide structure to the program,
declare data objects, and provide other information.
Sections 2.2-2.4 explain the basics of writing instructions and
directives.
2. C programs deal with memory locations (known as variables), but
assembly-language programs must deal with registers as well. A
"register" is a special memory location inside the processor itself,
having a permanent name rather than a numeric address.
Section 2.5, "8086-Family Registers," describes the use of each
register.
3. A data object in a C program can be arbitrarily complicated.
Assembly-language statements work on objects accessed through four
specific modes: immediate, register, direct memory, and indirect
memory. Each mode has specific properties and limitations imposed by
the processor.
Section 2.6, "Addressing Modes," explains each of these four modes and
gives examples.
4. The processor combines two 16-bit addresses to access each memory
location. This mechanism is called "segmented addressing." Assembly
language often requires a more complete understanding of segmented
addressing than C does.
Section 2.7, "Segmented Addressing and Segment Registers," explains
the full implications of segmented addressing.
Of the features listed above, segmented addressing is unique to the 8086
family. The 8086 is further distinguished from other processors by its set
of string operations, which permit fast initialization and copying of
blocks of data. You can read more about the string operations in Chapter
16, "Processing Strings."
2.2 Instructions, Directives, and Operands
The 8086-family processors understand only one kind of statement: an
instruction. QuickAssembler understands two kinds of statements:
instructions and directives.
As explained above, an instruction corresponds to a specific action that
the processor executes at run time. The fundamental task of the assembler
is to correctly translate each of these statements to specific
machine-code instructions.
As nonexecutable statements, directives are not translated to machine
actions. However, they give information to the assembler that affects how
other statements are translated. For example, some of the most important
directives declare data. These directives, in turn, help the assembler
correctly interpret instructions that refer to the data.
The rest of this section explains each part of an assembly-language
statement; the general syntax applies to both instructions and directives.
The section ends by stating the basics of entering numbers in different
radixes.
Syntax
Each line of source code consists of a blank line or a statement. Each
statement is an instruction or directive, and can contain as many as 512
characters. Statements can have up to four fields, as shown below:
[[name]] [[operation]] [[operands]] [[;comment]]
Each field (except the comment field) must be separated from the other
fields by a space or tab character. You can enter statements in uppercase
or lowercase letters. By default, QuickAssembler is not case sensitive,
but it does preserve case for external variables──thus providing
compatibility with C, which is case sensitive. You can control case
sensitivity by using the Assembler Flags dialog box.
As a convention, sample code in this manual uses uppercase letters for
directives, hexadecimal letter digits, and segment definitions.
2.2.1 The Name Field
The name field labels the statement with a symbolic name that other parts
of the program can reference. The meaning of the name depends on the type
of statement.
One of the most important uses of this field occurs in data declarations.
These declarations are much like variable declarations in C. The statement
defines the type and initial value. You use the name elsewhere in the
program, when you want to access the data.
QuickAssembler is different from C, however, in that the symbolic name
occurs in the first field. For example, the following DB directive
(Declare Bytes) associates the name string with a series of characters:
string DB "Hello, world"
In instructions, the name field functions like a program label in C: it
provides a target for a jump or call instruction elsewhere in the program.
To label an instruction, follow the name field with a colon (:). You can
place the name on the same line as the rest of the instruction or, to
improve readability, on a separate line. The following example shows the
latter case:
top: ; This label marks the top of the loop
mov ax,1 ; This is first instruction in the loop
There are other ways to label instructions. See Section 6.4, "Defining
Code Labels," for more information on how to declare labels.
2.2.2 The Operation Field
The operation field states the action of the statement. This field
determines the fundamental type of the statement──instruction or
directive. It also determines what additional syntax, if any, is required.
Some operations require an entry in the name field; most do not. If the
operation is an instruction, it strictly determines how many and what kind
of operands are legal.
This field contains exactly one item──an instruction or directive
mnemonic. "Mnemonics" are abbreviated, easy-to-remember names that each
symbolize a different operation (for instance, ADD, SUB, and OR). Examples
of directive mnemonics include EQU (Equate) and DB (Declare Bytes).
2.2.3 The Operand Field
The operand field lists the objects on which the statement operates.
Multiple operands are separated by commas. These objects can be registers,
constants, or memory locations. A memory location is typically represented
as a variable, although it can also be expressed as a numeric address or
complex expression.
Registers and constants require no previous declaration. To refer to a
variable, however, you should first declare the name with a data
directive, such as DB (Declare Bytes). The following example declares the
variable count and then uses it in an instruction:
count DB 7 ; Declare count as a byte variable
.
.
.
inc count ; count = count + 1
In the first statement, count appears in the name field and the number 7
appears in the operand field. The DB directive associates count with the
address of a byte initialized to 7. In the second statement, count appears
in the operand field. The INC instruction (increment) adds 1 to count,
thus increasing the value of the data to 8.
The next section gives more information on how to declare memory locations
as data types. Section 2.6, "Addressing Modes," gives a complete
description of all the different methods for specifying operands.
2.2.4 The Comment Field
The comment field lets you add text that appears in source code but is
ignored by the assembler. You can enter any text you want in this field.
Typically, you would use it to document the purpose of the statement. The
purpose of an assembly-language statement is not always self-explanatory,
and for this reason, programs often contain at least one comment for each
instruction.
Single-line comments always begin with a semicolon (;). You can also
create a multiline comment by one of two methods. You can enter successive
comment lines as shown below:
add count,5 ; Add 5 to count.
; ADD is the operation.
; count and 5 are operands.
sub Sum,12 ; Subtract 12 from Sum.
; SUB is the operation.
; Sum and 12 are operands.
You can also use the COMMENT directive, which lets you enter multiline
comments without using the semicolon. This directive has the following
syntax:
COMMENT delimiter [[text]]
text
[[text]] delimiter [[text]]
All text between the first delimiter and the line containing a second
delimiter is ignored by the assembler. The delimiter character is the
first nonblank character after the COMMENT directive. The text includes
the comments up to and including the line containing the next occurrence
of the delimiter.
Example
COMMENT + The plus
sign is the delimiter. The
assembler ignores the statement
following the last delimiter
+ mov ax,1 (ignored)
2.2.5 Entering Numbers in Different Bases
As with C, you can enter assembly-language constants as decimal,
hexadecimal, or octal. You can also enter binary constants. By default,
all constants are decimal, but you specify a different default with the
RADIX directive.
Hexadecimal constants appear frequently in assembly-language programs. To
indicate a hexadecimal constant, add an uppercase or lowercase H suffix.
If the first digit is one of the letters A-F, prefix the constant with a
leading 0 to indicate that the number is not a symbolic name.
Examples
100H
10FAh
0be03H
0FFh
You may often want to enter binary constants as well, particularly when
constructing bit masks. To indicate a binary constant, simply add an
uppercase or lowercase B suffix.
For more information on using different bases and using the RADIX
directive, see Section 6.1.1.2, "Setting the Default Radix."
2.2.6 Line-Continuation Character
You can create program lines that extend over more than one physical line
by using the backslash (\) as a line-continuation character. The backslash
must be the last character on the line. Comments cannot follow it. A
backslash is not considered a continuation character if it occurs in a
comment.
Example
BigProc PROC FAR \
USES DS SI DI, \
IntArg:WORD, \
String:FAR PTR BYTE, \
Ptr:FAR PTR BIGSTRUC, \
Long:DWORD
.
.
.
ret
BigProc ENDP
In this example, the line continuation-character is used to specify
multiple procedure arguments with the extended PROC syntax. All the
arguments must be placed on a single logical line, but they would go past
the edge of the editor screen if not placed on separate lines. The
continuation character is also useful for long macro calls and data
initializations.
2.3 8086-Family Instructions
The 8086-family processors support more than 80 instructions, but you
don't need to memorize the entire instruction set. Once inside the
expanded QuickC environment, you can get instant information on any
instruction. Move the cursor to an instruction keyword on the screen, then
press F1. To find the appropriate instruction for the action you want to
perform, refer to Part 3 of this book, which provides a topical survey of
instructions.
Many programs can be written with just a few of the most common
instructions. Sections 2.3.1 and 2.3.2 introduce some of these
instructions, grouping them into two sets: instructions that manipulate
data and instructions that control program flow. The programs in Chapters
3 and 4 use these same instructions to illustrate basic concepts of 8086
assembly language.
2.3.1 Data-Manipulation Instructions
The first group of instructions manipulate data. Each causes the processor
to copy data or perform a calculation at run time. Some of the simpler C
statements translate directly into a single instruction, so this section
uses C statements for illustration. Here are the six basic data-
manipulation instructions introduced in this section:
■ MOV (move data)
■ ADD (add second operand to first)
■ SUB (subtract second operand from first)
■ INC (increment operand)
■ DEC (decrement operand)
■ MUL (integer multiplication)
The processor supports a great many other data-manipulation instructions,
which are covered in Part 3 of this manual.
2.3.1.1 The MOV Instruction
The MOV instruction, probably the most frequently used 8086 instruction,
copies data from one location to another. The instruction leaves the
source data unaffected, so it is more a copy than a move. The MOV
instruction takes two operands:
MOV destination,source
The instruction copies the value of the source to the destination. It
might seem more logical to place the source operand first, until you
consider that C and BASIC assignments use the same order. For example, the
instruction
mov count,5
places the value 5 at the memory location count and thus performs the same
action as the C statement
count = 5;
The destination operand is similar to an "lvalue" in C. Instructions that
have two operands always interpret the leftmost operand as the
destination, or lvalue. The destination is the operand that the
instruction can alter; thus, it can't be a constant. Another limitation on
instructions with two operands is that the operands cannot both be memory
locations.
2.3.1.2 The ADD Instruction
The ADD instruction, like MOV, takes two operands: a destination and a
source. The processor adds the two operands together, storing the result
in the destination (on the left). This action will be familiar to C
programmers, since the instruction
add sum,10
adds 10 to the memory location sum and thus performs the same action as
the C statement
sum += 10;
The 8086 does not perform automatic scaling for pointer addition as C
does. The program itself must perform scaling for all pointer arithmetic.
2.3.1.3 The SUB Instruction
The SUB instruction is the counterpart of ADD: it subtracts the source
operand from the destination operand, storing the result in the
destination (on the left). Thus, the instruction
sub total,7
performs the same action as the C statement
total -= 7;
2.3.1.4 The INC and DEC Instructions
The INC (Increment) and DEC (Decrement) instructions add and subtract 1,
respectively. They are similar to, but faster than, ADD and SUB, and are
provided because adding and subtracting by 1 are such common operations.
The instruction
inc count
performs the same action as the C statement
count++;
2.3.1.5 The AND Instruction
The AND instruction is one of several bitwise logic operations supported
by the 8086. AND provides an efficient way to mask out bits. The
instruction
and stuff,0FFF0h
masks out the four lowest bits of stuff, as does the C statement
stuff &= 0x0FFF0;
2.3.1.6 The MUL Instruction
The MUL instruction multiplies two items, but one of these items is an
"implied operand"──that is, an operand you do not specify. For example,
the 16-bit version of the MUL instruction takes one explicit 16-bit
operand:
mul factor
The other operand is the AX register. The processor multiplies factor by
the value of AX, storing the low 16 bits of the result in AX. The
description of the AX register in Section 2.5.1, "The General-Purpose
Registers," gives more information on MUL.
2.3.2 Control-Flow Instructions
The control-flow instructions enable the program to execute loops and to
make decisions. Some of these instructions transfer control of the program
to a new address. The conditional jump instructions let you provide
program logic: they look at the result of a previous operation, and then
decide whether to jump or not. Here are the five basic control-flow
instructions introduced in this section:
■ JMP (Jump unconditionally)
■ CMP (Compare──subtract without storing result)
■ JE (Jump If Equal)
■ JA (Jump If Above)
■ JB (Jump If Below)
The processor supports a number of other control-flow instructions,
including several conditional jumps. See Section 15.1.2, "Jumping
Conditionally," for a description of these instructions.
2.3.2.1 The JMP Instruction
The JMP instruction causes the processor to jump to a new program address.
Like the C goto statement, JMP takes one operand: a label associated with
another statement. The instruction
jmp begin
jumps to the label begin, and thus performs the same action as the C
statement
goto begin;
2.3.2.2 The CMP Instruction
The CMP instruction, like SUB, performs a subtraction. But CMP doesn't
store the result; instead, it just sets processor flags in preparation for
a conditional jump (such as JE, JA, or JB).
A "processor flag" is a bit that resides in the processor and indicates
whether a specific condition is on or off. For example, the Zero flag
indicates that the result of the last operation produced zero. The JE
instruction (Jump If Equal) checks this one flag only, jumping if it is
set. Other conditional jumps determine a result by checking a combination
of flag settings. See Section 2.5.4, "The Flags Register," for a
description of all the flags.
Many instructions, including SUB, set processor flags. However, some of
these instructions have strong side effects. Use ADD or SUB to prepare for
a conditional jump when convenient. But use CMP when you need to make a
simple comparison without altering data.
2.3.2.3 The Conditional Jump Instructions
The JE, JA, and JB instructions are conditional jumps (meaning Jump On
Equal, Jump If Above, and Jump If Below, respectively). Like JMP, they
each take one argument: a program label to which to jump. Unlike JMP, they
cause the processor to jump only when certain flag settings are detected.
The result is that when you use CMP in combination with a conditional jump
instruction, you create an if-then relationship similar to an if statement
in a high-level language. Consider the following instructions:
cmp sum,10 ; Compare sum to 10
ja top ; If sum > 10, jump to top
This logic is a little different from a C program. The first instruction
makes the comparison. The second states, "If the result of the previous
instruction was above zero, then jump." Taken together, these two
instructions perform the same action as the C statement
if( sum > 10 )
goto top;
Of course, most C programmers do not use many goto statements. Typically,
you would test for a condition and execute a series of statements if the
condition is true, as in the following code:
if( sum >= 10 )
{
sum = 1;
count += 2;
delta = 5;
}
To implement this code in assembly language, test for the opposite
condition, then jump past statements if they should not be executed. For
example, the following code executes the three statements inside the if
block only if sum is greater than or equal to 10:
TopOfBlock:
cmp sum,10 ; Compare sum to 10
jb SumNotGreater ; If sum < 10, do NOT do
; next three statements
mov sum,1 ; sum = 1
add count,2 ; count = count + 2
mov delta,5 ; delta = 5
SumNotGreater:
──────────────────────────────────────────────────────────────────────────
NOTE JA (Jump If Above) and JB (Jump If Below) each work properly when
you compare unsigned integers. To compare signed integers, use JG (Jump If
Greater) and JL (Jump If Less Than). See Section 15.1.2, "Jumping
Conditionally," for a complete list of conditional jump instructions.
──────────────────────────────────────────────────────────────────────────
2.4 Declaring Simple Data Objects
This section describes how to declare global variables──often called
"static" because each corresponds to a fixed memory location.
Programs generally require data. If you wrote a program in machine code,
you'd have to reserve locations in memory for data, determine the address
of each data object, and remember these addresses whenever you operated on
memory. Fortunately, the assembler reserves memory locations for you and
associates each location with a symbolic name.
You use data directives to tell the assembler how to allocate and refer to
memory. The most common data directives for characters and integers are:
Directive Description
──────────────────────────────────────────────────────────────────────────
DB Declare byte (either a small integer or a character)
DW Declare word (2-byte integer)
DD Declare doubleword (4-byte integer)
To use these directives, place the name of the variable first, then enter
the data directive. The third column (operand field) contains one or more
initial values. Use a question mark to indicate an item with no initial
value.
aByte DB 1 ; aByte is a 1-byte integer, initialized to 1
area DW 500 ; area is a 2-byte integer, initialized to 500
population DD ? ; population is a 4-byte integer, no initial value
These directives correspond roughly to the following C statements:
char aByte = 1;
int area = 500;
long population;
Assembly data declarations are different from C declarations, however, in
that assembly data declarations are not declared signed or unsigned.
Instead, you must remember whether you intend to treat a variable as
signed or unsigned, and choose the appropriate operations.
Data directives reserve memory in the object file. They also associate
each variable with a name and a size attribute.
The assembler uses this information to correctly assemble instructions
that operate on variables. For example, at the machine-code level, the INC
instruction can be encoded to increment either a byte or a word of data.
The way the assembler encodes the instruction
inc myvar
depends on whether myvar was declared as a byte or word. (If it was
declared a doubleword, the instruction is illegal.) Another important use
of size attributes is in checking the validity of two operands. For
example, the following instruction causes the assembler to print a warning
message, because aByte and bx do not share the size attribute:
mov bx,aByte ; Move aByte into a word register
Moving a byte into a word location is not possible. After issuing the
warning, the assembler adjusts the instruction as if it were written as
follows:
mov bx,WORD PTR aByte ; Move the word at aByte to BX
The PTR operator temporarily modifies the size attribute of the object
that follows it. PTR can be used with a number of different data types, as
shown below:
Keywords Refers to
──────────────────────────────────────────────────────────────────────────
BYTE PTR object The byte at address of object
WORD PTR object The word at address of object
DWORD PTR object The doubleword at address of object
However, this adjustment may not produce the action you really want. The
PTR operator is not quite the same as a type cast in C. The C (int) type
cast manipulates data so that it represents the same value, but in a
different format. WORD PTR does no data manipulation──it simply causes the
instruction to operate on the word at the given address. In the example
above, the use of WORD PTR causes two adjacent bytes of data to be loaded
from memory into BX. If what you really want is to move a single byte of
data to BX, but convert it to a word, use the following code:
mov bl,aByte ; Lower byte of BX = aByte
sub bh,bh ; Higher byte of BX = 0
The example above only works properly when handling unsigned numbers. When
working with signed quantities, use the CBW instruction, as described in
Section 13.2.1, "Extending Signed Values."
By far the most common use of WORD PTR is in operations on objects 32 bits
or longer. An 8086 instruction can operate only on a byte or a word. You
use WORD PTR to tell the assembler to operate on one word at a time. For
example, the following code uses two moves to copy the 32-bit integer X to
a similar integer, Y:
X DD 80000 ; X is a long integer = 80,000
Y DD ? ; Y is a long integer
.
.
.
mov ax, WORD PTR X ; Move word at X to word at Y
mov WORD PTR Y, ax ; (using AX as intermediate register)
mov ax, WORD PTR X[2] ; Move word 2 bytes past X to
mov WORD PTR Y[2], ax ; word 2 bytes past Y
Brackets ([ ]) are used with arrays as well as portions of large data
objects as shown here; they also let you add a displacement to an address.
The use of brackets is further explained in the next few paragraphs.
Assembly language makes almost no distinction between simple variables and
arrays. You refer to the first element of an array just as you would a
simple variable──index brackets are optional. To declare an array or
string, just give a series of initial values:
warray DW ?,?,?,?
xarray DW 1,2,3,4
mystring DB "Hello, there."
To refer to the first element of warray, type warray into your program (no
brackets required). To refer to the next element, use either of these two
forms, each of which refers to the object two bytes past the beginning of
warray:
warray+2
warray[2]
When used with a variable name, the brackets do nothing but add a number
to the address. If warray refers to the address 2400h, then warray[2]
refers to the address 2402h. However, the brackets have an additional
function when used with registers. See Section 2.6.4, "Indirect Memory
Operands," for more information.
In assembly language, array indexes are zero-based, as in C; but unlike C,
they are unscaled. The number inside brackets always represents an
absolute distance in bytes.
In practical terms, the fact that indexes are unscaled means that if the
size of an element is larger than one byte, you must multiply the index of
the element by its size (in this case, 2), then add the result to the
address of the array. Thus, the expression warray[4] represents the third
element, which is 4 bytes past the beginning of the array. Similarly, the
expression warray[6] represents the fourth element.
In general, the numeric offset required to access an array element can be
calculated as shown in the following formula:
Nth element of Array = Array[(N-1) * size of element]
2.5 8086-Family Registers
A "register" is a special memory location inside the processor itself.
Operations on registers execute faster than operations on main memory. The
processor has a limited number of registers. Moreover, many operations on
the 8086 are impossible without the use of registers at some point. For
example, you cannot copy data between two memory locations without first
moving it into a register.
Figure 2.1 shows the registers common to all the 8086-family processors.
The 8086 registers can be grouped by function into the following sets:
general-purpose registers, index registers, pointer registers, and segment
registers. Each set corresponds to a different ending letter (X, I, P, or
S). The registers in each set are as follows:
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 2.5 of the manual │
└────────────────────────────────────────────────────────────────────────┘
■ The four general-purpose registers are AX, BX, CX, and DX. These
registers exist for the general use of the program. You can use these
registers to store temporary values and perform calculations.
■ The two index registers are SI (Source Index) and DI (Destination
Index). These registers can also be used for general storage, but are
less flexible than the general-purpose registers. SI and DI have a
special purpose in string instructions.
■ The pointer registers are IP (Instruction Pointer), SP (Stack Pointer),
and BP (Base Pointer). These registers should not be confused with BX,
which is the register normally used for pointer indirection. IP, SP,
and BP each have a special purpose in conjunction with procedure calls.
SP and BP should be altered with care; IP cannot be altered or
referenced directly at all.
■ The segment registers are CS, DS, SS, and ES. This section does not
describe these registers. You generally don't alter or reference them
except when starting the program or accessing data from multiple
segments. Section 2.7, "Segmented Addressing and Segment Registers,"
describes each segment register and how it is important to programs.
In addition, there is a flags register that indicates the status of the
process.
2.5.1 The General-Purpose Registers
The general-purpose registers have many important uses in an 8086
assembly-language program, including:
■ Storing the values most frequently used. Operations on registers are
much faster than operations on memory. Therefore, place the program's
principal values in registers. In larger programs, you will probably
have too many variables to place them all into registers. You can,
however, place a value in a register while it is in heavy use.
■ Supporting operations with two or more variables. Direct
memory-to-memory operations are illegal with 8086 processors. To
operate on two memory locations, you need to first load one of the
values into a register.
■ Enabling use of all the instructions. Many instructions require the use
of a particular register. For example, the MUL instruction always works
with the AX register (or AL, if you specify a byte operand).
■ Passing or returning values in a procedure or interrupt call.
Each of the general-purpose registers──AX, BX, CX, and DX──can be accessed
as single 16-bit registers, or as two 8-bit registers. As shown in Figure
2.1, the AH, BH, CH, and DH registers represent the high-order 8 bits of
the corresponding registers. Similarly, AL, BL, CL, and DL represent the
low-order 8 bits.
This design lets you operate directly on two-byte and one-byte objects. It
also lets you load a two-byte object and then manipulate one byte at a
time.
Each of the general-purpose registers has special uses, discussed below.
2.5.1.1 The AX Register
The AX (Accumulator) register is ideal for repeated calculations. It
accumulates totals as well as the results of multiplication and division.
Using AX can add speed to your program, because some instructions have
special encodings optimized for use with AX.
Multiplication instructions always use AX. In the 16-bit version of the
MUL instruction, you specify one 16-bit value. The processor multiplies
this value by the contents of AX and stores the 16 least significant
binary digits of the result in AX. (The 16 most significant digits are
stored in DX.)
The following example multiplies base times height, and stores the result
in area. These instructions are sufficient if the result does not exceed
the limit for two-byte numbers (otherwise, the DX register will contain
the overflow):
base DW 5 ; base is a word, initialized to 5
height DW 3 ; height is a word, initialized to 3
area DW ? ; area stores 16-bit (word) product
.
.
.
mov ax,base ; AX = base
mul height ; AX = AX * height
mov area,ax ; area = result
AX has a similar use in division instructions (DIV and IDIV). See Section
14.4, "Dividing," for examples of division. Also, in port I/O
instructions, AX holds the data to write to a port and receives data read
from a port.
By convention, AX has another special use. Microsoft high-level languages
expect AX to contain a function's return value. If the return value is
longer than four bytes, the high-level languages expect DX:AX to point to
the location of the return value.
2.5.1.2 The BX Register
The BX (Base) register has great importance as a pointer or address
register. All 16-bit registers can hold addresses, but not all registers
can be used to retrieve the contents of an address. In C this operation is
called "pointer dereferencing," or "indirection." The C source code to
implement this action might look like this:
value = *pVar;
The following assembly code achieves the same effect:
mov bx,pVar ; BX = pVar
mov value,[bx] ; value = object pointed to by BX
The brackets around BX in the second instruction direct QuickAssembler to
consider BX a pointer to the actual operand. The item [bx] is an example
of an indirect memory operand. See Section 2.6.4, "Indirect Memory
Operands," for more information.
2.5.1.3 The CX Register
The CX (Count) register has special meaning to instructions with a
repeat-operation feature. The contents of CX indicate how many times to
repeat execution. Loops, string operations, certain jump instructions, and
shifts and rotates all use CX this way.
A common instruction that uses CX to repeat execution is LOOP, which is
analogous to the C for statement. This instruction subtracts one from CX,
then jumps to the given label if CX is not equal to 0. Thus, the following
loop executes 20 times:
mov cx,20
top:
.
.
.
loop top
In the case of shifts and rotates, CL (the lower byte of CX) indicates how
many bit positions to shift. See Section 14.7, "Shifting and Rotating
Bits," for more information. Also, when an instruction has a REP (repeat)
prefix, the value in CX determines how many times the instruction is
executed.
2.5.1.4 The DX Register
The DX (Data) register often is used only for storage of temporary values.
However, DX has a special function in some versions of the multiplication,
division, and port instructions. Each of these uses is closely related to
AX. In fact, DX is located next to AX in the actual physical layout of the
8086 chip. (Figure 2.1 places the registers in the order AX, BX, CX, and
DX merely for ease of reference.)
When you multiply 16-bit values with MUL, DX holds the high 16 bits of the
32-bit result. The following example is a variation of the one given for
AX. In this example, Area is a 32-bit value (a long integer), and it
stores the entire 32-bit result of the MUL instruction:
base DW 500 ; base is a word, initialized to 500
height DW 300 ; height is a word, initialized to 3
area DD ? ; area stores doubleword product
.
.
.
mov ax,base ; AX = base
mul height ; DX:AX = AX * height
mov WORD PTR area[0],ax ; Store low 16 bits
mov WORD PTR area[2],dx ; Store high 16 bits
By convention, Microsoft high-level languages use both DX and AX to return
four-byte values from procedures. The high 16 bits are placed in DX.
2.5.2 The Index Registers
The two index registers are SI (Source Index) and DI (Destination Index).
These registers are similar to the general-purpose registers, but cannot
be accessed one byte at a time. Index registers are efficient places to
store general data, pointers, array indexes, and pointers to blocks of
memory. They have the following special uses:
■ You can use both SI and DI for pointer indirection, as you can BX and
BP. "Pointer indirection" is the process of retrieving the value that a
pointer points to.
■ You can use SI or DI to hold an array index. Indirect memory operands
can combine this index with a base address stored in BX or BP.
■ You prepare for string instructions, which execute highly efficient
block operations, by loading SI with a source address and DI with a
destination address.
See Chapter 16, "Processing Strings," for information on how to use
string instructions.
When you write a procedure to be called by C, be careful to leave SI and
DI in the same state they were in before C called your procedure.
Microsoft QuickC allocates register variables in SI and DI.
2.5.3 The Pointer Registers
The pointer registers──BP, SP, and IP──are all special-purpose registers
that help implement procedure calls. The processor alters SP (Stack
Pointer) and IP (Instruction Pointer) whenever you call a procedure, and
you can use BP (Base Pointer) to access parameters placed on the stack.
Despite their names, pointer registers are not good places to store
pointer variables or other general program data; you should generally use
BX, SI, and DI for that purpose.
2.5.3.1 The BP Register
You can use BP (Base Pointer) to retrieve the contents pointed to by an
address. However, by default, the BP register points into the stack
segment rather than the data segment. Therefore, BP is typically used to
access items on the stack.
The "stack" is the area of memory that holds parameters, local variables,
and return addresses for each procedure being executed. Although you can
store general data in BP, it is commonly used to access parameters of the
current procedure.
When you use the PROC statement with a parameter list as explained in the
next chapter, avoid altering the value of BP. The PROC directive generates
instructions that set BP to point to the procedure's local stack area, and
then use BP to access parameters and local data. If BP changes, all your
references to parameters will be wrong.
To learn how to set BP yourself, see Section 15.3.3, "Passing Arguments
on the Stack," or Appendix A, "Mixed-Language Mechanics."
2.5.3.2 The SP Register
The SP (Stack Pointer) register points to the current location within the
stack segment. As you add or remove items from the stack, the processor
changes the value of SP, so that SP always points to the top of the stack.
The processor stack works like a stack of dishes: you push items onto the
top of the stack as you need to save them, then pop them off the stack
when you're ready to use them again. The stack is a last-in-first-out
mechanism. You can only remove the item currently at the top of the stack.
Items must be removed in the reverse order they were placed there.
The processor automatically pushes and pops return addresses for you when
you call or return from a procedure. A "return address" is the place a
procedure or routine returns to when done. You can also place other values
on the stack by using the PUSH and POP instructions.
The PUSH instruction saves the value of a register or memory location by
placing it on the stack. POP removes the value from the stack and places
it back in the original location. (You can also pop the contents into some
other location if you wish.) Use these instructions when you need to
preserve a value. In the following example, BX holds an important value,
but the program needs temporary use of BX:
push bx ; Save BX on the stack
mov bx,pointer ; Load pointer into BX
mov value,[bx] ; value = *pointer
pop bx ; Pop old value back into BX
The stack also holds parameters and local variables during procedure
calls. Sections 13.4.2, "Using the Stack," and 15.3.3, "Passing
Arguments on the Stack," provide more information on using the stack.
Appendix A, "Mixed-Language Mechanics," explains how to manipulate the
stack to make room for local variables──one of the few times you should
change the value of SP directly.
2.5.3.3 The IP Register
You cannot adjust the IP (Instruction Pointer) register directly; it can
only be adjusted indirectly, through control-flow instructions. For this
reason, Quick-Assembler does not even recognize IP as a keyword.
The IP register contains the address of the next instruction to execute.
The instructions that control program flow (calls, jumps, loops, and
interrupts) automatically set the instruction pointer to the proper value.
The processor pushes the address of the next instruction onto the stack
when you call a procedure. The processor pops this instruction into IP
when the procedure returns. Normally, the processor increments IP to point
to the next instruction in memory.
2.5.4 The Flags Register
The flags register, shown in Figure 2.2, is a 16-bit register made up of
bits that each indicate some specific condition. Most of the flags help
determine the behavior of conditional jump instructions. Many
instructions──most notably CMP──set these flags in a meaningful way. Other
flags (Trap, Interrupt Enable, and Direction) do not affect conditional
jump instructions but control the processor's general operation.
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 2.5.4 of the manual │
└────────────────────────────────────────────────────────────────────────┘
The nine flags common to all 8086-family processors are summarized below,
progressing from the low-order to high-order flags. In these descriptions,
the term "set" means the bit value is 1, and "cleared" means the bit value
is 0.
Instructions actively set and clear various flags. For example, if the
result of a SUB or CMP instruction is zero, it sets the Zero flag. This
flag setting can, in turn, affect subsequent instructions──in particular,
conditional jumps. Some instructions do not set the flags at all, or have
random effects on some flags. Consult on-line Help for each instruction to
see precisely how it affects flag settings.
Flag Description
──────────────────────────────────────────────────────────────────────────
Carry Is set if an operation generates a carry to or a
borrow from a destination operand. (Operation viewed
as unsigned.)
Parity Is set if the low-order bits of the result of an
operation contain an even number of set bits.
Auxiliary Carry Is set if an operation generates a carry to or a
borrow from the low-order four bits of an operand.
This flag is used for binary coded decimal arithmetic.
Zero Is set if the result of an operation is 0.
Sign Equal to the high-order bit of the result of an
operation (0 is positive, 1 is negative).
Trap If set, the processor generates a single-step
interrupt after each instruction. Debugging programs,
including the QuickC/QuickAssembler debugging
facility, use this feature to execute a program one
instruction at a time.
Interrupt Enable If set, interrupts will be recognized and acted on as
they are received. The bit can be cleared to
temporarily turn off interrupt processing.
Direction Can be set to make string operations process down from
high addresses to low addresses, or can be cleared to
make string operations process up from low addresses
to high addresses.
Overflow Is set if the result of an operation is too large or
small to fit in the destination operand. (Operation
viewed as signed.)
The Carry and Overflow flags are similar, but have one major difference:
the Carry flag is set according to the rules of unsigned operations, and
the Overflow flag is set according to the rules of signed operations. A
signed operation uses two's complement arithmetic to represent negative
numbers. One of the features of this system is that a number is negative
if the most significant bit is set. Unsigned operations do not view any
number as negative.
Thus, the same ADD operation can be viewed as adding FFFFH to FFFEH
(unsigned) or -1 to -2 (signed). This operation would set the Carry flag
(because the maximum unsigned value is FFFFH), but not the Overflow flag.
──────────────────────────────────────────────────────────────────────────
NOTE This manual does not describe the details of two's-complement
arithmetic. For more information, see one of the references listed in the
Introduction.
──────────────────────────────────────────────────────────────────────────
Each of the conditional jump instructions responds to a particular flag or
combination of flags. For example, the JZ (Jump If Zero) instruction jumps
if the Zero flag is set. The JBE (Jump If Below or Equal) jumps if either
the Zero flag or the Carry flag is set. For a description of all the
conditional jump instructions, see Section 15.1.2, "Jumping
Conditionally."
2.6 Addressing Modes
You can specify several kinds of operands: immediate, register, direct
memory, and indirect memory. Each type of operand corresponds to a
different addressing mode. The "addressing mode" is the method that the
processor uses to calculate the actual value of the operand at run time.
You don't specify addressing modes explicitly. You simply give an operand,
and the assembler determines the corresponding addressing mode.
The four types of operands are summarized below, and described at length
in the rest of this section.
Operand Type Description
──────────────────────────────────────────────────────────────────────────
Immediate A constant value contained in the instruction itself
Register A 16-bit or 8-bit register
Direct memory A fixed location in memory
Indirect memory A memory location determined at run time by using the
address stored in one or two registers
Direct memory and indirect memory operands are closely related. Syntax
displays in this manual, as well as in on-line Help, often refer to memory
operands. You can use either type of memory operand wherever memory is
specified. From the processor's viewpoint, the only difference between
these types of operands is how the address is determined. The address
specified in the memory operand is called the "effective address" of the
instruction.
Most two-operand instructions require operands of the same size. When one
of the operands is a register, QuickAssembler adjusts the size of the
other, if possible, to be the size of the register──either 8 or 16 bits.
An instruction that operates on AX and BL is illegal, since these
registers are different sizes.
If the sizes conflict, you can sometimes use the PTR operator to override
the size attribute of an operand.
Sections 2.6.1-2.6.4 discuss each of the four operand types (and
corresponding addressing modes) in detail.
2.6.1 Immediate Operands
An "immediate operand" is a constant value on which the instruction
operates directly. This is the only addressing mode that involves no
further access of registers or memory. The data follows the instruction
right inside the executable code, thus giving rise to the name
"immediate."
Use immediate operands for the same reasons you would use a literal or
symbolic constant in C. The value of an immediate operand never changes.
An immediate operand can be a symbolic constant declared with the EQU
operand. This operand is often used for the same purpose as the C #define
directive. For example, consider the constant declaration:
magic EQU 7243
You could use this the same way as the C statement:
#define magic 7243
Chapter 11, "Using Equates, Macros, and Repeat Blocks," tells more about
defining constants with the EQU or = operator.
An immediate operand can also be an expression made up of constants. For
example, the following code directs QuickAssembler to calculate the
difference between two ASCII values, then use this difference as the
source (rightmost) operand:
mov bigdiff,'a'-'A'
The assembler interprets the one-byte strings 'a' and 'A' as the ASCII
values 97 and 65. The assembler calculates the difference──in this case,
32──and places the resulting value into the object code. At run time, this
value is fixed. Each time the instruction is executed, the processor moves
the value 32 into the memory location bigdiff. This instruction is
precisely equivalent to, but more readable than, the following:
mov bigdiff,32
One-byte and two-byte strings can be immediate operands. Larger strings
cannot be processed by a single 8086 instruction. Chapter 3, "Writing
Assembly Modules for C Programs," explains how to process longer strings,
one character at a time.
The OFFSET and SEG operators turn variable names (which normally are
memory operands) into immediate operands. These operators are similar to
the address operator (&) in C. In Chapter 4, "Writing Stand-Alone
Assembly Programs," you'll see how to use the OFFSET operator to treat an
address as immediate data.
When an instruction has two operands, you cannot place immediate data in
the destination (leftmost) operand. (The OUT instruction is the one
exception.)
Examples
var DW ?
college DW 1636
nine EQU 5+4 ; Declare nine as symbolic constant
.
.
.
mov var,nine ; Move immediate data to memory
mov bx,'ab' ; Move ASCII values for 'a' and 'b'
; into BH and BL
mov college,1701 ; Move immediate data to memory
mov ax,1+2+3+4 ; Move immediate data to AX
mov ax,OFFSET var ; Move address of var to AX
int 21h ; Immediate data is single operand
; 21 hexadecimal (33 decimal)
2.6.2 Register Operands
A register operand consists of one of the 20 register names. The processor
operates directly on the data stored in the register. "Register-direct"
mode refers to the direct use of the value of the register rather than a
memory location. Registers can also be used indirectly, to point to memory
locations as described in Section 2.6.4, "Indirect Memory Operands."
Most instructions can take one or more register operands. You generally
can use any of the general-purpose registers with these instructions,
although some instructions require specific registers. The use of segment
registers (CS, DS, SS, and ES) is restricted. You can refer to segment
registers only under special circumstances.
Table 2.1 shows all the valid register names for 8086 processors. You can
use any of these names as a register-direct operand.
Table 2.1 Register Operands
Register Type Register
Name
──────────────────────────────────────────────────────────────────────────
8-bit high registers AH BH CH DH
8-bit low registers AL BL CL DL
16-bit general AX BX CX DX
purpose
16-bit pointer and SP BP SI DI
index
16-bit segment CS DS SS ES
──────────────────────────────────────────────────────────────────────────
Section 2.5, "8086-Family Registers," discusses registers in more detail.
Limitations on register use for specific instructions are discussed in
sections on the specific instructions throughout Part 3, "Using
Instructions."
Examples
mov ds,ax ; Both operands are register direct
mov stuff,dx ; Source operand is register direct
mov ax,1 ; Destination is register direct
mul bx ; Single operand, register direct
2.6.3 Direct Memory Operands
A direct memory operand specifies a fixed address in main memory
containing the data to operate on. At the machine level, a direct memory
operand is a numeric address. In your QuickAssembler source code, you
usually represent a direct memory operand by entering a symbolic name
previously declared with a data directive such as DB (Declare Bytes).
A direct memory operand is similar to a simple variable in C or an array
element with a constant index. Any object in memory can be a direct memory
operand as long as the exact location is fixed in the executable code. The
data at the location can change, but the location itself is the same each
time the processor executes the instruction. This fact gives direct memory
operands a static character. For more dynamic operations, use indirect
memory operands.
Examples
mov ax,count ; Source operand is direct memory
mov count,ax ; Destination operand is direct memory
inc total ; Single operand is direct memory
Typically, a direct memory operand is a simple label. As with immediate
operands, you can specify a direct memory operand by entering an
expression. As long as the address can be determined at assembly time, the
operand is direct memory.
──────────────────────────────────────────────────────────────────────────
NOTE Technically, a program address is not determined until link time (in
the case of near addresses) or load time (in the case of segment
addresses). These adjustments are necessary to support multiple modules
and to enable the program to run anywhere in memory. However, you can
ignore these details. If the assembler can determine the operand's address
relative to the rest of the module, the operand is direct memory.
──────────────────────────────────────────────────────────────────────────
The following example uses an expression that translates to a direct
memory operand. This example could be used to load the value of DX into
the third element of an array of bytes. QuickAssembler considers area[2]
as equivalent to area+2.
mov area[2],dx ; Move DX to memory location 2 bytes
; past the address of "area"
In the statement above, the assembler calculates an address by adding 2 to
the address of area. The resulting address will be the same no matter what
values are stored in registers. At run time, the address is fixed. Thus,
the operand is direct memory.
You can use a numeric constant as a direct memory operand. Normally,
Quick-Assembler interprets a numeric constant as an immediate operand. To
ensure interpretation as a memory operand, prefix the number with a
segment register and colon (:). Brackets are optional. The following
instructions each load AX with the contents of memory address 100
hexadecimal in the data segment:
mov ax,ds:[100h]
mov ax,ds:100h
Section 2.7, "Segmented Addressing and Segment Registers," provides more
information on segment registers and the use of the colon (:). By default,
the processor assumes that data references lie in the segment pointed to
by DS.
2.6.4 Indirect Memory Operands
With indirect memory operands, the processor calculates the address of the
data at execution time, by referring to the contents of one or two
registers. Since values in the registers can change at run time, indirect
memory operands provide the most dynamic method for accessing data.
Indirect memory operands make possible run-time operations such as pointer
indirection, dynamic indexing of array elements──including indexing of
multi-dimensional arrays──and dynamic accessing of members of a structure.
All these operations are similar to operations in high-level languages.
The major difference is that assembly language requires you to use one of
several specific registers: BX, BP, SI, and DI.
You indicate an indirect memory operand by using at least one pair of
brackets. Use of the index operator ([ ]) is explained in more detail in
Section 9.2.1.3.
When you place a register name in brackets, the processor uses the data
pointed to by the register. For example, the following instruction
accesses the data at the address contained in BX, and then moves this data
into AX:
mov ax,[bx]
When you specify more than one register, the processor adds the contents
together to determine the effective address (the address of the data to
operate on). One register must be a base register (BX or BP), and the
other must be an index register (SI or DI):
mov ax,[bx+si]
You can specify one or more displacements. A "displacement" is a constant
value to add to the effective address. A simple use of a displacement is
to add a base address to a register:
mov ax,table[si]
In the example above, the displacement table is the address of an array;
SI holds an index to an array element. (Unlike C, an assembly-language
index always indicates the distance in bytes between the beginning of the
array and the element.) Each time the instruction executes, it may load a
different element into AX. The value of SI determines which array element
to load.
Each displacement can be an address or numeric constant. If there is more
than one displacement, the assembler adds them all together at assembly
time, and places the total displacement into the executable code. For
example, in the statement
mov ax,table[bx][di]+6
both table and 6 are displacements. The assembler adds the value of table
to 6 to get the total displacement.
Table 2.2 shows the modes in which registers can be used to specify
indirect memory operands.
Table 2.2 Indirect Addressing Modes
Mode Syntax Description
──────────────────────────────────────────────────────────────────────────
Register indirect [BX] [BP] [DI] [DI] Effective address is contents
of register
──────────────────────────────────────────────────────────────────────────
Based or indexed displacement[BX] Effective address is contents
displacement[BP] of register plus displacement
displacement[DI]
displacement[SI]
──────────────────────────────────────────────────────────────────────────
Based indexed [BX][DI] [BP][DI] Effective address is contents
[BX][SI] [BP][SI] of base register plus contents
of index register
──────────────────────────────────────────────────────────────────────────
Based indexed with displacement[BX][DI] Effective address is the sum
displacement displacement[BP][DI] of base register, index
displacement[BX][SI] register, plus displacement
displacement[BP][SI]
──────────────────────────────────────────────────────────────────────────
You can enclose each register in its own pair of brackets, or you can
place the registers in the same pair of brackets separated by a plus sign
(+). The period (.) is normally used with structures, but it also
indicates addition. The following statements are equivalent:
mov ax,table[bx][di]
mov ax,table[bx+di]
mov ax,[table+bx+di]
mov ax,[bx][di].table
mov ax,[bx][di]+table
mov ax,table[di][bx]
2.7 Segmented Addressing and Segment Registers
"Segmented addressing" is the internal mechanism that enables the
processor to address up to one megabyte of main memory. This mechanism
accesses each physical memory location by combining two 16-bit addresses.
The two addresses can be represented in source code as follows:
segment:offset
The first 16-bit address is the "segment address." The second 16-bit
address is the "offset address." In effect, the segment address selects a
64K region of memory, and the offset address selects a byte within this
region. Here's how it works:
1. The processor shifts the segment address left by four places, producing
a 20-bit address ending in four zeros. This operation has the effect of
multiplying the segment address by 16.
2. The processor adds this 20-bit address to the 16-bit offset address.
The offset address is not shifted.
3. The processor uses the resulting 20-bit address, often called the
"physical address," to access an actual location in the one-megabyte
address space.
Figure 2.3 illustrates this process. The 8086-family processors were
developed to use this mechanism because 16 bits (the size of an 8086
register) can only address 64K at a time. However, the combined 20-bit
address is sufficient to address a full megabyte. Note that DOS and ROM
BIOS reserve part of this area, so that no more than 640K is available for
program addresses.
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 2.7 of the manual │
└────────────────────────────────────────────────────────────────────────┘
A "segment" consists of a series of addresses that share the same segment
address, but different offsets. Segments can be no more than 64K in size.
To create large programs, you need to divide your program into multiple
segments. Even with smaller programs, it is convenient to have separate
code, data, and stack segments. (With tiny-model programs, the linker
combines these segments into a single physical segment.)
The following example helps illustrate segmented-address calculations
further. The processor calculates the address 53C2:107A by multiplying the
segment portion of the address by 16 (10H), and then adding the offset
portion, as shown below:
53C20h Segment times 10h
+ 107Ah Offset
54C9Ah Physical address
The use of segmented architecture doesn't mean that you have to specify
two addresses every time you access memory. The 8086-family processors use
four segment registers, which simplify programming in the following ways:
■ Normally, you don't specify a segment address when you access data.
Every data reference is relative to one of the four segment
registers──CS, DS, SS, or ES──so the segment address is implied.
■ Most of the time, you don't need to tell the processor which segment
register to use. By default, the processor uses CS for code addresses,
DS for data addresses, and SS for stack addresses, except where
otherwise noted in this section.
■ You initialize segment registers at the beginning of your program. Once
initialized, you can continue to use the segment addresses stored in
those registers.
If the program uses medium, large, huge, or compact model, you may need to
periodically reload one or more of the segment registers. These memory
models let you use more than 64K of code or 64K of data.
However, if the program uses small or tiny model, you never reload a
segment register except in the following situations: to access a special
hardware-defined location in memory, such as the video-display area, or to
access far memory allocated to the program by DOS function 48H.
Although each memory operand has a default segment register (usually DS,
unless the operand uses BP), you can specify another segment register by
using the segment override operator (:). The following example loads the
variable far_away residing in the segment pointed to by ES:
mov ax,es:far_away
For more information on this operator, see Section 9.2.3,
"Segment-Override Operator."
The CS Register
The processor always uses the CS (Code Segment) register as the segment
address of the next instruction to execute; IP (Instruction Pointer) holds
the offset address. CS:IP represents the full address of the next
instruction.
Near jumps and procedure calls alter the value of IP. Far jumps and
procedure calls alter both CS and IP. You never alter CS directly because
the far jump and call instructions do so automatically. Furthermore, DOS
initializes CS for you at the beginning of the program.
The DS Register
By default, the processor uses the DS (Data Segment) register as the
segment address for program data. String instructions and indirect memory
operands present some exceptions to this rule. With indirect memory
operands, the use of BP anywhere in the operand causes SS to be the
default segment register. Otherwise, DS is the default.
All the Microsoft standard memory models place the most frequently used
data in an area pointed to by DS. This area is commonly called the
"default data area," and it can be no larger than 64K. These memory models
use the ES register to access data outside the default data area. Your own
programs can either use this technique, or else reload DS whenever you
enter a new module. The standard method has the advantage of providing
fast access to the most frequently used data.
The SS Register
When the processor accesses data on the stack, it uses the SS (Stack
Segment) register as the segment register. (See the description of SP in
Section 2.5.3 for more information about the stack.) Thus, SS:SP always
points to the current stack position. Indirect memory operands involving
BP also use SS as the default segment register.
The Microsoft standard memory models set SS equal to DS. This setting
makes some programming tasks easier. In particular, it lets you address
stack or data addresses with either register. If you have to reload DS,
you can always access items in the default data area by using an SS
override.
The ES Register
The ES (Extra Segment) register is convenient for accessing data outside
of the default data area. As demonstrated in Section 3.4, "Decimal
Conversion with Far Data Pointers," you access far data by loading ES with
the desired segment address, and then giving a segment override. Section
13.3.2, "Loading Far Pointers," provides further information.
ES also plays a role in string instructions. With these instructions, the
DI (Destination Index) register is always relative to the segment address
in ES.
────────────────────────────────────────────────────────────────────────────
Chapter 3: Writing Assembly Modules for C Programs
As a C programmer, you can take advantage of the superior speed and
compactness of assembly-language routines. You can write most of your
program in C, then write time-critical routines in assembly language.
This chapter presents QuickAssembler programming techniques for
interfacing to C. You can use similar techniques to interface with other
languages. By using C with assembly language, however, you gain the
advantage of being able to develop the entire program from within the
integrated environment.
If you've read Chapter 2, read this chapter to see how to use assembly
language in a complete example module. If you skipped over Chapter 2, you
may want to refer to it occasionally for basic concepts, such as
instructions and registers.
3.1 A Skeleton for Procedure Modules
Let's start by looking at the skeleton of a module with one procedure. The
"skeleton" consists of statements that give basic structure to the module.
Within this structure, you can supply most any instructions you want.
Later sections of this chapter flesh out the skeleton by supplying useful
code.
The following skeleton assumes that the module is called by a small-model
C program, and consists of one procedure which takes a single parameter, a
pointer to a byte:
.MODEL small,c
.CODE
dectoint PROC Array:PTR BYTE
;
; (supply executable code here)
;
dectoint ENDP
END
Some features of the skeleton change when you write different procedures.
Other parts may remain the same. In particular, you'll need to add a PROC
and ENDP statement each time you add another procedure to the module.
Before looking at a full program example, let's examine each part of the
skeleton.
3.1.1 The .MODEL Directive
The .MODEL directive gives general information about the module. It uses
the following syntax:
.MODEL memorymodel [[,langtype [[,stacktype]]]]
The last two fields are optional. Commas are field separators and are only
required if you use more than one field. Usually, you'll want to enter
values in the first two fields.
The memorymodel and langtype fields correspond to the memory model and
language, respectively, of the calling module. If your C program declares
your procedure to be of type pascal or fortran, use Pascal, BASIC, or
FORTRAN in the langtype field. These keywords specify the use of the non-C
calling and naming conventions. Otherwise, specify C as the langtype.
Although the langtype field is optional, you should supply it since the
PROC features described later in this chapter require it.
Don't use the stacktype field unless the calling C program is compiled
with SS not equal to DS, in which case you should type in farStack.
(QuickC does not generate code that sets SS not equal to DS, but other
versions of Microsoft C do support this option.) The default is nearStack,
which assumes SS is equal to DS.
3.1.2 The .CODE Directive
The .CODE directive marks the beginning of the code segment, which is the
section of your program that contains the actual steps to execute:
.CODE
Statements that follow this directive are considered part of the code
segment. The segment continues to the end of the module or the next
segment directive. Typically, the code segment consists of instructions
and procedure definitions. It can also contain macro calls.
Some procedures work with static data. In Chapter 4, "Writing Stand-Alone
Assembly Programs," you'll see how to declare a data segment in which you
can place data declarations.
3.1.3 The PROC Directive
Use the PROC directive to define a procedure. The name of the procedure
appears in the first column:
dectoint PROC Array:PTR BYTE
Because the .MODEL statement specified C-language conventions, the
assembler prefixes the name dectoint with an underscore (_), and places
the name into object code as a public code label.
If your procedure alters registers that should be preserved, the optional
USES keyword automatically generates code to push the value of these
registers on the stack and pop them when the procedure returns. Procedures
called by C should not corrupt the value of SI, DI, or the segment
registers CS, DS, or SS. (The value of BP is automatically preserved.) The
following example shows how to preserve SI and DI for a procedure that
changes these registers:
dectoint PROC USES si di, Array:PTR BYTE
The last part of the statement declares one or more parameters. In this
case, the procedure declares a single parameter, Array, as a pointer to a
byte. The most common parameter types you can declare are listed below:
Declaration Meaning
──────────────────────────────────────────────────────────────────────────
WORD Word (two bytes)
DWORD Doubleword (four bytes)
PTR BYTE Pointer to a byte; most commonly, a pointer to a
character string
PTR WORD Pointer to a word; typically, the address of an array
of integers
PTR DWORD Pointer to a doubleword
For example, the following procedure definition declares a procedure named
MidStr, which takes as parameters two pointers to character strings and
one integer:
MidStr PROC Str1:PTR BYTE, Str2:PTR BYTE, Index:WORD
References to parameters are really references to locations on the stack.
C modules pass parameters by pushing them on the stack just before calling
the procedure. The BP register serves as a framepointer (a pointer to the
procedure's stack area), and each parameter is an offset from BP. The
exact offset of each parameter depends on the memory model and calling
convention, both established by the .MODEL directive.
When you use QuickAssembler procedure definitions, the assembler automates
the work of referring to parameters. Instead of setting up the
framepointer or calculating parameter offsets, you simply refer to
parameters by name. You can also use these names with debugging commands.
Appendix A, "Mixed-Language Mechanics," shows the actual code that
establishes BP as the framepointer. It also shows how to calculate
parameter offsets.
Section 6.4.3, "Procedure Labels," gives the complete syntax and rules
for using the PROC statement.
3.1.4 The ENDP and END Statements
The module ends with two statements: ENDP, which declares the end of a
procedure, and END, which declares the end of the module:
dectoint ENDP
END
You can place any number of procedures in the same module. Each time you
end a procedure, use ENDP. However, END should only occur once, at the end
of the module.
3.2 Instructions Used in This Chapter
The instructions below were introduced in Chapter 2, "Introducing 8086
Assembly Language." They are summarized here briefly for review. The first
group of instructions manipulates data:
Instruction Description
──────────────────────────────────────────────────────────────────────────
MOV destination, source Copies value of source to destination
ADD destination, source Adds source to destination, storing result in
destination
SUB destination, source Subtracts source from destination, storing
result in destination
INC destination Increment──adds 1 to destination
DEC destination Decrement──subtracts 1 from destination
MUL source Multiplies source by AX (if operand is 16 bits),
storing high 16 bits in DX and low 16 bits in AX
The second group of instructions controls the flow of program execution:
Instruction Description
──────────────────────────────────────────────────────────────────────────
CMP destination, Compare──subtracts source from destination, ignoring
source result but setting processor flags appropriately
JE label Jumps to label if result of last operation was equal
to zero
JAE label Jumps to label if result of last operation was equal
to or above zero (unsigned operations)
JMP label Jumps unconditionally to label
3.3 Decimal Conversion Example
This section uses a decimal-conversion example to illustrate the use of
some basic instructions and directives. It features an assembly module
that takes a pointer to a null-terminated string of characters as input
and returns an unsigned integer value. This chapter assumes that the value
is unsigned.
You can compute the value of a decimal string by multiplying each digit by
a power of 10:
2035 = 2 x 10 cubed + 0 x 10 squared + 3 x 10 + 5
One way to calculate the value of the number is to calculate each power of
10 separately, then multiply each digit by the corresponding power. For
example, you can calculate 10 cubed, and then multiply by 2.
A much more efficient algorithm combines the calculations for powers of
10. The algorithm adds each digit to a running total, then multiplies the
total by 10 after every digit but the last. The following pseudo-code
represents this algorithm, and assumes that the first character in the
string is the most significant digit:
initialize total to 0
while there's another digit
add value of digit to total
advance to next digit
if no more digits
we're done
else
multiply total by 10
A simple C program that calls the procedure might look like this:
extern unsigned int dectoint( char * );
main()
{
char digits[81];
gets( digits );
printf( "Numeric value is: %d", dectoint( digits ) );
}
The procedure itself could be written in C as:
unsigned int dectoint( char *Array)
{
unsigned int total = 0; /* Initialize total */
while( *Array != '\0' ) /* While there's another digit
{
total += *Array - '0'; /* Add value to total */
Array++; /* Advance to next digit */
if( *Array == '\0' ) /* If no more digits, */
break; /* we're done */
total *= 10; /* Else, multiply by 10 */
}
return( total );
}
This chapter shows how to write the same procedure in assembly language.
The assembly-language version will be faster because it can make strategic
use of registers and choose optimal instructions. You can write a main
module with C code, place the assembly routine in a separate module with a
.ASM extension, then link them together by creating a program list.
──────────────────────────────────────────────────────────────────────────
NOTE You can build mixed-language programs by placing both .C and .ASM
files in a program list. Place the main module first. In the Assembler
Flags dialog box, make sure that you select either Preserve Case or
Preserve Extrn (the default). From the QCL command line, use the /Cl
(preserve case) or /Cx (preserve case of external symbols) option. QC
calls the linker with case sensitivity on, so C and assembler symbols must
match exactly.
──────────────────────────────────────────────────────────────────────────
Before writing the assembly procedure, we first need to develop a strategy
for using registers.
The AX (Accumulator) register is ideal for keeping the running total. The
algorithm changes this total through both addition and multiplication. The
MUL instruction requires the use of AX. By keeping the total in AX at all
times, the procedure avoids having to constantly reload this register.
The BX register should be used to access the individual digits. The
procedure receives the address of the digit string, and then retrieves
each ASCII byte through pointer indirection. BX is one of the few
registers that supports this operation. SI and DI could also be used this
way, but C-generated code requires that SI and DI be preserved. BX can be
freely altered.
The procedure needs to allocate two more registers: one for holding the
multiplication factor (10), and another for adjusting the binary value of
the digit. The procedure uses CX and DX for these purposes. In this case,
CX and DX are interchangeable. However, we use CX for multiplication now,
because in the hex conversion example, CX will be needed for a special
kind of multiplication──shifting bits. We use DX as an intermediate
location to receive a byte and then add a word to AX.
The complete assembly-language module is shown below:
.MODEL small,c
.CODE
dectoint PROC Array:PTR BYTE
sub ax,ax ; ax = 0
mov bx,Array ; bx = Array
mov cx,10 ; factor = CX = 10
sub dx,dx ; dx = 0
cmp BYTE PTR [bx],0 ; Compare byte to NULL
je done ; If byte=0 we're done
top:
mov dl,BYTE PTR [bx] ; Get next digit
sub dl,'0' ; Convert numeral
add ax,dx ; Add to total
inc bx ; Point to next byte
cmp BYTE PTR [bx],0 ; Compare byte to NULL
je done ; If byte=0 we're done
mul cx ; AX = AX * 10
jmp SHORT top ; Goto top of loop
done:
ret ; Exit procedure
dectoint ENDP
END
We'll examine each section of the module in turn. The first three
statements are directives that form part of the module's skeleton. The
PROC directive, when used with one or more parameters as it is here,
generates code to set the framepointer (BP) properly so that you can
access parameters.
.MODEL small,c
.CODE
dectoint PROC Array:PTR BYTE
The rest of the module consists of instructions──the actual core of the
program. The first four instructions initialize the registers AX, BX, CX,
and DX. Note that when initializing a register to 0, the procedure uses
SUB in preference to MOV. Any value subtracted from itself leaves zero in
the destination operand. Although the result is the same, the SUB
instruction is smaller and faster because it involves no immediate data.
sub ax,ax ; ax = 0
mov bx,Array ; bx = Array
mov cx,10 ; factor = CX = 10
sub dx,dx ; dx = 0
The next two instructions handle a special case──that of a string
containing no digits at all. Recall that the procedure is passed a
null-terminated string. The operand BYTE PTR [bx] is a memory operand
referring to the byte pointed to by BX. If the string is empty, Array
points to a null byte. The two instructions test for a 0 (null) value and
jump to the end of the procedure if 0 is detected:
cmp BYTE PTR [bx],0 ; Compare byte to NULL
je done ; If byte=0 we're done
In the CMP instruction above, the BYTE PTR operator is strictly required,
because otherwise the assembler would have no way of knowing whether to
compare 0 to the byte or a word pointed to by BX. However, when one of the
operands is a register (as is the case with the MOV instruction below),
the BYTE PTR operator is optional.
The next eight instructions consist of a loop executed once for every
digit character in the string. The label top indicates the top of the
loop, and the first three instructions add the value of the digit to AX:
top:
mov dl,BYTE PTR [bx] ; Get next digit
sub dl,'0' ; Convert numeral
add ax,dx ; Add to total
The first instruction above retrieves the digit. The next instruction
converts the digit's ASCII value to the numeric value by subtracting the
value of the character '0' (48 decimal). This statement works because the
ASCII character set places all digit characters in sequence from 0 to 9.
Finally, the procedure adds the resulting value to the running total
stored in AX. Note that the operands in each case are the same size. The
first two instructions above access DL, the low byte of DX.
The next three instructions advance to the next byte in the string, and
test it for equality to zero. Getting the next byte is just a matter of
adding the value 1 to BX (with the INC instruction), so that BX points to
the next byte. The other two instructions are identical to previous
instructions that tested for zero value.
inc bx ; Point to next byte
cmp BYTE PTR [bx],0 ; Compare byte to NULL
je done ; If byte=0 we're done
If the next byte is a null byte, the processor jumps to the end of the
program. Otherwise, the processor continues executing the bottom of the
loop, which multiplies the current total by 10 (stored in CX), and then
jumps to the top:
mul CX ; AX = AX * 10
jmp SHORT top ; Goto top of loop
Notice the operator SHORT used with the jmp instruction. This optional
operator makes the encoded instruction smaller and faster, but it can be
used only if the destination of the jump is less than 128 bytes away.
SHORT is explained in more detail in Section 9.2.4.2.
The loop is now complete. The rest of the module exits and marks the end
of the segment and the module. The RET statement causes the assembler to
generate instructions to do the following: restore the stack, restore the
framepointer (BP), and return properly for the memory model (small) and
calling convention (C).
done:
ret ; Exit procedure
dectoint ENDP
END
Microsoft high-level languages always look for function return values in
AX, if two bytes long, or in DX and AX, if four bytes long. If the return
value is longer than four bytes, DX:AX points to the value returned. If
the return value is one byte, AL contains the value.
The C module that calls this procedure looks in AX for the return
value──as does all high-level-language code that calls a function
returning a two-byte value. In this case, AX already contains the results
of the calculation. No further action is required.
3.4 Decimal Conversion with Far Data Pointers
This section uses the same basic algorithm introduced in the last section,
but presents coding techniques for different memory models.
The .MODEL directive resolves all differences in the size of code
addresses. However, when you use memory models that use far data pointers
(compact, large, and huge), you must make some additional adjustments.
The program below shows the module rewritten for large memory model. This
example works for compact model if large in the first line is replaced
with compact.
.MODEL large,c
.CODE
dectoint PROC USES ds, Array:PTR BYTE
sub ax,ax ; ax = 0
lds bx,Array ; ds:bx = Array
mov cx,10 ; factor = CX = 10
sub dx,dx ; dx = 0
cmp BYTE PTR [bx],0 ; Compare byte to NULL
je done ; If byte=0 we're done
top:
mov dl,BYTE PTR [bx] ; Get next digit
sub dl,'0' ; Convert numeral
add ax,dx ; Add to total
inc bx ; Point to next byte
cmp BYTE PTR [bx],0 ; Compare byte to NULL
je done ; If byte=0 we're done
mul cx ; AX = AX * 10
jmp SHORT top ; Goto top of loop
done:
ret ; Exit procedure
dectoint ENDP
END
This procedure is the same as the one in the last section, except for two
lines. The PROC directive now includes a USES clause, and the LDS
instruction replaces the first MOV instruction.
The procedure loads the DS register with the segment address of Array,
thus causing subsequent data references to be relative to the new segment
address. However, procedures called from C must preserve DS. The PROC
statement, therefore, includes USES ds, which generates code to place DS
on the stack.
The LDS instruction (Load Data Segment) does the actual loading of the DS
register. This instruction is similar to the MOV instruction:
mov bx,Array ; bx = Array
; 2-byte data pointer
lds bx,Array ; ds:bx = Array
; 4-byte data pointer
The LDS instruction accomplishes two moves. First, it loads the offset
portion of the pointer into the specified register (BX). Second, it loads
the segment portion of the pointer into DS.
──────────────────────────────────────────────────────────────────────────
NOTE For the LDS and LES instructions to work properly, the segment
portion must be stored in the upper word of the four-byte (far) pointer. C
meets this requirement by always pushing the segment portion of the
pointer on the stack first. (The stack grows downward.) In your own
programs, you declare far pointers with the DD directive. You initialize
them by loading a segment address into the upper word of the pointer
variable and an offset address into the lower word.
──────────────────────────────────────────────────────────────────────────
3.4.1 Writing a Model-Independent Procedure
In the case of this procedure, the use of the LDS instruction is most
convenient. Once DS is loaded with the new segment address, all subsequent
memory references are automatically correct. No further adjustments are
needed.
The simplicity of this technique makes it easy to write a module that is
completely independent of memory models. This module can then be linked
with any C program. To adjust memory model, you simply change the .MODEL
directive, and recompile. In fact, the memory model itself can even be
specified with a compile flag so that source code never need change.
The model-independent version contains only a few lines different from the
previous example:
% .MODEL mem,c
.CODE
dectoint PROC USES ds, Array:PTR BYTE
sub ax,ax ; ax = 0
IF @DataSize
lds bx,Array ; ds:bx = Array
ELSE
mov bx,Array ; bx = Array
ENDIF
The .MODEL directive operates on an undefined variable, mem. You define
this variable on the QCL command line or in the Assembler Flags dialog
box. For example, to assemble with QCL in compact model, enter the
following text in the defines text box:
/Dmem=compact
The IF, ELSE, and ENDIF directives cause conditional assembly. The
@DataSize predefined macro is equal to 1 (true) if the memory model uses
far data pointers, and 0 (false) otherwise. The statement IF @DataSize
begins a conditional-assembly block that assembles the LDS instruction if
the memory model uses far data pointers; it assembles the MOV instruction
otherwise.
For more information on conditional assembly, see Chapter 10, "Assembling
Conditionally."
The USES clause is retained for all memory models, since even with small
model it does no harm. However, to increase efficiency, you may wish to
include the PROC statement inside conditional-assembly blocks.
3.4.2 Accessing Far Data through ES
The LDS instruction is inconvenient if you need to access items in the
default data segment, because you have no guarantee that DS still points
to that area of memory. Therefore, it's sometimes more efficient to leave
DS alone and use the ES register to access far data.
The standard C memory models all use the LES instruction to access far
data. You can also use this method, but it is not required, since it has
no effect on the interface between modules. Give the LES instruction to
load a far data pointer, which will load the ES register with the new
segment address. Then give the ES override whenever you refer to data in
the far segment. This method requires alteration of all instructions that
access the string data:
les bx,Array ; es:bx = Array
.
.
.
cmp es:BYTE PTR [bx],0 ; Compare byte to NULL
Once ES is loaded with the segment address of far data, access objects in
the default data area (the segment containing near data) as you normally
would. Use the ES override to access the far data.
3.5 Hexadecimal Conversion Example
The following example builds on the decimal example in Section 3.3,
adding the additional logic needed to convert hexadecimal rather than
decimal strings.
Hexadecimal conversion can use an algorithm similar to the one used
earlier for decimal conversion, with these adjustments made:
■ The procedure multiplies the running total by 16, not 10.
■ The procedure converts the letters A-F to numeric values, in addition
to converting the numerals 0-9.
You could make the first adjustment by loading CX with 16 instead of 10. A
much more efficient method is to use the SHL (Shift Left) instruction to
shift an object's bits left by four places. This effectively multiplies
the object by 16.
The second adjustment requires more complex logic. Hexadecimal digits can
consist of either letters or numerals. The procedure must consider three
different cases──one for each sequence of hexadecimal characters:
Range of Characters Conversion Required
──────────────────────────────────────────────────────────────────────────
0-9 Convert to face value. Subtract ASCII value of '0'.
A-F, and a-f Convert to values 10-15. Convert all letters to
uppercase, then subtract ASCII value of 'A' and add
10.
We convert all letters to uppercase in an optimized fashion by taking
advantage of the ASCII coding sequence. Uppercase letters are coded as 41H
onward. Lowercase letters are coded as 61H onward. Consequently, each
lowercase letter differs from the corresponding uppercase letter by
exactly one bit. We use the AND instruction, with the immediate operand
0DFH, to mask out this bit. This operation has the effect of setting the
third highest bit to 0.
0110 0001 61h = 'a' 0100 0001 41h = 'A'
AND 1101 1111 DFh 1101 1111 DFh
====================== ======================
result 0100 0001 41h = 'A' 0100 0001 41h = 'A'
0110 0010 62h = 'b' 0100 0010 42h = 'B'
AND 1101 1111 DFh 1101 1111 DFh
====================== ======================
result 0100 0010 42h = 'B' 0100 0010 42h = 'B'
The beauty of the operation is that it converts lowercase letters to
uppercase, but leaves uppercase letters alone. If the third highest bit is
already 0 (as is the case with uppercase letters), doing an AND operation
with 0DFH has no effect. This operation removes the need to handle
lowercase letters as a separate case.
The revised algorithm does the following:
initialize total to zero
while there's another digit
move byte to temporary location
if ascii value < 'A'
Subtract '0'
else
Convert lowercase to uppercase
Subtract 'A'-10
add byte value to total
advance to next digit
if no more digits
we're done
else
shift total left by four bits
The assembly-language code below implements this algorithm. The code tests
for each range, performing a different conversion for each case. Note the
use of JB (Jump If Below), which jumps to the specified label if the
previous comparison or subtraction produced a negative value──that is, if
the first operand is less than the second.
.MODEL small,c
.CODE
hextoint PROC Array:PTR BYTE
sub ax,ax ; ax = 0
mov bx,Array ; bx = Array
mov cl,4 ; Prepare to shift left by 4
sub dx,dx ; dx = 0
cmp BYTE PTR [bx],0 ; Compare byte to NULL
je done ; if byte=0 we're done
top:
mov dl,BYTE PTR [bx] ; Move byte to DL
cmp dl,'A' ; ASCII value >= 'A'?
jae isletter ; If so, goto isletter
sub dl,'0' ; Convert ascii to numeric
jmp addbyte ; Go add value of byte
isletter:
and dl,0DFh ; Convert to uppercase
sub dl,'A'-10 ; Convert ascii to numeric
addbyte:
add ax,dx ; Add value to total
inc bx ; Point to next byte
cmp BYTE PTR [bx],0 ; Compare byte to NULL
je done ; If byte=0 we're done
shl ax,cl ; AX = AX * 16
jmp SHORT top ; Goto top of loop
done:
ret
hextoint ENDP
END
The beginning of the procedure initializes the CL register to 4. This step
is necessary, because you can use the SHL instruction (Shift Left) in only
two ways: you can shift by exactly one bit, or you can shift by the number
of places indicated in CL. Clearly, using CL is more efficient than a
sequence of four shift instructions.
The main loop reads a character, tests it, and makes one basic decision:
is the character a letter or not? This test takes advantage of the ASCII
coding sequence. If the value of the character is equal to or greater than
'A', it cannot be one of the digits 0-9. The procedure uses the JAE
instruction (Jump If Above or Equal) to test for this condition.
top:
mov dl,BYTE PTR [bx] ; Move byte to DL
cmp dl,'a' ; ASCII value >= 'A'?
jae isletter ; If so, goto isletter
If the character is a letter, the procedure first converts the letter to
uppercase──using an AND instruction that converts lowercase letters but
leaves uppercase letters unchanged. The following instruction can then
properly handle all letters the same way, regardless of their original
case:
isletter:
and dl,0DFh ; Convert to uppercase
sub dl,'A'-10 ; Convert ascii to numeric
For simplicity, the procedure accepts invalid letters. You could easily
enhance it to verify that the letters are hexadecimal.
────────────────────────────────────────────────────────────────────────────
Chapter 4: Writing Stand-Alone Assembly Programs
With QuickAssembler, you can write stand-alone assembly programs to
produce small, efficient utilities. For example, you might write a utility
in assembly language to count the number of lines or paragraphs in a file.
These programs start and end with assembly code and generally do not
involve any links to high-level languages.
Stand-alone assembly programs can yield remarkably small .EXE files. They
require relatively little space, because they do not include the start-up
code for a high-level language. And often you can make your assembly
program even smaller by converting it to a .COM file as shown in this
chapter. Some useful .COM files take up less than 100 bytes of memory.
This chapter first describes the directives you need to write stand-alone
assembly programs, reviews instructions used in the chapter's examples,
and then presents a simple stand-alone program. Next, Sections 4.4-4.6
look closely at each segment of the program: stack, data, and code.
Finally, the chapter describes how to create a program in the .COM format.
4.1 A Skeleton for Stand-Alone Programs
This chapter uses the simplified segment directives described in the
previous chapter, and introduces three more directives──.STACK, .DATA, and
.STARTUP. The simplified segment directives produce programs using the
Microsoft standard segment format.
This format is not required, since your stand-alone program need not be
compatible with a high-level-language module. However, the standard format
is convenient because you can specify a number of different memory models,
and you are freed from having to specify segment names, attributes, and
register assumptions.
──────────────────────────────────────────────────────────────────────────
NOTE Occasionally, you may need a customized segment structure. Linking
assembly code to a non-Microsoft language is the most common situation
that requires customized segments. QuickAssembler lets you use full
segment definitions any time you need to customize segments. However, you
should find that simplified segment directives support the vast majority
of assembly-language programming you do──even when you write .COM files.
──────────────────────────────────────────────────────────────────────────
The skeleton for the programs in this chapter includes a stack, data, and
code segment. Note that one of the directives, .MODEL, will change when
you alter the memory model. The other statements remain the same.
.MODEL small ; Use small memory model
.STACK 100h ; Declare 256-byte stack
.DATA
;
; (place data declarations here)
;
.CODE
.STARTUP ; Set up DS, SS, and SP registers
;
; (place executable code here)
;
END
Sections 4.1.1-4.1.3 examine each of the statements in this skeleton more
closely.
4.1.1 The .MODEL Directive
The .MODEL directive performs the same role that it did in the previous
chapter; it defines the overall attributes of the module. Note, however,
that with a stand-alone program, a language type is not always required. A
language type is useful when a module contains one or more procedures.
Otherwise, you need only type .MODEL followed by a memory model:
.MODEL small ; Use small memory model
The memory model can be TINY, SMALL, MEDIUM, COMPACT, LARGE, or HUGE. Most
of these memory models may be familiar to you if you have used QuickC. For
a complete description of each memory model, see Section 5.1.1.
The TINY memory model is new; it alone results in the creation of a .COM
file rather than a .EXE file. Section 4.8, "Creating .COM Files," gives a
complete example featuring the use of tiny memory model.
Generally, to change memory model you change the .MODEL directive. You
also change the way you load and use data pointers, as described in
Chapter 3, "Writing Assembly Modules for C Programs." With these changes
made, many programs can readily be reassembled for a new memory model.
(However, as you'll see in Chapter 5, "Defining Segment Structure," you
cannot use .FARDATA segments in tiny, small, or medium model, and this may
require further revision of code in some cases.)
4.1.2 The .STACK, .CODE, and .DATA Directives
Each of the segment directives──.STACK, .CODE, and .DATA──declares the
beginning of a segment.
The code and data segments begin with .CODE and .DATA, respectively. Each
of these segments continues to the next segment directive or the end of
the program. The data segment contains data and symbolic constant
declarations. The code segment contains instructions.
However, the stack segment consists of only one line:
.STACK [[size]]
By default, QuickAssembler interprets size according to the current radix,
which by default is decimal. You can specify a hexadecimal constant by
using the H suffix. (Example: 200h.) The size argument is optional. If you
leave it out, the assembler creates a stack 1024 bytes long.
Unless the program is written in tiny memory model, you should always
declare a stack segment in your main module. Section 4.4, "Inside the
Stack Segment," explains the purpose of this segment.
4.1.3 The .STARTUP Directive
Unlike C programs, assembly-language programs have to initialize register
values. Specifically, the program has to initialize DS, the Data Segment
register; CS and IP, which point to the first instruction to execute; and
SS and SP, the stack registers.
By far the easiest way to initialize all these registers is to just
include .STARTUP, a simple directive that takes no arguments:
.STARTUP ; Set up DS, SS, and SP registers
When you use this directive, the assembler generates code to initialize
your registers the way Microsoft high-level languages do. The generated
code is similar to some of the instructions in the C start-up code. The
directive takes care of minimal start-up, but many programs will need to
do additional start-up tasks, such as releasing unused memory.
──────────────────────────────────────────────────────────────────────────
NOTE The start-up sequence adjusts SS and SP so that SS is equal to DS.
This starting condition gives you some advantages. If you later have to
alter the value of DS, you can always access a data object as an indirect
operand using BP, or through an SS segment override. To avoid this
starting sequence, so that the stack and data are separate physical
segments, use the farStack keyword with the .MODEL directive, as described
in Section 5.1.3.
──────────────────────────────────────────────────────────────────────────
4.2 Instructions Used in This Chapter
This section summarizes the instructions used in this chapter. Because the
program examples are simple, only a very few of the 80-odd instructions of
the 8086 are featured here.
This chapter features four instructions:
Instruction Description
──────────────────────────────────────────────────────────────────────────
MOV destination, Moves source to destination
source
INT number Generates the indicated interrupt signal, causing
processor to call a memory-resident interrupt routine
DEC destination Decrement──subtracts 1 from destination
JNZ label Jump If Not Zero──jumps to label if result of last
operation was not zero
Most of the instructions above were introduced in Chapter 2, "Introducing
8086 Assembly Language." The new instruction is INT.
The INT instruction generates a software interrupt signal, causing the
processor to call an interrupt service routine usually residing in a DOS
or ROM-BIOS memory area. This call is much like a procedure call; the
processor executes a specific function and returns to the program when the
routine is complete.
There are two major differences between an interrupt call and a procedure
call. First, instead of calling a procedure you have written, an INT
instruction calls a DOS system routine or ROM-BIOS service. These
low-level routines carry out a variety of basic operations, such as
reading the keyboard, writing to the screen, or using the file system.
Most DOS services are accessed through interrupt 21H (33 decimal).
The second major difference is syntactic. You follow the INT keyword by an
interrupt number (in the range 0 to 255), rather than a procedure name. In
many cases, you further specify the interrupt routine by loading AH with a
function number.
4.3 A Program That Says Hello
The following sample program prints Hello world and then successfully
exits back to DOS. You can use this program as a template and insert your
own code and data.
.MODEL small ; Use small model
.STACK 100h ; Allocate 256-byte stack
.DATA
message DB "Hello, world.",13,10 ; Message to print
lmessage EQU $ - message ; Determine length of message
.CODE
.STARTUP ; Use standard startup code
mov bx,1 ; Load 1 - file handle
; for standard output
mov cx,lmessage ; Load length of message
mov dx,OFFSET message ; Load address of message
mov ah,40h ; Load no. of DOS Write function
int 21h ; Call interrupt 21H (DOS)
mov ax,4c00h ; Load no. of DOS Exit function
; in AH, and 0 exit code in AL
int 21h ; Call interrupt 21H (DOS)
END
The first statement determines the memory model of the program:
.MODEL small ; Use small model
This statement specifies small memory model, which places code and data in
two separate segments, each of which cannot exceed 64K.
The next few sections consider the rest of this program──stack, data, and
code.
4.4 Inside the Stack Segment
The stack segment is the easiest to create, because with simplified
segment directives you enter only one statement:
.STACK 100h ; Allocate 256-byte stack
Each processor or interrupt call uses up stack space. The stack stores
return addresses, parameters, and local variables for each procedure
called. When a procedure or interrupt routine returns, the stack space it
used is restored. The more procedure calls your program makes without
returning, the more stack area it requires. Programs that nest many
procedures or use recursion (in which a procedure calls itself repeatedly)
may require large stacks. Unfortunately, there is no formula for
determining how large a stack is needed.
A 256-byte stack (100 hexadecimal) is adequate for most small programs.
For this sample program, which makes one interrupt call but no procedure
calls, 256 bytes provides an ample margin of error.
You can also create a stack by using full segment definitions. See Section
5.2, "Full Segment Definitions," for more information.
4.5 Inside the Data Segment
A single keyword declares the beginning of the segment:
.DATA
QuickAssembler considers all statements following this line to lie in the
data segment, up until the next segment declaration or END directive. The
END directive marks the end of the source file.
The next two statements are directives that declare a string of characters
and a symbolic constant:
message DB "Hello, world.",13,10 ; Message to print
lmessage EQU $ - message ; Determine length of message
The first statement above declares a series of bytes. The label message is
a symbolic name that QuickAssembler associates with the string's starting
address.
The assembler allocates 15 bytes in the data segment, and initializes
these bytes to the ASCII values for H, e, l, l, o, and so forth. The
values 13 and 10 indicate a carriage return and line feed, respectively,
causing the program to move the cursor to the beginning of the next line
when it prints the string.
The second directive in the data segment declares a symbolic constant
equal to the length of the string:
lmessage EQU $ - message ; Determine length of message
Again, the item in the first column, lmessage, is the label of the
statement. The EQU directive equates the label with the value of the
operand itself. EQU does not allocate memory.
The operand field contains $ - message, which in this case equals 15. We
could just as easily have entered 15 in the operand field. However, the
item $ - message is guaranteed to be equal to the length of the string,
even if you later rewrite the initial string value.
The dollar sign ($) is the "location counter," which represents the
current address of the statement. QuickAssembler translates the full
expression as "Take the current address ($) and subtract the address of
message." The current address is one byte after the end of the string.
Thus, $ - message is automatically equal to the length of the string.
4.6 Inside the Code Segment
A single keyword declares the beginning of the code segment:
.CODE
The code segment consists of all statements between .CODE and the END
statement, which marks the end of the source code. In this example, all
the statements in the code segment, aside from .STARTUP, are instructions.
The program has three basic tasks. Each instruction helps carry out one of
these operations:
1. Initialize registers
2. Call a DOS function to print the message
3. Call a DOS function to exit the program gracefully
The .STARTUP directive initializes registers. If you write a main module
without this directive, you must explicitly initialize DS, CS, and IP.
Furthermore, if you want SS to equal DS (which gives some programming
advantages), you must adjust both SS and SP.
To see how to initialize registers without the use of .STARTUP, see
Chapter 5, "Defining Segment Structure."
After registers are initialized, a series of five instructions makes the
call to DOS that prints the message:
mov bx,1 ; Load 1 - file handle for
; standard output
mov cx,lmessage ; Load length of message
mov dx,OFFSET message ; Load address of message
mov ah,40h ; Load no. of DOS Write function
int 21h ; Call interrupt 21H (DOS)
The first four instructions prepare for the DOS call. Interrupt calls
generally use registers to receive parameters. Unlike procedure calls,
they do not reference the stack for this information. The DOS Write
function uses the following registers to receive data:
Register Data
──────────────────────────────────────────────────────────────────────────
AH Selects the DOS function. 40H is the Write function.
BX File handle to which to write. The number 1 is a
reserved file handle that always corresponds to
standard output. "Standard output" is normally
synonymous with the computer screen, unless you
redirect program output. If you were writing to a
file, you would first open the file and use the file
handle returned by the DOS open-file function.
CX Length of the message. The second statement in the
data segment determined this length.
DS:DX The beginning address of the actual message text.
Remember that DS was loaded earlier with the address
of the data segment, so it does not need to be
reloaded now.
This procedure uses the OFFSET operator to load DX with the address of the
message. Although variables are translated to addresses, the processor
normally interprets a variable address as a memory operand──that is, the
processor operates on the data at the address, not the address itself.
The OFFSET operator extracts the offset portion of the address and turns
it into an immediate operand. If the OFFSET operator was not used the DOS
routine would not receive the address of message, but would instead
receive the value of the first byte. The OFFSET operator is similar to the
address operator (&) in C. Use it whenever you need to pass an address
rather than a value.
After the interrupt service returns, the AX register contains the number
of bytes written. The programs in this chapter do not use this return
value, but a more sophisticated program might. In particular, if AX
(number of bytes written) is less than CX (number of bytes requested to be
written), then an error has occurred.
Each DOS function has its own conventions for receiving data in different
registers. Consult the Microsoft MS-DOS Programmer's Reference for a
complete description of each function. The Assembler Contents selection
from the Help menu also describes the major DOS functions.
──────────────────────────────────────────────────────────────────────────
NOTE Each DOS function has conventions for getting and returning values
in registers and flags. Bear in mind that values placed in any of these
registers may change. If you need to preserve register values before
making a DOS call, use the PUSH and POP instructions. See Section 13.4.1,
"Pushing and Popping," for more information on how to preserve register
values.
──────────────────────────────────────────────────────────────────────────
The INT instruction makes the actual call to DOS. The interrupt number for
the majority of DOS functions is 21H. You use different interrupt numbers
to call ROM-BIOS services.
The final two instructions cause the program to terminate operation and
return control to DOS. High-level language programmers can ignore the need
to exit a program explicitly, if they like. But when you write a
stand-alone assembly program, you don't have this luxury. The program must
exit explicitly. Otherwise, the processor continues to execute random
instructions after the end of the program, making the system appear to
crash.
The DOS Exit function (service 4CH) is the preferred method for exiting
back to DOS. This function uses two register values:
Register Data
──────────────────────────────────────────────────────────────────────────
AH Selects the DOS function. 4CH is the Exit function.
AL Exit code. Batch files can use this exit code as an
"errorlevel" indicator. An exit code of 0 usually
indicates no error.
A single instruction loads both registers:
mov ax,4c00h ; Load number of DOS Exit func
; in AH, and 0 exit code in
A single MOV instruction actually moves data into two registers──AH and
AL. AH is loaded with 4CH, the function number for the DOS exit function,
and AL is loaded with 0, an exit code indicating no error.
Finally, another INT instruction calls DOS.
int 21h ; Call interrupt 21H (DOS)
4.7 Making the Program Repeat Itself
Once you understand the template for writing stand-alone programs, you can
alter the sample program given above and generate your own code. This
section alters the sample program so that it prints out a different
message, and prints it ten times.
The new sample program is listed below:
.MODEL small ; Use small model
.STACK 100h ; Allocate 256-byte stack
.DATA
message DB "Hello, ten times.",13,10 ; Message to print
lmessage EQU $ - message ; Determine length of message
count DW 10
.CODE
.STARTUP ; Use standard startup code
mov bx,1 ; Load 1 - file handle for
; standard output
mov cx,lmessage ; Load length of message
mov dx,OFFSET message ; Load address of message
printit: mov ah,40h ; Load no. of DOS Write functi
int 21h ; Call interrupt 21H (DOS)
dec count ; count = count-1
jnz printit ; if count > 0, print again
mov ax,4c00h ; Load DOS 4C function number
; in AH, and 0 exit code in
int 21h ; Call interrupt 21H (DOS)
END
Note the following changes:
■ The string data is different.
■ The data segment includes a new variable, count.
■ One of the instructions is now labeled printit.
■ Two additional instructions decrement count, then loop back to the
label printit if count is greater than zero.
The string data is longer than before, and QuickAssembler must allocate
more bytes than in the previous version of the program. However, the EQU
statement that follows guarantees that the assembler still calculates
string length correctly:
message DB "Hello, ten times.",13,10 ; Message to print
lmessage EQU $ - message ; Determine length of message
The new variable is actually a memory location of word size (two bytes).
QuickAssembler allocates another two bytes in the data segment, and
initializes these bytes:
count DW 10
The label count becomes associated with the address of the data, and the
number 10 is the initial value placed at this memory location. However,
the value can change.
The instruction mov ah,40h now has a label, because the program needs to
return here to repeat the print operation. Not all instructions need a
label──only those that the program may need to jump to directly.
The two new instructions cause the program to repeat the print operation
ten times:
dec count ; count = count-1
jnz printit ; if count > 0, print again
The DEC instruction subtracts 1 from the memory location count, and sets
processor flags according to the result of the operation. JNZ then jumps
to the specified label if the result was not zero. The combined effect of
these two instructions is to repeat the previous instructions (from
printit onward) ten times. To change the number of repetitions, initialize
count with a different value.
Note that the DOS print function returns a value in the register
AX──specifically, the number of bytes written. The program jumps back to
printit so that AH is reloaded before the call to DOS.
You can optimize this program further by using a register instead of the
memory location count. For example, to use the register SI as the counter,
follow these steps:
■ Remove the declaration of count.
■ Initialize SI to 10 at the beginning of the program with the
instruction mov si,10.
■ Decrement SI instead of count near the bottom of the loop.
With this program, it's safe to use SI as the counter, since SI is not
needed for any other purpose. However, some programs make special use of
SI. In these cases, it may be more efficient to place the count in a
variable.
4.8 Creating .COM Files
You can use QuickAssembler to produce .COM files as well as .EXE files.
(However, these programs cannot contain any C modules.) Most of the memory
models, ranging from small to large, produce a .EXE file. The tiny memory
model is special because it alone supports creation of a .COM file.
──────────────────────────────────────────────────────────────────────────
NOTE To produce a .COM file, you must not only use tiny memory model, but
also select Generate COM File from the Linker Flags dialog box (choose
Make from the Options menu), or else give the /TINY linker option on the
QCL command line.
──────────────────────────────────────────────────────────────────────────
Each .COM file has only one physical segment and is limited in size to a
total of 64K. A .COM file has no executable-file header or
relocation-table entries. Because DOS doesn't have to examine a file
header or adjust relocatable segment addresses, it loads the .COM file
slightly faster.
DOS initializes all segment registers (including DS) to point to the first
available memory address. The Stack Pointer, SP, is set to 64K above the
start of the program. Unlike .EXE files, .COM files have no definite stack
area. Instead, the stack starts at offset address FFFE hexadecimal and
continues to grow downward until it overlaps code and data areas. At that
point, program failure is likely.
Simplified segment directives in QuickAssembler now provide direct support
for .COM files. The template is, in fact, smaller than the template for a
.EXE file.
The code below shows the example in Section 4.3, "A Program That Says
Hello," revised to produce a .COM file:
.MODEL tiny ; Produce a .COM file
.DATA
message DB "Hello, world.",13,10 ; Message to print
lmessage EQU $ - message ; Determine length of message
.CODE
.STARTUP
mov bx,1 ; Load 1 - file handle for
; standard output
mov cx,lmessage ; Load length of message
mov dx,OFFSET message ; Load address of message
mov ah,40h ; Load no. of DOS Write function
int 21h ; Call interrupt 21H (DOS)
mov ax,4c00h ; Load no. of DOS Exit function
; in AH, and 0 exit code in AL
int 21h ; Call interrupt 21H (DOS)
END
A tiny-model program could be produced by simply taking the small-model
version from earlier in the chapter, and changing the first line to the
following:
.MODEL tiny
The code would then run correctly. However, the sample code in this
section takes advantage of tiny model by eliminating the stack segment.
DOS initializes the SS (Stack Segment) register and SP (Stack Pointer)
register for you, so you need not declare a stack. The assembler ignores
stack segments in tiny model.
The program still includes the .STARTUP directive. With tiny model, all
this directive does is generate the statement ORG 100h.
──────────────────────────────────────────────────────────────────────────
NOTE The statement ORG 100h is necessary for programs in the .COM
format, and must appear just before the first line of executable code. ORG
100h starts the location counter at 100 hexadecimal, reflecting the way
that DOS loads .COM files into memory. (DOS reserves the first 256 bytes
for the Program Segment Prefix (PSP).) See Section 6.6, "Setting the
Location Counter," for more information on the ORG directive.
──────────────────────────────────────────────────────────────────────────
With tiny-model programs, QuickAssembler lets you define separate code and
data segments, but combines these segments into a single physical segment,
called a "group." QuickAssembler places the code segment first regardless
of how you write your source code. The resulting .COM file assumes a
single segment address for the whole program (as required by the structure
of a .COM file), and execution automatically begins at the proper address.
Finally, Quick-Assembler directs the linker to output a file in the .COM
format rather than the .EXE format.
──────────────────────────────────────────────────────────────────────────
NOTE "Groups" are a standard concept in 8086 assembly language. You can
place a series of segments into a group. The total size must not exceed
64K. The linker responds by combining all the segments into a single
physical segment in which all addresses share the same segment address.
For a fuller explanation of groups and segments, see Chapter 5.
──────────────────────────────────────────────────────────────────────────
When you write .COM files, you must observe some important restrictions.
You cannot use program-defined segment addresses. Similarly, you have no
access to defined segment addresses, such as @data and @code.
Because .COM files lack relocation-table entries, DOS cannot adjust
segment addresses at load time. The program must use absolute segment
addresses or else assume the loading segment address that DOS assigns. The
principal restriction is that you cannot refer to program-defined segment
addresses. Therefore, memory references can be of three kinds:
1. Any memory location within the 64K program area. For these memory
references, you do not load a new value into any of the segment
registers.
2. Hard-coded locations in memory that have special meaning at the system
or hardware level. A video-page address, such as B800:0000, is such a
special segment address.
3. An address returned to you by a DOS or ROM-BIOS function. For example,
DOS function 48H, Allocate Memory, returns a pointer to a block of
dynamically allocated memory.
4.9 Creating .COM Files with Full Segment Definitions
You don't generally need to use full segment definitions to create .COM
files. However, when you do use these directives with programs written in
.COM format, you need to follow certain rules. The assembler automatically
follows most of these rules when you use simplified segment directives.
The guidelines for .COM format are listed below:
■ Place the entire program into one physical segment. It's possible to
divide your program into separate logical segments, then group them
into one physical segment with the GROUP directive. Simplified segment
directives, in fact, use this technique with tiny model.
However, you must ensure that code, not data, appears at the beginning
of the .COM file. A number of different factors affect segment
ordering, so it may be hard to ensure that the code segment appears
first. Thus, creating just one segment is the more reliable method.
In contrast, when you use simplified segment directives with tiny
model, the assembler always places the code segment at the beginning of
the .COM file.
■ Use the ASSUME directive to inform the assembler that all segment
registers will point to the beginning of the segment. At load time, DOS
sets all segment registers to this address. The ASSUME directive
informs the assembler of this fact so that it can correctly calculate
offset addresses. This directive is not necessary when you use
simplified segment directives.
■ Use the ORG directive to set the location counter. At load time, DOS
sets the starting address to 100H. The first 100H bytes are reserved
for the Program Segment Prefix (PSP). The statement ORG 100h is
necessary for the assembler to assign addresses in a way consistent
with run-time conditions. Otherwise, jump instructions and data
references will be wrong.
When you use simplified segments directives with tiny model, the
assembler automatically sets the location counter to 100H.
■ Use the END statement to take one argument: a starting address. This
argument is not necessary if you use the .STARTUP simplified segment
directive, because the program automatically begins execution wherever
you place .STARTUP.
The modified procedure is shown below:
_TEXT SEGMENT 'CODE' ; Define code segment
ASSUME cs:_TEXT,ds:_TEXT,ss:_TEXT
ORG 100h
start: jmp begin
message DB "Hello, world.",13,10 ; Message to print
lmessage EQU $ - message ; Determine length of message
begin: mov bx,1 ; Load 1 - file handle
; for standard output
mov cx,lmessage ; Load length of message
mov dx,OFFSET message ; Load address of message
mov ah,40h ; Load no. of DOS Write function
int 21h ; Call interrupt 21H (DOS)
mov ax,4c00h ; Load no. of DOS Exit function
; in AH, and 0 exit code in AL
int 21h ; Call interrupt 21H (DOS)
_TEXT ENDS
END start
The first three statements are new. The SEGMENT statement defines the
beginning of a segment named _TEXT. (Instead of using the name _TEXT, you
can choose any other valid symbolic name.) The ASSUME statement then
informs the assembler that the CS, DS, and SS segment registers will all
point to the beginning of this segment at run time. Finally, the ORG
statement informs the assembler that the instruction pointer will be set
to 100H.
_TEXT SEGMENT ; Define code segment
ASSUME cs:_TEXT,ds:_TEXT,ss:_TEXT
ORG 100h
The body of the procedure now includes code and data together in the same
segment. The first item in the segment must be an instruction, because
.COM files always begin execution at the start of the file. Attempting to
execute data would almost certainly cause program failure. Since there is
no separate data segment, the first instruction jumps around the data
declarations.
start: jmp begin
message DB "Hello, world.",13,10 ; Message to print
lmessage EQU $ - message ; Determine length of message
begin: mov bx,1 ; Load 1 - file handle for
; standard output
Another way to write a program for .COM format is to place data
declarations after the end of the instructions. However, the assembler
often produces better results if you place data declarations early in the
source file. That way, you avoid forward references to data.
The source file ends by giving an argument to the END statement. This
statement is necessary because the program does not use the .STARTUP
directive. The argument to END must be the label of the first instruction
executed:
END start
────────────────────────────────────────────────────────────────────────────
PART 2: Using Directives
Part 2 of the Programmer's Guide (comprising Chapters 5-12) describes
the directives and operators recognized by the Microsoft QuickAssembler.
Directives are nonexecutable statements that give general information to
the assembler. Some of the more important directives declare program
structure, define data, and create macros. Operators indicate calculations
to be performed at assembly time.
Chapters 5-8 present the basic directives you need to write a program,
including segment, data, multimodule, and structure directives. Chapter
9 deals specifically with operators. Chapter 10 describes conditional
assembly, and Chapter 11 presents macros, a technique for replacing a
series of frequently used instructions with a single statement. The
directives that control your output are covered in Chapter 12.
────────────────────────────────────────────────────────────────────────────
Chapter 5: Defining Segment Structure
A segment is an area in memory up to 64K in size, in which all locations
share the same segment address. The 8086 assembly-language modules use
segments for two reasons:
■ Segments provide a convenient means for dividing a program into its
major divisions──code, data, constant data, and stack.
■ The architecture of the 8086 requires some use of segments. Every
reference to memory must be relative to one of the four segment
registers, as described in Section 2.7, "Segmented Addressing and
Segment Registers." Segment definitions make it possible for
QuickAssembler to assume the use of the same segment register for a
large number of different addresses.
You can define segments by using simplified segment directives or full
segment definitions.
In most cases, simplified segment directives are a better choice. They are
easier to use and more consistent, yet you seldom sacrifice any
functionality by using them. Simplified segment directives automatically
define the segment structure required when combining assembler modules
with modules prepared with Microsoft high-level languages.
Although more difficult to use, full segment definitions give more
complete control over segments. A few complex programs may require full
segment definitions in order to get unusual segment orders and types.
This chapter describes both methods. If you choose to use simplified
segment directives, you will probably not need to read about full segment
definitions.
5.1 Simplified Segment Directives
Simplified segment directives provide an easy way to write
assembly-language programs. They handle some of the difficult aspects of
segment definition automatically, and assume the same conventions adopted
by Microsoft high-level languages.
When you write stand-alone assembler programs, the simplified segment
directives make programming easier. The Microsoft conventions are flexible
enough to work for most kinds of programs.
When you write assembler routines to be linked with Microsoft high-level
languages, the simplified segment directives ensure against mistakes that
would make your modules incompatible. The names are automatically defined
consistently and correctly.
The simplified segment directives automatically generate the same ASSUME
and GROUP statements used by Microsoft high-level languages. You can learn
more about the ASSUME and GROUP directives in Sections 5.3 and 5.4.
However, for most programs you do not need to understand these directives.
Simply use the simplified segment directives in the format shown in the
examples.
5.1.1 Understanding Memory Models
To use simplified segment directives, you must declare a memory model for
your program. The memory model specifies the default size of data and code
used in a program.
Microsoft high-level languages require that each program have a default
size (or memory model). Any assembly-language routine called from a
high-level language program should have the same memory model as the
calling program. The C compiler provided with QuickAssembler supports all
models except tiny. If you use assembly modules with a different compiler,
the compiler documentation should tell what memory models are supported.
The most commonly used memory models are described below:
Model Description
──────────────────────────────────────────────────────────────────────────
Tiny All data and code fit in a single physical segment
(group). Tiny-model programs can be converted to
.COM-file format with the Generate COM File option in
the Linker Flags dialog box (or the linker /TINY
option used with QCL). Tiny-model programs have
restrictions described in Chapter 4, "Writing
Stand-Alone Assembly Programs."
Small All data fits within a single 64K segment, and all
code fits within a 64K segment. Therefore, all code
and data can be accessed as near. This is the most
common model for stand-alone assembler programs. C is
the only Microsoft language that supports this model.
Medium All data fits within a single 64K segment, but code
may be greater than 64K. Therefore, data is near, but
code is far. Most recent versions of Microsoft
high-level languages support this model.
Compact All code fits within a single 64K segment, but the
total amount of data may be greater than 64K (although
no array can be larger than 64K). Therefore, code is
near, but data is far. C is the only Microsoft
high-level language that supports this model.
Large Both code and data may be greater than 64K (although
no array can be larger than 64K). Therefore, both code
and data are far. All Microsoft high-level languages
support this model.
Huge Both code and data may be greater than 64K. In
addition, any individual data array can be larger than
64K. From the standpoint of QuickAssembler, this
memory model is almost equivalent to large model (the
only exception is the meaning of the predefined equate
@DataSize). If you want to support arrays larger than
64K, you must provide the program logic to support
these arrays.
Stand-alone assembler programs can have any model. Tiny and small model
are adequate for most programs written entirely in assembly language.
Since near data or code can be accessed more quickly, the smallest memory
model that can accommodate your code and data is usually the most
efficient.
Mixed-model programs use the default size for most code and data but
override the default for particular data items. Stand-alone assembler
programs can be written as mixed-model programs by making specific
procedures or variables near or far. Some Microsoft high-level languages
have NEAR, FAR, and HUGE keywords that enable you to override the default
size of individual data or code items.
5.1.2 Specifying DOS Segment Order
The DOSSEG directive specifies that segments be ordered according to the
DOS segment-order convention. This is the convention used by Microsoft
high-level-language compilers.
Syntax
DOSSEG
Using the DOSSEG directive enables you to maintain a consistent, logical
segment order without actually defining segments in that order in your
source file. Without this directive, the final segment order of the
executable file depends on a variety of factors, such as segment order,
class name, and order of linking. These factors are described in Section
5.2, "Full Segment Definitions."
Since segment order is not crucial to the proper functioning of most
stand-alone assembler programs, you can simply use the DOSSEG directive
and ignore the whole issue of segment order.
──────────────────────────────────────────────────────────────────────────
NOTE Using the DOSSEG directive (or the /DOSSEG linker option) has two
side effects. The linker generates symbols called _end and _edata. You
should not use these names in programs that contain the DOSSEG directive.
Also, the linker increases the offset of the first byte of the code
segment by 16 bytes in small and compact models. This is to give proper
alignment to executable files created with Microsoft compilers.
──────────────────────────────────────────────────────────────────────────
If you want to use the DOS segment-order convention in stand-alone
assembler programs, you should use the DOSSEG argument in the main module.
Modules called from the main module need not use the DOSSEG directive.
You do not need to use the DOSSEG directive for modules called from
Microsoft high-level languages, since the compiler already defines DOS
segment order.
Under the DOS segment-order convention, segments have the following order:
1. All segment names having the class name 'CODE'
2. Any segments that do not have class name 'CODE' and are not part of the
group DGROUP
3. Segments that are part of DGROUP, in the following order:
a. Any segments of class BEGDATA (this class name is reserved for
Microsoft use)
b. Any segments not of class BEGDATA, BSS, or STACK
c. Segments of class BSS
d. Segments of class STACK
Using the DOSSEG directive has the same effect as using the /DOSSEG linker
option.
The directive works by writing to the comment record of the object file.
The Intel(R) title for this record is COMENT. If the linker detects a
certain sequence of bytes in this record, it automatically puts segments
in the DOS order.
5.1.3 Defining Basic Attributes of the Module
The .MODEL directive defines attributes that affect the entire module:
memory model, default calling and naming conventions, and stack type. This
directive should appear before any other simplified segment directive.
Syntax
.MODEL memorymodel[[[[,language]],stacktype]]
Each of the three fields defines a basic attribute. The memorymodel field
defines the segment structure of the module. The language field defines
the default calling and naming conventions assumed by PROC statements.
These conventions correspond to the high-level language you specify. The
stacktype field determines whether or not the assembler assumes that the
SS register is equal to the DS register.
The memorymodel field can be TINY, SMALL, MEDIUM, COMPACT, LARGE, or HUGE.
The assembler defines segments the same way for large and huge models, but
the @DataSize equate (explained in Section 5.1.5, "Using Predefined
Segment Equates") gives a different value for these two models.
If you write an assembler routine for a high-level language, the
memorymodel field should match the memory model used by the compiler or
interpreter. If you write a stand-alone assembler program, you can use any
model. Section 5.1.1 describes each memory model.
The optional language field tells the assembler to follow the naming,
calling, and return conventions appropriate to the indicated language. In
addition, if you use the language argument, the assembler automatically
makes all procedure names public. You can use C, Pascal, FORTRAN, or BASIC
as the language argument. The last three are equivalent, since these
languages share the same naming and calling conventions.
Note that although the language field is optional, you will not be able to
use the high-level language features of the PROC directive if you do not
give it. Normally, you should specify a language with .MODEL. If you use C
for the language argument, all public and external names are by default
prefixed with an underscore (_) in the .OBJ file. Specifying any other
language has no effect on the names.
──────────────────────────────────────────────────────────────────────────
NOTE The assembler does not truncate names in order to match the
conventions of specific languages, such as FORTRAN or Pascal. Moreover,
using the C type specifier does not cause the assembler to preserve case.
To preserve lowercase names in public symbols, choose one of the assembler
flags that preserves case (Preserve Extrn or Preserve Case), or assemble
with /Cx or /Cl on the QCL command line. Within the environment, the
Preserve Extrn flag is on by default.
──────────────────────────────────────────────────────────────────────────
See Appendix A for an explanation of how the different calling
conventions are implemented. You should also note that each language has
different defaults for passing parameters by value or by reference.
Depending on which method is used, a high-level language passes a
parameter either as a value or as a pointer to the value.
The optional stacktype field determines whether or not the assembler
assumes that SS is equal to DS. The default value is nearStack, which
assumes that SS is part of the default data area, so that SS is equal to
DS, and SP is set to the top of the data area. You can also use farStack,
which assumes that the stack segment is in a separate physical segment
from the default data area.
If you write a module called from QuickC, you should always use the
default (in other words, just leave the field blank), since QuickC always
assumes DS equals SS. If you write modules for a compiler (such as the
Microsoft Optimized C Compiler) that supports customized memory models,
use farStack for models in which SS does not equal DS. If you write a
stand-alone assembler program, you can choose either setting. If you use
the .STARTUP directive, the assembler automatically generates the proper
code for setting up the indicated stack type.
If you write a stand-alone module without using .STARTUP, you should
exercise caution. If you initialize DS but do not adjust SS and SP (as
described in Section 5.5.3, "Initializing the SS and SP Registers), use
the farStack keyword. If you do adjust SS and SP as described in Section
5.5.3, you can use the default value, nearStack.
Example 1
DOSSEG
.MODEL small,c
This statement defines default segments for small-model programs and
creates the ASSUME and GROUP statements used by small-model programs. The
segments are automatically ordered according to the Microsoft convention.
The example statements might be used at the start of the main (or only)
module of a stand-alone assembler program.
Example 2
.MODEL large,pascal
This statement defines default segments for large-model programs and
creates the ASSUME and GROUP statements used by large-model programs. It
does not automatically order segments according to the Microsoft
convention. The example statement might be used at the start of an
assembly module that would be called from a large-model Pascal program or
a C program in which the Pascal calling convention was specified.
Example 3
.MODEL small,c,farStack
This statement defines default segments for a small-model program and
creates the appropriate ASSUME and GROUP statements. In addition, this
statement makes all procedures public, and directs the assembler to prefix
an underscore to the beginning of each public name, so that the naming
convention is compatible with C. If you later use the PROC statement to
declare parameters, the assembler will assume that the parameters are
placed on the stack in the order specified by the C calling convention. In
addition, the statement uses farStack, indicating that SS is not equal to
DS.
The last example would be appropriate for a module called by a C module
with a customized memory model, compiled with a setting that did not
assume SS equal to DS. Note that QuickC does not support customized memory
models.
──────────────────────────────────────────────────────────────────────────
NOTE The assembler does not normally display the code generated by the
high-level-language support features. You can see the code produced by
these features by using the .LALL directive or the /LA command-line
option.
──────────────────────────────────────────────────────────────────────────
To write procedures for use with more than one language and memory model,
you can use text macros for the memory model and language arguments, and
define the values from the command line or in the Assembler Flags dialog
box. For example, the following .MODEL directive uses text macros for the
memorymodel and language arguments:
% .MODEL memmodel,lang ; Use % to evaluate memmodel, lang
The values of the two text macros can be defined from the command line
using the /D switch:
QCL /Dmemmodel=MEDIUM /Dlang=C /AM /Cx main.c proc.asm
5.1.4 Defining Simplified Segments
Each of the directives .CODE, .STACK, .DATA, .DATA?, .CONST, .FARDATA,
.FARDATA?, and .STARTUP indicate the start of a segment. They also end the
immediately preceding segment definition.
Syntax
.CODE [[name]] Code segment
.STACK [[size]] Stack segment
.DATA Initialized near-data segment
.DATA? Uninitialized near-data segment
.CONST Constant-data segment
.FARDATA [[name]] Initialized far-data segment
.FARDATA? [[name]] Uninitialized far-data segment
.STARTUP Code to initialize segment registers
For segments that take an optional name, the base file name of the source
module is used if you do not specify a value yourself.
Each new segment directive ends the previous segment. The END directive
closes the last segment in the source file.
5.1.4.1 How to Use Simplified Segments
The .CODE, .DATA, and .STACK directives create the three basic segments
that programs generally need to have. Chapter 4, "Writing Stand-Alone
Assembly Programs," demonstrates how to use these directives to write
code, data, and stack segments. Chapter 4 also explains the purpose of
each of these segments.
The .STARTUP directive initializes segment registers to the appropriate
segment values. Chapter 4 describes the use of .STARTUP, and Section
5.5 tells more about how .STARTUP works and what code it generates.
When you write a mixed-language program, you generally don't need to
declare a stack segment, because the start-up code in the C main module
creates a stack for you. When you write a stand-alone program, you should
declare a stack segment in the main module only.
Your programs can also use the .DATA? and .CONST directives to create
segments for uninitialized and constant data, respectively. With
stand-alone assembler programs, the use of these directives is optional,
because you can place all data in the segment defined by .DATA if you
want. With mixed-language programs, use .DATA? and .CONST to ensure
compatibility with the way C handles uninitialized and constant data. Once
you define these segments, it is up to you to place the appropriate data
in each segment.
If your program is written in compact, large, or huge model, you can use
the .FARDATA and .FARDATA? directives to define additional data segments.
All the data in the other data segments (defined by .DATA, .DATA?, and
.CONST) must not exceed a total of 64K across all modules. In addition,
the stack segment is also placed into this 64K area unless you specify
farStack with the .MODEL directive.
Data in the .FARDATA and .FARDATA? segments takes slightly longer to
access. However, there is generally much more room in these segments for
data definitions. For each module, the .FARDATA and .FARDATA? directives
each create a separate physical segment that can be up to 64K in size. The
recommended procedure is to use .FARDATA for initialized data, and
.FARDATA? for uninitialized data, although this is optional.
With medium, large, and huge model, you can use the name attribute to
create multiple code segments within a source module. With compact, large,
and huge model, you can also use the name attribute to create multiple
far-data segments.
Example 1
DOSSEG
.MODEL small,c
.STACK 100h
.DATA
ivariable DB 5
iarray DW 50 DUP (5)
string DB "This is a string"
uarray DW 50 DUP (?)
EXTRN xvariable:WORD
.CODE
.STARTUP
EXTRN xprocedure:NEAR
call xprocedure
.
.
.
END
This code uses simplified segment directives for a small-model,
stand-alone assembler program. Notice that initialized data, uninitialized
data, and a string constant are all defined in the same data segment. See
Section 5.1.7, "Default Segment Names," for an equivalent version that
uses full segment definitions.
Example 2
.MODEL, large,c
.FARDATA?
fuarray DW 10 DUP (?) ; Far uninitialized data
.CONST
string DB "This is a string" ; String constant
.DATA
niarray DB 100 DUP (5) ; Near initialized data
.FARDATA
EXTRN xvariable:FAR
fiarray DW 100 DUP (10) ; Far initialized data
.CODE TASK
EXTRN xprocedure:PROC
task PROC
.
.
.
ret
task ENDP
END
This example uses simplified segment directives to create a module that
might be called from a large-model, high-level-language program. Notice
that different types of data are put in different segments to conform to
Microsoft compiler conventions. See Section 5.1.7, "Default Segment
Names," for an equivalent version using full segment definitions.
5.1.4.2 How Simplified Segments Are Implemented
When you use the simplified segment directives described above, the
assembler defines segments in a way compatible with Microsoft high-level
languages.
This section makes a number of references to groups and ASSUME statements.
Both of these concepts arise from the need to deal with the 8086 segmented
architecture. A "group" consists of one or more segments, totaling no more
than 64K. When multiple segments are placed into a group, the linker
combines these segments into a single physical segment. All addresses in
the physical segment are adjusted so that they share the same segment
address. Use of groups is convenient because it removes the need to
constantly reload the DS register.
The ASSUME directive is described at greater length in Section 5.4,
"Associating Segments with Registers." This directive informs the
assembler where a segment register will point to at run time so that the
assembler can correctly calculate offset addresses relative to the value
in the appropriate segment register.
Unless you use tiny model, the code segment (defined with .CODE) is placed
in its own physical segment, separate from all the data and stack
segments. With medium, large, or huge model, you can define multiple code
segments within one source model by using .CODE repeatedly, each time with
a different name attribute. When you use this technique, each .CODE
directive generates a new ASSUME statement so that the assembler knows
where CS points to at run time.
Segments defined with the .STACK, .CONST, .DATA, or .DATA? directives are
placed in a group called DGROUP. Segments defined with the .FARDATA or
.FARDATA? directives are not placed in any group. See Section 5.3 for
more information on segment groups. When initializing the DS register to
access data in a group-associated segment, the value of DGROUP should be
loaded into DS. The .STARTUP directive does this initialization
automatically.
The .MODEL directive generates ASSUME statements to inform the assembler
that at run time, DS, SS, and ES will all point to the beginning of
DGROUP. You don't need to write these ASSUME statements yourself.
If you specify farStack with the .MODEL directive, the stack is placed in
a separate physical segment and the .MODEL directive generates an ASSUME
statement to inform the assembler that SS does not point to the same
segment address that DS does.
5.1.5 Using Predefined Segment Equates
Several equates are predefined for you. You can use the equate names at
any point in your code to represent the equate values. You should not
assign equates having these names. The predefined equates are listed
below:
Name Value
──────────────────────────────────────────────────────────────────────────
@CodeSize and If the .MODEL directive has been used, the value of
@DataSize @CodeSize is 0 for the models that use near-code
labels (tiny, small, and compact) or 1 for models that
use far-code labels (medium, large, and huge). The
value of @DataSize is 0 for models that use near-data
labels (tiny, small, and medium), 1 for compact and
large models, and 2 for huge models. These values can
be used in conditional-assembly statements.
IF @DataSize
les bx,pointer ; Load far pointer
mov ax,es:WORD PTR [bx]
ELSE
mov bx,WORD PTR pointer ; Load near pointer
mov ax,WORD PTR [bx]
ENDIF
@CurSeg This name has the segment name of the current segment.
This value may be convenient for ASSUME statements,
segment overrides, or other cases in which you need to
access the current segment. It can also be used to end
a segment.
@FileName This value represents the base name of the current
source file. For example, if the current source file
is TASK.ASM, the value of @FileName is TASK. This
value can be used in any name you would like to change
if the file name changes. For example, it can be used
as a procedure name:
@FileName PROC
.
.
.
@FileName ENDP
@Model As with the @CodeSize and @DataSize predefined
equates, you must first use the .MODEL directive
before using the @Model equate. The value of @Model is
1 for tiny model, 2 for small, 3 for compact, 4 for
medium, 5 for large, and 6 for huge. @Model can be
used in conditional-assembly statements.
Segment equates For each of the primary segment directives, there is a
corresponding equate with the same name, except that
the equate starts with an "at sign" (@) instead of a
period. For example, the @code equate represents the
segment name defined by the .CODE directive.
Similarly, @fardata represents the .FARDATA segment
name and @fardata? represents the .FARDATA? segment
name. The @data equate represents the group name
shared by all the near-data segments. It can be used
to access the segments created by the .DATA, .DATA?,
.CONST, and .STACK segments.
These equates can be used in ASSUME statements and at
any other time a segment must be referred to by name.
──────────────────────────────────────────────────────────────────────────
NOTE Although predefined equates are part of the simplified segment
system, the @CurSeg and @FileName equates are also available when using
full segment definitions. If you use the /Cl option or set Preserve Case
in the Assembler Flags dialog box, predefined equates will be case
sensitive with the exact names shown above.
──────────────────────────────────────────────────────────────────────────
5.1.6 Simplified Segment Defaults
Although your program can combine full segment definitions and simplified
segment directives, the .MODEL directive enables certain features of
simplified segment directives that change defaults. Defaults that change
are listed below:
■ If you do not use the .MODEL directive, the default size for the PROC
directive is always NEAR. If you use the .MODEL directive, the PROC
directive is associated with the specified memory model: NEAR for tiny,
small, and compact models and FAR for medium, large, and huge models.
See Section 6.4.3, "Procedure Labels," for further discussion of the
PROC directive.
■ If you use the .MODEL directive, the OFFSET operator returns an offset
relative to the beginning of a group, whenever a data item is defined
within a group. If you do not use the .MODEL directive, the OFFSET
operator always returns an offset relative to the beginning of the
segment. The simplified segment directives .DATA, .DATA?, and .STACK
all create segments that are part of the group DGROUP.
For example, assume the variable test1 was declared in a segment
defined with the .DATA directive and test2 was declared in a segment
defined with the .FARDATA directive. The statement
mov ax,OFFSET test1
loads the address of test1 relative to DGROUP. The statement
mov ax,OFFSET test2
loads the address of test2 relative to the segment defined by the
.FARDATA directive. See Section 5.3 for more information on groups.
5.1.7 Default Segment Names
If you use the simplified segment directives by themselves, you do not
need to know the names assigned for each segment. However, it is possible
to mix full segment definitions with simplified segment directives.
Therefore, some programmers may wish to know the actual names assigned to
all segments.
Table 5.1 shows the default segment names created by each directive.
Table 5.1 Default Segments and Types for Standard Memory Models
Model Directive Name Align Combine Class Group
──────────────────────────────────────────────────────────────────────────
Tiny .CODE _TEXT WORD PUBLIC 'CODE' DGROUP
.DATA _DATA WORD PUBLIC 'DATA' DGROUP
.CONST CONST WORD PUBLIC 'CONST' DGROUP
.DATA? _BSS WORD PUBLIC 'BSS' DGROUP
──────────────────────────────────────────────────────────────────────────
Small .CODE _TEXT WORD PUBLIC 'CODE'
.DATA _DATA WORD PUBLIC 'DATA' DGROUP
.CONST CONST WORD PUBLIC 'CONST' DGROUP
.DATA? _BSS WORD PUBLIC 'BSS' DGROUP
.STACK STACK PARA STACK 'STACK' DGROUP
──────────────────────────────────────────────────────────────────────────
Medium .CODE name_TEXT WORD PUBLIC 'CODE'
.DATA _DATA WORD PUBLIC 'DATA' DGROUP
.CONST CONST WORD PUBLIC 'CONST' DGROUP
.DATA? _BSS WORD PUBLIC 'BSS' DGROUP
.STACK STACK PARA STACK 'STACK' DGROUP
──────────────────────────────────────────────────────────────────────────
Compact .CODE _TEXT WORD PUBLIC 'CODE'
.FARDATA FAR_DATA PARA private 'FAR_DATA'
.FARDATA? FAR_BSS PARA private 'FAR_BSS'
.DATA _DATA WORD PUBLIC 'DATA' DGROUP
.CONST CONST WORD PUBLIC 'CONST' DGROUP
.DATA? _BSS WORD PUBLIC 'BSS' DGROUP
.STACK STACK PARA STACK 'STACK' DGROUP
──────────────────────────────────────────────────────────────────────────
Large or .CODE name_TEXT WORD PUBLIC 'CODE'
huge .FARDATA FAR_DATA PARA private 'FAR_DATA'
.FARDATA? FAR_BSS PARA private 'FAR_BSS'
.DATA _DATA WORD PUBLIC 'DATA' DGROUP
.CONST CONST WORD PUBLIC 'CONST' DGROUP
.DATA? _BSS WORD PUBLIC 'BSS' DGROUP
.STACK STACK PARA STACK 'STACK' DGROUP
──────────────────────────────────────────────────────────────────────────
The name used as part of far-code segment names is the file name of the
module. The default name associated with the .CODE directive can be
overridden in medium and large models. The default names for the .FARDATA
and .FARDATA? directives can always be overridden.
The segment and group table at the end of listings always shows the actual
segment names. However, the GROUP and ASSUME statements generated by the
.MODEL directive are not shown in listing files. For a program that uses
all possible segments, group statements equivalent to the following would
be generated:
DGROUP GROUP _DATA,CONST,_BSS,STACK
For tiny model, the following would be generated:
ASSUME cs:DGROUP,ds:DGROUP,ss:DGROUP
For small and compact models, the following would be generated:
ASSUME cs:_TEXT,ds:DGROUP,ss:DGROUP
For medium, large, and huge models, the following statement is given:
ASSUME cs: name_TEXT,ds:DGROUP,ss:DGROUP
Example 1
EXTRN xvariable:WORD
EXTRN xprocedure:NEAR
DGROUP GROUP _DATA,_BSS
ASSUME cs:_TEXT,ds:DGROUP,ss:DGROUP
_TEXT SEGMENT WORD PUBLIC 'CODE'
start: mov ax,DGROUP ; Initialize data segment
mov ds,ax
cli
mov ss,ax ; Move DGROUP into SS
add sp,OFFSET STACK ; Adjust SP to top of stac
sti
.
.
.
TEXT ENDS
_DATA SEGMENT WORD PUBLIC 'DATA'
ivariable DB 5
iarray DW 50 DUP (5)
string DB "This is a string"
uarray DW 50 DUP (?)
_DATA ENDS
STACK SEGMENT PARA STACK 'STACK'
DB 100h DUP (?)
STACK ENDS
END start
This example is equivalent to Example 1 in Section 5.1.4, "Defining
Simplified Segments." Notice that the segment order must be different in
this version to achieve the segment order specified by using the DOSSEG
directive in the first Section 5.1.4 example. The external variables are
declared at the start of the source code in this example. With simplified
segment directives, external variables can be declared in the segment in
which they are used. The code generated by .STARTUP is discussed in more
detail in Section 5.5.3.
Example 2
DGROUP GROUP _DATA,CONST,STACK
ASSUME cs:TASK_TEXT,ds:FAR_DATA,ss:STACK
EXTRN xprocedure:FAR
EXTR xvariable:FAR
FAR_BSS SEGMENT PARA 'FAR_DATA'
fuarray DW 10 DUP (?) ; Far uninitialized data
FAR_BSS ENDS
CONST SEGMENT WORD PUBLIC 'CONST'
string DB "This is a string" ; String constant
CONST ENDS
_DATA SEGMENT WORD PUBLIC 'DATA'
niarray DB 100 DUP (5) ; Near initialized data
_DATA ENDS
FAR_DATA SEGMENT WORD 'FAR_DATA'
fiarray DW 100 DUP (10)
FAR_DATA ENDS
TASK_TEXT SEGMENT WORD PUBLIC 'CODE'
task PROC FAR
.
.
.
ret
task ENDP
TASK_TEXT ENDS
END
This example is equivalent to Example 2 in Section 5.1.4, "Defining
Simplified Segments." Notice that the segment order is the same in both
versions. The segment order shown here is written to the object file, but
it is different in the executable file. The segment order specified by the
compiler (the DOS segment order) overrides the segment order in the module
object file.
5.2 Full Segment Definitions
If you need complete control over segments, you may want to give complete
segment definitions. The section below explains all aspects of segment
definitions, including how to order segments and how to define all the
segment types.
5.2.1 Setting the Segment-Order Method
The order in which QuickAssembler writes segments to the object file can
be either sequential or alphabetical. If the sequential method is
specified, segments are written in the order in which they appear in the
source code. If the alphabetical method is specified, segments are written
in the alphabetical order of their segment names.
The default is sequential. If no segment-order directive or option is
given, segments are ordered sequentially. The segment-order method is only
one factor in determining the final order of segments in memory. The
DOSSEG directive (see Section 5.1.2, "Specifying DOS Segment Order") and
class type (see Section 5.2.2.3, "Controlling Segment Structure with
Class Type") can also affect segment order.
The ordering method can be set by using the .ALPHA or .SEQ directive in
the source code. The method can also be set using the /s (sequential) or
/a (alphabetical) assembler options (see Appendix B, Section B.1,
"Specifying the Segment-Order Method"). The directives have precedence
over the options. For example, if the source code contains the .ALPHA
directive, but the /s option is given on the command line, the segments
are ordered alphabetically.
Changing the segment order is an advanced technique. In most cases, you
can simply leave the default sequential order in effect. If you are
linking with high-level-language modules, the compiler automatically sets
the segment order. The DOSSEG directive also overrides any segment-order
directives or options.
──────────────────────────────────────────────────────────────────────────
NOTE Some previous versions of the IBM Macro Assembler ordered segments
alphabetically by default. If you have trouble assembling and linking
source-code listings from books or magazines, try using the /a option.
Listings written for previous IBM versions of the assembler may not work
without this option. The distinction between ENDS as the end of a segment
and ENDS as the end of a structure is also made by the content of the
program.
──────────────────────────────────────────────────────────────────────────
Example 1
.SEQ
DATA SEGMENT WORD PUBLIC 'DATA'
DATA ENDS
CODE SEGMENT WORD PUBLIC 'CODE'
CODE ENDS
Example 2
.ALPHA
DATA SEGMENT WORD PUBLIC 'DATA'
DATA ENDS
CODE SEGMENT WORD PUBLIC 'CODE'
CODE ENDS
In Example 1, the DATA segment is written to the object file first because
it appears first in the source code. In Example 2, the CODE segment is
written to the object file first because its name comes first
alphabetically.
5.2.2 Defining Full Segments
The beginning of a program segment is defined with the SEGMENT directive,
and the end of the segment is defined with the ENDS directive.
Syntax
name SEGMENT [[align]] [[combine]] [[use]] [['class']]
statements
name ENDS
The name defines the name of the segment. This name can be unique, or it
can be the same name given to other segments in the program. Segments with
identical names are treated as the same segment. For example, if it is
convenient to put different portions of a single segment in different
source modules, the segment is given the same name in both modules.
The optional align, combine, use, and 'class' types give the linker and
the assembler instructions on how to set up and combine segments. Types
can be specified in any order; it is not necessary to enter all types, or
any type, for a given segment.
Defining segment types is an advanced technique. Beginning
assembly-language programmers might try using the simplified segment
directives discussed in Section 5.1.
──────────────────────────────────────────────────────────────────────────
NOTE Don't confuse the PAGE align type and the PUBLIC combine type with
the PAGE and PUBLIC directives. The distinction should be clear from
context since the align and combine types are only used on the same line
as the SEGMENT directive.
──────────────────────────────────────────────────────────────────────────
5.2.2.1 Controlling Alignment with Align Type
The optional align type defines the range of memory addresses from which a
starting address for the segment can be selected. The align type can be
any one of the following:
Align Type Meaning
──────────────────────────────────────────────────────────────────────────
BYTE Uses the next available byte address
WORD Uses the next available word address (2 bytes per
word)
DWORD Uses the next available doubleword address (4 bytes
per doubleword)
PARA Uses the next available paragraph address (16 bytes
per paragraph)
PAGE Uses the next available page address (256 bytes per
page)
If no align type is given, PARA is used by default.
The linker uses the alignment information to determine the relative start
address for each segment. DOS uses the information to calculate the actual
start address when the program is loaded.
Align types are illustrated in Figure 5.1 in the next section.
5.2.2.2 Defining Segment Combinations with Combine Type
The optional combine type defines how to combine segments having the same
name. The combine type can be any one of the following:
Combine Type Meaning
──────────────────────────────────────────────────────────────────────────
PUBLIC Concatenates all segments having the same name to form
a single, contiguous segment. The total size of the
resulting segment is equal to the sum of all
contributing segments.
All instruction and data addresses in the new segment
are relative to a single segment register, and all
offsets are adjusted to represent the distance from
the beginning of the segment.
STACK Concatenates all segments having the same name to form
a single, contiguous segment. This combine type is the
same as the PUBLIC combine type, except that all
addresses in the new segment are relative to the SS
segment register. The total size of the resulting
segment is equal to the sum of all contributing
segments.
The Stack Pointer (SP) register is initialized to the
length of the segment. The stack segment of your
program should normally use the STACK type, since this
automatically initializes the SS register, as
described in Section 5.5.3. If you create a stack
segment and do not use the STACK type, you must give
instructions to initialize the SS and SP registers.
For each individual segment, all initialized data is
placed at the high end of the resulting stack segment.
Consequently, if more than one stack segment contains
initialized data, the linker overwrites this data as
it links in each segment. Note that stack data cannot
be initialized with simplified segment directives.
COMMON Creates overlapping segments by placing the start of
all segments having the same name at the same address.
The length of the resulting area is the length of the
longest segment. All addresses in the segments are
relative to the same base address. If variables are
initialized in more than one segment having the same
name and COMMON type, the most recently initialized
data replaces any previously initialized data.
MEMORY Concatenates all segments having the same name to form
a single, contiguous segment.
The Microsoft Overlay Linker treats MEMORY segments
exactly the same as PUBLIC segments. QuickAssembler
allows you to use MEMORY type even though LINK does
not recognize a separate MEMORY type. This feature is
compatible with other linkers that may support a
combine type conforming to the Intel definition of
MEMORY type.
AT address Causes all label and variable addresses defined in the
segment to be relative to address.
The address can be any valid expression but must not
contain a forward reference──that is, a reference to a
symbol defined later in the source file. An AT segment
typically contains no code or initialized data.
Instead, it represents an address template that can be
placed over code or data already in memory, such as a
screen buffer or other absolute memory locations
defined by hardware. The linker will not generate any
code or data for AT segments, but existing code or
data can be accessed by name if it is given a label in
an AT segment. Section 6.6, "Setting the Location
Counter," shows an example of a segment with AT
combine type.
If no combine type is given, the segment has private type. Segments having
the same name are not combined. Instead, each segment receives its own
physical segment when loaded into memory.
──────────────────────────────────────────────────────────────────────────
NOTE Although a given segment name can be used more than once in a
source file, each segment definition using that name must have either
exactly the same attributes, or attributes that do not conflict. If types
are given for an initial segment definition, subsequent definitions for
that segment need not specify any types.
Normally, you should provide at least one stack segment (having STACK
combine type) in a program. If no stack segment is declared, LINK displays
a warning message. You can ignore this message if you have a specific
reason for not declaring a stack segment. For example, you would not have
a separate stack segment in a program in the .COM format.
──────────────────────────────────────────────────────────────────────────
Example
The following source-code shell illustrates one way in which the combine
and align types can be used. Figure 5.1 shows the way LINK would load the
sample program into memory.
NAME module_1
ASEG SEGMENT BYTE PUBLIC 'CODE'
start: .
.
.
ASEG ENDS
BSEG SEGMENT WORD COMMON 'DATA'
.
.
.
BSEG ENDS
CSEG SEGMENT PARA STACK 'STACK'
.
.
.
CSEG ENDS
DSEG SEGMENT AT 0B800H
.
.
.
DSEG ENDS
END start
NAME module_2
ASEG SEGMENT BYTE PUBLIC 'CODE'
.
.
.
ASEG ENDS
BSEG SEGMENT WORD COMMON 'DATA'
.
.
.
BSEG ENDS
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 5.2.2.2 of the manual │
└────────────────────────────────────────────────────────────────────────┘
5.2.2.3 Controlling Segment Structure with Class Type
Class type is a means of associating segments that have different names,
but similar purposes. It can be used to control segment order and to
identify the code segment.
The class name must be enclosed in single quotation marks ('). Class names
are not case sensitive unless the /Cl or /Cx option is used during
assembly.
All segments belong to a class. Segments for which no class name is
explicitly stated have the null class name. LINK imposes no restriction on
the number or size of segments in a class. The total size of all segments
in a class can exceed 64K.
──────────────────────────────────────────────────────────────────────────
NOTE The names assigned for class types of segments should not be used
for other symbol definitions in the source file. For example, if you give
a segment the class name 'CONSTANT', you should not give the name constant
to variables or labels in the source file.
──────────────────────────────────────────────────────────────────────────
The linker expects segments having the class name CODE or a class name
with the suffix CODE to contain program code. You should always assign
this class name to segments containing code.
Class type is one of two factors that control the final order of segments
in an executable file. The other factor is the order of the segments in
the source file (with the /s option or the .SEQ directive) or the
alphabetical order of segments (with the /a option or the .ALPHA
directive).
These factors control different internal behavior, but both affect the
final order of segments in the executable file. The sequential or
alphabetical order of segments in the source file determines the order in
which the assembler writes segments to the object file. The class type can
affect the order in which the linker writes segments from object files to
the executable file.
Segments having the same class type are loaded into memory together,
regardless of their sequential or alphabetical order in the source file.
──────────────────────────────────────────────────────────────────────────
NOTE The DOSSEG directive (see Section 5.1.2, "Specifying DOS Segment
Order") overrides all other factors in determining segment order.
──────────────────────────────────────────────────────────────────────────
Example
A_SEG SEGMENT 'SEG_1'
A_SEG ENDS
B_SEG SEGMENT 'SEG_2'
B_SEG ENDS
C_SEG SEGMENT 'SEG_1'
C_SEG ENDS
When QuickAssembler assembles the preceding program fragment, it writes
the segments to the object file in sequential or alphabetical order,
depending on whether the /a option or the .ALPHA directive was used. In
the example above, the sequential and alphabetical order are the same, so
the order will be A_SEG, B_SEG, C_SEG in either case.
When the linker writes the segments to the executable file, it first
checks to see if any segments have the same class type. If they do, it
writes them to the executable file together. Thus, A_SEG and C_SEG are
placed together because they both have class type 'SEG_1'. The final order
in memory is A_SEG, C_SEG, B_SEG.
Since LINK processes modules in the order it receives them on the command
line, you may not always be able to easily specify the order in which you
want segments to be loaded. For example, assume your program has four
segments that you want loaded in the following order: _TEXT, _DATA, CONST,
and STACK.
The _TEXT, CONST, and STACK segments are defined in the first module of
your program, but the _DATA segment is defined in the second module. LINK
will not put the segments in the proper order because it first loads the
segments encountered in the first module.
You can avoid this problem by starting your program with dummy segment
definitions in the order you wish to load your real segments. The dummy
segments can either go at the start of the first module, or they can be
placed in a separate include file that is called at the start of the first
module. You can then put the actual segment definitions in any order or
any module you find convenient.
For example, you might call the following include file at the start of the
first module of your program:
_TEXT SEGMENT WORD PUBLIC 'CODE'
_TEXT ENDS
_DATA SEGMENT WORD PUBLIC 'DATA'
_DATA ENDS
CONST SEGMENT WORD PUBLIC 'CONST'
CONST ENDS
STACK SEGMENT PARA STACK 'STACK'
STACK ENDS
The DOSSEG directive may be more convenient for defining segment order if
you are willing to accept the DOS segment-order conventions.
Once a segment has been defined, you do not need to specify the align,
combine, use, and class types on subsequent definitions. For example, if
your code defined dummy segments as shown above, you could define an
actual data segment with the following statements:
_DATA SEGMENT
.
.
.
_DATA ENDS
5.3 Defining Segment Groups
A group is a collection of segments associated with the same starting
address. You may wish to use a group if you want several types of data to
be organized in separate segments in your source code, but want them all
to be accessible from a single, common segment register at run time.
Syntax
name GROUP segment [[,segment]]...
The name is the symbol assigned to the starting address of the group. All
labels and variables defined within the segments of the group are relative
to the start of the group, rather than to the start of the segments in
which they are defined.
The segment can be any previously defined segment or a SEG expression (see
Section 9.2.4.5).
Segments can be added to a group one at a time. For example, you can
define and add segments to a group one by one.
The GROUP directive does not affect the order in which segments of a group
are loaded. Loading order depends on each segment's class, or on the order
in which object modules are given to the linker.
Segments in a group need not be contiguous. Segments that do not belong to
the group can be loaded between segments that do. The only restriction is
that the distance (in bytes) between the first byte in the first segment
of the group and the last byte in the last segment must not exceed 65,535
bytes.
──────────────────────────────────────────────────────────────────────────
NOTE When the .MODEL directive is used, the offset of a group-relative
segment refers to the ending address of the segment, not the beginning.
For example, the expression OFFSET STACK evaluates to the end of the stack
segment.
──────────────────────────────────────────────────────────────────────────
Group names can be used with the ASSUME directive (discussed in Section
5.4, "Associating Segments with Registers") and as an operand prefix with
the segment-override operator (discussed in Section 9.2.3).
Example
DGROUP GROUP ASEG,CSEG
ASSUME ds:DGROUP
ASEG SEGMENT WORD PUBLIC 'DATA'
.
asym .
.
ASEG ENDS
BSEG SEGMENT WORD PUBLIC 'DATA'
.
bsym .
.
BSEG ENDS
CSEG SEGMENT WORD PUBLIC 'DATA'
.
csym .
.
CSEG ENDS
END
Figure 5.2 shows the order of the example segments in memory. They are
loaded in the order in which they appear in the source code (or in
alphabetical order if the .ALPHA directive or /s option is specified).
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 5.3 of the manual │
└────────────────────────────────────────────────────────────────────────┘
Since ASEG and CSEG are declared part of the same group, they have the
same base despite their separation in memory. This means that the symbols
asym and csym have offsets from the beginning of the group, which is also
the beginning of ASEG. The offset of bsym is from the beginning of BSEG,
since it is not part of the group. This sample illustrates the way LINK
organizes segments in a group. It is not intended as a typical use of a
group.
5.4 Associating Segments with Registers
Many of the assembler instructions assume a default segment. For example,
JMP instructions assume the segment associated with the CS register; PUSH
and POP instructions assume the segment associated with the SS register;
MOV instructions assume the segment associated with the DS register.
When the assembler needs to reference an address, it must know what
segment the address is in. It does this by using the default segment or
group addresses assigned with the ASSUME directive.
──────────────────────────────────────────────────────────────────────────
NOTE Using the ASSUME directive to tell the assembler which segment to
associate with a segment register is not the same as telling the
processor. The ASSUME directive only affects assembly-time assumptions.
You may need to use instructions to change run-time assumptions.
Initializing segment registers at run time is discussed in Section 5.5.
──────────────────────────────────────────────────────────────────────────
Syntax
ASSUME segmentregister:name [[,segmentregister:name]]...
ASSUME segmentregister:NOTHING
ASSUME NOTHING
The name must be the name of the segment or group that is to be associated
with segmentregister. Subsequent instructions that assume a default
register for referencing labels or variables automatically assume that if
the default segment is segmentregister, the label or variable is in the
name segment or group.
The ASSUME directive can define a segment for each of the segment
registers. The segmentregister can be CS, DS, ES, or SS. The name must be
one of the following:
■ The name of a segment defined in the source file with the SEGMENT
directive
■ The name of a group defined in the source file with the GROUP directive
■ The keyword NOTHING
■ A SEG expression (see Section 9.2.4.5, "SEG Operator")
■ A string equate that evaluates to a segment or group name (but not a
string equate that evaluates to a SEG expression)
The keyword NOTHING cancels the current segment selection. For example,
the statement ASSUME NOTHING cancels all register selections made by
previous ASSUME statements.
Usually, a single ASSUME statement defines all four segment registers at
the start of the source file. However, you can use the ASSUME directive at
any point to change segment assumptions.
Using the ASSUME directive to change segment assumptions is often
equivalent to changing assumptions with the segment-override operator (:)
(see Section 9.2.3). The segment-override operator is more convenient for
one-time overrides, whereas the ASSUME directive may be more convenient if
previous assumptions must be overridden for a sequence of instructions.
Example
DOSSEG
.MODEL large ; DS automatically assumed to @data
.STACK 100h
.DATA
d1 DW 7
.FARDATA
d2 DW 9
.CODE
start: mov ax,@data ; Initialize near data
mov ds,ax
mov ax,@fardata ; Initialize far data
mov es,ax
.
.
.
; Method 1 for series of instructions that need override
; Use segment override for each statement
mov ax,es:d2
.
.
.
mov es:d2,bx
; Method 2 for series of instructions that need override
; Use ASSUME at beginning of series of instructions
ASSUME es:@fardata
mov cx,d2
.
.
.
mov d2,dx
5.5 Initializing Segment Registers
Assembly-language programs must initialize segment values for each segment
register before instructions that reference the segment register can be
used in the source program.
Initializing segment registers is different from assigning default values
for segment registers with the ASSUME statement. The ASSUME directive
tells the assembler what segments to use at assembly time. Initializing
segments gives them an initial value that will be used at run time.
The .STARTUP directive generates all the initialization code described in
this section. This directive must be preceded by the .MODEL directive. If
the .MODEL directive was followed by the farStack attribute, .STARTUP does
not adjust SS and SP. Otherwise, it assumes the nearStack default, which
sets SS equal to DS as described in Section 5.5.3, "Initializing the SS
and SP Registers." When you use this default, the combined stack and near
data must not exceed 64K.
If you use .STARTUP, you don't need to enter any of the code in this
section, except for the END statement. (However, if you use .STARTUP, you
don't need to specify a starting address.) Make sure that you place the
.STARTUP directive at the point you want your program to start executing,
because the assembler automatically initializes CS:IP to point to the
beginning of the code generated by .STARTUP.
5.5.1 Initializing the CS and IP Registers
The CS and IP registers are initialized by specifying a starting address
with the END directive.
Syntax
END [[startaddress]]
The startaddress is a label or expression identifying the address where
you want execution to begin when the program is loaded. Normally, a label
for the start address should be placed at the address of the first
instruction in the code segment.
The CS segment is initialized to the value of startaddress. The IP
register is normally initialized to 0. You can change the initial value of
the IP register by using the ORG directive (see Section 6.6, "Setting the
Location Counter") just before the startaddress label. For example,
programs in the .COM format use ORG 100h to initialize the IP register to
256 (100 hexadecimal).
If a program consists of a single source module, the start address is
required for that module. If a program has several modules, all modules
must terminate with an END directive, but only one of them can define a
start address.
──────────────────────────────────────────────────────────────────────────
WARNING One, and only one, module must define a start address. If you do
not specify a start address, none is assumed. Neither QuickAssembler nor
LINK will generate an error message, but your program will probably start
execution at the wrong address.
──────────────────────────────────────────────────────────────────────────
Example
; Module 1
.CODE
start: . ; First executable instruction
.
.
EXTRN task:NEAR
call task
.
.
.
END start ; Starting address defined in main module
; Module 2
PUBLIC task
.CODE
task PROC
.
.
.
task ENDP
END ; No starting address in secondary module
If Module 1 and Module 2 are linked into a single program, it is essential
that only the calling module define a starting address.
5.5.2 Initializing the DS Register
The DS register must be initialized to the address of the segment that
will be used for data.
The address of the segment or group for the initial data segment must be
loaded into the DS register. This is done in two statements because a
memory value cannot be loaded directly into a segment register. The
segment-setup lines typically appear at the start or very near the start
of the code segment.
Example 1
_DATA SEGMENT WORD PUBLIC 'DATA'
.
.
.
_DATA ENDS
_TEXT SEGMENT BYTE PUBLIC 'CODE'
ASSUME cs:_TEXT,ds:_DATA
start: mov ax,_DATA ; Load start of data segment
mov ds,ax ; Transfer to DS register
.
.
.
_TEXT ENDS
END start
If you are using the Microsoft naming convention and segment order, the
address loaded into the DS register is not a segment address but the
address of DGROUP, as shown in Example 2. With simplified segment
directives, the address of DGROUP is represented by the predefined equate
@data.
Example 2
DOSSEG
.MODEL SMALL
.DATA
.
.
.
.CODE
start: mov ax,@data ; Load start of DGROUP (@data)
mov ds,ax ; Transfer to DS register
.
.
.
END start
5.5.3 Initializing the SS and SP Registers
At load time, DOS sets SS to the segment address of the last segment
having combine type STACK, and SP to the size of the stack. (The linker
actually determines the value of SS:SP and places this value in the
executable-file header. DOS sets SS and SP as indicated in the file
header.)
If you use a stack segment with combine type STACK or use the .STACK
directive, the program automatically loads with SS and SP initialized, as
described above.
However, this basic initialization does not set SS equal to DS. If the
program contains the statement ASSUME SS:DGROUP, it will be prone to
errors. The following code resets SS and SP so that SS has the same value
as DS. The code then adjusts SP upward so that SS:SP points to the
same physical address it did before. Since hardware interrupts use
the same stack as the program, you should turn off interrupts while
changing the stack. Most 8086-family processors turn off interrupts
automatically when you adjust SS or SP, but early versions of the 8088 do n
Example 1
.MODEL small
.STACK 100h ; Initialize "STACK"
.DATA
.
.
.
.CODE
start: mov ax,@data ; Load segment location
mov ds,ax ; into DS register
cli ; Turn off interrupts
mov ss,ax ; Load same value as DS into SS
mov sp,OFFSET STACK ; Give SP new stack size
sti ; Turn interrupts back on
.
.
.
This example reinitializes SS so that it has the same value as DS, and it
adjusts SP to reflect the new stack offset. Microsoft high-level-language
compilers do this so that stack variables in near procedures can be
accessed relative to either SS or DS.
However, this code only works correctly if you use .MODEL and you declare
a stack segment in just one module. The following code handles the more
general case. The .STARTUP directive generates this code:
Example 2
start_label:
mov dx,DGROUP ; Move DGROUP into DS and DX
mov ds,dx
mov bx,ss ; BX = STACK - DGROUP
sub bx,dx ;
shl bx,1 ; Multiply difference by 16
shl bx,1 ; and leave result in BX
shl bx,1
shl bx,1
cli
mov ss,dx ; Move DGROUP into SS
add sp,bx ; Adjust SP upward by
sti ; (STACK - DGROUP) * 16
The code above sets SS and SP so that SS equals DS. This code works
correctly no matter how many modules declare a stack segment.
5.5.4 Initializing the ES Register
The ES register is not automatically initialized. If your program uses the
ES register, you must initialize it by moving the appropriate segment
value into the register.
Example
ASSUME es:@fardata ; Tell the assembler
mov ax,@fardata ; Tell the processor
mov es,ax
5.6 Nesting Segments
Segments can be nested. When QuickAssembler encounters a nested segment,
it temporarily suspends assembly of the enclosing segment and begins
assembly of the nested segment. When the nested segment has been
assembled, Quick-Assembler continues assembly of the enclosing segment.
Nesting of segments makes it possible to mix segment definitions in
programs that use simplified segment directives for most segment
definitions. When a full segment definition is given, the new segment is
nested in the simplified segment in which it is defined.
Example 1
; Macro to print message on the screen
; Uses full segment definitions - segments nested
message MACRO text
LOCAL symbol
_DATA SEGMENT WORD PUBLIC 'DATA'
symbol DB &text
DB 13,10,"$"
_DATA ENDS
mov ah,09h
mov dx,OFFSET symbol
int 21h
ENDM
_TEXT SEGMENT BYTE PUBLIC 'CODE'
.
.
.
message "Please insert disk"
In the example above, a macro called from inside of the code segment
(_TEXT) allocates a variable within a nested data segment (_DATA). This
has the effect of allocating more data space on the end of the data
segment each time the macro is called. The macro can be used for messages
appearing only once in the source code.
Example 2
; Macro to print message on the screen
; Uses simplified segment directives - segments not nested
message MACRO text
LOCAL symbol
.DATA
symbol DB &text
DB 13,10,"$"
.CODE
mov ah,09h
mov dx,OFFSET symbol
int 21h
ENDM
.CODE
.
.
.
message "Please insert disk"
Although Example 2 has the same practical effect as Example 1,
Quick-Assembler handles the two macros differently. In Example 1, assembly
of the outer (code) segment is suspended rather than terminated. In
Example 2, assembly of the code segment terminates, assembly of the data
segment starts and terminates, and then assembly of the code segment is
restarted.
────────────────────────────────────────────────────────────────────────────
Chapter 6: Defining Constants, Labels, and Variables
This chapter explains how to define constants, labels, variables, and
other symbols that refer to instruction and data locations within
segments.
Constants are important in QuickAssembler, just as they are in other
languages. You can use constants as immediate operands in instructions and
as initial values in data declarations. QuickAssembler supports a number
of useful radixes (including binary and hexadecimal), as described in
Section 6.1.
QuickAssembler lets you use symbols as well as constants. Sections 6.2,
"Assigning Names to Symbols," and 6.3, "Using Type Specifiers," present
the basic principles of generating symbolic names.
Most symbols are either code labels or variable names. Section 6.4,
"Defining Code Labels," and Section 6.5, "Defining and Initializing
Data," describe how to define these symbols.
This chapter tells you how to assign labels and most kinds of variables.
(Multifield variables, such as structures and records, are discussed in
Chapter 7, "Using Structures and Records.") Chapter 6 also discusses
related directives, including those that control the location counter
directly. The assembler uses the location counter to assign addresses to
symbols.
6.1 Constants
Constants can be used in source files to specify numbers or strings that
are set or initialized at assembly time. The assembler recognizes four
types of constant values:
1. Integers
2. Packed binary coded decimals
3. Real numbers
4. Strings
6.1.1 Integer Constants
Integer constants represent integer values. They can be used in a variety
of contexts in assembly-language source code. For example, they can be
used in data declarations and equates, or as immediate operands.
Packed decimal integers are a special kind of integer constant that can
only be used to initialize binary coded decimal (BCD) variables. They are
described in Sections 6.1.2, "Packed Binary Coded Decimal Constants," and
6.5.1.2, "Binary Coded Decimal Variables."
Integer constants can be specified in binary, octal, decimal, or
hexadecimal values. Table 6.1 shows the legal digits for each of these
radixes. For hexadecimal radix, the digits can be either uppercase or
lowercase letters.
Table 6.1 Digits Used with Each Radix
Radix Base Digits
──────────────────────────────────────────────────────────────────────────
Binary 2 0 1
Octal 8 0 1 2 3 4 5 6 7
Decimal 10 0 1 2 3 4 5 6 7 8 9
Hexadecimal 16 0 1 2 3 4 5 6 7 8 9 A B C D E F
──────────────────────────────────────────────────────────────────────────
The radix for an integer can be defined for a specific integer by using
radix specifiers, or a default radix can be defined globally with the
.RADIX directive.
6.1.1.1 Specifying Integers with Radix Specifiers
The radix for an integer constant can be given by putting one of the
following radix specifiers after the last digit of the number:
Radix Specifier
──────────────────────────────────────────────────────────────────────────
Binary B
Octal Q or O
Decimal D
Hexadecimal H
Radix specifiers can be given in either uppercase or lowercase letters;
sample code in this manual uses lowercase letters.
Hexadecimal numbers must always start with a decimal digit (0-9). If
necessary, put a leading 0 at the left of the number to distinguish
between symbols and hexadecimal numbers that start with a letter. For
example, 0ABCh is interpreted as a hexadecimal number, but ABCh is
interpreted as a symbol. The hexadecimal digits A through F can be either
uppercase or lowercase letters. Sample code in this manual uses uppercase
letters.
If no radix is given, the assembler interprets the integer by using the
current default radix. The initial default radix is decimal, but you can
change the default with the .RADIX directive.
Examples
n360 EQU 01011010b + 132q + 5Ah + 90d ; 4 * 90
n60 EQU 00001111b + 17o + 0Fh + 15d ; 4 * 15
6.1.1.2 Setting the Default Radix
The .RADIX directive sets the default radix for integer constants in the
source file.
Syntax
.RADIX expression
The expression must evaluate to a number in the range 2-16. It defines
whether the numbers are binary, octal, decimal, hexadecimal, or numbers of
some other base.
Numbers given in expression are always considered decimal, regardless of
the current default radix. The initial default radix is decimal.
Note that the .RADIX directive does not affect real numbers initialized as
variables with the DD, DQ, or DT directive. Initial values for real-number
variables declared with these directives are always evaluated as decimal
unless a radix specifier is appended.
Also, the .RADIX directive does not affect the optional radix specifiers,
B and D, used with integer numbers. When the letters B or D appear at the
end of any integer, they are always considered to be a radix specifier
even if the current radix is 16.
For example, if the input radix is 16, the number 0ABCD will be
interpreted as 0ABC decimal, an illegal number, instead of as 0ABCD
hexadecimal, as intended. Type 0ABCDh to specify 0ABCD in hexadecimal.
Similarly, the number 11B will be treated as 11 binary, a legal number,
but not as 11B hexadecimal as intended. Type 11Bh to specify 11B in
hexadecimal.
Examples
.RADIX 16 ; Set default radix to hexadecimal
.RADIX 2 ; Set default radix to binary
6.1.2 Packed Binary Coded Decimal Constants
When an integer constant is used with the DT directive, the number is
interpreted by default as a packed binary coded decimal (BCD) number. You
can use the D radix specifier to override the default and initialize
10-byte integers as binary-format integers.
The syntax for specifying binary coded decimals is exactly the same as for
other integers. However, the assembler encodes binary coded decimals in a
completely different way. See Section 6.5.1.2, "Binary Coded Decimal
Variables," for complete information on storage of binary coded decimals.
Examples
positive DT 1234567890 ; Encoded as 00000000001234567890h
negative DT -1234567890 ; Encoded as 80000000001234567890h
6.1.3 Real-Number Constants
A real number is a number consisting of an integer part, a fractional
part, and an exponent. Real numbers are usually represented in decimal
format.
Syntax
[[+ | -]] integer.fraction[[E[[+ | -]]exponent]]
The integer and fraction parts combine to form the value of the number.
This value is stored internally as a unit and is called the mantissa. It
may be signed. The optional exponent follows the exponent indicator (E).
It represents the magnitude of the value and is stored internally as a
unit. If no exponent is given, 1 is assumed. If an exponent is given, it
may be signed.
During assembly, the assembler converts real-number constants given in
decimal format to a binary format. The sign, exponent, and mantissa of the
real number are encoded as bit fields within the number. See Section
6.5.1.4, "Real-Number Variables," for an explanation of how real numbers
are encoded.
You can specify the encoded format directly using hexadecimal digits (0-9
or A-F). The number must begin with a decimal digit (0-9) and cannot be
signed. It must be followed by the real-number designator (R). This
designator is used the same as a radix designator except it specifies that
the given hexadecimal number should be interpreted as a real number.
Real numbers can only be used to initialize variables with the DD, DQ, and
DT directives. They cannot be used in expressions. The maximum number of
digits in the number and the maximum range of exponent values depend on
the directive. The number of digits for encoded numbers used with DD, DQ,
and DT must be 8, 16, and 20 digits, respectively. (If a leading 0 is
supplied, the number must be 9, 17, or 21 digits.) See Section 6.5.1.4,
"Real-Number Variables," for an explanation of how real numbers are
encoded.
──────────────────────────────────────────────────────────────────────────
NOTE Real numbers will be encoded differently depending upon whether you
use the .MSFLOAT directive. By default, real numbers are encoded in the
IEEE format. The .MSFLOAT directive overrides the default and specifies
Microsoft Binary format. See Section 6.5.1.4, "Real-Number Variables,"
for a description of these formats.
──────────────────────────────────────────────────────────────────────────
Example
; Real numbers
shrt DD 25.23
long DQ 2.523E1
ten_byte DT 2523.0E-2
; Assumes .MSFLOAT
mbshort DD 81000000r ; 1.0 as Microsoft Binary short
mblong DQ 8100000000000000r ; 1.0 as Microsoft Binary long
; Assumes default IEEE format
ieeeshort DD 3F800000r ; 1.0 as IEEE short
ieeelong DQ 3FF0000000000000r ; 1.0 as IEEE long
; The same regardless of processor directives
temporary DT 3FFF8000000000000000r ; 1.0 as 10-byte temporary real
6.1.4 String Constants
A string constant consists of one or more ASCII characters enclosed in
single or double quotation marks. Strings are interpreted as lists of
characters having the ASCII values of the characters in the string.
Syntax
'characters'
"characters"
String constants are case sensitive. A string constant consisting of a
single character is sometimes called a character constant.
Single quotation marks must be encoded twice when used literally within
string constants that are also enclosed by single quotation marks.
Similarly, double quotation marks must be encoded twice when used in
string constants that are also enclosed by double quotation marks.
Examples
char DB 'a'
char2 DB "a"
message DB "This is a message."
warn DB 'Can"t find file.' ; Can't find file.
warn2 DB "Can't find file." ; Can't find file.
string DB "This ""value"" not found." ; This "value" not found.
string2 DB 'This "value" not found.' ; This "value" not found.
6.1.5 Determining Floating-Point Format
The .MSFLOAT directive disables all coprocessor instructions and specifies
that initialized real-number variables be encoded in the Microsoft Binary
format. Without this directive, initialized real-number variables are
encoded in the IEEE format. This is a change from Versions 4.0 and earlier
of the Microsoft Macro Assembler, which used Microsoft Binary format by
default and required a coprocessor directive or the /R option to specify
IEEE format. .MSFLOAT must be used for programs that require real-number
data in the Microsoft Binary format. Section 6.5.1.4, "Real-Number
Variables," describes real-number data formats and the factors to consider
in choosing a format.
6.2 Assigning Names to Symbols
A symbol is a name that represents a value. Symbols are one of the most
important elements of assembly-language programs. Elements that must be
represented symbolically in assembly-language source code include
variables, address labels, macros, segments, procedures, records, and
structures. Constants, expressions, and strings can also be represented
symbolically.
Symbol names are combinations of letters (both uppercase and lowercase),
digits, and special characters. The QuickAssembler recognizes the
following character set:
A-Z a-z 0-9
? @ _ $ : . [ ] ( ) < > { } + - / *
& % ! ' ~ | \ = # ^ ; , ` "
Letters, digits, and some characters can be used in symbol names, but some
restrictions on how certain characters can be used or combined are listed
below:
■ A name can have any combination of uppercase and lowercase letters.
Within the QC integrated environment, the default behavior (Preserve
Extrn) is for the assembler to convert all symbol names to uppercase
unless they are public or external. When you use simplified segment
directives, all procedure labels declared with PROC are automatically
public.
When you use QCL, all lowercase letters are converted to uppercase by
the assembler, unless you give the /Cl assembly option, or you declare
the name with a PROC, PUBLIC, or EXTRN directive and you give the /Cx
option. The /Cl and /Cx options correspond to the assembler flags
Preserve Case and Preserve Extrn, respectively, within the QC
environment.
■ Digits may be used within a name, but not as the first character.
■ A name can be given any number of characters, but only the first 31 are
used. All other characters are ignored.
■ The following characters may be used at the beginning of a name or
within a name: underscore (_), question mark (?), dollar sign ($), and
at sign (@).
■ The period (.) is an operator and cannot be used within a name, but it
can be used as the first character of a name.
■ A name may not be the same as any reserved name. Note that two special
characters, the question mark (?) and the dollar sign ($), are reserved
names and therefore can't stand alone as symbol names.
A reserved name is any name with a special, predefined meaning to the
assembler. Reserved names include instruction and directive mnemonics,
register names, and operator names. All uppercase and lowercase letter
combinations of these names are treated as the same name.
The following is a list of names that are always reserved by the
assembler. Using any of these names for a symbol results in an error.
$ DWORD GE %OUT
* ELSE GROUP PAGE
+ ELSEIF GT PROC
- ELSEIF1 HIGH PTR
. ELSEIF2 IF PUBLIC
/ ELSEIFB IF1 PURGE
= ELSEIFDEF IF2 QWORD
? ELSEIFDIF IFB .RADIX
[] ELSEIFDIFI IFDEF RECORD
.186 ELSEIFE IFDIF REPT
.286 ELSEIFIDN IFE .SALL
.286P ELSEIFIDNI IFIDN SEG
.287 ELSEIFNB IFNB SEGMENT
.386 ELSEIFNDEF IFNDEF .SEQ
.386P END INCLUDE .SFCOND
.387 ENDIF INCLUDELIB SHL
.8086 ENDM IRP SHORT
.8087 ENDP IRPC SHR
ALIGN ENDS LABEL SIZE
.ALPHA EQ .LALL SIZESTR
AND EQU LE .STACK
ASSUME .ERR LENGTH .STARTUP
BYTE .ERR1 .LFCOND STRUC
CATSTR .ERR2 .LIST SUBSTR
.CODE .ERRB LOCAL SUBTTL
COMM .ERRDEF LOW TBYTE
COMMENT .ERRDIF LT .TFCOND
.CONST .ERRE MACRO THIS
.CREF .ERRIDN MASK TITLE
.DATA .ERRNB MOD TYPE
.DATA? .ERRNDEF .MODEL .TYPE
DB .ERRNZ NAME WIDTH
DD EVEN NE WORD
DOSSEG EXITM NEAR .XALL
DQ EXTRN NOT .XCREF
DS FAR OFFSET .XLIST
DT .FARDATA OR XOR
In addition to the names listed above, instruction mnemonics and register
names are considered reserved names. Instructions can vary depending on
the processor directives given in the source file. For example, ENTER is
recognized as a reserved word if you have enabled 286 instructions with
the .286 directive. Section 18.3 describes processor directives.
Instruction mnemonics for each processor are listed in the on-line Help
system. Register names are listed in Section 2.6.2, "Register Operands."
6.3 Using Type Specifiers
Some statements require type specifiers to give the size or type of an
operand. There are two kinds of type specifiers: those that specify the
size of a variable or other memory operand, and those that specify the
distance of a label.
The type specifiers that give the size of a memory operand are listed
below with the number of bytes specified by each:
Specifier Number of Bytes
──────────────────────────────────────────────────────────────────────────
BYTE 1
WORD 2
DWORD 4
QWORD 8
TBYTE 10
In some contexts, ABS can also be used as a type specifier that indicates
an operand is a constant rather than a memory operand.
The type specifiers that give the distance of a label are listed below:
Specifier Description
──────────────────────────────────────────────────────────────────────────
FAR The label references both the segment and offset of
the label.
NEAR The label references only the offset of the label.
PROC The label has the default type (NEAR or FAR) of the
current memory model. The default size is always NEAR
if you use full segment definitions. If you use
simplified segment directives (see Section 5.1), the
default type is NEAR for small and compact models or
FAR for medium, large, and huge models.
Directives that use type specifiers include LABEL, PROC, EXTRN, and COMM.
Operators that use type specifiers include PTR and THIS.
6.4 Defining Code Labels
Code labels give symbolic names to the addresses of instructions in the
code segment. These labels can be used as the operands to jump, call, and
loop instructions to transfer program control to a new instruction.
6.4.1 Near-Code Labels
Near-label definitions create instruction labels that have NEAR type.
These instruction labels can be used to access the address of the label
from other statements.
Syntax
name:
The name must be followed by a colon (:). The segment containing the
definition must be the one that the assembler currently associates with
the CS register. The ASSUME directive is used to associate a segment with
a segment register (see Section 5.4, "Associating Segments with
Registers"). A near label can appear on a line by itself or on a line with
an instruction.
Near-code labels have different behavior depending on whether they are
used in a procedure with the extended PROC syntax. When the extended PROC
feature is used (which requires that .MODEL and a language must be
specified), near labels are local to the procedure. This functionality is
explained in Section 15.3.7, "Variable Scope."
If the full segments are used or if the language argument is not supplied
to the .MODEL directive, near labels are known throughout the module in
which they occur. The same label name can be used in different modules as
long as each label is only referenced by instructions in its own module.
If a label must be referenced by instructions in another module, it must
be given a unique name and declared with the PUBLIC and EXTRN directives,
as described in Chapter 8, "Creating Programs from Multiple Modules."
Examples
cmp ax,5 ; Compare with 5
ja bigger
jb smaller
. ; Instructions if AX = 5
.
.
jmp done
bigger: . ; Instructions if AX > 5
.
.
jmp done
smaller: . ; Instructions if AX < 5
.
.
done:
6.4.2 Anonymous Labels
The assembler provides a way to generate automatic labels for jump
instructions. To define a label, use two at signs (@@) followed by a colon
(:). To jump to the nearest preceding anonymous label, use @B (back) in
the jump instruction's operand field; to jump to the nearest following
anonymous label, use @F (forward) in the operand field.
You can use two at signs (@@) to define any number of anonymous labels in
your program. The items @B and @F always refer to the nearest occurrences
of @@, so there is never any conflict between different anonymous labels.
Anonymous labels are best used for conditionally executing a few lines of
code. The advantage is that you do not need to continually think up new
label names. The disadvantage is that they do not provide a meaningful
name. You should mark major divisions of a program with actual named
labels.
The following example shows a typical sequence of code with a
jump-to-label instruction:
; DX is 20, unless CX is less than -20, then make DX 30
mov dx,20
cmp cx,-20
jge greatequ
mov dx,30
greatequ:
Here are the same lines rewritten to use an anonymous label:
; DX is 20, unless CX is less than -20, then make DX 30
mov dx,20
cmp cx,-20
jge @F
mov dx,30
@@:
6.4.3 Procedure Labels
The easiest way to declare a procedure is to use the PROC and ENDP
directives. The former declares the beginning of the procedure, and the
latter declares the end.
The PROC directive has the following syntax:
label PROC [[NEAR|FAR]]
statements
RET [[constant]]
label ENDP
The label assigns a symbol to the procedure. The distance can be NEAR or
FAR. Any RET instructions within the procedure automatically have the same
distance (NEAR or FAR) as the procedure.
The syntax shown here is always available. In addition, there is an
extended PROC syntax available if you use .MODEL and specify a language.
The extended PROC syntax is explained in Section 15.3.4, "Declaring
Parameters with the PROC Directive."
The ENDP directive labels the address where the procedure ends. Every
procedure label must have a matching ENDP label to mark the end of the
procedure. QuickAssembler generates an error message if it does not find
an ENDP directive to match each PROC directive.
When the PROC label definition is encountered, the assembler sets the
label's value to the current value of the location counter and sets its
type to NEAR or FAR. If the label has FAR type, the assembler also sets
its segment value to that of the enclosing segment. If you have specified
full segment definitions, the default distance is NEAR. If you are using
simplified segment directives, the default distance is the distance
associated with the declared memory model──that is, NEAR for small and
compact models or FAR for medium, large, and huge models.
The procedure label can be used in a CALL instruction to direct execution
control to the first instruction of the procedure. Control can be
transferred to a NEAR procedure label from any address in the same segment
as the label. Control can be transferred to a FAR procedure label from an
address in any segment.
Procedure labels must be declared with the PUBLIC and EXTRN directives if
they are located in one module but called from another module, as
described in Chapter 8, "Creating Programs from Multiple Modules."
Example
call task ; Call procedure
.
.
.
task PROC NEAR ; Start of procedure
.
.
.
ret
task ENDP ; End of procedure
6.4.4 Code Labels Defined with the LABEL Directive
The LABEL directive provides an alternative method of defining code
labels.
Syntax
name LABEL distance
The name is the symbol name assigned to the label. The distance can be a
type specifier, such as NEAR, FAR, or PROC. PROC means NEAR or FAR,
depending on the default memory model, as described in Section 5.1.3,
"Defining Basic Attributes of the Module." You can use the LABEL directive
to define a second entry point into a procedure. FAR code labels can also
be the destination of far jumps or of far calls that use the RETF
instruction (see Section 15.3.2, "Defining Procedures").
Example
task PROC FAR ; Main entry point
.
.
.
task1 LABEL FAR ; Secondary entry point
.
.
.
ret
task ENDP ; End of procedure
6.5 Defining and Initializing Data
The data-definition directives enable you to allocate memory for data. At
the same time, you can specify the initial values for the allocated data.
Data can be specified as numbers, strings, or expressions that evaluate to
constants. The assembler translates these constant values into binary
bytes, words, or other units of data. The encoded data is written to the
object file at assembly time.
6.5.1 Variables
Variables consist of one or more named data objects of a specified size.
Syntax
[[name]] directive initializer [[,initializer]]...
The name is the symbol name assigned to the variable. If no name is
assigned, the data is allocated; but the starting address of the variable
has no symbolic name.
The size of the variable is determined by directive. The directives that
can be used to define single-item data objects are listed below:
Directive Meaning
──────────────────────────────────────────────────────────────────────────
DB Defines byte
DW Defines word (2 bytes)
DD Defines doubleword (4 bytes)
DQ Defines quadword (8 bytes)
DT Defines 10-byte variable
The optional initializer can be a constant, an expression that evaluates
to a constant, or a question mark (?). The question mark is the symbol
indicating that the value of the variable is undefined. You can define
multiple values by using multiple initializers separated by commas, or by
using the DUP operator, as explained in Section 6.5.2, "Arrays and
Buffers."
Simple data types can allocate memory for integers, strings, addresses, or
real numbers.
6.5.1.1 Integer Variables
When defining an integer variable, you can specify an initial value as an
integer constant or as a constant expression. QuickAssembler generates an
error if you specify an initial value too large for the specified
variable.
Integer values for all sizes except 10-byte variables are stored in binary
form. They can be interpreted as either signed or unsigned numbers. For
instance, the hexadecimal value 0FFCD can be interpreted either as the
signed number -51 or the unsigned number 65,485.
The processor cannot tell the difference between signed and unsigned
numbers. Some instructions are designed specifically for signed numbers.
It is the programmer's responsibility to decide whether a value is to be
interpreted as signed or unsigned, and then to use the appropriate
instructions to handle the value correctly.
Figure 6.1 shows various integer storage formats.
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 6.5.1.1 of the manual │
└────────────────────────────────────────────────────────────────────────┘
The directives for defining integer variables are listed below with the
sizes of integer they can define:
Directive Size of Directive
──────────────────────────────────────────────────────────────────────────
DB (bytes) Allocates unsigned numbers from 0 to 255 or signed
numbers from -128 to 127.
DW (words) Allocates unsigned numbers from 0 to 65,535 or signed
numbers from -32,768 to 32,767. The bytes of a word
integer are stored in the format shown in Figure 6.1.
Note that in assembler listings and in most debuggers
the bytes of a word are shown in the opposite
order──high byte first──since this is the way most
people think of numbers.
Word values can be used directly in 8086-family
instructions. They can also be loaded, used in
calculations, and stored with 8087-family
instructions.
DD (doublewords) Allocates unsigned numbers from 0 to 4,294,967,295 or
signed numbers from -2,147,483,648 to 2,147,483,647.
The words of a doubleword integer are stored in the
format shown in Figure 6.1.
These 32-bit values (called long integers) can be
loaded, used in calculations, and stored with
8087-family instructions. Some calculations can be
done on these numbers directly with 16-bit 8086-family
processors; others involve an indirect method of doing
calculations on each word separately (see Chapter 14,
"Doing Arithmetic and Bit Calculations").
DQ (quadwords) Allocates 64-bit integers. The doublewords of a
quadword integer are stored in the format shown in
Figure 6.1.
These values can be loaded, used in calculations, and
stored with 8087-family instructions. You must write
your own routines to use them with 16-bit 8086-family
processors.
DT Allocates 10-byte (80-bit) integers if the D radix
specifier is used.
By default, DT allocates packed binary coded decimal
(BCD) numbers, as described in Section 6.5.1.2,
"Binary Coded Decimal Variables." If you define binary
10-byte integers, you must write your own routines to
use routines in calculations.
Example
integer DB 16 ; Initialize byte to 16
expression DW 4*3 ; Initialize word to 12
empty DQ ? ; Allocate uninitialized long integer
DB 1,2,3,4,5,6 ; Initialize six unnamed bytes
long DD 4294967295 ; Initialize double word to 4,294,967,295
tb DT 2345d ; Initialize 10-byte binary integer
6.5.1.2 Binary Coded Decimal Variables
Binary coded decimal (BCD) numbers provide a method of doing calculations
on large numbers without rounding errors. They are sometimes used in
financial applications. There are two kinds: packed and unpacked.
Unpacked BCD numbers are stored one digit to a byte, with the value in the
lower four bits. They can be defined with the DB directive. For example,
an unpacked BCD number could be defined and initialized as shown below:
unpackedr DB 1,5,8,2,5,2,9 ; Initialized to 9,252,851
unpackedf DB 9,2,5,2,8,5,1 ; Initialized to 9,252,851
Whether least-significant digits can come either first or last depends on
how you write the calculation routines that handle the numbers.
Calculations with unpacked BCD numbers are discussed in Section 14.5.1.
Packed BCD numbers are stored two digits to a byte, with one digit in the
lower four bits and one in the upper four bits. The leftmost bit holds the
sign (0 for positive or 1 for negative).
Packed BCD variables can be defined with the DT directive as shown below:
packed DT 9252851 ; Allocate 9,252,851
The 8087-family processors can do fast calculations with packed BCD
numbers, as described in Chapter 17, "Calculating with a Math
Coprocessor." The 8086-family processors can also do some calculations
with packed BCD numbers, but the process is slower and more complicated.
See Section 14.5.2 for details.
6.5.1.3 String Variables
Strings are normally initialized with the DB directive. The initializing
value is specified as a string constant. Strings can also be initialized
by specifying each value in the string. For example, the following
definitions are equivalent:
version1 DB 97,98,99 ; As ASCII values
version2 DB 'a','b','c' ; As characters
version3 DB "abc" ; As a string
One- and two-character strings can also be initialized with any of the
other data-definition directives. The last (or only) character in the
string is placed in the byte with the lowest address. Either 0 or the
first character is placed in the next byte. The unused portion of such
variables is filled with zeros.
Examples
function9 DB 'Hello',13,10,'$' ; Use with DOS INT 21h
; function 9
asciiz DB "\ASM\TEST.ASM",0 ; Use as ASCIIZ string
message DB "Enter file name: " ; Use with DOS INT 21h
l_message EQU $-message ; function 40h
a_message EQU OFFSET message
str1 DB "ab" ; Stored as 61 62
str2 DD "ab" ; Stored as 62 61 00 00
str3 DD "a" ; Stored as 61 00 00 00
6.5.1.4 Real-Number Variables
Real numbers must be stored in binary format. However, when initializing
variables, you can specify decimal or hexadecimal constants and let the
assembler automatically encode them into their binary equivalents.
QuickAssembler can use two different binary formats for real numbers: IEEE
or Microsoft Binary. You can specify the format by using directives (IEEE
is the default).
This section tells you how to initialize real-number variables, describes
the two binary formats, and explains real-number encoding.
Initializing and Allocating Real-Number Variables
Real numbers can be defined by initializing them either with real-number
constants or with encoded hexadecimal constants. The real-number
designator (R) must follow numbers specified in encoded format.
The directives for defining real numbers are listed below with the sizes
of the numbers they can allocate:
Directive Size
──────────────────────────────────────────────────────────────────────────
DD Allocates short (32-bit) real numbers in either the
IEEE or Microsoft Binary format.
DQ Allocates long (64-bit) real numbers in either the
IEEE or Microsoft Binary format.
DT Allocates temporary or 10-byte (80-bit) real numbers.
The format of these numbers is similar to the IEEE
format. They are always encoded the same regardless of
the real-number format. Their size is nonstandard and
incompatible with most Microsoft high-level languages.
Temporary-real format is provided for those who want
to initialize real numbers in the format used
internally by 8087-family processors.
The 8086-family microprocessors do not have any instructions for handling
real numbers. You must write your own routines, use a library that
includes real-number calculation routines, or use a coprocessor. The
8087-family coprocessors can load real numbers in the IEEE format; they
can also use the values in calculations and store the results back to
memory, as explained in Chapter 17, "Calculating with a Math
Coprocessor."
Examples
shrt DD 98.6 ; QuickAsm automatically encodes
long DQ 5.391E-4 ; in current format
ten_byte DT -7.31E7
eshrt DD 87453333r ; 98.6 encoded in Microsoft
; Binary format
elong DQ 3F41AA4C6F445B7Ar ; 5.391E-4 encoded in IEEE format
The real-number designator (R) used to specify encoded numbers is
explained in Section 6.1.3, "Real-Number Constants."
Selecting a Real-Number Format
QuickAssembler can encode four-byte and eight-byte real numbers in two
different formats: IEEE and Microsoft Binary. Your choice depends on the
type of program you are writing. The four primary alternatives are listed
below:
1. If your program requires a coprocessor for calculations, you must use
the IEEE format.
2. Most high-level languages use the IEEE format. If you are writing
modules that will be called from such a language, your program should
use the IEEE format. All versions of the C, FORTRAN, and Pascal
compilers sold by Microsoft and IBM use the IEEE format.
3. If you are writing a module that will be called from early versions of
Microsoft or IBM BASIC, your program should use the Microsoft Binary
format. Versions that support only the Microsoft Binary format include:
■ Microsoft QuickBASIC through Version 2.01
■ Microsoft BASIC Compiler through Version 5.3
■ IBM BASIC Compiler through Version 2.0
■ Microsoft GW-BASIC(R) interpreter (all versions)
■ IBM BASICA interpreter (all versions)
Microsoft QuickBASIC Version 3.0 supported both the Microsoft Binary
and IEEE formats as options. Current and future versions of Microsoft
QuickBASIC and the Microsoft and IBM BASIC compilers support only the
IEEE format.
4. If you are creating a stand-alone program that does not use a
coprocessor, you can choose either format. The IEEE format is better
for overall compatibility with high-level languages. The Microsoft
Binary format may be necessary for compatibility with existing source
code.
──────────────────────────────────────────────────────────────────────────
NOTE When you interface assembly-language modules with high-level
languages, the real-number format only matters if you initialize
real-number variables in the assembly module. If your assembly module does
not use real numbers, or if all real numbers are initialized in the
high-level-language module, the real-number format does not make any
difference.
──────────────────────────────────────────────────────────────────────────
By default, QuickAssembler assembles real-number data in the IEEE format.
If you wish to use the Microsoft Binary format, you must put the .MSFLOAT
directive at the start of your source file before initializing any
real-number variables.
Real-Number Encoding
The IEEE format for encoding four- and eight-byte real numbers is
illustrated in Figure 6.2. Although this figure places the
most-significant bit first for illustration, low bytes actually appear
first in memory.
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 6.5.1.4 of the manual │
└────────────────────────────────────────────────────────────────────────┘
The parts of the real numbers are described below:
1. Sign bit (0 for positive or 1 for negative) in the upper bit of the
first byte.
2. Exponent in the next bits in sequence (8 bits for short real number or
11 bits for long real number).
3. All except the first set bit of mantissa in the remaining bits of the
variable. Since the first significant bit is known to be set, it need
not be actually stored. The length is 23 bits for short real numbers
and 52 bits for long real numbers.
The Microsoft Binary format for encoding real numbers is illustrated in
Figure 6.3.
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 6.5.1.4 of the manual │
└────────────────────────────────────────────────────────────────────────┘
The three parts of real numbers are described below:
1. Biased exponent (8 bits) in the high-address byte. The bias is 81H for
short real numbers and 401H for long real numbers.
2. Sign bit (0 for positive or 1 for negative) in the upper bit of the
second-highest byte.
3. All except the first set bit of mantissa in the remaining 7 bits of the
second-highest byte and in the remaining bytes of the variable. Since
the first significant bit is known to be set, it need not be actually
stored. The length is 23 bits for short real numbers and 55 bits for
long real numbers.
QuickAssembler also supports the 10-byte temporary-real format used
internally by 8087-family coprocessors. This format is similar to IEEE
format. The size is nonstandard and is not used by Microsoft compilers or
interpreters. Since the coprocessors can load and automatically convert
numbers in the more standard 4-byte and 8-byte formats, the 10-byte format
is seldom used in assembly-language programming.
The temporary-real format for encoding real numbers is illustrated in
Figure 6.4.
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 6.5.1.4 of the manual │
└────────────────────────────────────────────────────────────────────────┘
The four parts of the real numbers are described below:
1. Sign bit (0 for positive or 1 for negative) in the upper bit of the
first byte.
2. Exponent in the next bits in sequence (15 bits for 10-byte real).
3. The integer part of mantissa in the next bit in sequence (bit 63).
4. Remaining bits of mantissa in the remaining bits of the variable. The
length is 63 bits.
Notice that the 10-byte temporary-real format stores the integer part of
the mantissa. This differs from the 4-byte and 8-byte formats, in which
the integer part is implicit.
6.5.2 Arrays and Buffers
Arrays, buffers, and other data structures consisting of multiple data
objects of the same size can be defined with the DUP operator. This
operator can be used with any of the data-definition directives described
in this chapter.
Syntax
count DUP (initialvalue[[,initialvalue]]...)
The count sets the number of times to define initialvalue. The initial
value can be any expression that evaluates to an integer value, a
character constant, or another DUP operator. It can also be the undefined
symbol (?) if there is no initial value.
Multiple initial values must be separated by commas. If multiple values
are specified within the parentheses, the sequence of values is allocated
count times. For example, the statement
DB 5 DUP ("Text ")
allocates the string "Text " five times for a total of 25 bytes.
DUP operators can be nested up to 17 levels. The initial value (or values)
must always be placed within parentheses.
Examples
array DD 10 DUP (1) ; 10 doublewords
; initialized to 1
buffer DB 256 DUP (?) ; 256 byte buffer
masks DB 20 DUP (040h,020h,04h,02h) ; 80 byte buffer
; with bit masks
DB 32 DUP ("I am here ") ; 320 byte buffer with
; signature for debugging
three_d DD 5 DUP (5 DUP (5 DUP (0))) ; 125 doublewords
; initialized to 0
Note that QuickAssembler sometimes generates different object code when
the DUP operator is used rather than when multiple values are given. For
example, the statement
test1 DB ?,?,?,?,? ; Indeterminate
is "indeterminate." It causes QuickAssembler to write five zero-value
bytes to the object file. The statement
test2 DB 5 DUP (?) ; Undefined
is "undefined." It causes QuickAssembler to increase the offset of the
next record in the object file by five bytes. Therefore, an object file
created with the first statement will be larger than one created with the
second statement.
In most cases, the distinction between indeterminate and undefined
definitions is trivial. The linker adjusts the offsets so that the same
executable file is generated in either case. However, the difference is
significant in segments with the COMMON combine type. If COMMON segments
in two modules contain definitions for the same variable, one with an
indeterminate value and one with an explicit value, the actual value in
the executable file varies depending on link order. If the module with the
indeterminate value is linked last, the 0 initialized for it overrides the
explicit value. You can prevent this by always using undefined rather than
indeterminate values in COMMON segments. For example, use the first of the
following statements:
test3 DB 1 DUP (?) ; Undefined - doesn't initialize
test4 DB ? ; Indeterminate - initializes 0
If you use the undefined definition, the explicit value is always used in
the executable file regardless of link order.
6.5.3 Labeling Variables
The LABEL directive can be used to define a variable of a given size at a
specified location. It is useful if you want to refer to the same data as
variables of different sizes.
Syntax
name LABEL type
The name is the symbol assigned to the variable, and type is the variable
size. The type can be any one of the following type specifiers: BYTE,
WORD, DWORD, QWORD, or TBYTE. It can also be the name of a previously
defined structure.
Examples
warray LABEL WORD ; Access array as 50 words
darray LABEL DWORD ; Access same array as 25 doublewords
barray DB 100 DUP(?) ; Access same array as 100 bytes
6.5.4 Pointer Variables
The assembler lets you explicitly allocate pointers. A pointer (address)
variable is either two or four bytes in size; consequently, any word or
doubleword data definition can create a pointer variable. However,
declaring pointer variables explicitly gives more accurate debugging
information to the environment.
Pointer-variable definitions have the following form:
symbol [[DW | DD]] type PTR initialvalue
The type can consist of the name of a record, structure, or one of the
standard types described in Section 6.3, "Using Type Specifiers."
Example
string DB "from swerve of shore to bend of bay", 0
pstring DW BYTE PTR string ; Declares a near pointer to string.
fpstring DD BYTE PTR string ; Declares a far pointer to string.
In this example, near (two-byte) and far (four-byte) pointers are declared
and initialized to the value of a null terminated string. This is the
format used in C language, and the pointer variables in the example could
be used in C functions that process strings.
Using an explicit pointer declaration generates debugging information,
causing the variable to be viewed as a pointer during debugging.
Consequently, the environment properly interprets the variable when you
enter it as a Watch expression. No special syntax is required.
This use of PTR is distinct from the use of PTR to alter the type of a
variable during an instruction. The assembler uses the context of the
program to determine which way you are using the PTR keyword.
6.6 Setting the Location Counter
As the assembler encounters labels and data declarations, it needs to know
what address to assign. This function is fulfilled by the location
counter, which indicates the offset address corresponding to the current
line of source code. This value is generally equal to the value that IP
will have at run time.
The assembler increments the location counter as it processes each
statement. However, you can set the location counter directly by using the
ORG directive.
Syntax
ORG expression
Subsequent code and data offsets begin at the new offset specified set by
expression. The expression must resolve to a constant number. In other
words, all symbols used in the expression must be known on the first pass
of the assembler.
──────────────────────────────────────────────────────────────────────────
NOTE The value of the location counter, represented by the dollar sign
($), can be used in the expression, as described in Section 9.3, "Using
the Location Counter."
──────────────────────────────────────────────────────────────────────────
Example 1
; Labeling absolute addresses
STUFF SEGMENT AT 0 ; Segment has constant value 0
ORG 410h ; Offset has constant value 410h
equipment LABEL WORD ; Value at 0000:0410 labeled "equipment"
ORG 417h ; Offset has constant value 417h
keyboard LABEL WORD ; Value at 0000:0417 labeled "keyboard"
STUFF ENDS
.CODE
.
.
.
ASSUME ds:STUFF ; Tell the assembler
mov ax,STUFF ; Tell the processor
mov ds,ax
mov dx,equipment
mov keyboard,ax
Example 1 illustrates one way of assigning symbolic names to absolute
addresses.
Example 2
; Format for .COM files
_TEXT SEGMENT
ASSUME cs:_TEXT,ds:_TEXT,ss:_TEXT,es:_TEXT
ORG 100h ; Skip 100h bytes of DOS header
entry: jmp begin ; Jump over data
variable DW ? ; Put more data here
.
begin: . ; First line of code
. ; Put more code here
_TEXT ENDS
END entry
Example 2 illustrates how the ORG directive is used to initialize the
starting execution point in .COM files.
6.7 Aligning Data
Some operations are more efficient when the variable used in the operation
is lined up on a boundary of a particular size. The EVEN and ALIGN
directives can be used to pad the object file so that the next variable is
aligned on a specified boundary.
Syntax 1
EVEN
Syntax 2
ALIGN number
The EVEN directive always aligns on the next even byte. The ALIGN
directive aligns on the next byte that is a multiple of number. The number
must be a power of 2. For example, use ALIGN 2 or EVEN to align on word
boundaries, or use ALIGN 4 to align on doubleword boundaries.
If the value of the location counter is not on the specified boundary when
an ALIGN directive is encountered, the location counter is incremented to
a value on the boundary. If the location counter is already on the
boundary, the directive has no effect.
When the assembler increments the location counter, it also pads the
skipped memory locations by inserting certain values. If the segment is a
data segment, the assembler always pads these locations with zeros. If the
segment is a code segment, the assembler pads skipped memory locations
with a two-byte no-op instruction (instruction that performed no
operation) where possible:
xchg bx,bx
This instruction, which assembles as 87D8 hexadecimal, does nothing, but
it executes faster than two NOP instructions. Where there is no room for
the two-byte no-op, the assembler inserts the one-byte NOP instruction.
The ALIGN and EVEN directives give no efficiency improvements on
processors that have an 8-bit data bus (such as the 8088). These
processors always fetch data one byte at a time, regardless of the
alignment. However, using EVEN can speed certain operation on processors
that have a 16-bit data bus (such as the 8086), since the processor can
fetch a word if the data is word aligned, but must do two memory fetches
if the data is not word aligned.
──────────────────────────────────────────────────────────────────────────
NOTE The ALIGN directive is a new feature of recent versions of Microsoft
assemblers, starting with 5.0. In previous versions, data could be word
aligned by using the EVEN directive, but other alignments could not be
specified.
The EVEN directive should not be used in segments with BYTE align type.
Similarly, the number specified with the ALIGN directive should be no
greater than the size of the align type of the current segment (thus
ensuring that number is a divisor of the align type of the segment).
──────────────────────────────────────────────────────────────────────────
Example
DOSSEG
.MODEL small,c
.STACK 100h
.DATA
.
.
.
ALIGN 2 ; For faster data access
stuff DW 66,124,573,99,75
.
.
.
ALIGN 2 ; For faster data access
evenstuff DW ?,?,?,?,?
.CODE
start: mov ax,@data ; Load segment location
mov ds,ax ; into DS
mov es,ax ; and ES registers
mov cx,5 ; Load count
mov si,OFFSET stuff ; Point to source
mov di,OFFSET evenstuff ; and destination
ALIGN 2 ; Align for faster loop access
mloop: lodsw ; Load a word
inc ax ; Make it even by incrementing
and ax,NOT 1 ; and turning off first bit
stosw ; Store
loop mloop ; Again
.
.
.
In this example, the words at stuff and evenstuff are forced to word
boundaries. This makes access to the data faster with processors that have
a 16-bit data bus. Without this alignment, the initial data might start on
an odd boundary and the processor would have to fetch half of each word at
a time.
Similarly, the alignment in the code segment speeds up repeated access to
the code at the start of the loop. The sample code sacrifices program size
in order to achieve moderate improvements on the 8086 and 80286. There is
no speed advantage on the 8088.
────────────────────────────────────────────────────────────────────────────
Chapter 7: Using Structures and Records
QuickAssembler can define and use two kinds of multifield variables:
structures and records.
"Structures" are templates for data objects made up of smaller data
objects. A structure can be used to define structure variables, which are
made up of smaller variables called fields. Fields within a structure can
be different sizes, and each can be accessed individually.
"Records" are templates for data objects whose bits can be described as
groups of bits called fields. A record can be used to define record
variables. Each bit field in a record variable can be used separately in
constant operands or expressions. The processor cannot access bits
individually at run time, but bit fields can be used with logical bit
instructions to change bits indirectly.
This chapter describes structures and records and tells how to use them.
7.1 Structures
A structure variable is a collection of data objects that can be accessed
symbolically as a single data object. Objects within the structure can
have different sizes and can be accessed symbolically.
There are two steps in using structure variables:
1. Declare a structure type. A structure type is a template for data. It
declares the sizes and, optionally, the initial values for objects in
the structure. By itself the structure type does not define any data.
The structure type is used by QuickAssembler during assembly but is not
saved as part of the object file.
2. Define one or more variables having the structure type. For each
variable defined, memory is allocated to the object file in the format
declared by the structure type.
The structure variable can then be used as an operand in assembler
statements. The structure variable can be accessed as a whole by using the
structure name, or individual fields can be accessed by using structure
and field names.
7.1.1 Declaring Structure Types
The STRUC and ENDS directives mark the beginning and end of a type
declaration for a structure.
Syntax
name STRUC
fielddeclarations
name ENDS
The name declares the name of the structure type. It must be unique. The
fielddeclarations declare the fields of the structure. Any number of field
declarations may be given. They must follow the form of data definitions
described in Section 6.5, "Defining and Initializing Data." Default
initial values may be declared individually or with the DUP operator.
The names given to fields must be unique within the source file where they
are declared. When variables are defined, the field names will represent
the offset from the beginning of the structure to the corresponding field.
When declaring strings in a structure type, make sure the initial values
are long enough to accommodate the largest possible string. Strings
smaller than the field size can be placed in the structure variable, but
larger strings will be truncated.
A structure declaration can contain both field declarations and comments.
Conditional-assembly statements are allowed in structure declarations; no
other kinds of statements are allowed. Since the STRUC directive is not
allowed inside structure declarations, structures cannot be nested.
──────────────────────────────────────────────────────────────────────────
NOTE The ENDS directive that marks the end of a structure has the same
mnemonic as the ENDS directive that marks the end of a segment. The
assembler recognizes the meaning of the directive from context. Make sure
each SEGMENT directive and each STRUC directive has its own ENDS
directive.
──────────────────────────────────────────────────────────────────────────
Example
student STRUC ; Structure for student records
id DW ? ; Field for identification #
sname DB "Last, First Middle "
scores DB 10 DUP (100) ; Field for 10 scores
student ENDS
Within the sample structure student, the fields id, sname, and scores have
the offset values 0, 2, and 24, respectively.
7.1.2 Defining Structure Variables
A structure variable is a variable with one or more fields of different
sizes. The sizes and initial values of the fields are determined by the
structure type with which the variable is defined.
Syntax
[[name]] structurename <[[initialvalue [[,initialvalue]]...]]>
The name is the name assigned to the variable. If no name is given, the
assembler allocates space for the variable, but does not give it a
symbolic name. The structurename is the name of a structure type
previously declared by using the STRUC and ENDS directives.
An initialvalue can be given for each field in the structure. Its type
must not be incompatible with the type of the corresponding field. The
angle brackets (< >) are required even if no initial value is given. If
initial values are given for more than one field, the values must be
separated by commas.
If the DUP operator (see Section 6.5.2, "Arrays and Buffers") is used to
initialize multiple structure variables, only the angle brackets and
initial values, if given, need to be enclosed in parentheses. For example,
you can define an array of structure variables as shown below:
war date 365 DUP (<,,1940>)
You need not initialize all fields in a structure. If an initial value is
left blank, the assembler automatically uses the default initial value of
the field, which was originally determined by the structure type. If there
is no default value, the field is undefined.
Examples
The following examples use the student type declared in the example in
Section 7.1.1, "Declaring Structure Types":
s1 student <> ; Uses default values of type
s2 student <1467,"White, Robert D.",>
; Override default values of first two
; fields--use default value of third
sarray student 100 DUP (<>) ; Declare 100 student variables
; with default initial values
Note that you cannot initialize any structure field that has multiple
values if this field was given a default initial value when the structure
was declared. For example, assume the following structure declaration:
stuff STRUC
buffer DB 100 DUP (?) ; Can't override
crlf DB 13,10 ; Can't override
query DB 'Filename: ' ; String <= can override
endmark DB 36 ; Can override
stuff ENDS
The buffer and crlf fields cannot be overridden by initial values in the
structure definition because they have multiple values. The query field
can be overridden as long as the overriding string is no longer than query
(10 bytes). A longer string would generate an error. The endmark field can
be overridden by any byte value.
7.1.3 Using Structure Operands
Like other variables, structure variables can be accessed by name. Fields
within structure variables can also be accessed by using the syntax shown
below:
Syntax
variable.field
The variable must be the name of a structure (or an operand that resolves
to the address of a structure). The field must be the name of a field
within that structure. The variable is separated from the field by a
period. The period is discussed as a structure-field-name operator in
Section 9.2.1.2.
The address of a structure operand is the sum of the offsets of variable
and field. The address is relative to the segment or group in which the
variable is declared.
Examples
date STRUC ; Declare structure
month DB ?
day DB ?
year DW ?
date ENDS
.DATA
yesterday date <9,30,1987> ; Declare structure
today date <10,1,1987> ; variables
tomorrow date <10,2,1987>
.CODE
.
.
.
mov al,yesterday.day ; Use structure variables
mov ah,today.month ; as operands
mov tomorrow.year,dx
mov bx,OFFSET yesterday ; Load structure address
mov ax,[bx].month ; Use as indirect operand
.
.
.
7.2 Records
A record variable is a byte or word variable in which specific bit fields
can be accessed symbolically. Bit fields within the record can have
different sizes.
There are two steps in declaring record variables:
1. Declare a record type. A record type is a template for data. It
declares the sizes and, optionally, the initial values for bit fields
in the record. By itself, the record type does not define any data. The
record type is used by Quick-Assembler during assembly but is not saved
as part of the object file.
2. Define one or more variables having the record type. For each variable
defined, memory is allocated to the object file in the format declared
by the type.
The record variable can then be used as an operand in assembler
statements. The record variable can be accessed as a whole by using the
record name, or individual fields can be specified by using the record
name and a field name combined with the field-name operator. A record type
can also be used as a constant (immediate data).
7.2.1 Declaring Record Types
The RECORD directive declares a record type for an 8-bit or 16-bit record
that contains one or more bit fields.
Syntax
recordname RECORD field [[,field]]...
The recordname is the name of the record type to be used when creating the
record. The field declares the name, width, and initial value for the
field.
The syntax for each field is shown below:
Syntax
fieldname:width[[=expression]]
The fieldname is the name of a field in the record, width is the number of
bits in the field, and expression is the initial (or default) value for
the field.
Any number of field combinations can be given for a record, as long as
each is separated from its predecessor by a comma. The sum of the widths
for all fields must not exceed 16 bits.
The width must be a constant. If the total width of all declared fields is
larger than eight bits, the assembler uses two bytes. Otherwise, only one
byte is used.
If expression is given, it declares the initial value for the field. An
error message is generated if an initial value is too large for the width
of its field. If the field is at least seven bits wide, you can use an
ASCII character for expression. The expression must not contain a forward
reference to any symbol.
In all cases, the first field you declare goes into the most significant
bits of the record. Successively declared fields are placed in the
succeeding bits to the right. If the fields you declare do not total
exactly 8 bits or exactly 16 bits, the entire record is shifted right so
that the last bit of the last field is the lowest bit of the record.
Unused bits in the high end of the record are initialized to 0.
Example 1
color RECORD blink:1,back:3,intense:1,fore:3
Example 1 creates a byte record type color having four fields: blink,
back, intense, and fore. The contents of the record type are shown below:
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 7.2.1 of the manual │
└────────────────────────────────────────────────────────────────────────┘
Since no initial values are given, all bits are set to 0. Note that this
is only a template maintained by the assembler. No data is created.
Example 2
cw RECORD r1:3=0,ic:1=0,rc:2=0,pc:2=3,r2:2=1,masks:6=63
Example 2 creates a record type cw having six fields. Each record declared
by using this type occupies 16 bits of memory. The bit diagram below shows
the contents of the record type:
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 7.2.1 of the manual │
└────────────────────────────────────────────────────────────────────────┘
Default values are given for each field. They can be used when data is
declared for the record.
7.2.2 Defining Record Variables
A record variable is an 8-bit or 16-bit variable whose bits are divided
into one or more fields.
Syntax
[[name]] recordname <[[initialvalue [[,initialvalue]]...]]>
The name is the symbolic name of the variable. If no name is given, the
assembler allocates space for the variable, but does not give it a
symbolic name. The recordname is the name of a record type that was
previously declared by using the RECORD directive.
An initialvalue for each field in the record can be given as an integer, a
character constant, or an expression that resolves to a value compatible
with the size of the field. Angle brackets (< >) are required even if no
initial value is given. If initial values for more than one field are
given, the values must be separated by commas.
If the DUP operator (see Section 6.5.2, "Arrays and Buffers") is used to
initialize multiple record variables, only the angle brackets and initial
values, if given, need to be enclosed in parentheses. For example, you can
define an array of record variables as shown below:
xmas color 50 DUP (<1,2,0,4>)
You do not have to initialize all fields in a record. If an initial value
is left blank, the assembler automatically uses the default initial value
of the field. This is declared by the record type. If there is no default
value, each bit in the field is cleared.
Sections 7.2.3, "Using Record Operands and Record Variables," and 7.2.4,
"Record Operators," illustrate ways to use record data after it has been
declared.
Example 1
color RECORD blink:1,back:3,intense:1,fore:3 ; Record declaration
warning color <1,0,1,4> ; Record definition
The definition above creates a variable named warning whose type is given
by the record type color. The initial values of the fields in the variable
are set to the values given in the record definition. The initial values
would override the default record values, had any been given in the
declaration. The contents of the record variable are shown below:
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 7.2.2 of the manual │
└────────────────────────────────────────────────────────────────────────┘
Example 2
color RECORD blink:1,back:3,intense:1,fore:3 ; Record declaration
colors color 16 DUP (<>) ; Record declaration
Example 2 creates an array named colors containing 16 variables of type
color. Since no initial values are given in either the declaration or the
definition, the variables have undefined (0) values.
Example 3
cw RECORD r1:3=0,ic:1=0,rc:2=0,pc:2=3,r2:2=1,masks:6=63
newcw cw <,,2,,,>
Example 3 creates a variable named newcw with type cw. The default values
set in the type declaration are used for all fields except the rc field.
This field is set to 2. The contents of the variable are shown below:
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 7.2.2 of the manual │
└────────────────────────────────────────────────────────────────────────┘
7.2.3 Using Record Operands and Record Variables
A record operand refers to the value of a record type. It should not be
confused with a record variable. A record operand is a constant; a record
variable is a value stored in memory. A record operand can be used with
the following syntax:
Syntax
recordname <[[value[[,value]]...]]>
The recordname must be the name of a record type declared in the source
file. The optional value is the value of a field in the record. If more
than one value is given, each value must be separated by a comma. Values
can include expressions or symbols that evaluate to constants. The
enclosing angle brackets (<>) are required, even if no value is given. If
no value for a field is given, the default value for that field is used.
Example
.DATA
color RECORD blink:1,back:3,intense:1,fore:3 ; Record declaration
window color <0,6,1,6> ; Record definition
.CODE
.
.
.
mov ah,color <0,3,0,2> ; Load record operand
; (constant value 32h
mov bh,window ; Load record variable
; (memory value 6Eh)
In this example, the record operand color <0,3,0,2> and the record
variable warning are loaded into registers. The contents of the values are
shown below:
┌────────────────────────────────────────────────────────────────────────┐
│ This figure can be found in Section 7.2.3 of the manual │
└────────────────────────────────────────────────────────────────────────┘
7.2.4 Record Operators
The WIDTH and MASK operators are used exclusively with records to return
constant values representing different aspects of previously declared
records.
7.2.4.1 The MASK Operator
The MASK operator returns a bit mask for the bit positions in a record
occupied by the given record field. A bit in the mask contains a 1 if that
bit corresponds to a field bit. All other bits contain 0.
Syntax
MASK {recordfieldname | record}
The recordfieldname may be the name of any field in a previously defined
record. The record may be the name of any previously defined record. The
NOT operator is sometimes used with the MASK operator to reverse the bits
of a mask.
Example
.DATA
color RECORD blink:1,back:3,intense:1,fore:3
message color <0,5,1,1>
.CODE
.
.
.
mov ah,message ; Load initial 0101 1001
and ah,NOT MASK back ; Turn off AND 1000 1111
; "back" ---------
; 0000 1001
or ah,MASK blink ; Turn on OR 1000 0000
; "blink" ---------
; 1000 1001
xor ah,MASK intense ; Toggle XOR 0000 1000
; "intense" ---------
; 1000 0001
7.2.4.2 The WIDTH Operator
The WIDTH operator returns the width (in bits) of a record or record
field.
Syntax
WIDTH {recordfieldname | record}
The recordfieldname may be the name of any field defined in any record.
The record may be the name of any defined record.
Note that the width of a field is the number of bits assigned for that
field; the value of the field is the starting position (from the right) of
the field.
Examples
.DATA
color RECORD blink:1,back:3,intense:1,fore:3
wblink EQU WIDTH blink ; "wblink" = 1 "blink" = 7
wback EQU WIDTH back ; "wback" = 3 "back" = 4
wintense EQU WIDTH intense ; "wintense" = 1 "intense" = 3
wfore EQU WIDTH fore ; "wfore" = 3 "fore" = 0
wcolor EQU WIDTH color ; "wcolor" = 8
prompt color <1,5,1,1>
.CODE
.
.
.
IF (WIDTH color) GE 8 ; If color is 16 bit, load
mov ax,prompt ; into 16-bit register
ELSE ; else
mov al,prompt ; load into low 8-bit register
xor ah,ah ; and clear high 8-bit register
ENDIF
7.2.5 Using Record-Field Operands
Record-field operands represent the location of a field in its
corresponding record. The operand evaluates to the bit position of the
low-order bit in the field and can be used as a constant operand. The
field name must be from a previously declared record.
Record-field operands are often used with the WIDTH and MASK operators, as
described in Sections 7.2.4.1 and 7.2.4.2.
Example
.DATA
color RECORD blink:1,back:3,intense:1,fore:3 ; Record declaration
cursor color <1,5,1,1> ; Record definition
.CODE
.
.
.
; Rotate "back" of "cursor" without changing other values
mov al,cursor ; Load value from memory
mov ah,al ; Save a copy for work 1101 1001=
and al,NOT MASK back ; Mask out old bits AND 1000 1111=
; to save old cursor ---------
; 1000 1001=
mov cl,back ; Load bit position
shr ah,cl ; Shift to right 0000 1101=
inc ah ; Increment 0000 1110=
shl ah,cl ; Shift left again 1110 0000=
and ah,MASK back ; Mask off extra bits AND 0111 0000=
; to get new cursor ---------
; 0110 0000
or ah,al ; Combine old and new OR 1000 1001
; ---------
mov cursor,ah ; Write back to memory 1110 1001
This example illustrates several ways in which record fields can be used
as operands and in expressions.
────────────────────────────────────────────────────────────────────────────
Chapter 8: Creating Programs from Multiple Modules
Most medium and large assembly-language programs are created from several
source files or modules. When several modules are used, the scope of
symbols becomes important. This chapter discusses the scope of symbols and
explains how to declare global symbols that can be accessed from any
module. It also tells you how to specify a module that will be accessed
from a library.
Symbols, such as labels and variable names, can be either local or global
in scope. By default, all symbols are local; they are specific to the
source file in which they are defined. Symbols must be declared global if
they must be accessed from modules other than the one in which they are
defined.
To declare symbols global, they must be declared public in the source
module in which they are defined. They must also be declared external in
any module that must access the symbol. If the symbol represents
uninitialized data, it can be declared communal──meaning that the symbol
is both public and external. The PUBLIC, EXTRN, and COMM directives are
used to declare symbols public, external, and communal, respectively.
──────────────────────────────────────────────────────────────────────────
NOTE The term "local" often has a different meaning in assembly language
than in many high-level languages. Local symbols in compiled languages are
symbols that are known only within a procedure (called a function,
routine, subprogram, or subroutine, depending on the language). You can
use QuickAssembler to generate these kinds of variables, as explained in
Section 15.3.6, "Creating Locals Automatically."
By default, the assembler converts all lowercase letters in names declared
with the PUBLIC, EXTRN, and COMM directives to uppercase letters before
copying the name to the object file. To preserve lowercase names in public
symbols, choose Preserve Case or Preserve Extrn from the Assembler Flags
dialog box, or assemble with /Cx or /Cl on the QCL command line. This
should be done when preparing assembler modules to be linked with modules
from C and other case-sensitive languages.
──────────────────────────────────────────────────────────────────────────
8.1 Declaring Symbols Public
The PUBLIC directive is used to declare symbols public so that they can be
accessed from other modules. If a symbol is not declared public, the
symbol name is not written to the object file. The symbol has the value of
its offset address during assembly, but the name and address are not
available to the linker.
If the symbol is declared public, its name is associated with its offset
address in the object file. During linking, symbols in different
modules──but with the same name──are resolved to a single address.
Public symbol names are also used by some symbolic debuggers (such as
SYMDEB) to associate addresses with symbols.
Syntax
PUBLIC declaration [[,declaration]]...
Each declaration has the following syntax:
[[lang]] name
The optional lang field contains a language specifier that overrides the
language specified by the .MODEL directive. With this statement, the
language specifier determines naming conventions for the variable that it
precedes. The specifier can be C, FORTRAN, Pascal, or BASIC. The C naming
convention prefixes each variable with an underscore (_); the other
conventions do not. If you specify lang with the .MODEL directive, all
procedures are automatically public. However, you must use the PUBLIC
directive for any data that you want to access from other modules.
Note that using the C type specifier does not preserve case. You must
choose one of the assembler flags or options that preserve case.
The name must be the name of a variable, label, or numeric equate defined
within the current source file. PUBLIC declarations can be placed anywhere
in the source file. Equate names, if given, can only represent one-byte or
two-byte integer or string values. Text macros (or text equates) cannot be
declared public.
Note that although absolute symbols can be declared public, aliases for
public symbols may cause errors. For example, the following statements are
illegal:
PUBLIC lines ; Declare absolute symbol public
lines EQU rows ; Declare alias for lines
rows EQU 25 ; Illegal - Assign value to alias
Example
.MODEL small,c
PUBLIC true,status,first,clear
true EQU -1 ; Public constant
.DATA
status DB 1 ; Public variable
.CODE
.
.
.
first LABEL FAR ; Public label
clear PROC ; Procedure names are automatically public
. ; with .MODEL model, lang
.
.
clear ENDP
8.2 Declaring Symbols External
If a symbol undeclared in a module must be accessed by instructions in
that module, it must be declared with the EXTRN directive.
This directive tells the assembler not to generate an error, even though
the symbol is not in the current module. The assembler assumes that the
symbol occurs in another module. However, the symbol must actually exist
and must be declared public in some module. Otherwise, the linker
generates an error.
Syntax
EXTRN declaration [[,declaration]]...
Each declaration has the following syntax:
[[lang]]name:type
The optional lang field contains a language specifier that overrides the
language specified by the .MODEL directive. With this statement, the
language specifier determines naming conventions for the variable that it
precedes. The specifier can be C, FORTRAN, Pascal, or BASIC. The C naming
convention prefixes each variable with an underscore (_); the other
conventions do not.
Note that using the C type specifier does not preserve case. You must
choose one of the assembler flags or options that preserve case.
The EXTRN directive defines an external variable, label, or symbol of the
specified name and type. The type must match the type given to the item in
its actual definition in some other module. It can be any one of the
following:
Description Types
──────────────────────────────────────────────────────────────────────────
Distance specifier NEAR, FAR, or PROC
Size specifier BYTE, WORD, DWORD, QWORD, or TBYTE
Absolute ABS
The ABS type is for symbols that represent constant numbers, such as
equates declared with the EQU and = directives (see Section 11.1, "Using
Equates").
The PROC type represents the default type for a procedure. For programs
that use simplified segment directives, the type of an external symbol
declared with PROC will be NEAR for small or compact model, or FAR for
medium, large, or huge model. Section 5.1.3, "Defining Basic Attributes
of the Model," tells you how to declare the memory model using the .MODEL
directive. If full segment definitions are used, the default type
represented by PROC is always NEAR.
Although the actual address of an external symbol is not determined until
link time, the assembler assumes a default segment for the item, based on
where the EXTRN directive is placed in the source code. Placement of EXTRN
directives should follow these rules:
■ NEAR code labels (such as procedures) must be declared in the code
segment from which they are accessed.
■ FAR code labels can be declared anywhere in the source code. It may be
convenient to declare them in the code segment from which they are
accessed if the label may be FAR in one context or NEAR in another.
■ Data must be declared in the segment in which it occurs. This may
require that you define a dummy data segment for the external
declaration.
■ Absolute symbols can be declared anywhere in the source code.
Example 1
EXTRN max:ABS,act:FAR ; Constant or FAR label anywhere
DOSSEG
.MODEL small,c
.STACK 100h
.DATA
EXTRN nvar:BYTE ; NEAR variable in near data
.FARDATA
EXTRN fvar:WORD ; FAR variable in far data
.CODE
.STARTUP
EXTRN task:PROC ; PROC or NEAR in near code
ASSUME es:SEG fvar ; Tell assembler
mov ax,SEG fvar ; Tell processor that ES
mov es,ax ; has far data segment
.
.
.
mov ah,nvar ; Load external NEAR variable
mov bx,fvar ; Load external FAR variable
mov cx,max ; Load external constant
call task ; Call procedure (NEAR or FAR)
jmp act ; Jump to FAR label
END
The example above shows how each type of external symbol could be declared
and used in a small-model program that uses simplified segment directives.
Notice the use of the PROC type specifier to make the external-procedure
memory model independent. The jump and its external declaration are
written so that they will be FAR regardless of the memory model. Using
these techniques, you can change the memory model without breaking code.
Example 2
EXTRN max:ABS,act:FAR ; Constant or FAR label anywhere
STACK SEGMENT PARA STACK 'STACK'
DB 100h DUP (?)
STACK ENDS
_DATA SEGMENT WORD PUBLIC 'DATA'
EXTRN nvar:BYTE ; NEAR variable in near data
_DATA ENDS
FAR_DATA SEGMENT PARA 'FAR_DATA'
EXTRN fvar:WORD ; FAR variable in far data
FAR_DATA ENDS
DGROUP GROUP _DATA,STACK
_TEXT SEGMENT BYTE PUBLIC 'CODE'
EXTRN task:NEAR ; NEAR procedure in near code
ASSUME cs:_TEXT,ds:DGROUP,ss:STACK
start: mov ax,DGROUP ; Load segment
mov ds,ax ; into DS
ASSUME es:SEG fvar ; Tell assembler
mov ax,SEG fvar ; Tell processor that ES
mov es,ax ; has far data segment
.
.
.
mov ah,nvar ; Load external NEAR variable
mov bx,fvar ; Load external FAR variable
mov cx,max ; Load external constant
call task ; Call NEAR procedure
jmp act ; Jump to FAR label
_TEXT ENDS
END start
Example 2 shows a fragment similar to the one in Example 1, but with full
segment definitions. Notice that the types of code labels must be declared
specifically. If you wanted to change the memory model, you would have to
specifically change each external declaration and each call or jump.
8.3 Using Multiple Modules
The following source files illustrate a program that uses public and
external declarations to access instruction labels. The program consists
of two modules called hello and display.
The hello module is the program's initializing module. Execution starts at
the instruction labeled start in the hello module. After initializing the
data segment, the program calls the procedure display in the display
module, where a DOS call is used to display a message on the screen.
Execution then returns to the address after the call in the hello module.
The hello module is shown below:
TITLE hello
DOSSEG
.MODEL small,c
.STACK 256
.DATA
PUBLIC message, lmessage
message DB "Hello, world.",13,10
lmessage EQU $ - message
.CODE
EXTRN display:PROC ; Declare in near code segment
.STARTUP
call display ; Call other module
mov ax,04C00h ; Terminate with exit code 0
int 21h ; Call DOS
END
The display module is shown below:
TITLE display
EXTRN lmessage:ABS ; Declare anywhere
.MODEL small
.DATA
EXTRN message:BYTE ; Declare in near data segment
.CODE
display PROC
mov bx,1 ; File handle for standard output
mov cx,lmessage ; Message length
mov dx,OFFSET message ; Message address
mov ah,40h ; Write function
int 21h ; Call DOS
ret
display ENDP
END
The sample program is a variation of the HELLO.ASM program used in the
examples in Chapter 4, "Writing Stand-Alone Assembly Programs," except
that it uses an external procedure to display to the screen. Notice that
all symbols defined in one module but used in another are declared PUBLIC
in the defining module and declared EXTRN in the using module.
For instance, message and lmessage are declared PUBLIC in the program
HELLO.ASM and declared EXTRN in DISPLAY.ASM. The procedure display is
declared EXTRN in HELLO.ASM. The symbol display is automatically public in
the simplified segment version, but you would have to specifically declare
it PUBLIC if you used full segments.
To create an executable file for these modules, you can add both files to
the environment's Program List dialog box. You can also assemble the
modules with the following command line:
QCL hello.asm display.asm
The output is placed in the executable file HELLO.EXE.
For each source module, QuickAssembler writes a module name to the object
file. The module name is used by some debuggers and by the linker when it
displays error messages. With QuickAssembler, the module name is always
the base name of the source module file.
For compatibility, QuickAssembler recognizes the NAME directive. However,
NAME has no effect. Arguments to the directive are ignored.
8.4 Declaring Symbols Communal
Communal variables are uninitialized variables that are both public and
external. They are often declared in include files.
If a variable must be used by several assembly routines, you can declare
the variable communal in an include file, and then include the file in
each of the assembly routines. Although the variable is declared in each
source module, it exists at only one address. Using a communal variable in
an include file and including it in several source modules is an
alternative to defining the variable and declaring it public in one source
module and then declaring it external in other modules.
If a variable is declared communal in one module and public in another,
the public declaration takes precedence and the communal declaration has
the same effect as an external declaration.
Syntax
COMM definition[[,definition]]...
Each definition has the following syntax:
[[NEAR | FAR]] [[lang]] label:size[[:count]]
A communal variable can be NEAR or FAR. If neither is specified, the type
will be that of the default memory model. If you use simplified segment
directives, the default type is NEAR for small and medium models, or FAR
for compact, large, and huge models. If you use full segment definitions,
the default type is NEAR.
The optional lang field can be C, BASIC, FORTRAN, or Pascal. The use of
the C keyword turns on the C naming convention──the assembler prefixes the
name of the variable with an underscore (_). The use of any of the other
language types turns off the C naming convention, even if you specified C
with the .MODEL directive. Note that the use of C does not preserve case.
You must choose one of the assembler flags or options that preserve case.
The label is the name of the variable. The size can be BYTE, WORD, DWORD,
QWORD, TBYTE, or the name of a structure. The count is the number of
elements. If no count is given, one element is assumed. Multiple variables
can be defined with one COMM statement by separating each definition with
a comma.
──────────────────────────────────────────────────────────────────────────
NOTE C variables declared outside functions (except static variables) are
communal unless explicitly initialized; they are the same as
assembly-language communal variables. If you are writing assembly-language
modules for C, you can declare the same communal variables in C include
files and in QuickAssembler include files.
──────────────────────────────────────────────────────────────────────────
QuickAssembler cannot tell whether a communal variable has been used in
another module. Allocation of communal variables is handled by LINK. As a
result, communal variables have the following limitations that other
variables declared in assembly language do not have:
■ Communal variables cannot be initialized. Under DOS, initial values are
not guaranteed to be 0 or any other value. The variables can be used
for data, such as file buffers, that is not given a value until run
time.
■ Communal variables are not guaranteed to be allocated in the sequence
in which they are declared. Assembly-language techniques that depend on
the sequence and position in which data is defined should not be used
with communal variables. For example, the following statements do not
work:
COMM wbuffer:WORD:128
lbuffer EQU $ - buffer ; "lbuffer" won't have desired val
bbuffer LABEL BYTE ; "bbuffer" won't have desired add
COMM wbuffer:WORD:40
■ If a communal variable references a variable that is allocated and
declared public inside a module, the variable has the segment of the
allocated instance. If all references to the variable are communal, the
variable will be placed in one of the segments described below.
Near communal variables are placed in a segment called c_common, which
is part of DGROUP. This group is created and initialized automatically
if you use simplified segment directives. If you use full segment
definitions, you must create a group called DGROUP and use the ASSUME
directive to associate it with the DS register.
Far communal variables are placed in a segment called FAR_BSS. This
segment has combine type private and class type 'FAR_BSS'. This means
that multiple segments with the same name can be created. Such segments
cannot be accessed by name. They must be initialized indirectly using
the SEG operator. For example, if a far communal variable (with word
size) is called comvar, its segment can be initialized with the
following lines:
ASSUME ds:SEG comvar ; Tell the assembler
mov ax,SEG comvar ; Tell the processor
mov ds,ax
mov bx,comvar ; Use the variable
Example 1
.DATA
COMM temp:BYTE:128
ASCIIZ MACRO address ;; Name of address for string
mov temp,128 ;; Insert maximum length
mov dx,OFFSET temp ;; Address of string buffer
mov ah,0Ah ;; Get string
int 21h
mov dl,temp[1] ;; Get length of string
xor dh,dh
mov bx,dx
mov temp[bx+2],0 ;; Overwrite CR with null
address EQU OFFSET temp+2
ENDM
Example 1 shows an include file that declares a buffer for temporary data.
The buffer is then used in a macro in the same include file. An example of
how the macro could be used in a source file is shown below:
DOSSEG
.MODEL small,c
INCLUDE communal.inc
.STACK
.DATA
message DB "Enter file name: $"
.CODE
.STARTUP
.
.
.
mov dx,OFFSET message ; Load offset of file prompt
mov ah,09h ; Display prompt
int 21h
ASCIIZ place ; Get file name and
; return address as "place"
mov al,00000010b ; Load access code
mov dx,place ; Load address of ASCIIZ string
mov ah,3Dh ; Open the file
int 21h
.
.
.
Note that once the macro is written, the user does not need to know the
name of the temporary buffer or how it is used in the macro.
Example 2
date STRUC
month DB ?
day DB ?
year DB ?
date ENDS
.DATA
COMM today:date
.
.
.
The example above uses the COMM directive to make the structure variable
today a communal variable.
8.5 Specifying Library Files
The INCLUDELIB directive instructs the linker to link with a specified
library file. If you are writing a program that calls library routines,
you can use this directive to specify the library file in the assembly
source file rather than in the LINK command line.
Syntax
INCLUDELIB libraryname
The libraryname is written to the comment record of the object file. The
Intel title for this record is COMENT. At link time, the linker reads this
record and links with the specified library file.
The libraryname must be a file name rather than a complete file
specification. If you do not specify an extension, the default extension
.LIB is assumed. LINK searches directories for the library file in the
following order:
1. The current directory
2. Any directories given in the library field of the LINK command line
3. Any directories listed in the LIB environment variable
Example
INCLUDELIB graphics
This statement passes a message from QuickAssembler telling LINK to use
library routines from the file GRAPHICS.LIB. If this statement is included
in a source file called DRAW.ASM, the program might be linked with the
following command line:
LINK draw;
Without the INCLUDELIB directive, the program would have to be linked with
the following command line:
LINK draw,,,graphics;
────────────────────────────────────────────────────────────────────────────
Chapter 9: Using Operands and Expressions
Operands are the arguments that define values to be acted on by
instructions or directives. Operands can be constants, variables,
expressions, or keywords, depending on the instruction or directive and
the context of the statement.
A common type of operand is an expression. An expression consists of
several operands that are combined to describe a value or memory location.
Operators indicate the operations to be performed when combining the
operands of an expression.
Expressions are evaluated at assembly time. By using expressions, you can
instruct the assembler to calculate values that would be difficult or
inconvenient to calculate when you are writing source code.
This chapter discusses operands, expressions, and operators as they are
evaluated at assembly time. See Section 2.6, "Addressing Modes," for a
discussion of the addressing modes that can be used to calculate operand
values at run time. This chapter also discusses the location-counter
operand, forward references, and strong typing of operands.
9.1 Using Operands with Directives
Each directive requires a specific type of operand. Most directives take
string or numeric constants, or symbols or expressions that evaluate to
such constants.
The type of operand varies for each directive, but the operand must always
evaluate to a value that is known at assembly time. This differs from
instructions, whose operands may not be known at assembly time and may
vary at run time. Operands used with instructions are discussed in Section
2.6, "Addressing Modes."
Some directives, such as those used in data declarations, accept labels or
variables as operands. When a symbol that refers to a memory location is
used as an operand to a directive, the symbol represents the address of
the symbol rather than its contents. This is because the contents may
change at run time and are therefore not known at assembly time.
Example 1
ORG 100h ; Set address to 100h
var DB 10h ; Address of "var" is 100h
; Value of "var" is 10h
pvar DW var ; Address of "pvar" is 101h
; Value of "pvar" is
; address of "var" (100h)
In Example 1, the operand of the DW directive in the third statement
represents the address of var (100h) rather than its contents (10h). The
address is relative to the start of the segment in which var is defined.
Example 2
TITLE doit ; String
_TEXT SEGMENT BYTE PUBLIC 'CODE' ; Key words
INCLUDE \include\bios.inc ; Pathname
.RADIX 16 ; Numeric constant
tst DW a / b ; Numeric expression
PAGE + ; Special character
sum EQU x * y ; Numeric expression
here LABEL WORD ; Type specifier
Example 2 illustrates the different kinds of values that can be used as
directive operands.
9.2 Using Operators
The assembler provides a variety of operators for combining, comparing,
changing, or analyzing operands. Some operators work with integer
constants, some with memory values, and some with both. Operators cannot
be used with floating-point constants since QuickAssembler does not
recognize real numbers in expressions.
It is important to understand the difference between operators and
instructions. Operators handle calculations of constant values that are
known at assembly time. Instructions handle calculations of values that
may not be known until run time. For example, the addition operator (+)
handles assembly-time addition, while the ADD and ADC instructions handle
run-time addition.
This section describes the different kinds of operators used in
assembly-language statements and gives examples of expressions formed with
them. In addition to the operators described in this chapter, you can use
the DUP operator (Section 6.5.2, "Arrays and Buffers"), the record
operators (Section 7.2.4, "Record Operators"), and the macro operators
(Section 11.4, "Using Macro Operators").
9.2.1 Calculation Operators
QuickAssembler provides the common arithmetic operators as well as several
other operators for adding, shifting, or doing bit manipulations. The
sections below describe operators that can be used for doing numeric
calculations.
9.2.1.1 Arithmetic Operators
QuickAssembler recognizes a variety of arithmetic operators for common
mathematical operations. Table 9.1 lists the arithmetic operators.
Table 9.1 Arithmetic Operators
Operator Syntax Meaning
──────────────────────────────────────────────────────────────────────────
+ +expression Positive (unary)
- -expression Negative (unary)
* expression1 * expression2 Multiplication
/ expression1 / expression2 Integer division
MOD expression1 MOD expression2 Remainder (modulus)
+ expression1 + expression2 Addition
- expression1 - expression2 Subtraction
──────────────────────────────────────────────────────────────────────────
For all arithmetic operators except the addition operator (+) and the
subtraction operator (-), the expressions operated on must be integer
constants.
The addition and subtraction operators can be used to add or subtract an
integer constant and a memory operand. The result can be used as a memory
operand.
The subtraction operator can also be used to subtract one memory operand
from another, but only if the operands refer to locations within the same
segment. The result will be a constant, not a memory operand.
──────────────────────────────────────────────────────────────────────────
NOTE The unary plus and minus operators (used to designate positive or
negative numbers) are not the same as the binary plus and minus operators
(used to designate addition or subtraction). The unary plus and minus
operators have a higher level of precedence, as described in Section
9.2.5, "Operator Precedence."
──────────────────────────────────────────────────────────────────────────
Example 1
intgr = 14 * 3 ; = 42
intgr = intgr / 4 ; 42 / 4 = 10
intgr = intgr MOD 4 ; 10 mod 4 = 2
intgr = intgr + 4 ; 2 + 4 = 6
intgr = intgr - 3 ; 6 - 3 = 3
intgr = -intgr - 8 ; -3 - 8 = -11
intgr = -intgr - intgr ; 11 - -11 = 22
Example 1 illustrates arithmetic operators used in integer expressions.
Example 2
ORG 100h
a DB ? ; Address is 100h
b DB ? ; Address is 101h
mem1 EQU a + 5 ; mem1 = 100h + 5 = 105h
mem2 EQU a - 5 ; mem2 = 100h - 5 = 0FBh
const EQU b - a ; const = 101h - 100h = 1
Example 2 illustrates arithmetic operators used in memory expressions.
9.2.1.2 Structure-Field-Name Operator
The structure-field-name operator (.) indicates addition. It is used to
designate a field within a structure.
Syntax
variable.field
The variable is a memory operand (usually a previously declared structure
variable), and field is the name of a field within the structure. See
Section 7.1, "Structures," for more information.
Example
.DATA
date STRUC ; Declare structure
month DB ?
day DB ?
year DW ?
date ENDS
yesterday date <12,31,1987> ; Define structure variables
today date <1,1,1988>
.CODE
.
.
.
mov bh,yesterday.day ; Load structure variable
mov bx,OFFSET today ; Load structure variable address
inc [bx].year ; Use in indirect memory operand
9.2.1.3 Index Operator
The index operator ([ ]) indicates addition. It is similar to the addition
(+) operator. When used with a register, the index operator also indicates
that the operand is an indirect memory operand rather than a
register-direct operand.
Syntax
[[expression1]][expression2]
In most cases expression1 is simply added to expression2. The limitations
of the addition operator for adding memory operands also apply to the
index operator. For example, two direct memory operands cannot be added.
The expression label1[label2] is illegal if both are memory operands.
The index operator has an extended function in specifying indirect memory
operands. Section 2.6.4 explains the use of indirect memory operands. The
index brackets must be outside the register or registers that specify the
indirect displacement. However, any of the three operators that indicate
addition (the addition operator, the index operator, or the
structure-field-name operator) may be used for multiple additions within
the expression.
For example, the following statements are equivalent:
mov ax,table[bx][di]
mov ax,table[bx+di]
mov ax,[table+bx+di]
mov ax,[table][bx][di]
The following statements are illegal because the index operator does not
enclose the registers that specify indirect displacement:
mov ax,table+bx+di ; Illegal - no index operator
mov ax,[table]+bx+di ; Illegal - registers not
; inside index operator
The index operator is typically used to index elements of a data object,
such as variables in an array or characters in a string.
Example 1
mov al,string[3] ; Get 4th element of string
add ax,array[4] ; Add 5th element of array
mov string[7],al ; Load into 8th element of string
Example 1 illustrates the index operator used with direct memory operands.
Example 2
mov ax,[bx] ; Get element BX points to
add ax,array[si] ; Add element SI points to
mov string[di],al ; Load element DI points to
cmp cx,table[bx][di] ; Compare to element BX and DI point
Example 2 illustrates the index operator used with indirect memory
operands.
9.2.1.4 Shift Operators
The SHR and SHL operators can be used to shift bits in constant values.
Both perform logical shifts. Bits on the right for SHL and on the left for
SHR are zero-filled as their contents are shifted out of position.
Syntax
expression SHR count
expression SHL count
The expression is shifted right or left by count number of bits. Bits
shifted off either end of the expression are lost. If count is greater
than or equal to 16, the result is 0.
Do not confuse the SHR and SHL operators with the processor instructions
having the same names. The operators work on integer constants only at
assembly time. The processor instructions work on register or memory
values at run time. The assembler can tell the difference between
instructions and operands from context.
Examples
mov ax,01110111b SHL 3 ; Load 01110111000b
mov ah,01110111b SHR 3 ; Load 01110b
9.2.1.5 Bitwise Logical Operators
The bitwise operators perform logical operations on each bit of an
expression. The expressions must resolve to constant values. Table 9.2
lists the logical operators and their meanings.
Table 9.2 Logical Operators
Operator Syntax Meaning
──────────────────────────────────────────────────────────────────────────
NOT NOT expression Bitwise complement
AND expression1 AND expression2 Bitwise AND
OR expression1 OR expression2 Bitwise inclusive OR
XOR expression1 XOR expression2 Bitwise exclusive XOR
──────────────────────────────────────────────────────────────────────────
Do not confuse the NOT, AND, OR, and XOR operators with the processor
instructions having the same names. The operators work on integer
constants only at assembly time. The processor instructions work on
register or memory values at run time. The assembler can tell the
difference between instructions and operands from context.
──────────────────────────────────────────────────────────────────────────
NOTE Although calculations on expressions using the AND, OR, and XOR
operators are done using 17-bit numbers, the results are truncated to 16
bits.
──────────────────────────────────────────────────────────────────────────
Examples
mov ax,NOT 11110000b ; Load 1111111100001111b
mov ah,NOT 11110000b ; Load 00001111b
mov ah,01010101b AND 11110000b ; Load 01010000b
mov ah,01010101b OR 11110000b ; Load 11110101b
mov ah,01010101b XOR 11110000b ; Load 10100101b
9.2.2 Relational Operators
The relational operators compare two expressions and return true (-1) if
the condition specified by the operator is satisfied, or false (0) if it
is not. The expressions must resolve to constant values. Relational
operators are typically used with conditional directives. Table 9.3 lists
the operators and the values they return if the specified condition is
satisfied.
Table 9.3 Relational Operators
Operator Syntax Returned Value
──────────────────────────────────────────────────────────────────────────
EQ expression1 EQ expression2 True if expressions are equal
NE expression1 NE expression2 True if expressions are not
equal
LT expression1 LT expression2 True if left expression is
less than right
LE expression1 LE expression2 True if left expression is
less than or equal to right
GT expression1 GT expression2 True if left expression is
greater than right
GE expression1 GE expression2 True if left expression is
greater than or equal to right
──────────────────────────────────────────────────────────────────────────
Note that the EQ and NE operators treat their arguments as 16-bit numbers.
Numbers specified with the 16th bit set are considered negative. For
example, the expression -1 EQ OFFFFh is true, but the expression -1 NE
OFFFFh is false.
The LT, LE, GT, and GE operators treat their arguments as 17-bit numbers,
in which the 17th bit specifies the sign. For example, OFFFFh is 65,535,
not -1. The expression 1 GT -1 is true, but the expression 1 GT OFFFFh is
false.
Examples
mov ax,4 EQ 3 ; Load false( 0)
mov ax,4 NE 3 ; Load true (-1)
mov ax,4 LT 3 ; Load false( 0)
mov ax,4 LE 3 ; Load false( 0)
mov ax,4 GT 3 ; Load true (-1)
mov ax,4 GE 3 ; Load true(-1)
9.2.3 Segment-Override Operator
The segment-override operator (:) forces the address of a variable or
label to be computed relative to a specific segment.
Syntax
segment:expression
The segment can be specified in several ways. It can be one of the segment
registers: CS, DS, SS, or ES. It can also be a segment or group name. In
this case, the name must have been previously defined with a SEGMENT or
GROUP directive and assigned to a segment register with an ASSUME
directive. The expression can be a constant, expression, or a SEG
expression. See Section 9.2.4.5 for more information on the SEG operator.
Note that when a segment override is given with an indexed operand, the
segment must be specified outside the index operators. For example,
es:[di] is correct, but [es:di] generates an error.
Examples
mov ax,ss:[bx+4] ; Override default assume (DS)
mov al,es:082h ; Load from ES
ASSUME ds:FAR_DATA ; Tell the assembler and
mov bx,FAR_DATA:count ; load from a far segment
As shown in the last two statements, a segment override with a segment
name is not enough if no segment register is assumed for the segment name.
You must use the ASSUME directive to assign a segment register, as
explained in Section 5.4, "Associating Segments with Registers."
9.2.4 Type Operators
This section describes the assembler operators that specify or analyze the
types of memory operands and other expressions.
9.2.4.1 PTR Operator
The PTR operator specifies the type for a variable or label.
Syntax
type PTR expression
The operator forces expression to be treated as having type. The
expression can be any operand. The type can be BYTE, WORD, DWORD, QWORD,
or TBYTE for memory operands. It can be NEAR, FAR, or PROC for labels.
The PTR operator is typically used with forward references to define
explicitly what size or distance a reference has. If it is not used, the
assembler assumes a default size or distance for the reference. See
Section 9.4 for more information on forward references.
The PTR operator is also used to enable instructions to access variables
in ways that would otherwise generate errors. For example, you could use
the PTR operator to access the high-order byte of a WORD size variable.
The PTR operator is required for FAR calls and jumps to forward-referenced
labels.
Example 1
.DATA
stuff DD ?
buffer DB 20 DUP (?)
.CODE
.
.
.
call FAR PTR task ; Call a far procedure
jmp FAR PTR place ; Jump far
mov bx,WORD PTR stuff[0] ; Load a word from a
; doubleword variable
add ax,WORD PTR buffer[bx] ; Add a word from a byte variab
The PTR operator can be used to specify the size of an indirect register
operand for a CALL or JMP instruction. However, the size cannot be
specified with NEAR or FAR. Use WORD or DWORD instead. Examples are shown
below:
Example 2
jmp WORD PTR [bx] ; Legal near jump
call NEAR PTR [bx] ; Illegal near call
call DWORD PTR [bx] ; Legal far call
jmp FAR PTR [bx] ; Illegal far jump
This limitation only applies to indirect register operands. NEAR or FAR
can be applied to operands associated with labels, as shown in Example 1.
Furthermore, use NEAR or FAR with an indirect operand that combines a
register with a label.
9.2.4.2 SHORT Operator
The SHORT operator sets the type of a specified label to SHORT. Short
labels can be used in JMP instructions whenever the distance from the
label to the instruction is less than 128 bytes.
Syntax
SHORT label
Instructions using short labels are a byte smaller than identical
instructions using the default near labels. See Section 9.4.1, "Forward
References to Labels," for information on using the SHORT operator with
jump instructions.
Example
jmp again ; Jump 128 bytes or more
.
.
.
jmp SHORT again ; Jump less than 128 bytes
.
.
.
again:
9.2.4.3 THIS Operator
The THIS operator creates an operand whose offset and segment values are
equal to the current location-counter value and whose type is specified by
the operator.
Syntax
THIS type
The type can be BYTE, WORD, DWORD, QWORD, or TBYTE for memory operands. It
can be NEAR, FAR, or PROC for labels.
The THIS operator is typically used with the EQU or equal-sign (=)
directive to create labels and variables. The result is similar to using
the LABEL directive.
Example
tag1 EQU THIS BYTE ; Both represent the same variable
tag2 LABEL BYTE
check1 EQU THIS NEAR ; All represent the same address
check2 LABEL NEAR
check3:
check4 PROC NEAR
check4 ENDP
9.2.4.4 HIGH and LOW Operators
The HIGH and LOW operators return the high and low bytes, respectively, of
an expression.
Syntax
HIGH expression
LOW expression
The HIGH operator returns the high-order eight bits of expression; the LOW
operator returns the low-order eight bits. The expression must evaluate to
a constant. You cannot use the HIGH and LOW operators on the contents of a
memory operand since the contents may change at run time.
Examples
stuff EQU 0ABCDh
mov ah,HIGH stuff ; Load 0ABh
mov al,LOW stuff ; Load 0CDh
The HIGH and LOW operators work reliably only with constants and with
offsets to external symbols. HIGH and LOW operations are not supported for
offsets to local symbols.
9.2.4.5 SEG Operator
The SEG operator returns the segment address of an expression.
Syntax
SEG expression
The expression can be any label, variable, segment name, group name, or
other memory operand. The SEG operator cannot be used with constant
expressions. The returned value can be used as a memory operand.
Example
.DATA
var DB ?
.CODE
.
.
.
ASSUME ds:SEG var ; Assume segment of variable
mov ax,SEG var ; Get address of segment
mov ds,ax ; where variable is declared
9.2.4.6 OFFSET Operator
The OFFSET operator returns the offset address of an expression.
Syntax
OFFSET expression
The expression can be any label, variable, or other direct memory operand.
Constant expressions return meaningless values. The value returned by the
OFFSET operand is an immediate (constant) operand.
If the MODEL directive is used, the value returned by the OFFSET operator
is relative to a group, whenever the data item is declared in a segment
that is part of a group. The OFFSET operator returns the number of bytes
between the beginning of the group and the data object. If the object is
declared in a segment not part of a group, OFFSET returns the number of
bytes between the beginning of the segment and the data object.
If the MODEL directive is not used, OFFSET returns a value relative to the
beginning of the segment, regardless of whether the segment is part of a
group.
If full segment definitions are given, the returned value is a memory
operand equal to the number of bytes between the item and the beginning of
the segment in which it is defined.
The segment-override operator (:) can be used to force OFFSET to return
the number of bytes between the item in expression and the beginning of a
named segment or group. This is the method used to generate valid offsets
for items in a group when full segment definitions are used. For example,
the statement
mov bx,OFFSET DGROUP:array
is not the same as
mov bx,OFFSET array
if array is not in the first segment in DGROUP.
Example
.DATA
string DB "This is it."
.CODE
.
.
.
mov dx,OFFSET string ; Load offset of variable
9.2.4.7 .TYPE Operator
The .TYPE operator returns a byte that defines the mode and scope of an
expression.
Syntax
.TYPE expression
If expression is not valid, .TYPE returns 0. Otherwise, .TYPE returns a
byte having the bit setting shown in Table 9.4. The .TYPE operator sets
all bits except bit 6. Future versions of the assembler may reserve a use
for this bit.
Table 9.4 .TYPE Operator and Variable Attributes
Bit Position If Bit = 0 If Bit = 1
──────────────────────────────────────────────────────────────────────────
0 Not program related Program related
1 Not data related Data related
2 Not a constant value Constant value
3 Addressing mode is not direct Addressing mode is direct
4 Not a register Expression is a register
5 Not defined Defined
7 Local or public scope External scope
──────────────────────────────────────────────────────────────────────────
The .TYPE operator is typically used in macros in which different kinds of
arguments may need to be handled differently.
Example
display MACRO string
IF ((.TYPE string) SHL 14) NE 8000h
IF2
%OUT Argument must be a variable
ENDIF
ENDIF
mov dx,OFFSET string
mov ah,09h
int 21h
ENDM
This macro checks to see if the argument passed to it is data related (a
variable). It does this by shifting all bits except the relevant bits (1
and 0) to the left so that they can be checked. If the data bit is not
set, an error message is generated.
9.2.4.8 TYPE Operator
The TYPE operator returns a number that represents the type of an
expression.
Syntax
TYPE expression
If expression evaluates to a variable, the operator returns the number of
bytes in each data object in the variable. Each byte in a string is
considered a separate data object, so the TYPE operator returns 1 for
strings.
If expression evaluates to a structure or structure variable, the operator
returns the number of bytes in the structure. If the expression is a
label, the operator returns 0FFFFH for NEAR labels and 0FFFEH for FAR
labels. If the expression is a constant, the operator returns 0.
9.2.4.9 LENGTH Operator
The LENGTH operator returns the number of data elements in an array or
other variable defined with the DUP operator.
Syntax
LENGTH variable
The returned value is the number of elements of the declared size in
variable. If the variable was declared with nested DUP operators, only the
value given for the outer DUP operator is returned. If the variable was
not declared with the DUP operator, the value returned is always 1.
Example
array DD 100 DUP(0FFFFFFh)
table DW 100 DUP(1,10 DUP(?))
string DB 'This is a string'
var DT ?
larray EQU LENGTH array ; 100 - number of elements
ltable EQU LENGTH table ; 100 - inner DUP not counted
lstring EQU LENGTH string ; 1 - string is one element
lvar EQU LENGTH var ; 1
.
.
.
mov cx,LENGTH array ; Load number of elements
again: . ; Perform some operation on
. ; each element
.
loop again
9.2.4.10 SIZE Operator
The SIZE operator returns the total number of bytes allocated for an array
or other variable defined with the DUP operator.
Syntax
SIZE variable
The returned value is equal to the value of LENGTH variable times the
value of TYPE variable. If the variable was declared with nested DUP
operators, only the value given for the outside DUP operator is
considered. If the variable was not declared with the DUP operator, the
value returned is always TYPE variable.
Example
array DD 100 DUP(1)
table DW 100 DUP(1,10 DUP(?))
string DB 'This is a string'
var DT ?
sarray EQU SIZE array ; 400 - elements times size
stable EQU SIZE table ; 200 - inner DUP ignored
sstring EQU SIZE string ; 1 - string is one element
svar EQU SIZE var ; 10 - bytes in variable
.
.
.
mov cx,SIZE array ; Load number of bytes
again: . ; Perform some operation on
. ; each byte
.
loop again
9.2.5 Operator Precedence
Expressions are evaluated according to the following rules:
■ Operations of highest precedence are performed first.
■ Operations of equal precedence are performed from left to right.
■ The order of evaluation can be overridden by using parentheses.
Operations in parentheses are always performed before any adjacent
operations.
The order of precedence for all operators is listed in Table 9.5.
Operators on the same line have equal precedence.
Table 9.5 Operator Precedence
Precedence Operators
──────────────────────────────────────────────────────────────────────────
(Highest)
1 LENGTH, SIZE, WIDTH, MASK, (), [],<>
2 . (structure-field-name operator)
3 :
4 PTR, OFFSET, SEG, TYPE, THIS
5 HIGH, LOW
6 +,- (unary)
7 *,/, MOD, SHL, SHR
8 +, - (binary)
9 EQ, NE, LT, LE, GT, GE
10 NOT
11 AND
12 OR, XOR
13 SHORT, .TYPE
(Lowest)
──────────────────────────────────────────────────────────────────────────
Examples
a EQU 8 / 4 * 2 ; Equals 4
b EQU 8 / (4 * 2) ; Equals 1
c EQU 8 + 4 * 2 ; Equals 16
d EQU (8 + 4) * 2 ; Equals 24
e EQU 8 OR 4 AND 2 ; Equals 8
f EQU (8 OR 4) AND 3 ; Equals 0
9.3 Using the Location Counter
The location counter is a special operand that, during assembly,
represents the address of the statement currently being assembled. At
assembly time, the location counter keeps changing, but when used in
source code, it resolves to a constant representing an address.
The location counter has the same attributes as a near label. It
represents an offset that is relative to the current segment and is equal
to the number of bytes generated for the segment to that point.
Example 1
string DB "Who wants to count every byte in a string, "
DB "especially if you might change it later."
lstring EQU $-string ; Let the assembler do it
Example 1 shows one way of using the location-counter operand in
expressions relating to data.
Example 2
cmp ax,bx
jl shortjump ; If ax < bx, go to "shortjump"
. ; else if ax >= bx, continue
.
shortjump: .
cmp ax,bx
jge $+5 ; If ax >= bx, continue
jmp longjump ; else if ax < bx, go to "longjump"
. ; This is "$+5"
.
longjump: .
Example 2 illustrates how you can use the location counter to do
conditional jumps of more than 128 bytes. The first part shows the normal
way of coding jumps of less than 128 bytes, and the second part shows how
to code the same jump when the label is more than 128 bytes away.
9.4 Using Forward References
The assembler permits you to refer to labels, variable names, segment
names, and other symbols before they are declared in the source code. Such
references are called forward references.
The assembler handles forward references by making assumptions about them
on the first pass and then attempting to correct the assumptions, if
necessary, on the second pass. Checking and correcting assumptions on the
second pass takes processing time, so source code with forward references
assembles more slowly than source code with no forward references.
In addition, the assembler may make incorrect assumptions that it cannot
correct, or corrects at a cost in program efficiency.
9.4.1 Forward References to Labels
Forward references to labels may result in incorrect or inefficient code.
In the statement below, the label target is a forward reference:
jmp target ; Generates 3 bytes in 16-bit segmen
.
.
.
target:
Since the assembler processes source files sequentially, target is unknown
when it is first encountered. It could be one of three types: short (-128
to 127 bytes from the jump), near (-32,768 to 32,767 bytes from the jump),
or far (in a different segment than the jump). QuickAssembler assumes that
target is a near label, and assembles the number of bytes necessary to
specify a near label: one byte for the instruction and two bytes for the
operand.
If, on the second pass, the assembler learns that target is a short label,
it will need only two bytes: one for the instruction and one for the
operand. However, it will not be able to change its previous assembly and
the three-byte version of the assembly will stand. If the assembler learns
that target is a far label, it will need five bytes. Since it can't make
this adjustment, it will generate a phase error.
You can override the assembler's assumptions by specifying the exact size
of the jump. For example, if you know that a JMP instruction refers to a
label less than 128 bytes from the jump, you can use the SHORT operator,
as shown below:
jmp SHORT target ; Generates 2 bytes
.
.
.
target:
Using the SHORT operator makes the code smaller and slightly faster. If
the assembler has to use the three-byte form when the two-byte form would
be acceptable, it will generate a warning message if the warning level is
2. (The warning level can be set with the /W option, as described in
Appendix B, Section B.16.) You can ignore the warning, or you can go
back to the source code and change the code to eliminate the forward
references.
──────────────────────────────────────────────────────────────────────────
NOTE The SHORT operator in the example above would not be needed if
target were located before the jump. The assembler would have already
processed target and would be able to make adjustments based on its
distance.
──────────────────────────────────────────────────────────────────────────
If you use the SHORT operator when the label being jumped to is more than
128 bytes away, QuickAssembler generates an error message. You can either
remove the SHORT operator, or try to reorganize your program to reduce the
distance.
If a far jump to a forward-referenced label is required, you must override
the assembler's assumptions with the FAR and PTR operators, as shown
below:
jmp FAR PTR target ; Generates 5 bytes
.
.
.
target: ; In different segment
If the type of a label has been established earlier in the source code
with an EXTRN directive, the type does not need to be specified in the
jump statement.
9.4.2 Forward References to Variables
When QuickAssembler encounters code referencing variables that have not
yet been defined in pass 1, it makes assumptions about the segment where
the variable will be defined. If on pass 2 the assumptions turn out to be
wrong, an error will occur.
These problems usually occur with complex segment structures that do not
follow the Microsoft segment conventions. The problems never appear if
simplified segment directives are used.
By default, QuickAssembler assumes that variables are referenced to the DS
register. If a statement must access a variable in a segment not
associated with the DS register, and if the variable has not been defined
earlier in the source code, you must use the segment-override operator to
specify the segment.
The situation is different if neither the variable nor the segment in
which it is defined has been defined earlier in the source code. In this
case, you must assign the segment to a group earlier in the source code.
QuickAssembler will then know about the existence of the segment even
though it has not yet been defined.
9.5 Strong Typing for Memory Operands
The assembler carries out strict syntax checks for all instruction
statements, including strong typing for operands that refer to memory
locations. This means that when an instruction uses two operands with
implied data types, the operand types must match. Warning messages are
generated for nonmatching types.
For example, in the following fragment, the variable string is incorrectly
used in a move instruction:
.DATA
string DB "A message."
.CODE
.
.
.
mov ax,string[1]
The ax register has WORD type, but string has BYTE type. Therefore, the
statement generates the following warning message:
Operand types must match
To avoid all ambiguity and prevent the warning error, use the PTR operator
to override the variable's type, as shown below:
mov ax,WORD PTR string[1]
You can ignore the warnings if you are willing to trust the assembler's
assumptions. When a register and memory operand are mixed, the assembler
assumes that the register operand is always the correct size. For example,
in the statement
mov ax,string[1]
the assembler assumes that the programmer wishes the word size of the
register to override the byte size of the variable. A word starting at
string[1] will be moved into AX. In the statement
mov string[1],ax
the assembler assumes that the programmer wishes to move the word value in
AX into the word starting at string[1]. However, the assembler's
assumptions are not always as clear as in these examples. You should not
ignore warnings about type mismatches unless you are sure you understand
how your code will be assembled.
──────────────────────────────────────────────────────────────────────────
NOTE Some assemblers (including early versions of the IBM Macro
Assembler) do not do strict type checking. For compatibility with these
assemblers, type errors are warnings rather than severe errors. Many
assembly-language program listings in books and magazines are written for
assemblers with weak type checking. Such programs may produce warning
messages, but assemble correctly. You can use the /W option to turn off
type warnings if you are sure the code is correct.
──────────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────
Chapter 10: Assembling Conditionally
QuickAssembler provides two types of conditional directives,
conditional-assembly and conditional-error directives.
Conditional-assembly directives test for a specified condition and
assemble a block of statements if the condition is true. Conditional-error
directives test for a specified condition and generate an assembly error
if the condition is true.
Both kinds of conditional directives test assembly-time conditions. They
cannot test run-time conditions. Only expressions that evaluate to
constants during assembly can be compared or tested.
Since macros and conditional-assembly directives are often used together,
you may need to refer to Chapter 11, "Using Equates, Macros, and Repeat
Blocks," to understand some of the examples in this chapter. In
particular, conditional directives are frequently used with the operators
described in Section 11.5, "Using Macro Operators."
10.1 Using Conditional-Assembly Directives
The conditional-assembly directives include the following:
ELSE IFB IFIDN
ENDIF IFDEF IFIDNI
IF IFDIF IFNB
IF1 IFDIFI IFNDEF
IF2 IFE
The IF directives and the ENDIF and ELSE directives can be used to enclose
the statements to be considered for conditional assembly.
Syntax
IFcondition
statements
[[ELSEIFcondition
statements]]
.
.
.
[[ELSE
statements]]
ENDIF
The statements following the IF directive can be any valid statements,
including other conditional blocks. The ELSEIF and ELSE blocks are
optional. The conditional block can contain any number of ELSEIF blocks.
(The ELSEIF directives are listed in Section 10.1.6.) ENDIF ends the
block.
The statements following the IF directive are assembled only if the
corresponding condition is true. If the condition is not true and an
ELSEIF directive is used, the assembler checks to see if the corresponding
condition is true. If so, it assembles the statements following the ELSEIF
directive. If no IF or ELSEIF conditions are satisifed, the statements
following the ELSE directive are assembled.
IF statements can be nested up to 20 levels. A nested ELSE or ELSEIF
directive always belongs to the nearest preceding IF statement that does
not have its own ELSE directive.
10.1.1 Testing Expressions with IF and IFE Directives
The IF and IFE directives test the value of an expression and grant
assembly based on the result.
Syntax
IF expression
IFE expression
The IF directive grants assembly if the value of expression is true
(nonzero). The IFE directive grants assembly if the value of expression is
false (0). The expression must evaluate to a constant value and must not
contain forward references.
Example
IF debug GT 20
push debug
call adebug
ELSEIF debug GT 10
call bdebug
ELSE
call cdebug
ENDIF
In this example, a different debug routine will be called, depending on
the value of debug.
10.1.2 Testing the Pass with IF1 and IF2 Directives
The IF1 and IF2 directives test the current assembly pass and grant
assembly only on the pass specified by the directive. Multiple passes of
the assembler are discussed in Appendix C, Section C.7, "Reading a Pass
1 Listing."
Syntax
IF1
IF2
The IF1 directive grants assembly only on pass 1. The IF2 directive grants
assembly only on pass 2. The directives take no arguments. If you turn on
the One-Pass Assembly option, the IF2 directive produces an error.
Macros usually only need to be processed once. You can enclose blocks of
macros in IF1 blocks to prevent them from being reprocessed on the second
pass.
Example
IF1 ; Define on first pass only
dostuff MACRO argument
.
.
.
ENDM
ENDIF
10.1.3 Testing Symbol Definition with IFDEF and IFNDEF Directives
The IFDEF and IFNDEF directives test whether a symbol has been defined and
grant assembly based on the result.
Syntax
IFDEF name
IFNDEF name
The IFDEF directive grants assembly only if name is a defined label,
variable, or symbol. The IFNDEF directive grants assembly if name has not
yet been defined.
The name can be any valid name. Note that if name is a forward reference,
it is considered undefined on pass 1, but defined on pass 2.
Example
IFDEF buffer
buff DB buffer DUP(?)
ENDIF
In this example, buff is allocated only if buffer has been previously
defined.
One way to use this conditional block is to leave buffer undefined in the
source file and define it if needed by using the /Dsymbol option (see
Appendix B, Section B.4, "Defining Assembler Symbols") when you start
QuickAssembler. For example, if the conditional block is in TEST.ASM, you
could start the assembler with the following command line:
QCL /Dbuffer=1024 test.asm
You could also define the symbol buffer by entering buffer=1024 in the
Defines field of the Assembler Flags dialog box.
The command line would define the symbol buffer. As a result, the
conditional-assembly block would allocate buff. However, if you didn't
need buff, you could use the following command line:
QCL test.asm
10.1.4 Verifying Macro Parameters with IFB and IFNB Directives
The IFB and IFNB directives test to see if a specified argument was passed
to a macro and grant assembly based on the result.
Syntax
IFB <argument >
IFNB <argument>
These directives are always used inside macros, and they always test
whether a real argument was passed for a specified dummy argument. The IFB
directive grants assembly if argument is blank. The IFNB directive grants
assembly if argument is not blank. The arguments can be any name, number,
or expression. Angle brackets (< >) are required.
Example
Write MACRO buffer,bytes,handle
IFNB <handle>
mov bx,handle ; (1=stdout,2=stderr,3=aux,4=printer)
ELSE
mov bx,1 ; Default standard out
ENDIF
mov dx,OFFSET buffer ; Address of buffer to write to
mov cx,bytes ; Number of bytes to write
mov ah,40h
int 21h
ENDM
In this example, a default value is used if no value is specified for the
third macro argument.
10.1.5 Comparing Macro Arguments with IFIDN and IFDIF Directives
The IFIDN and IFDIF directives compare two macro arguments and grant
assembly based on the result.
Syntax
IFIDN[[I]] <argument1>,<argument2>
IFDIF[[I]] <argument1>,<argument2>
These directives are always used inside macros, and they always test
whether real arguments passed for two specified arguments are the same.
The IFIDN directive grants assembly if argument1 and argument2 are
identical. The IFDIF directive grants assembly if argument1 and argument2
are different. The arguments can be names, numbers, or expressions. They
must be enclosed in angle brackets and separated by a comma.
The optional I at the end of the directive name specifies that the
directive is case insensitive. Arguments that are spelled the same will be
evaluated the same, regardless of case. If the I is not given, the
directive is case sensitive.
Example
divide8 MACRO numerator,denominator
IFDIFI <numerator>,<al> ;; If numerator isn't AL
mov al,numerator ;; make it AL
ENDIF
xor ah,ah
div denominator
ENDM
In this example, a macro uses the IFDIFI directive to check one of the
arguments and take a different action, depending on the text of the
string. The sample macro could be enhanced further by checking for other
values that would require adjustment (such as a denominator passed in AL
or passed in AH).
10.1.6 ELSEIF Directives
The assembler includes an ELSEIF conditional-assembly directive
corresponding to each of the IF directives. The ELSEIF directives provide
a more compact and better structured way of writing some sequences of ELSE
and IF directives. QuickAssembler supports the following directives:
ELSEIF ELSEIFDEF ELSEIFIDN
ELSEIF1 ELSEIFDIF ELSEIFIDNI
ELSEIF2 ELSEIFDIFI ELSEIFNB
ELSEIFB ELSEIFE ELSEIFNDEF
The following macro contains nested IF and ELSE blocks:
; Macro to load register for high-level-language return
FuncRet MACRO arg,length
LOCAL tmploc
IF length EQ 1
mov al,arg
ELSE
IF length EQ 2
mov ax,arg
ELSE
IF length EQ 4
.DATA
tmploc DW ?
DW ?
.CODE
mov ax,WORD PTR arg
mov tmploc,ax
mov ax,WORD PTR arg+2
mov tmploc+2,ax
mov dx,SEG tmploc
mov ax,OFFSET tmploc
ELSE
%OUT Error in FuncRet expansion
.ERR
ENDIF
ENDIF
ENDIF
ENDM
The macro can be rewritten as follows, using the ELSEIF directives:
FuncRet MACRO arg,length
LOCAL tmploc
IF length EQ 1
mov al,arg
ELSEIF length EQ 2
mov ax,arg
ELSEIF length EQ 4
.DATA
tmploc DW ?
DW ?
.CODE
mov ax,WORD PTR arg
mov tmploc,ax
mov ax,WORD PTR arg+2
mov tmploc+2,ax
mov dx,SEG tmploc
mov ax,OFFSET tmploc
ELSE
%OUT Error in FuncRet expansion
.ERR
ENDIF
ENDM
10.2 Using Conditional-Error Directives
Conditional-error directives can be used to debug programs and check for
assembly-time errors. By inserting a conditional-error directive at a key
point in your code, you can test assembly-time conditions at that point.
You can also use conditional-error directives to test for boundary
conditions in macros.
The conditional-error directives and the error messages they produce are
listed in Table 10.1.
Table 10.1 Conditional-Error Directives
Directive Number Message
──────────────────────────────────────────────────────────────────────────
.ERR1 2087 Forced error - pass1
.ERR2 2088 Forced error - pass2
.ERR 2089 Forced error
.ERRE 2090 Forced error - expression equals 0
.ERRNZ 2091 Forced error - expression not equal 0
.ERRNDEF 2092 Forced error - symbol not defined
.ERRDEF 2093 Forced error - symbol defined
.ERRB 2094 Forced error - string blank
.ERRNB 2095 Forced error - string not blank
.ERRIDN [[I]] 2096 Forced error - strings identical
.ERRDIF [[I]] 2097 Forced error - strings different
──────────────────────────────────────────────────────────────────────────
Like other severe errors, those generated by conditional-error directives
cause the assembler to return exit code 7. If a severe error is
encountered during assembly, QuickAssembler will delete the object module.
All conditional-error directives except ERR1 generate severe errors.
10.2.1 Generating Unconditional Errors
with .ERR, .ERR1, and .ERR2 Directives
The .ERR, .ERR1, and .ERR2 directives force an error where the directives
occur in the source file. The error is generated unconditionally when the
directive is encountered, but the directives can be placed within
conditional-assembly blocks to limit the errors to certain situations.
Syntax
.ERR
.ERR1
.ERR2
The .ERR directive forces an error regardless of the pass. The .ERR1 and
.ERR2 directives force the error only on their respective passes. The
.ERR1 directive appears only on the screen or in the listing file if you
use the /D option to request a pass 1 listing.
You can place these directives within conditional-assembly blocks or
macros to see which blocks are being expanded.
Example
IFDEF dos
.
.
.
ELSEIFDEF xenix
.
.
.
ELSE
.ERR
%OUT dos or xenix must be defined
ENDIF
ENDIF
This example makes sure that either the symbol dos or the symbol xenix is
defined. If neither is defined, the nested ELSE condition is assembled and
an error message is generated. Since the .ERR directive is used, an error
would be generated on each pass. You could use .ERR1 or .ERR2 to check if
you want the error to be generated only on the corresponding pass.
10.2.2 Testing Expressions with .ERRE or .ERRNZ Directives
The .ERRE and .ERRNZ directives test the value of an expression and
conditionally generate an error based on the result.
Syntax
.ERRE expression
.ERRNZ expression
The .ERRE directive generates an error if expression is false (0). The
.ERRNZ directive generates an error if expression is true (nonzero). The
expression must evaluate to a constant value and must not contain forward
references.
Example
buffer MACRO count,bname
.ERRE count LE 128 ;; Allocate memory, but
bname DB count DUP(0) ;; no more than 128 bytes
ENDM
.
.
.
buffer 128,buf1 ; Data allocated - no error
buffer 129,buf2 ; Error generated
In this example, the .ERRE directive is used to check the boundaries of a
parameter passed to the macro buffer. If count is less than or equal to
128, the expression being tested by the error directive will be true
(nonzero) and no error will be generated. If count is greater than 128,
the expression will be false (0) and the error will be generated.
10.2.3 Verifying Symbol Definition with .ERRDEF and .ERRNDEF Directives
The .ERRDEF and .ERRNDEF directives test whether a symbol is defined and
conditionally generate an error based on the result.
Syntax
.ERRDEF name
.ERRNDEF name
The .ERRDEF directive produces an error if name is defined as a label,
variable, or symbol. The .ERRNDEF directive produces an error if name has
not yet been defined. If name is a forward reference, it is considered
undefined on pass 1, but defined on pass 2.
Example
.ERRNDEF publevel
IF publevel LE 2
PUBLIC var1, var2
ELSE
PUBLIC var1, var2, var3
ENDIF
In this example, the .ERRNDEF directive at the beginning of the
conditional block makes sure that a symbol being tested in the block
actually exists.
10.2.4 Testing for Macro Parameters with .ERRB and .ERRNB Directives
The .ERRB and .ERRNB directives test whether a specified argument was
passed to a macro and conditionally generate an error based on the result.
Syntax
.ERRB <argument>
.ERRNB <argument>
These directives are always used inside macros, and they always test
whether a real argument was passed for a specified dummy argument. The
.ERRB directive generates an error if argument is blank. The .ERRNB
directive generates an error if argument is not blank. The argument can be
any name, number, or expression. Angle brackets (<>) are required.
Example
work MACRO realarg,testarg
.ERRB <realarg> ;; Error if no parameters
.ERRNB <testarg> ;; Error if more than one parameter
.
.
.
ENDM
In this example, error directives are used to make sure that one, and only
one, argument is passed to the macro. The .ERRB directive generates an
error if no argument is passed to the macro. The .ERRNB directive
generates an error if more than one argument is passed to the macro.
10.2.5 Comparing Macro Arguments with .ERRIDN and .ERRDIF Directives
The .ERRIDN and .ERRDIF directives compare two macro arguments and
conditionally generate an error based on the result.
Syntax
.ERRIDN[[I]] <argument1>,<argument2>
.ERRDIF[[I]] <argument1>,<argument2>
These directives are always used inside macros, and they always compare
the real arguments specified for two parameters. The .ERRIDN directive
generates an error if the arguments are identical. The .ERRDIF directive
generates an error if the arguments are different. The arguments can be
names, numbers, or expressions. They must be enclosed in angle brackets
and separated by a comma.
The optional I at the end of the directive name specifies that the
directive is case insensitive. Arguments that are spelled the same will be
evaluated the same regardless of case. If the I is not given, the
directive is case sensitive.
Example
addem MACRO ad1,ad2,sum
.ERRIDNI <ax>,<ad2> ;; Error if ad2 is "ax"
mov ax,ad1 ;; Would overwrite if ad2 were AX
add ax,ad2
mov sum,ax ;; Sum must be register or memory
ENDM
In this example, the .ERRIDNI directive is used to protect against passing
the AX register as the second parameter, since this would cause the macro
to fail.
────────────────────────────────────────────────────────────────────────────
Chapter 11: Using Equates, Macros, and Repeat Blocks
This chapter explains how to use equates, macros, and repeat blocks.
"Equates" are constant values assigned to symbols so that the symbol can
be used in place of the value. "Macros" are a series of statements that
are assigned a symbolic name (and, optionally, parameters) so that the
symbol can be used in place of the statements. "Repeat blocks" are a
special form of macro used to do repeated statements.
Both equates and macros are processed at assembly time. They can simplify
writing source code by allowing the user to substitute mnemonic names for
constants and repetitive code. By changing a macro or equate, a programmer
can change the effect of statements throughout the source code.
In exchange for these conveniences, the programmer loses some
assembly-time efficiency. Assembly may be slightly slower for a program
that uses macros and equates extensively than for the same program written
without them. However, the program without macros and equates usually
takes longer to write and is more difficult to maintain.
11.1 Using Equates
The equate directives enable you to use symbols that represent numeric or
string constants. QuickAssembler recognizes three kinds of equates:
1. Redefinable numeric equates
2. Nonredefinable numeric equates
3. String equates (also called text macros)
11.1.1 Redefinable Numeric Equates
Redefinable numeric equates are used to assign a numeric constant to a
symbol. The value of the symbol can be redefined at any point during
assembly time. Although the value of a redefinable equate may be different
at different points in the source code, a constant value will be assigned
for each use, and that value will not change at run time.
Redefinable equates are often used for assembly-time calculations in
macros and repeat blocks.
Syntax
name=expression
The equal-sign (=) directive creates or redefines a constant symbol by
assigning the numeric value of expression to name. No storage is allocated
for the symbol. The symbol can be used in subsequent statements as an
immediate operand having the assigned value. It can be redefined at any
time.
The expression can be an integer, a constant expression, a one- or
two-character string constant, or an expression that evaluates to an
address. The name must be either a unique name or a name previously
defined by using the equal-sign (=) directive.
──────────────────────────────────────────────────────────────────────────
NOTE Redefinable equates must be assigned numeric values. String
constants longer than two characters cannot be used.
──────────────────────────────────────────────────────────────────────────
Example
counter = 0 ; Initialize counter
array LABEL BYTE ; Label array of increasing numbers
REPT 100 ; Repeat 100 times
DB counter ; Initialize number
counter = counter + 1 ; Increment counter
ENDM
This example redefines equates inside a repeat block to declare an array
initialized to increasing values from 0 to 100. The equal-sign directive
is used to increment the counter symbol for each loop. See Section 11.4
for more information on repeat blocks.
11.1.2 Nonredefinable Numeric Equates
Nonredefinable numeric equates are used to assign a numeric constant to a
symbol. The value of the symbol cannot be redefined.
Nonredefinable numeric equates are often used for assigning mnemonic names
to constant values. This can make the code more readable and easier to
maintain. If a constant value used in numerous places in the source code
needs to be changed, the equate can be changed in one place rather than
throughout the source code.
Syntax
name EQU expression
The EQU directive creates constant symbols by assigning expression to
name. The assembler replaces each subsequent occurrence of name with the
value of expression. Once a numeric equate has been defined with the EQU
directive, it cannot be redefined. Attempting to do so generates an error.
──────────────────────────────────────────────────────────────────────────
NOTE String constants can also be defined with the EQU directive, but the
syntax is different, as described in Section 11.1.3, "String Equates."
──────────────────────────────────────────────────────────────────────────
No storage is allocated for the symbol. Symbols defined with numeric
values can be used in subsequent statements as immediate operands having
the assigned value.
Examples
column EQU 80 ; Numeric constant 80
row EQU 25 ; Numeric constant 25
screenful EQU column * row ; Numeric constant 2000
line EQU row ; Alias for "row"
.DATA
buffer DW screenful
.CODE
.
.
.
mov cx,column
mov bx,line
11.1.3 String Equates
String equates (or text macros) are used to assign a string constant to a
symbol. String equates can be used in a variety of contexts, including
defining aliases and string constants.
Syntax
name EQU{string | <string>}
The EQU directive creates constant symbols by assigning string to name.
The assembler replaces each subsequent occurrence of name with string.
Symbols defined to represent strings with the EQU directive can be
redefined to new strings. Symbols cannot be defined to represent strings
with the equal-sign (=) directive.
An alias is a special kind of string equate. It is a symbol that is
equated to another symbol or keyword.
If you want an equate to be a string equate, you should use angle brackets
to force the assembler to evaluate it as a string. If you do not use angle
brackets, the assembler will try to guess from context whether a numric or
string equate is appropriate. This can lead to unexpected results. For
example, the statement
rt EQU run-time
would be evaluated as run minus time, even though the user might intend to
define the string run-time. If run and time were not already defined as
numeric equates, the statement would generate an error. Using angle
brackets solves this problem. The statement
rt EQU <run-time>
is evaluated as the string run-time.
Examples
;String equate definitions
pi EQU <3.1415> ; String constant "3.1415"
prompt EQU <'Type Name: '> ; String constant "'Type Name: '",
WPT EQU <WORD PTR> ; String constant for "WORD PTR"
argl EQU <[bp+4]> ; String constant for "[bp+4]"
; Use of string equates
.DATA
message DB prompt ; Allocate string "Type Name:"
pie DQ pi ; Allocate real number 3.1415
.CODE
.
.
.
inc WPT parm1 ; Increment word value of
; argument passed on stack
Section 11.3, "Text-Macro String Directives," describes directives that
enable you to manipulate strings. They are particularly powerful when you
use them from within macros and repeat blocks, described later.
11.1.4 Predefined Equates
The assembler includes several predefined equates. The ones related to
segments are described in Section 5.1.5, "Using Predefined Segment
Equates." In addition, the following equates are available: @WordSize,
@Cpu, and @Version.
The @WordSize equate returns the size of a word for the current segment.
With QuickAssembler, this value is always equal to 2. However, other
versions of the assembler can assign a different value to @WordSize when
working with 80386 extended features.
──────────────────────────────────────────────────────────────────────────
NOTE If you set the Preserve Case assembler flag or use the /Cl option,
QuickAssembler considers predefined equates to be case-sensitive. The
case-sensitive names of predefined equates are @WordSize, @Cpu, @Version,
@CurSeg, @FileName, @CodeSize, @DataSize, @Model, @data, @data?, @fardata,
@fardata?, and @code.
──────────────────────────────────────────────────────────────────────────
The @Cpu equate returns a 16-bit value containing information about the
selected processor. You select a processor by using one of the processor
directives, such as the .286 directive. You can use the @Cpu text macro to
control assembly of processor-specific code. Individual bits in the value
returned by @Cpu indicate information about the selected processor.
Bit If Bit = 1
──────────────────────────────────────────────────────────────────────────
0 8086 processor
1 80186 processor
2 80286 processor
8 8087 coprocessor instructions enabled
10 80287 coprocessor instructions enabled
Because the processors are upwardly compatible, selecting a
higher-numbered processor automatically sets the bits indicating
lower-numbered processors. For example, selecting an 80286 processor
automatically sets the 80186 and 8086 bits.
Bits 4 through 6, 9, and 12 through 15 are reserved for future use and
should be masked off when testing. Bits 3, 7, and 11 have special meaning
to Versions 5.1 and later of the Microsoft Macro Assembler: bit 3
indicates an 80386 processor, bit 7 indicates privilege mode enabled, and
bit 11 indicates that 80387 coprocessor instructions are enabled.
──────────────────────────────────────────────────────────────────────────
NOTE The @Cpu equate only provides information about the processor
selected during assembly by one of the processor directives. It does not
provide information about the processor actually used when a program is
run.
──────────────────────────────────────────────────────────────────────────
The following example uses the @Cpu text macro to select more efficient
instructions available only on the 80186 processor and above:
; Use the 186/286/386 pusha instruction if possible
P186 EQU (@Cpu AND 0002h) ; Only test 186 bit--286 and
; 386 set 186 bit as well
.
.
.
IF P186 ; Non-zero if 186 processor
pusha ; or above
ELSE
push ax ; Do what the single
push cx ; pusha instruction does
push dx
push bx
push sp
push bp
push si
push di
ENDIF
The @Version equate returns the version of the assembler in use. With the
@Version equate, you can write macros that take appropriate actions for
different versions of the assembler. Currently, the @Version equate
returns 520 as a string of three characters.
──────────────────────────────────────────────────────────────────────────
NOTE Although the version number of QuickAssembler is 2.01, the @Version
equate returns 520 rather than 201. The number 520 was selected because
QuickAssembler is an enhancement of Version 5.1 of the Microsoft Macro
Assembler. The @Version equate was first assembled for Version 5.1.
──────────────────────────────────────────────────────────────────────────
You can use the IF and IFE conditional assembly directives to test for
different versions of the assembler and to assemble different code
depending on the version.
IFNDEF @Version
%OUT MASM 5.0 or earlier has no extended PROC or .STARTUP
ELSEIF @Version EQ 510
%OUT MASM 5.1 has extended PROC, but not .STARTUP
ELSEIF @Version EQ 520
%OUT QuickAssembler 2.01 has extended PROC and .STARTUP
ELSE
%OUT Future assembler
ENDIF
11.2 Using Macros
Macros enable you to assign a symbolic name to a block of source
statements and then to use that name in your source file to represent the
statements. Parameters can also be defined to represent arguments passed
to the macro.
Macro expansion is a text-processing function that occurs at assembly
time. Each time QuickAssembler encounters the text associated with a macro
name, it replaces that text with the text of the statements in the macro
definition. Similarly, the text of parameter names is replaced with the
text of the corresponding actual arguments.
A macro can be defined any place in the source file as long as the
definition precedes the first source line that calls the macro. Macros and
equates are often kept in a separate file and made available to the
program through an INCLUDE directive (see Section 11.7.1, "Using Include
Files") at the start of the source code.
Often a task can be done by using either a macro or procedure. For
example, the addup procedure shown in Section 15.3.3, "Passing Arguments
on the Stack," does the same thing as the addup macro in Section 11.2.1,
"Defining Macros." Macros are expanded on every occurrence of the macro
name, so they can increase the length of the executable file if called
repeatedly. Procedures are coded only once in the executable file, but the
increased overhead of saving and restoring addresses and parameters can
make them slower.
The section below tells how to define and call macros. Repeat blocks, a
special form of macro for doing repeated operations, are discussed
separately in Section 11.4.
11.2.1 Defining Macros
The MACRO and ENDM directives are used to define macros. MACRO designates
the beginning of the macro block, and ENDM designates the end of the macro
block.
Syntax
name MACRO [[parameter [[,parameter]]...]]
statements
ENDM
The name must be unique and a valid symbol name. It can be used later in
the source file to invoke the macro.
The parameters (sometimes called dummy parameters) are names that act as
placeholders for values to be passed as arguments to the macro when it is
called. Any number of parameters can be specified, but they must all fit
on one line. If you give more than one parameter, you must separate them
with commas, spaces, or tabs. Commas can always be used as separators;
spaces and tabs may cause ambiguity if the arguments are expressions.
──────────────────────────────────────────────────────────────────────────
NOTE This manual uses the term "parameter" to refer to a placeholder for
a value that will be passed to a macro or procedure. Parameters appear in
macro or procedure definitions. The term "argument" is used to refer to an
actual value passed to the macro or procedure when it is called.
──────────────────────────────────────────────────────────────────────────
Any valid assembler statements may be placed within a macro, including
statements that call or define other macros. Any number of statements can
be used. The parameters can be used any number of times in the statements.
Macros can be nested, redefined, or used recursively, as explained in
Section 11.6, "Using Recursive, Nested, and Redefined Macros."
QuickAssembler assembles the statements in a macro only if the macro is
called and only at the point in the source file from which it is called.
The macro definition itself is never assembled.
A macro definition can include the LOCAL directive, which lets you define
labels used only within a macro, or the EXITM directive, which allows you
to exit from a macro before all the statements in the block are expanded.
These directives are discussed in Sections 11.2.3, "Using Local Symbols,"
and 11.2.4, "Exiting from a Macro." Macro operators can also be used in
macro definitions, as described in Section 11.5, "Using Macro Operators."
Example
addup MACRO ad1,ad2,ad3
mov ax,ad1 ;; First parameter in AX
add ax,ad2 ;; Add next two parameters
add ax,ad3 ;; and leave sum in AX
ENDM
This example defines a macro named addup, which uses three parameters to
add three values and leave their sum in the AX register. The three
parameters will be replaced with arguments when the macro is called.
11.2.2 Calling Macros
A macro call directs QuickAssembler to copy the statements of the macro to
the point of the call and to replace any parameters in the macro
statements with the corresponding actual arguments.
Syntax
name [[argument [[,argument]]...]]
The name must be the name of a macro defined earlier in the source file.
The arguments can be any text. For example, symbols, constants, and
registers are often given as arguments. Any number of arguments can be
given, but they must all fit on one logical line. You can use the
continuation character (\) to continue long macro calls on multiple
physical lines. Multiple arguments must be separated by commas, spaces, or
tabs.
QuickAssembler replaces the first parameter with the first argument, the
second parameter with the second argument, and so on. If a macro call has
more arguments than the macro has parameters, the extra arguments are
ignored. If a call has fewer arguments than the macro has parameters, any
remaining parameters are replaced with a null (empty) string.
You can use conditional statements to enable macros to check for null
strings or other types of arguments. The macro can then take appropriate
action to adjust to different kinds of arguments. See Chapter 10,
"Assembling Conditionally," for more information on using
conditional-assembly and conditional-error directives to test macro
arguments.
Example
addup MACRO ad1,ad2,ad3 ; Macro definition
mov ax,ad1 ;; First parameter in AX
add ax,ad2 ;; Add next two parameters
add ax,ad3 ;; and leave sum in AX
ENDM
.
.
.
addup bx,2,count ; Macro call
When the addup macro is called, QuickAssembler replaces the parameters
with the actual parameters given in the macro call. In the example above,
the assembler would expand the macro call to the following code:
mov ax,bx
add ax,2
add ax,count
This code could be shown in an assembler listing, depending on whether the
.LALL, .XALL, or .SALL directive was in effect (see Section 12.3,
"Controlling the Contents of Listings").
11.2.3 Using Local Symbols
The LOCAL directive can be used within a macro to define symbols that are
available only within the defined macro.
──────────────────────────────────────────────────────────────────────────
NOTE In this context, the term "local" is not related to the public
availability of a symbol, as described in Chapter 8, "Creating Programs
from Multiple Modules," or to variables that are defined to be local to a
procedure, as described in Section 15.3.5, "Using Local Variables." Local
simply means that the symbol is not known outside the macro where it is
defined.
──────────────────────────────────────────────────────────────────────────
Syntax
LOCAL localname [[,localname]]...
The localname is a temporary symbol name that is to be replaced by a
unique symbol name when the macro is expanded. At least one local name is
required for each LOCAL directive. If more than one local symbol is given,
the names must be separated with commas. Once declared, local name can be
used in any statement within the macro definition.
QuickAssembler creates a new actual name for localname each time the macro
is expanded. The actual name has the following form:
??number
The number is a hexadecimal number in the range 0000 to 0FFFF. You should
not give other symbols names in this format, since doing so may produce a
symbol with multiple definitions. In listings, the local name is shown in
the macro definition, but the actual name is shown in expansions of macro
calls.
Nonlocal labels may be used in a macro; but if the macro is used more than
once, the same label will appear in both expansions, and QuickAssembler
will display an error message, indicating that the file contains a symbol
with multiple definitions. To avoid this problem, use only local labels
(or redefinable equates) in macros.
──────────────────────────────────────────────────────────────────────────
NOTE The LOCAL directive in macro definitions must precede all other
statements in the definition. If you try another statement (such as a
comment directive) before the LOCAL directive, an error will be generated.
──────────────────────────────────────────────────────────────────────────
Example
power MACRO factor,exponent ;; Use for unsigned only
LOCAL again,gotzero ;; Declare symbols for macro
xor dx,dx ;; Clear DX
mov cx,exponent ;; Exponent is count for loop
mov ax,1 ;; Multiply by 1 first time
jcxz gotzero ;; Get out if exponent is zero
mov bx,factor
again: mul bx ;; Multiply until done
loop again
gotzero:
ENDM
In this example, the LOCAL directive defines the local names again and
gotzero as labels to be used within the power macro. These local names
will be replaced with unique names each time the macro is expanded. For
example, the first time the macro is called, again will be assigned the
name ??0000 and gotzero will be assigned ??0001. The second time through,
again will be assigned ??0002 and gotzero will be assigned ??0003, and so
on.
11.2.4 Exiting from a Macro
Normally, QuickAssembler processes all the statements in a macro
definition and then continues with the next statement after the macro
call. However, you can use the EXITM directive to tell the assembler to
terminate macro expansion before all the statements in the macro have been
assembled.
When the EXITM directive is encountered, the assembler exits the macro or
repeat block immediately. Any remaining statements in the macro or repeat
block are not processed. If EXITM is encountered in a nested macro or
repeat block, QuickAssembler returns to expanding the outer block.
The EXITM directive is typically used with conditional directives to skip
the last statements in a macro under specified conditions. Often macros
using the EXITM directive contain repeat blocks or are called recursively.
Example
allocate MACRO times ; Macro definition
x = 0
REPT times ;; Repeat up to 256 times
IF x GT 0FFh ;; Is x > 255 yet?
EXITM ;; If so, quit
ELSE
DB x ;; Else allocate x
ENDIF
x = x + 1 ;; Increment x
ENDM
ENDM
This example defines a macro that allocates a variable amount of data, but
no more than 255 bytes. The macro contains an IF directive that checks the
expression x - 0FFh. When the value of this expression is true (x-255 =
0), the EXITM directive is processed and expansion of the macro stops.
11.3 Text-Macro String Directives
The assembler includes four text-macro string directives that let you
manipulate literal strings or text-macro values. You use the four
directives in much the same way you use the equal-sign (=) directive. For
example, the following line assigns the first three characters (abc) of
the literal string to the label three by using the SUBSTR directive:
three SUBSTR <abcdefghijklmnopqrstuvwxyz>,1,3
Each of the directives assigns its value──depending on the directive──to a
numeric label or a text macro. The following list summarizes the four
directives and the type of label that the directives should be used with:
Directive Description
──────────────────────────────────────────────────────────────────────────
SUBSTR Returns a substring of its text macro or literal
string argument. SUBSTR requires a text-macro label.
CATSTR Concatenates a variable number of strings (text macros
or literal strings) to form a single string. CATSTR
requires a text-macro label.
SIZESTR Returns the length, in characters, of its argument
string. SIZESTR requires a numeric label.
INSTR Returns an index indicating the starting position of a
substring within another string. INSTR requires a
numeric label.
Strings used as arguments in the directives must be text enclosed in angle
brackets (< >), previously defined text macros, or expressions starting
with a percent sign (%). Numeric arguments can be numeric constants or
expressions that evaluate to constants during assembly.
The next four sections describe the directives in more detail.
11.3.1 The SUBSTR Directive
The SUBSTR directive returns a substring from a given string.
Syntax
textlabel SUBSTR string,start[[, length]]
The SUBSTR directive takes the following arguments:
Argument Description
──────────────────────────────────────────────────────────────────────────
textlabel The text label the result is assigned to.
string The string the substring is extracted from.
start The starting position of the substring. The first
character in the string has a position of one.
length The number of characters to extract. If omitted,
SUBSTR returns all characters to the right of position
start, including the character at position start.
In the following lines, the text macro freg is assigned the first two
characters of the text macro reglist:
reglist EQU <ax,bx,cx,dx>
.
.
.
freg SUBSTR reglist,1,2 ; freg = ax
11.3.2 The CATSTR Directive
The CATSTR directive concatenates a series of strings.
Syntax
textlabel CATSTR string[[, string]]...
The CATSTR directive takes the following arguments:
Argument Description
──────────────────────────────────────────────────────────────────────────
textlabel The text label the result is assigned to
string The string or strings concatenated and assigned to
textlabel
The following lines concatenate the two literal strings and assign the
result to the text macro lstring:
lstring CATSTR <a b c>, <d e f>, ; lstring = a b c d e f
11.3.3 The SIZESTR Directive
The SIZESTR directive assigns the length of its argument string to a
numeric label.
Syntax
numericlabel SIZESTR string
The SIZESTR directive takes the following arguments:
Argument Description
──────────────────────────────────────────────────────────────────────────
numericlabel The numeric label that the assembler assigns the
string length to
string The string whose length is returned
The following lines set slength to 8──the length of the text macro
tstring:
tstring EQU <ax bx cx>
.
.
.
slength SIZESTR tstring ; slength = 8
A null string has a length of zero.
11.3.4 The INSTR Directive
The INSTR directive returns the position of a string within another
string. The directive returns 0 if the string is not found. The first
character in a string has a position of one.
Syntax
numericlabel INSTR [[start,]]string1, string2
The INSTR directive takes the following arguments:
Argument Description
──────────────────────────────────────────────────────────────────────────
numbericlabel The numeric label the substring's position is assigned
to.
start The starting position for the search. When omitted,
the INSTR directive starts searching at the first
character. The first character in the string has a
position of one.
string1 The string being searched.
string2 The string to look for.
The following lines set colpos to the character position of the first
colon in segarg:
segarg EQU <ES:AX>
.
.
.
colpos INSTR segarg,<:> ; colpos = 3
11.3.5 Using String Directives Inside Macros
The following example uses the text-macro string directives CATSTR, INSTR,
SIZESTR, and SUBSTR. It defines two macros, SaveRegs and RestRegs, that
save and restore registers on the stack. The macros are written so that
RestRegs restores only the most recently saved group of registers.
The SaveRegs macro uses a text macro, regpushed, to keep track of the
registers pushed onto the stack. The RestRegs macro uses this string to
restore the proper registers. Each time the SaveRegs macro is invoked, it
adds a pound sign (#) to the string to mark the start of a new group of
registers. The RestRegs macro restores the most recently saved group by
finding the first pound sign in the string, creating a substring
containing the saved register names, and then looping and generating PUSH
instructions.
; Initialize regpushed to the null string
regpushed EQU <>
; SaveRegs
; Loops and generates a push for each argument register.
; Saves each register name in regpushed.
SaveRegs MACRO r1,r2,r3,r4,r5,r6,r7,r8,r9
regpushed CATSTR <#>,regpushed ;; Mark a new group of regs
IRP reg,<r1,r2,r3,r4,r5,r6,r7,r8,r9>
IFNB <reg>
push reg ;; Push and record a register
regpushed CATSTR <reg>,<,>,regpushed
ELSE
EXITM ;; Quit on blank argument
ENDIF
ENDM
ENDM
; RestRegs
; Generates a pop for each register in the most recently saved groups
RestRegs MACRO
numloc INSTR regpushed,<#> ;; Find location of #
reglist SUBSTR regpushed,1,numloc-1 ;; Get list of registers to pop
reglen SIZESTR regpushed ;; Adjust numloc if # is notlast
IF reglen GT numloc ;; item in the string
numloc = numloc + 1
ENDIF
regpushed SUBSTR regpushed,numloc ;; Remove list from regpushed
% IRP reg,<reglist> ;; Generate pop for each register
IFNB <reg>
pop reg
ENDIF
ENDM
ENDM
The following lines from a listing file show the sample code that the
macros would generate (a 2 marks lines generated by the macros):
SaveRegs ax,bx
2 push ax ;
2 push bx ;
SaveRegs cx
2 push cx ;
SaveRegs dx
2 push dx ;
RestRegs
2 pop dx
RestRegs
2 pop cx
RestRegs
2 pop bx
2 pop ax
11.4 Defining Repeat Blocks
Repeat blocks are a special form of macro that allows you to create blocks
of repeated statements. They differ from macros in that they are not
named, and thus cannot be called. However, like macros, they can have
parameters that are replaced by actual arguments during assembly. Macro
operators, symbols declared with the LOCAL directive, and the EXITM
directive can be used in repeat blocks. Like macros, repeat blocks are
always terminated by an ENDM directive.
Repeat blocks are frequently placed in macros in order to repeat some of
the statements in the macro. They can also be used independently, usually
for declaring arrays with repeated data elements.
Repeat blocks are processed at assembly time and should not be confused
with the REP instruction, which causes string instructions to be repeated
at run time, as explained in Chapter 16, "Processing Strings."
Three different kinds of repeat blocks can be defined by using the REPT,
IRP, and IRPC directives. The difference between them is in how the number
of repetitions is specified.
11.4.1 The REPT Directive
The REPT directive is used to create repeat blocks in which the number of
repetitions is specified with a numeric argument.
Syntax
REPT expression
statements
ENDM
The expression must evaluate to a numeric constant (a 16-bit unsigned
number). It specifies the number of repetitions. Any valid assembler
statements may be placed within the repeat block.
Example
alphabet LABEL BYTE
x = 0 ;; Initialize
REPT 26 ;; Specify 26 repetitions
DB 'A' + x ;; Allocate ASCII code for letter
x = x + 1 ;; Increment
ENDM
This example repeats the equal-sign (=) and DB directives to initialize
ASCII values for each uppercase letter of the alphabet.
11.4.2 The IRP Directive
The IRP directive is used to create repeat blocks in which the number of
repetitions, as well as parameters for each repetition, is specified in a
list of arguments.
Syntax
IRP parameter,<argument[[,argument]]...>
statements
ENDM
The assembler statements inside the block are repeated once for each
argument in the list enclosed by angle brackets (< >). The parameter is a
name for a placeholder to be replaced by the current argument. Each
argument can be text, such as a symbol, string, or numeric constant. Any
number of arguments can be given. If multiple arguments are given, they
must be separated by commas. The angle brackets (< >) around the argument
list are required. The parameter can be used any number of times in the
statements.
When QuickAssembler encounters an IRP directive, it makes one copy of the
statements for each argument in the enclosed list. While copying the
statements, it substitutes the current argument for all occurrences of
parameter in these statements. If a null argument (< >) is found in the
list, the dummy name is replaced with a blank value. If the argument list
is empty, the IRP directive is ignored and no statements are copied.
Example
numbers LABEL BYTE
IRP x,<0,1,2,3,4,5,6,7,8,9>
DB 10 DUP(x)
ENDM
This example repeats the DB directive 10 times, allocating 10 bytes for
each number in the list. The resulting statements create 100 bytes of
data, starting with 10 zeros, followed by 10 ones, and so on.
11.4.3 The IRPC Directive
The IRPC directive is used to create repeat blocks in which the number of
repetitions, as well as arguments for each repetition, is specified in a
string.
Syntax
IRPC parameter,string
statements
ENDM
The asse