layout | title | permalink |
---|---|---|
page |
Advanced MS-DOS Programming |
/pubs/pc/reference/microsoft/mspl13/msdos/advdos/ |
{% raw %}
Advanced MS-DOS Programming
════════════════════════════════════════════════════════════════════════════
Advanced MS-DOS Programming
The Microsoft(R) Guide for Assembly Language and C Programmers
By Ray Duncan
════════════════════════════════════════════════════════════════════════════
PUBLISHED BY
Microsoft Press
A Division of Microsoft Corporation
16011 NE 36th Way, Box 97017, Redmond, Washington 98073-9717
Copyright (C) 1986, 1988 by Ray Duncan
Published 1986. Second edition 1988.
All rights reserved. No part of the contents of this book may be
reproduced or transmitted in any form or by any means without the written
permission of the publisher.
Library of Congress Cataloging in Publication Data
Duncan, Ray, 1952-
Advanced MS-DOS programming.
Rev. ed. of: Advanced MS-DOS. (C)1986.
Includes index.
1. MS-DOS (Computer operating system) 2. Assembler language
(Computer program language) 3. C (Computer program language)
I. Duncan, Ray, 1952- Advanced MS-DOS. II. Title.
QA76.76.063D858 1988 005.4'46 88-1251
ISBN 1-55615-157-8
Printed and bound in the United States of America.
1 2 3 4 5 6 7 8 9 FGFG 3 2 1 0 9 8
Distributed to the book trade in the United States by Harper & Row.
Distributed to the book trade in Canada by General Publishing Company,
Ltd.
Penguin Books Ltd., Harmondworth, Middlesex, England
Penguin Books Australia Ltd., Ringwood, Victoria, Australia
Penguin Books N.Z. Ltd., 182-190 Wairu Road, Auckland 10, New Zealand
British Cataloging in Publication Data available
IBM(R), PC/AT(R), and PS/2(R) are registered trademarks of International
Business Machines Corporation. CodeView(R), Microsoft(R), MS-DOS(R), and
XENIX(R) are registered trademarks and InPort TM is a trademark of
Microsoft Corporation.
──────────────────────────────────────────────────────────────────────────
Technical Editor: Mike Halvorson Production Editor: Mary Ann Jones
──────────────────────────────────────────────────────────────────────────
Dedication
For Carolyn
────────────────────────────────────────────────────────────────────────────
Contents
Road Map to Figures and Tables
Acknowledgments
Introduction
SECTION 1 PROGRAMMING FOR MS-DOS
Chapter 1 Genealogy of MS-DOS
Chapter 2 MS-DOS in Operation
Chapter 3 Structure of MS-DOS Application Programs
Chapter 4 MS-DOS Programming Tools
Chapter 5 Keyboard and Mouse Input
Chapter 6 Video Display
Chapter 7 Printer and Serial Port
Chapter 8 File Management
Chapter 9 Volumes and Directories
Chapter 10 Disk Internals
Chapter 11 Memory Management
Chapter 12 The EXEC Function
Chapter 13 Interrupt Handlers
Chapter 14 Installable Device Drivers
Chapter 15 Filters
Chapter 16 Compatibility and Portability
SECTION 2 MS-DOS FUNCTIONS REFERENCE
SECTION 3 IBM ROM BIOS AND MOUSE FUNCTIONS REFERENCE
SECTION 4 LOTUS/INTEL/MICROSOFT EMS FUNCTIONS REFERENCE
Index
────────────────────────────────────────────────────────────────────────────
Road Map to Figures and Tables
MS-DOS versions and release dates
MS-DOS memory map
Structure of program segment prefix (PSP)
Structure of .EXE load module
Register conditions at program entry
Segments, groups, and classes
Macro Assembler switches
C Compiler switches
Linker switches
MAKE switches
ANSI escape sequences
Video attributes
Structure of normal file control block (FCB)
Structure of extended file control block
MS-DOS error codes
Structure of boot sector
Structure of directory entry
Structure of fixed-disk master block
LIM EMS error codes
Intel 80x86 internal interrupts (faults)
Intel 80x86, MS-DOS, and ROM BIOS interrupts
Device-driver attribute word
Device-driver command codes
Structure of BIOS parameter block (BPB)
Media descriptor byte
────────────────────────────────────────────────────────────────────────────
Acknowledgments
My renewed thanks to the outstanding editors and production staff at
Microsoft Press, who make beautiful books happen, and to the talented
Microsoft developers, who create great programs to write books about.
Special thanks to Mike Halvorson, Jeff Hinsch, Mary Ann Jones, Claudette
Moore, Dori Shattuck, and Mark Zbikowski; if this book has anything unique
to offer, these people deserve most of the credit.
────────────────────────────────────────────────────────────────────────────
Introduction
Advanced MS-DOS Programming is written for the experienced C or
assembly-language programmer. It provides all the information you need to
write robust, high-performance applications under the MS-DOS operating
system. Because I believe that working, well-documented programs are
unbeatable learning tools, I have included detailed programming examples
throughout──including complete utility programs that you can adapt to your
own needs.
This book is both a tutorial and a reference and is divided into four
sections, so that you can find information more easily. Section 1
discusses MS-DOS capabilities and services by functional group in the
context of common programming issues, such as user input, control of the
display, memory management, and file handling. Special classes of
programs, such as interrupt handlers, device drivers, and filters, have
their own chapters.
Section 2 provides a complete reference guide to MS-DOS function calls,
organized so that you can see the calling sequence, results, and version
dependencies of each function at a glance. I have also included notes,
where relevant, about quirks and special uses of functions as well as
cross-references to related functions. An assembly-language example is
included for each entry in Section 2.
Sections 3 and 4 are references to IBM ROM BIOS, Microsoft Mouse driver,
and Lotus/Intel/Microsoft Expanded Memory Specification functions. The
entries in these two sections have the same form as in Section 2, except
that individual programming examples have been omitted.
The programs in this book were written with the marvelous Brief editor
from Solution Systems and assembled or compiled with Microsoft Macro
Assembler version 5.1 and Microsoft C Compiler version 5.1. They have been
tested under MS-DOS versions 2.1, 3.1, 3.3, and 4.0 on an 8088-based IBM
PC, an 80286-based IBM PC/AT, and an 80386-based IBM PS/2 Model 80. As far
as I am aware, they do not contain any software or hardware dependencies
that will prevent them from running properly on any IBM PC─compatible
machine running MS-DOS version 2.0 or later.
Changes from the First Edition
Readers who are familiar with the first edition will find many changes in
the second edition, but the general structure of the book remains the
same. Most of the material comparing MS-DOS to CP/M and UNIX/XENIX has
been removed; although these comparisons were helpful a few years ago,
MS-DOS has become its own universe and deserves to be considered on its
own terms.
The previously monolithic chapter on character devices has been broken
into three more manageable chapters focusing on the keyboard and mouse,
the display, and the serial port and printer. Hardware-dependent video
techniques have been de-emphasized; although this topic is more important
than ever, it has grown so complex that it requires a book of its own. A
new chapter discusses compatibility and portability of MS-DOS applications
and also contains a brief introduction to Microsoft OS/2, the new
multitasking, protected-mode operating system.
A road map to vital figures and tables has been added, following the Table
of Contents, to help you quickly locate the layouts of the program segment
prefix, file control block, and the like.
The reference sections at the back of the book have been extensively
updated and enlarged and are now complete through MS-DOS version 4.0, the
IBM PS/2 Model 80 ROM BIOS and the VGA video adapter, the Microsoft Mouse
driver version 6.0, and the Lotus/Intel/Microsoft Expanded Memory
Specification version 4.0.
In the two years since Advanced MS-DOS Programming was first published,
hundreds of readers have been kind enough to send me their comments, and I
have tried to incorporate many of their suggestions in this new edition.
As before, please feel free to contact me via MCI Mail (user name LMI),
CompuServe (user ID 72406,1577), or BIX (user name rduncan).
Ray Duncan Los Angeles, California September 1988
────────────────────────────────────────────────────────────────────────────
SECTION 1 PROGRAMMING FOR MS-DOS
────────────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────
Chapter 1 Genealogy of MS-DOS
In only seven years, MS-DOS has evolved from a simple program loader into
a sophisticated, stable operating system for personal computers that are
based on the Intel 8086 family of microprocessors (Figure 1-1). MS-DOS
supports networking, graphical user interfaces, and storage devices of
every description; it serves as the platform for thousands of application
programs; and it has over 10 million licensed users──dwarfing the combined
user bases of all of its competitors.
The progenitor of MS-DOS was an operating system called 86-DOS, which was
written by Tim Paterson for Seattle Computer Products in mid-1980. At that
time, Digital Research's CP/M-80 was the operating system most commonly
used on microcomputers based on the Intel 8080 and Zilog Z-80
microprocessors, and a wide range of application software (word
processors, database managers, and so forth) was available for use with
CP/M-80.
To ease the process of porting 8-bit CP/M-80 applications into the new
16-bit environment, 86-DOS was originally designed to mimic CP/M-80 in
both available functions and style of operation. Consequently, the
structures of 86-DOS's file control blocks, program segment prefixes, and
executable files were nearly identical to those of CP/M-80. Existing
CP/M-80 programs could be converted mechanically (by processing their
source-code files through a special translator program) and, after
conversion, would run under 86-DOS either immediately or with very little
hand editing.
Because 86-DOS was marketed as a proprietary operating system for Seattle
Computer Products' line of S-100 bus, 8086-based microcomputers, it made
very little impact on the microcomputer world in general. Other vendors of
8086-based microcomputers were understandably reluctant to adopt a
competitor's operating system and continued to wait impatiently for the
release of Digital Research's CP/M-86.
In October 1980, IBM approached the major microcomputer-software houses in
search of an operating system for the new line of personal computers it
was designing. Microsoft had no operating system of its own to offer
(other than a stand-alone version of Microsoft BASIC) but paid a fee to
Seattle Computer Products for the right to sell Paterson's 86-DOS. (At
that time, Seattle Computer Products received a license to use and sell
Microsoft's languages and all 8086 versions of Microsoft's operating
system.) In July 1981, Microsoft purchased all rights to 86-DOS, made
substantial alterations to it, and renamed it MS-DOS. When the first IBM
PC was released in the fall of 1981, IBM offered MS-DOS (referred to as
PC-DOS 1.0) as its primary operating system.
IBM also selected Digital Research's CP/M-86 and Softech's P-system as
alternative operating systems for the PC. However, they were both very
slow to appear at IBM PC dealers and suffered the additional disadvantages
of higher prices and lack of available programming languages. IBM threw
its considerable weight behind PC-DOS by releasing all the IBM-logo PC
application software and development tools to run under it. Consequently,
most third-party software developers targeted their products for PC-DOS
from the start, and CP/M-86 and P-system never became significant factors
in the IBM PC─compatible market.
In spite of some superficial similarities to its ancestor CP/M-80, MS-DOS
version 1.0 contained a number of improvements over CP/M-80, including the
following:
■ An improved disk-directory structure that included information about a
file's attributes (such as whether it was a system or a hidden file),
its exact size in bytes, and the date that the file was created or last
modified
■ A superior disk-space allocation and management method, allowing
extremely fast sequential or random record access and program loading
■ An expanded set of operating-system services, including
hardware-independent function calls to set or read the date and time, a
filename parser, multiple-block record I/O, and variable record sizes
■ An AUTOEXEC.BAT batch file to perform a user-defined series of commands
when the system was started or reset
IBM was the only major computer manufacturer (sometimes referred to as
OEM, for original equipment manufacturer) to ship MS-DOS version 1.0 (as
PC-DOS 1.0) with its products. MS-DOS version 1.25 (equivalent to IBM
PC-DOS 1.1) was released in June 1982 to fix a number of bugs and also to
support double-sided disks and improved hardware independence in the DOS
kernel. This version was shipped by several vendors besides IBM, including
Texas Instruments, COMPAQ, and Columbia, who all entered the personal
computer market early. Due to rapid decreases in the prices of RAM and
fixed disks, MS-DOS version 1 is no longer in common use.
MS-DOS version 2.0 (equivalent to PC-DOS 2.0) was first released in March
1983. It was, in retrospect, a new operating system (though great care was
taken to maintain compatibility with MS-DOS version 1). It contained many
significant innovations and enhanced features, including those listed on
the following page.
■ Support for both larger-capacity floppy disks and hard disks
■ Many UNIX/XENIX-like features, including a hierarchical file structure,
file handles, I/O redirection, pipes, and filters
■ Background printing (print spooling)
■ Volume labels, plus additional file attributes
■ Installable device drivers
■ A user-customizable system-configuration file that controlled the
loading of additional device drivers, the number of system disk
buffers, and so forth
■ Maintenance of environment blocks that could be used to pass
information between programs
■ An optional ANSI display driver that allowed programs to position the
cursor and control display characteristics in a hardware-independent
manner
■ Support for the dynamic allocation, modification, and release of memory
by application programs
■ Support for customized user command interpreters (shells)
■ System tables to assist application software in modifying its currency,
time, and date formats (known as international support)
MS-DOS version 2.11 was subsequently released to improve international
support (table-driven currency symbols, date formats, decimal-point
symbols, currency separators, and so forth), to add support for 16-bit
Kanji characters throughout, and to fix a few minor bugs. Version 2.11
rapidly became the base version shipped for 8086/8088-based personal
computers by every major OEM, including Hewlett-Packard, Wang, Digital
Equipment Corporation, Texas Instruments, COMPAQ, and Tandy.
MS-DOS version 2.25, released in October 1985, was distributed in the Far
East but was never shipped by OEMs in the United States and Europe. In
this version, the international support for Japanese and Korean character
sets was extended even further, additional bugs were repaired, and many of
the system utilities were made compatible with MS-DOS version 3.0.
MS-DOS version 3.0 was introduced by IBM in August 1984 with the release
of the 80286-based PC/AT machines. It represented another major rewrite of
the entire operating system and included the important new features listed
on the following page.
■ Direct control of the print spooler by application software
■ Further expansion of international support for currency formats
■ Extended error reporting, including a code that suggests a recovery
strategy to the application program
■ Support for file and record locking and sharing
■ Support for larger fixed disks
MS-DOS version 3.1, which was released in November 1984, added support for
the sharing of files and printers across a network. Beginning with version
3.1, a new operating-system module called the redirector intercepts an
application program's requests for I/O and filters out the requests that
are directed to network devices, passing these requests to another machine
for processing.
Since version 3.1, the changes to MS-DOS have been evolutionary rather
than revolutionary. Version 3.2, which appeared in 1986, generalized the
definition of device drivers so that new media types (such as 3.5-inch
floppy disks) could be supported more easily. Version 3.3 was released in
1987, concurrently with the new IBM line of PS/2 personal computers, and
drastically expanded MS-DOS's multilanguage support for keyboard mappings,
printer character sets, and display fonts. Version 4.0, delivered in 1988,
was enhanced with a visual shell as well as support for very large file
systems.
While MS-DOS has been evolving, Microsoft has also put intense efforts
into the areas of user interfaces and multitasking operating systems.
Microsoft Windows, first shipped in 1985, provides a multitasking,
graphical user "desktop" for MS-DOS systems. Windows has won widespread
support among developers of complex graphics applications such as desktop
publishing and computer-aided design because it allows their programs to
take full advantage of whatever output devices are available without
introducing any hardware dependence.
Microsoft Operating System/2 (MS OS/2), released in 1987, represents a new
standard for application developers: a protected-mode, multitasking,
virtual-memory system specifically designed for applications requiring
high-performance graphics, networking, and interprocess communications.
Although MS OS/2 is a new product and is not a derivative of MS-DOS, its
user interface and file system are compatible with MS-DOS and Microsoft
Windows, and it offers the ability to run one real-mode (MS-DOS)
application alongside MS OS/2 protected-mode applications. This
compatibility allows users to move between the MS-DOS and OS/2
environments with a minimum of difficulty.
┌─────────────┐
│ MS-DOS 1.0 │ 1981: First operating system on IBM PC
│ PC-DOS 1.0 │
└──────┬──────┘
│
┌──────▼──────┐
│ MS-DOS 1.25 │ Double-sided disk support and bug fixes added:
│ PC-DOS 1.1 │ widely distributed by OEMs other than IBM
└──────┬──────┘
│
┌──────▼──────┐ 1983: Introduced with IBM PC/XT;
│ MS-DOS 2.0 │ support for UNIX/XENIX-like hierarchical
│ PC-DOS 2.0 │ file structure and hard disks added
└──────┬──────┘
├──────────────────────────────────────┐
┌──────▼──────┐ ┌──────▼──────┐
│ MS-DOS 2.01 │ 2.0 with international │ PC-DOS 2.1 │ Introduced with PCjr
└──────┬──────┘ support └─────────────┘ 2.0 with bug fixes
│
┌──────▼──────┐
│ MS-DOS 2.11 │ 2.01 with bug fixes
└──────┬──────┘
├──────────────────────────────────────┐
┌──────▼──────┐ 1984: Introduced with ┌──────▼──────┐ 1985: Far East OEMs;
│ MS-DOS 3.0 │ PC/AT; support for │ MS-DOS 2.25 │ support for extended
│ PC-DOS 3.0 │ 1.2 MB floppy disk, └─────────────┘ character sets
└──────┬──────┘ larger hard disk added
│
┌──────▼──────┐
│ MS-DOS 3.1 │ Support for Microsoft ┌─────────────┐ 1985: Graphical
│ PC-DOS 3.1 │ Networks added │ Windows │ user interface
└──────┬──────┘ │ 1.0 │ for MS-DOS
│ └──────┬──────┘
┌──────▼──────┐ │
│ MS-DOS 3.2 │ 1986: Support for 3.5- │
│ PC-DOS 3.2 │ inch disks added │
└──────┬──────┘ │
│ ┌──────▼──────┐ 1987: Compatibility
┌──────▼──────┐ 1987: Introduced with │ Windows │ with OS/2
│ MS-DOS 3.3 │ IBM PS/2; generalized │ 2.0 │ Presentation Manager
│ PC-DOS 3.3 │ code-page (font) └─────────────┘
└──────┬──────┘ support
│
┌──────▼──────┐ 1988: Support for
│ MS-DOS 4.0 │ logical volumes larger
│ PC-DOS 4.0 │ than 32 MB; visual shell
└─────────────┘
Figure 1-1. The evolution of MS-DOS.
What does the future hold for MS-DOS? Only the long-range planning teams
at Microsoft and IBM know for sure. But it seems safe to assume that
MS-DOS, with its relatively small memory requirements, adaptability to
diverse hardware configurations, and enormous base of users, will remain
important to programmers and software publishers for years to come.
────────────────────────────────────────────────────────────────────────────
Chapter 2 MS-DOS in Operation
It is unlikely that you will ever be called upon to configure the MS-DOS
software for a new model of computer. Still, an acquaintance with the
general structure of MS-DOS can often be very helpful in understanding the
behavior of the system as a whole. In this chapter, we will discuss how
MS-DOS is organized and how it is loaded into memory when the computer is
turned on.
The Structure of MS-DOS
MS-DOS is partitioned into several layers that serve to isolate the kernel
logic of the operating system, and the user's perception of the system,
from the hardware it is running on. These layers are
■ The BIOS (Basic Input/Output System)
■ The DOS kernel
■ The command processor (shell)
We'll discuss the functions of each of these layers separately.
The BIOS Module
The BIOS is specific to the individual computer system and is provided by
the manufacturer of the system. It contains the default resident
hardware-dependent drivers for the following devices:
■ Console display and keyboard (CON)
■ Line printer (PRN)
■ Auxiliary device (AUX)
■ Date and time (CLOCK$)
■ Boot disk device (block device)
The MS-DOS kernel communicates with these device drivers through I/O
request packets; the drivers then translate these requests into the proper
commands for the various hardware controllers. In many MS-DOS systems,
including the IBM PC, the most primitive parts of the hardware drivers are
located in read-only memory (ROM) so that they can be used by stand-alone
applications, diagnostics, and the system startup program.
The terms resident and installable are used to distinguish between the
drivers built into the BIOS and the drivers installed during system
initialization by DEVICE commands in the CONFIG.SYS file. (Installable
drivers will be discussed in more detail later in this chapter and in
Chapter 14.)
The BIOS is read into random-access memory (RAM) during system
initialization as part of a file named IO.SYS. (In PC-DOS, the file is
called IBMBIO.COM.) This file is marked with the special attributes hidden
and system.
The DOS Kernel
The DOS kernel implements MS-DOS as it is seen by application programs.
The kernel is a proprietary program supplied by Microsoft Corporation and
provides a collection of hardware-independent services called system
functions. These functions include the following:
■ File and record management
■ Memory management
■ Character-device input/output
■ Spawning of other programs
■ Access to the real-time clock
Programs can access system functions by loading registers with
function-specific parameters and then transferring to the operating system
by means of a software interrupt.
The DOS kernel is read into memory during system initialization from the
MSDOS.SYS file on the boot disk. (The file is called IBMDOS.COM in
PC-DOS.) This file is marked with the attributes hidden and system.
The Command Processor
The command processor, or shell, is the user's interface to the operating
system. It is responsible for parsing and carrying out user commands,
including the loading and execution of other programs from a disk or other
mass-storage device.
The default shell that is provided with MS-DOS is found in a file called
COMMAND.COM. Although COMMAND.COM prompts and responses constitute the
ordinary user's complete perception of MS-DOS, it is important to realize
that COMMAND.COM is not the operating system, but simply a special class
of program running under the control of MS-DOS.
COMMAND.COM can be replaced with a shell of the programmer's own design by
simply adding a SHELL directive to the system-configuration file
(CONFIG.SYS) on the system startup disk. The product COMMAND-PLUS from ESP
Systems is an example of such an alternative shell.
More about COMMAND.COM
The default MS-DOS shell, COMMAND.COM, is divided into three parts:
■ A resident portion
■ An initialization section
■ A transient module
The resident portion is loaded in lower memory, above the DOS kernel and
its buffers and tables. It contains the routines to process Ctrl-C and
Ctrl-Break, critical errors, and the termination (final exit) of other
transient programs. This part of COMMAND.COM issues error messages and is
responsible for the familiar prompt
Abort, Retry, Ignore?
The resident portion also contains the code required to reload the
transient portion of COMMAND.COM when necessary.
The initialization section of COMMAND.COM is loaded above the resident
portion when the system is started. It processes the AUTOEXEC.BAT batch
file (the user's list of commands to execute at system startup), if one is
present, and is then discarded.
The transient portion of COMMAND.COM is loaded at the high end of memory,
and its memory can also be used for other purposes by application
programs. The transient module issues the user prompt, reads the commands
from the keyboard or batch file, and causes them to be executed. When an
application program terminates, the resident portion of COMMAND.COM does a
checksum of the transient module to determine whether it has been
destroyed and fetches a fresh copy from the disk if necessary.
The user commands that are accepted by COMMAND.COM fall into three
categories:
■ Internal commands
■ External commands
■ Batch files
Internal commands, sometimes called intrinsic commands, are those carried
out by code embedded in COMMAND.COM itself. Commands in this category
include COPY, REN(AME), DIR(ECTORY), and DEL(ETE). The routines for the
internal commands are included in the transient part of COMMAND.COM.
External commands, sometimes called extrinsic commands or transient
programs, are the names of programs stored in disk files. Before these
programs can be executed, they must be loaded from the disk into the
transient program area (TPA) of memory. (See "How MS-DOS Is Loaded" in
this chapter.) Familiar examples of external commands are CHKDSK, BACKUP,
and RESTORE. As soon as an external command has completed its work, it is
discarded from memory; hence, it must be reloaded from disk each time it
is invoked.
Batch files are text files that contain lists of other intrinsic,
extrinsic, or batch commands. These files are processed by a special
interpreter that is built into the transient portion of COMMAND.COM. The
interpreter reads the batch file one line at a time and carries out each
of the specified operations in order.
In order to interpret a user's command, COMMAND.COM first looks to see if
the user typed the name of a built-in (intrinsic) command that it can
carry out directly. If not, it searches for an external command
(executable program file) or batch file by the same name. The search is
carried out first in the current directory of the current disk drive and
then in each of the directories specified in the most recent PATH command.
In each directory inspected, COMMAND.COM first tries to find a file with
the extension .COM, then .EXE, and finally .BAT. If the search fails for
all three file types in all of the possible locations, COMMAND.COM
displays the familiar message
Bad command or file name
If a .COM file or a .EXE file is found, COMMAND.COM uses the MS-DOS EXEC
function to load and execute it. The EXEC function builds a special data
structure called a program segment prefix (PSP) above the resident portion
of COMMAND.COM in the transient program area. The PSP contains various
linkages and pointers needed by the application program. Next, the EXEC
function loads the program itself, just above the PSP, and performs any
relocation that may be necessary. Finally, it sets up the registers
appropriately and transfers control to the entry point for the program.
(Both the PSP and the EXEC function will be discussed in more detail in
Chapters 3 and 12.) When the transient program has finished its job, it
calls a special MS-DOS termination function that releases the transient
program's memory and returns control to the program that caused the
transient program to be loaded (COMMAND.COM, in this case).
A transient program has nearly complete control of the system's resources
while it is executing. The only other tasks that are accomplished are
those performed by interrupt handlers (such as the keyboard input driver
and the real-time clock) and operations that the transient program
requests from the operating system. MS-DOS does not support sharing of the
central processor among several tasks executing concurrently, nor can it
wrest control away from a program when it crashes or executes for too
long. Such capabilities are the province of MS OS/2, which is a
protected-mode system with preemptive multitasking (time-slicing).
How MS-DOS Is Loaded
When the system is started or reset, program execution begins at address
0FFFF0H. This is a feature of the 8086/8088 family of microprocessors and
has nothing to do with MS-DOS. Systems based on these processors are
designed so that address 0FFFF0H lies within an area of ROM and contains a
jump machine instruction to transfer control to system test code and the
ROM bootstrap routine (Figure 2-1).
The ROM bootstrap routine reads the disk bootstrap routine from the first
sector of the system startup disk (the boot sector) into memory at some
arbitrary address and then transfers control to it (Figure 2-2). (The
boot sector also contains a table of information about the disk format.)
The disk bootstrap routine checks to see if the disk contains a copy of
MS-DOS. It does this by reading the first sector of the root directory and
determining whether the first two files are IO.SYS and MSDOS.SYS (or
IBMBIO.COM and IBMDOS.COM), in that order. If these files are not present,
the user is prompted to change disks and strike any key to try again.
┌───────────────────────────────────────────────┐
│ ROM bootstrap routine │
├───────────────────────────────────────────────┤
│ │
├───────────────────────────────────────────────┤ ◄ Top of RAM
│ │
│ │
└──────────────────────┐ │
┌────────────────────┐ └────────────────────────┘
│ └──────────────────────────┐
│ │
│ │
│ │
00400H ├───────────────────────────────────────────────┤
│ Interrupt vectors │
00000H └───────────────────────────────────────────────┘
Figure 2-1. A typical 8086/8088-based computer system immediately after
system startup or reset. Execution begins at location 0FFFF0H, which
contains a jump instruction that directs program control to the ROM
bootstrap routine.
┌───────────────────────────────────────────────┐
│ ROM bootstrap routine │
├───────────────────────────────────────────────┤
│ │
├───────────────────────────────────────────────┤ ◄ Top of RAM
│ │
├───────────────────────────────────────────────┤
│ Disk bootstrap routine │
├───────────────────────────────────────────────┤ ◄ Arbitrary
│ │ load location
│ │
└──────────────────────┐ │
┌────────────────────┐ └────────────────────────┘
│ └──────────────────────────┐
│ │
│ │
00400H ├───────────────────────────────────────────────┤
│ Interrupt vectors │
00000H └───────────────────────────────────────────────┘
Figure 2-2. The ROM bootstrap routine loads the disk bootstrap routine
into memory from the first sector of the system startup disk and then
transfers control to it.
If the two system files are found, the disk bootstrap reads them into
memory and transfers control to the initial entry point of IO.SYS (Figure
2-3). (In some implementations, the disk bootstrap reads only IO.SYS into
memory, and IO.SYS in turn loads the MSDOS.SYS file.)
The IO.SYS file that is loaded from the disk actually consists of two
separate modules. The first is the BIOS, which contains the linked set of
resident device drivers for the console, auxiliary port, printer, block,
and clock devices, plus some hardware-specific initialization code that is
run only at system startup. The second module, SYSINIT, is supplied by
Microsoft and linked into the IO.SYS file, along with the BIOS, by the
computer manufacturer.
SYSINIT is called by the manufacturer's BIOS initialization code. It
determines the amount of contiguous memory present in the system and then
relocates itself to high memory. Then it moves the DOS kernel, MSDOS.SYS,
from its original load location to its final memory location, overlaying
the original SYSINIT code and any other expendable initialization code
that was contained in the IO.SYS file (Figure 2-4).
Next, SYSINIT calls the initialization code in MSDOS.SYS. The DOS kernel
initializes its internal tables and work areas, sets up the interrupt
vectors 20H through 2FH, and traces through the linked list of resident
device drivers, calling the initialization function for each. (See Chapter
14.)
┌───────────────────────────────────────────────┐
│ ROM bootstrap routine │
├───────────────────────────────────────────────┤
│ │
├───────────────────────────────────────────────┤ ◄ Top of RAM
│ │
├───────────────────────────────────────────────┤
│ Disk bootstrap routine │
├───────────────────────────────────────────────┤
│ │
└──────────────────────┐ │
┌────────────────────┐ └────────────────────────┘
│ └──────────────────────────┐
│ │
├───────────────────────────────────────────────┤
│ DOS kernel (from MSDOS.SYS) │
├───────────────────────────────────────────────┤ ◄ In temporary
│ SYSINIT (from IO.SYS) │ location
├───────────────────────────────────────────────┤
│ BIOS (from IO.SYS) │
├───────────────────────────────────────────────┤
│ │
00400H ├───────────────────────────────────────────────┤
│ Interrupt vectors │
00000H └───────────────────────────────────────────────┘
Figure 2-3. The disk bootstrap reads the file IO.SYS into memory. This
file contains the MS-DOS BIOS (resident device drivers) and the SYSINIT
module. Either the disk bootstrap or the BIOS (depending upon the
manufacturer's implementation) then reads the DOS kernel into memory from
the MSDOS.SYS file.
These driver functions determine the equipment status, perform any
necessary hardware initialization, and set up the vectors for any external
hardware interrupts the drivers will service.
As part of the initialization sequence, the DOS kernel examines the
disk-parameter blocks returned by the resident block-device drivers,
determines the largest sector size that will be used in the system, builds
some drive-parameter blocks, and allocates a disk sector buffer. Control
then returns to SYSINIT.
When the DOS kernel has been initialized and all resident device drivers
are available, SYSINIT can call on the normal MS-DOS file services to open
the CONFIG.SYS file. This optional file can contain a variety of commands
that enable the user to customize the MS-DOS environment. For instance,
the user can specify additional hardware device drivers, the number of
disk buffers, the maximum number of files that can be open at one time,
and the filename of the command processor (shell).
If it is found, the entire CONFIG.SYS file is loaded into memory for
processing. All lowercase characters are converted to uppercase, and the
file is interpreted one line at a time to process the commands. Memory is
allocated for the disk buffer cache and the internal file control blocks
used by the handle file and record system functions. (See Chapter 8.) Any
device drivers indicated in the CONFIG.SYS file are sequentially loaded
into memory, initialized by calls to their init modules, and linked into
the device-driver list. The init function of each driver tells SYSINIT how
much memory to reserve for that driver.
┌───────────────────────────────────────────────┐
│ ROM bootstrap routine │
├───────────────────────────────────────────────┤
│ │
├───────────────────────────────────────────────┤ ◄ Top of RAM
│ SYSINIT module │
├───────────────────────────────────────────────┤
│ │
└──────────────────────┐ │
┌────────────────────┐ └────────────────────────┘
│ └──────────────────────────┐
│ │
├───────────────────────────────────────────────┤
│ Installable drivers │
├───────────────────────────────────────────────┤
│ File control blocks │
├───────────────────────────────────────────────┤
│ Disk buffer cache │
├───────────────────────────────────────────────┤
│ DOS kernel │
├───────────────────────────────────────────────┤ ◄ In final
│ BIOS │ location
├───────────────────────────────────────────────┤
│ │
├───────────────────────────────────────────────┤
00400H ├───────────────────────────────────────────────┤
│ Interrupt vectors │
00000H └───────────────────────────────────────────────┘
Figure 2-4. SYSINIT moves itself to high memory and relocates the DOS
kernel, MSDOS.SYS, downward to its final address. The MS-DOS disk buffer
cache and file control block areas are allocated, and then the installable
device drivers specified in the CONFIG.SYS file are loaded and linked into
the system.
After all installable device drivers have been loaded, SYSINIT closes all
file handles and reopens the console (CON), printer (PRN), and auxiliary
(AUX) devices as the standard input, standard output, standard error,
standard list, and standard auxiliary devices. This allows a
user-installed character-device driver to override the BIOS's resident
drivers for the standard devices.
Finally, SYSINIT calls the MS-DOS EXEC function to load the command
interpreter, or shell. (The default shell is COMMAND.COM, but another
shell can be substituted by means of the CONFIG.SYS file.) Once the shell
is loaded, it displays a prompt and waits for the user to enter a command.
MS-DOS is now ready for business, and the SYSINIT module is discarded
(Figure 2-5).
┌───────────────────────────────────────────────┐
│ ROM bootstrap routine │
├───────────────────────────────────────────────┤
│ │
├───────────────────────────────────────────────┤ ◄ Top of RAM
│ Transient part of COMMAND.COM │
├───────────────────────────────────────────────┤
└──────────────────────┐ │
┌────────────────────┐ └────────────────────────┘
│ └──────────────────────────┐
│ Transient program area │
├───────────────────────────────────────────────┤
│ Resident part of COMMAND.COM │
├───────────────────────────────────────────────┤
│ Installable drivers │
├───────────────────────────────────────────────┤
│ File control blocks │
├───────────────────────────────────────────────┤
│ Disk buffer cache │
├───────────────────────────────────────────────┤
│ DOS kernel │
├───────────────────────────────────────────────┤
│ BIOS │
├───────────────────────────────────────────────┤
│ │
00400H ├───────────────────────────────────────────────┤
│ Interrupt vectors │
00000H └───────────────────────────────────────────────┘
Figure 2-5. The final result of the MS-DOS startup process for a typical
system. The resident portion of COMMAND.COM lies in low memory, above the
DOS kernel. The transient portion containing the batch-file interpreter
and intrinsic commands is placed in high memory, where it can be overlaid
by extrinsic commands and application programs running in the transient
program area.
────────────────────────────────────────────────────────────────────────────
Chapter 3 Structure of MS-DOS Application Programs
Programs that run under MS-DOS come in two basic flavors: .COM programs,
which have a maximum size of approximately 64 KB, and .EXE programs, which
can be as large as available memory. In Intel 8086 parlance, .COM programs
fit the tiny model, in which all segment registers contain the same value;
that is, the code and data are mixed together. In contrast, .EXE programs
fit the small, medium, or large model, in which the segment registers
contain different values; that is, the code, data, and stack reside in
separate segments. .EXE programs can have multiple code and data segments,
which are respectively addressed by long calls and by manipulation of the
data segment (DS) register.
A .COM-type program resides on the disk as an absolute memory image, in a
file with the extension .COM. The file does not have a header or any other
internal identifying information. A .EXE program, on the other hand,
resides on the disk in a special type of file with a unique header, a
relocation map, a checksum, and other information that is (or can be) used
by MS-DOS.
Both .COM and .EXE programs are brought into memory for execution by the
same mechanism: the EXEC function, which constitutes the MS-DOS loader.
EXEC can be called with the filename of a program to be loaded by
COMMAND.COM (the normal MS-DOS command interpreter), by other shells or
user interfaces, or by another program that was previously loaded by EXEC.
If there is sufficient free memory in the transient program area, EXEC
allocates a block of memory to hold the new program, builds the program
segment prefix (PSP) at its base, and then reads the program into memory
immediately above the PSP. Finally, EXEC sets up the segment registers and
the stack and transfers control to the program.
When it is invoked, EXEC can be given the addresses of additional
information, such as a command tail, file control blocks, and an
environment block; if supplied, this information will be passed on to the
new program. (The exact procedure for using the EXEC function in your own
programs is discussed, with examples, in Chapter 12.)
.COM and .EXE programs are often referred to as transient programs. A
transient program "owns" the memory block it has been allocated and has
nearly total control of the system's resources while it is executing. When
the program terminates, either because it is aborted by the operating
system or because it has completed its work and systematically performed a
final exit back to MS-DOS, the memory block is then freed (hence the term
transient) and can be used by the next program in line to be loaded.
The Program Segment Prefix
A thorough understanding of the program segment prefix is vital to
successful programming under MS-DOS. It is a reserved area, 256 bytes
long, that is set up by MS-DOS at the base of the memory block allocated
to a transient program. The PSP contains some linkages to MS-DOS that can
be used by the transient program, some information MS-DOS saves for its
own purposes, and some information MS-DOS passes to the transient
program──to be used or not, as the program requires (Figure 3-1).
Offset
0000H ┌────────────────────────────────────────────────────────┐
│ Int 20H │
0002H ├────────────────────────────────────────────────────────┤
│ Segment, end of allocation block │
0004H ├────────────────────────────────────────────────────────┤
│ Reserved │
0005H ├────────────────────────────────────────────────────────┤
│ Long call to MS-DOS function dispatcher │
000AH ├────────────────────────────────────────────────────────┤
│ Previous contents of termination handler │
│ interrupt vector (Int 22H) │
000EH ├────────────────────────────────────────────────────────┤
│ Previous contents of Ctrl-C interrupt vector (Int 23H) │
0012H ├────────────────────────────────────────────────────────┤
│ Previous contents of critical-error handler │
│ interrupt vector (Int 24H) │
0016H ├────────────────────────────────────────────────────────┤
│ Reserved │
002CH ├────────────────────────────────────────────────────────┤
│ Segment address of environment block │
002EH ├────────────────────────────────────────────────────────┤
│ Reserved │
005CH ├────────────────────────────────────────────────────────┤
│ Default file control block #1 │
006CH ├────────────────────────────────────────────────────────┤
│ Default file control block #2 │
│ (overlaid if FCB #1 opened) │
008OH ├────────────────────────────────────────────────────────┤
└──────────────────────────┐ │
┌────────────────────────┐ └─────────────────────────────┘
│ └───────────────────────────────┐
│ Command tail and default disk transfer area (buffer) │
OOFFH └────────────────────────────────────────────────────────┘
Figure 3-1. The structure of the program segment prefix.
In the first versions of MS-DOS, the PSP was designed to be compatible
with a control area that was built beneath transient programs under
Digital Research's venerable CP/M operating system, so that programs could
be ported to MS-DOS without extensive logical changes. Although MS-DOS has
evolved considerably since those early days, the structure of the PSP is
still recognizably similar to its CP/M equivalent. For example, offset
0000H in the PSP contains a linkage to the MS-DOS process-termination
handler, which cleans up after the program has finished its job and
performs a final exit. Similarly, offset 0005H in the PSP contains a
linkage to the MS-DOS function dispatcher, which performs disk operations,
console input/output, and other such services at the request of the
transient program. Thus, calls to PSP:0000 and PSP:0005 have the same
effect as CALL 0000 and CALL 0005 under CP/M. (These linkages are not the
"approved" means of obtaining these services, however.)
The word at offset 0002H in the PSP contains the segment address of the
top of the transient program's allocated memory block. The program can use
this value to determine whether it should request more memory to do its
job or whether it has extra memory that it can release for use by other
processes.
Offsets 000AH through 0015H in the PSP contain the previous contents of
the interrupt vectors for the termination, Ctrl-C, and critical-error
handlers. If the transient program alters these vectors for its own
purposes, MS-DOS restores the original values saved in the PSP when the
program terminates.
The word at PSP offset 002CH holds the segment address of the environment
block, which contains a series of ASCIIZ strings (sequences of ASCII
characters terminated by a null, or zero, byte). The environment block is
inherited from the program that called the EXEC function to load the
currently executing program. It contains such information as the current
search path used by COMMAND.COM to find executable programs, the location
on the disk of COMMAND.COM itself, and the format of the user prompt used
by COMMAND.COM.
The command tail──the remainder of the command line that invoked the
transient program, after the program's name──is copied into the PSP
starting at offset 0081H. The length of the command tail, not including
the return character at its end, is placed in the byte at offset 0080H.
Redirection or piping parameters and their associated filenames do not
appear in the portion of the command line (the command tail) that is
passed to the transient program, because redirection is transparent to
applications.
To provide compatibility with CP/M, MS-DOS parses the first two parameters
in the command tail into two default file control blocks (FCBs) at
PSP:005CH and PSP:006CH, under the assumption that they may be filenames.
However, if the parameters are filenames that include a path
specification, only the drive code will be valid in these default FCBs,
because FCB-type file- and record-access functions do not support
hierarchical file structures. Although the default FCBs were an aid in
earlier years, when compatibility with CP/M was more of a concern, they
are essentially useless in modern MS-DOS application programs that must
provide full path support. (File control blocks are discussed in detail in
Chapter 8 and hierarchical file structures are discussed in Chapter 9.)
The 128-byte area from 0080H through 00FFH in the PSP also serves as the
default disk transfer area (DTA), which is set by MS-DOS before passing
control to the transient program. If the program does not explicitly
change the DTA, any file read or write operations requested with the FCB
group of function calls automatically use this area as a data buffer. This
is rarely useful and is another facet of MS-DOS's handling of the PSP that
is present only for compatibility with CP/M.
──────────────────────────────────────────────────────────────────────────
WARNING
Programs must not alter any part of the PSP below offset 005CH.
──────────────────────────────────────────────────────────────────────────
Introduction to .COM Programs
Programs of the .COM persuasion are stored in disk files that hold an
absolute image of the machine instructions to be executed. Because the
files contain no relocation information, they are more compact, and are
loaded for execution slightly faster, than equivalent .EXE files. Note
that MS-DOS does not attempt to ascertain whether a .COM file actually
contains executable code (there is no signature or checksum, as in the
case of a .EXE file); it simply brings any file with the .COM extension
into memory and jumps to it.
Because .COM programs are loaded immediately above the program segment
prefix and do not have a header that can specify another entry point, they
must always have an origin of 0100H, which is the length of the PSP.
Location 0100H must contain an executable instruction. The maximum length
of a .COM program is 65,536 bytes, minus the length of the PSP (256 bytes)
and a mandatory word of stack (2 bytes).
When control is transferred to the .COM program from MS-DOS, all of the
segment registers point to the PSP (Figure 3-2). The stack pointer
register contains 0FFFEH if memory allows; otherwise, it is set as high as
possible in memory minus 2 bytes. (MS-DOS pushes a zero word on the stack
before entry.)
SS:SP ┌────────────────────────────────────────────────────────┐
│ │
│ Stack grows downward from top of segment │
│ │ │
│ ▼ │
│ │
│ │ │
│ Program code and data │
│ │
CS:0100H ├────────────────────────────────────────────────────────┤
│ Program segment prefix │
CS:0000H └────────────────────────────────────────────────────────┘
DS:0000H
ES:0000H
SS:0000H
Figure 3-2. A memory image of a typical .COM-type program after loading.
The contents of the .COM file are brought into memory just above the
program segment prefix. Program, code, and data are mixed together in the
same segment, and all segment registers contain the same value.
Although the size of an executable .COM file can't exceed 64 KB, the
current versions of MS-DOS allocate all of the transient program area to
.COM programs when they are loaded. Because many such programs date from
the early days of MS-DOS and are not necessarily "well-behaved" in their
approach to memory management, the operating system simply makes the
worst-case assumption and gives .COM programs everything that is
available. If a .COM program wants to use the EXEC function to invoke
another process, it must first shrink down its memory allocation to the
minimum memory it needs in order to continue, taking care to protect its
stack. (This is discussed in more detail in Chapter 12.)
When a .COM program finishes executing, it can return control to MS-DOS by
several means. The preferred method is Int 21H Function 4CH, which allows
the program to pass a return code back to the program, shell, or batch
file that invoked it. However, if the program is running under MS-DOS
version 1, it must exit by means of Int 20H, Int 21H Function 0, or a
NEAR RETURN. (Because a word of zero was pushed onto the stack at entry, a
NEAR RETURN causes a transfer to PSP:0000, which contains an Int 20H
instruction.)
A .COM-type application can be linked together from many separate object
modules. All of the modules must use the same code-segment name and class
name, and the module with the entry point at offset 0100H within the
segment must be linked first. In addition, all of the procedures within a
.COM program should have the NEAR attribute, because all executable code
resides in one segment.
When linking a .COM program, the linker will display the message
Warning: no stack segment
This message can be ignored. The linker output is a .EXE file, which must
be converted into a .COM file with the MS-DOS EXE2BIN utility before
execution. You can then delete the .EXE file. (An example of this process
is provided in Chapter 4.)
An Example .COM Program
The HELLO.COM program listed in Figure 3-3 demonstrates the structure of
a simple assembly-language program that is destined to become a .COM file.
(You may find it helpful to compare this listing with the HELLO.EXE
program later in this chapter.) Because this program is so short and
simple, a relatively high proportion of the source code is actually
assembler directives that do not result in any executable code.
The NAME statement simply provides a module name for use during the
linkage process. This aids understanding of the map that the linker
produces. In MASM versions 5.0 and later, the module name is always the
same as the filename, and the NAME statement is ignored.
The PAGE command, when used with two operands, as in line 2, defines the
length and width of the page. These default respectively to 66 lines and
80 characters. If you use the PAGE command without any operands, a
formfeed is sent to the printer and a heading is printed. In larger
programs, use the PAGE command liberally to place each of your subroutines
on separate pages for easy reading.
The TITLE command, in line 3, specifies the text string (limited to 60
characters) that is to be printed at the upper left corner of each page.
The TITLE command is optional and cannot be used more than once in each
assembly-language source file.
──────────────────────────────────────────────────────────────────────────
1: name hello
2: page 55,132
3: title HELLO.COM--print hello on terminal
4:
5: ;
6: ; HELLO.COM: demonstrates various components
7: ; of a functional .COM-type assembly-
8: ; language program, and an MS-DOS
9: ; function call.
10: ;
11: ; Ray Duncan, May 1988
12: ;
13:
14: stdin equ 0 ; standard input handle
15: stdout equ 1 ; standard output handle
16: stderr equ 2 ; standard error handle
17:
18: cr equ 0dh ; ASCII carriage return
19: lf equ 0ah ; ASCII linefeed
20:
21:
22: _TEXT segment word public 'CODE'
23:
24: org 100h ; .COM files always have
25: ; an origin of 100h
26:
27: assume cs:_TEXT,ds:_TEXT,es:_TEXT,ss:_TEXT
28:
29: print proc near ; entry point from MS-DOS
30:
31: mov ah,40h ; function 40h = write
32: mov bx,stdout ; handle for standard output
33: mov cx,msg_len ; length of message
34: mov dx,offset msg ; address of message
35: int 21h ; transfer to MS-DOS
36:
37: mov ax,4c00h ; exit, return code = 0
38: int 21h ; transfer to MS-DOS
39:
40: print endp
41:
42:
43: msg db cr,lf ; message to display
44: db 'Hello World!',cr,lf
45:
46: msg_len equ $-msg ; length of message
47:
48:
49: _TEXT ends
50:
51: end print ; defines entry point
──────────────────────────────────────────────────────────────────────────
Figure 3-3. The HELLO.COM program listing.
Dropping down past a few comments and EQU statements, we come to a
declaration of a code segment that begins in line 22 with a SEGMENT
command and ends in line 49 with an ENDS command. The label in the
leftmost field of line 22 gives the code segment the name _TEXT. The
operand fields at the right end of the line give the segment the
attributes WORD, PUBLIC, and `CODE'. (You might find it helpful to read
the Microsoft Macro Assembler manual for detailed explanations of each
possible segment attribute.)
Because this program is going to be converted into a .COM file, all of its
executable code and data areas must lie within one code segment. The
program must also have its origin at offset 0100H (immediately above the
program segment prefix), which is taken care of by the ORG statement
in line 24.
Following the ORG instruction, we encounter an ASSUME statement on line
27. The concept of ASSUME often baffles new assembly-language programmers.
In a way, ASSUME doesn't "do" anything; it simply tells the assembler
which segment registers you are going to use to point to the various
segments of your program, so that the assembler can provide segment
overrides when they are necessary. It's important to notice that the
ASSUME statement doesn't take care of loading the segment registers with
the proper values; it merely notifies the assembler of your intent to do
that within the program. (Remember that, in the case of a .COM program,
MS-DOS initializes all the segment registers before entry to point to the
PSP.)
Within the code segment, we come to another type of block declaration that
begins with the PROC command on line 29 and closes with ENDP on line 40.
These two instructions declare the beginning and end of a procedure, a
block of executable code that performs a single distinct function. The
label in the leftmost field of the PROC statement (in this case, print)
gives the procedure a name. The operand field gives it an attribute. If
the procedure carries the NEAR attribute, only other code in the same
segment can call it, whereas if it carries the FAR attribute, code located
anywhere in the CPU's memory-addressing space can call it. In .COM
programs, all procedures carry the NEAR attribute.
For the purposes of this example program, I have kept the print procedure
ridiculously simple. It calls MS-DOS Int 21H Function 40H to send the
message Hello World! to the video screen, and calls Int 21H Function 4CH
to terminate the program.
The END statement in line 51 tells the assembler that it has reached the
end of the source file and also specifies the entry point for the program.
If the entry point is not a label located at offset 0100H, the .EXE file
resulting from the assembly and linkage of this source program cannot be
converted into a .COM file.
Introduction to .EXE Programs
We have just discussed a program that was written in such a way that it
could be assembled into a .COM file. Such a program is simple in
structure, so a programmer who needs to put together this kind of quick
utility can concentrate on the program logic and do a minimum amount of
worrying about control of the assembler. However, .COM-type programs have
some definite disadvantages, and so most serious assembly-language efforts
for MS-DOS are written to be converted into .EXE files.
Although .COM programs are effectively restricted to a total size of 64 KB
for machine code, data, and stack combined, .EXE programs can be
practically unlimited in size (up to the limit of the computer's available
memory). .EXE programs also place the code, data, and stack in separate
parts of the file. Although the normal MS-DOS program loader does not take
advantage of this feature of .EXE files, the ability to load different
parts of large programs into several separate memory fragments, as well as
the opportunity to designate a "pure" code portion of your program that
can be shared by several tasks, is very significant in multitasking
environments such as Microsoft Windows.
The MS-DOS loader always brings a .EXE program into memory immediately
above the program segment prefix, although the order of the code, data,
and stack segments may vary (Figure 3-4). The .EXE file has a header, or
block of control information, with a characteristic format (Figures 3-5
and 3-6). The size of this header varies according to the number of
program instructions that need to be relocated at load time, but it is
always a multiple of 512 bytes.
Before MS-DOS transfers control to the program, the initial values of the
code segment (CS) register and instruction pointer (IP) register are
calculated from the entry-point information in the .EXE file header and
the program's load address. This information derives from an END statement
in the source code for one of the program's modules. The data segment (DS)
and extra segment (ES) registers are made to point to the PSP so that the
program can access the environment-block pointer, command tail, and other
useful information contained there.
SS:SP ┌────────────────────────────────────────────────────────┐
│ │
│ Stack segment: │
│ stack grows downward from top of segment │
│ │ │
│ ▼ │
SS:0000H ├────────────────────────────────────────────────────────┤
│ Data segment │
├────────────────────────────────────────────────────────┤
│ Program code │
CS:0000H ├────────────────────────────────────────────────────────┤
│ Program segment prefix │
DS:0000H └────────────────────────────────────────────────────────┘
ES:0000H
Figure 3-4. A memory image of a typical .EXE-type program immediately
after loading. The contents of the .EXE file are relocated and brought
into memory above the program segment prefix. Code, data, and stack reside
in separate segments and need not be in the order shown here. The entry
point can be anywhere in the code segment and is specified by the END
statement in the main module of the program. When the program receives
control, the DS (data segment) and ES (extra segment) registers point to
the program segment prefix; the program usually saves this value and then
resets the DS and ES registers to point to its data area.
The initial contents of the stack segment (SS) and stack pointer (SP)
registers come from the header. This information derives from the
declaration of a segment with the attribute STACK somewhere in the
program's source code. The memory space allocated for the stack may be
initialized or uninitialized, depending on the stack-segment definition;
many programmers like to initialize the stack memory with a recognizable
data pattern so that they can inspect memory dumps and determine how much
stack space is actually used by the program.
When a .EXE program finishes processing, it should return control to
MS-DOS through Int 21H Function 4CH. Other methods are available, but
they offer no advantages and are considerably less convenient (because
they usually require the CS register to point to the PSP).
Byte
offset
0000H ┌────────────────────────────────────────────────────────┐
│ First of .EXE file signature (4DH) │
0001H ├────────────────────────────────────────────────────────┤
│ Second part of .EXE file signature (5AH) │
0002H ├────────────────────────────────────────────────────────┤
│ Length of file MOD 512 │
0004H ├────────────────────────────────────────────────────────┤
│ Size of file in 512-byte pages, including header │
0006H ├────────────────────────────────────────────────────────┤
│ Number of relocation-table items │
0008H ├────────────────────────────────────────────────────────┤
│ Size of header in paragraphs (16-byte units) │
000AH ├────────────────────────────────────────────────────────┤
│ Minimum number of paragraphs needed above program │
000CH ├────────────────────────────────────────────────────────┤
│ Maximum number of paragraphs desired above program │
000EH ├────────────────────────────────────────────────────────┤
│ Segment displacement of stack module │
0010H ├────────────────────────────────────────────────────────┤
│ Contents of SP register at entry │
0012H ├────────────────────────────────────────────────────────┤
│ Word checksum │
0014H ├────────────────────────────────────────────────────────┤
│ Contents of IP register at entry │
0016H ├────────────────────────────────────────────────────────┤
│ Segment displacement of code module │
0018H ├────────────────────────────────────────────────────────┤
│ Offset of first relocation item in file │
001AH ├────────────────────────────────────────────────────────┤
│ Overlay number (0 for resident part of program) │
001BH ├────────────────────────────────────────────────────────┤
│ Variable reserved space │
├────────────────────────────────────────────────────────┤
│ Relocation table │
├────────────────────────────────────────────────────────┤
│ Variable reserved space │
├────────────────────────────────────────────────────────┤
│ Program and data segments │
├────────────────────────────────────────────────────────┤
│ Stack segment │
└────────────────────────────────────────────────────────┘
Figure 3-5. The format of a .EXE load module.
The input to the linker for a .EXE-type program can be many separate
object modules. Each module can use a unique code-segment name, and the
procedures can carry either the NEAR or the FAR attribute, depending on
naming conventions and the size of the executable code. The programmer
must take care that the modules linked together contain only one segment
with the STACK attribute and only one entry point defined with an END
assembler directive. The output from the linker is a file with a .EXE
extension. This file can be executed immediately.
──────────────────────────────────────────────────────────────────────────
C>DUMP HELLO.EXE
0 1 2 3 4 5 6 7 8 9 A B C D E F
0000 4D 5A 28 00 02 00 01 00 20 00 09 00 FF FF 03 00 MZ(..... .......
0010 80 00 20 05 00 00 00 00 1E 00 00 00 01 00 01 00 .. .............
0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
.
.
.
0200 B8 01 00 8E D8 B4 40 BB 01 00 B9 10 00 90 BA 08 ......@.........
0210 00 CD 21 B8 00 4C CD 21 0D 0A 48 65 6C 6C 6F 20 ..!..L.!..Hello
0220 57 6F 72 6C 64 21 0D 0A World!..
──────────────────────────────────────────────────────────────────────────
Figure 3-6. A hex dump of the HELLO.EXE program, demonstrating the
contents of a simple .EXE load module. Note the following interesting
values: the .EXE signature in bytes 0000H and 0001H, the number of
relocation-table items in bytes 0006H and 0007H, the minimum extra memory
allocation (MIN_ALLOC) in bytes 000AH and 000BH, the maximum extra memory
allocation (MAX_ALLOC) in bytes 000CH and 000DH, and the initial IP
(instruction pointer) register value in bytes 0014H and 0015H. See also
Figure 3-5.
An Example .EXE Program
The HELLO.EXE program in Figure 3-7 demonstrates the fundamental
structure of an assembly-language program that is destined to become a
.EXE file. At minimum, it should have a module name, a code segment, a
stack segment, and a primary procedure that receives control of the
computer from MS-DOS after the program is loaded. The HELLO.EXE program
also contains a data segment to provide a more complete example.
The NAME, TITLE, and PAGE directives were covered in the HELLO.COM example
program and are used in the same manner here, so we'll move to the first
new item of interest. After a few comments and EQU statements, we come to
a declaration of a code segment that begins on line 21 with a SEGMENT
command and ends on line 41 with an ENDS command. As in the HELLO.COM
example program, the label in the leftmost field of the line gives the
code segment the name _TEXT. The operand fields at the right end of the
line give the attributes WORD, PUBLIC, and `CODE'.
Following the code-segment instruction, we find an ASSUME statement on
line 23. Notice that, unlike the equivalent statement in the HELLO.COM
program, the ASSUME statement in this program specifies several different
segment names. Again, remember that this statement has no direct effect on
the contents of the segment registers but affects only the operation of
the assembler itself.
──────────────────────────────────────────────────────────────────────────
1: name hello
2: page 55,132
3: title HELLO.EXE--print Hello on terminal
4: ;
5: ; HELLO.EXE: demonstrates various components
6: ; of a functional .EXE-type assembly-
7: ; language program, use of segments,
8: ; and an MS-DOS function call.
9: ;
10: ; Ray Duncan, May 1988
11: ;
12:
13: stdin equ 0 ; standard input handle
14: stdout equ 1 ; standard output handle
15: stderr equ 2 ; standard error handle
16:
17: cr equ 0dh ; ASCII carriage return
18: lf equ 0ah ; ASCII linefeed
19:
20:
21: _TEXT segment word public 'CODE'
22:
23: assume cs:_TEXT,ds:_DATA,ss:STACK
24:
25: print proc far ; entry point from MS-DOS
26:
27: mov ax,_DATA ; make our data segment
28: mov ds,ax ; addressable...
29:
30: mov ah,40h ; function 40h = write
31: mov bx,stdout ; standard output handle
32: mov cx,msg_len ; length of message
33: mov dx,offset msg ; address of message
34: int 21h ; transfer to MS-DOS
35:
36: mov ax,4c00h ; exit, return code = 0
37: int 21h ; transfer to MS-DOS
38:
39: print endp
40:
41: _TEXT ends
42:
43:
44: _DATA segment word public 'DATA'
45:
46: msg db cr,lf ; message to display
47: db 'Hello World!',cr,lf
48:
49: msg_len equ $-msg ; length of message
50:
51: _DATA ends
52:
53:
54: STACK segment para stack `STACK'
55:
56: db 128 dup (?)
57:
58: STACK ends
59:
60: end print ; defines entry point
──────────────────────────────────────────────────────────────────────────
Figure 3-7. The HELLO.EXE program listing.
Within the code segment, the main print procedure is declared by the PROC
command on line 25 and closed with ENDP on line 39. Because the procedure
resides in a .EXE file, we have given it the FAR attribute as an example,
but the attribute is really irrelevant because the program is so small and
the procedure is not called by anything else in the same program.
The print procedure first initializes the DS register, as indicated in the
earlier ASSUME statement, loading it with a value that causes it to point
to the base of the data area. (MS-DOS automatically sets up the CS and SS
registers.) Next, the procedure uses MS-DOS Int 21H Function 40H to
display the message Hello World! on the screen, just as in the HELLO.COM
program. Finally, the procedure exits back to MS-DOS with an Int 21H
Function 4CH on lines 36 and 37, passing a return code of zero (which by
convention means a success).
Lines 44 through 51 declare a data segment named _DATA, which contains the
variables and constants the program will use. If the various modules of a
program contain multiple data segments with the same name, the linker will
collect them and place them in the same physical memory segment.
Lines 54 through 58 establish a stack segment; PUSH and POP instructions
will access this area of scratch memory. Before MS-DOS transfers control
to a .EXE program, it sets up the SS and SP registers according to the
declared size and location of the stack segment. Be sure to allow enough
room for the maximum stack depth that can occur at runtime, plus a safe
number of extra words for registers pushed onto the stack during an MS-DOS
service call. If the stack overflows, it may damage your other code and
data segments and cause your program to behave strangely or even to crash
altogether!
The END statement on line 60 winds up our brief HELLO.EXE program, telling
the assembler that it has reached the end of the source file and providing
the label of the program's point of entry from MS-DOS.
The differences between .COM and .EXE programs are summarized in Figure
3-8.
╓┌─┌──────────────────┌──────────────────────────┌───────────────────────────╖
.COM program .EXE program
──────────────────────────────────────────────────────────────────────────
Maximum size 65,536 bytes minus 256 No limit
bytes for PSP and 2 bytes
for stack
Entry point PSP:0100H Defined by END statement
AL at entry 00H if default FCB #1 has Same
valid drive, 0FFH if
invalid drive
.COM program .EXE program
──────────────────────────────────────────────────────────────────────────
AH at entry 00H if default FCB #2 has Same
valid drive, 0FFH if
invalid drive
CS at entry PSP Segment containing module
with entry point
IP at entry 0100H Offset of entry point within
its segment
DS at entry PSP PSP
ES at entry PSP PSP
SS at entry PSP Segment with STACK attribute
SP at entry 0FFFEH or top word in Size of segment defined with
available memory, STACK attribute
.COM program .EXE program
──────────────────────────────────────────────────────────────────────────
available memory, STACK attribute
whichever is lower
Stack at entry Zero word Initialized or uninitialized
Stack size 65,536 bytes minus 256 Defined in segment with
bytes for PSP and size of STACK attribute
executable code and data
Subroutine calls Usually NEAR NEAR or FAR
Exit method Int 21H Function 4CH Int 21H Function 4CH
preferred, NEAR RET if preferred
MS-DOS version 1
Size of file Exact size of program Size of program plus header
(multiple of 512 bytes)
──────────────────────────────────────────────────────────────────────────
.COM program .EXE program
──────────────────────────────────────────────────────────────────────────
Figure 3-8. Summary of the differences between .COM and .EXE programs,
including their entry conditions.
More About Assembly-Language Programs
Now that we've looked at working examples of .COM and .EXE
assembly-language programs, let's backtrack and discuss their elements a
little more formally. The following discussion is based on the Microsoft
Macro Assembler, hereafter referred to as MASM. If you are familiar with
MASM and are an experienced assembly-language programmer, you may want to
skip this section.
MASM programs can be thought of as having three structural levels:
■ The module level
■ The segment level
■ The procedure level
Modules are simply chunks of source code that can be independently
maintained and assembled. Segments are physical groupings of like items
(machine code or data) within a program and a corresponding segregation of
dissimilar items. Procedures are functional subdivisions of an executable
program──routines that carry out a particular task.
Program Modules
Under MS-DOS, the module-level structure consists of files containing the
source code for individual routines. Each source file is translated by the
assembler into a relocatable object module. An object module can reside
alone in an individual file or with many other object modules in an
object-module library of frequently used or related routines. The
Microsoft Object Linker (LINK) combines object-module files, often with
additional object modules extracted from libraries, into an executable
program file.
Using modules and object-module libraries reduces the size of your
application source files (and vastly increases your productivity), because
these files need not contain the source code for routines they have in
common with other programs. This technique also allows you to maintain the
routines more easily, because you need to alter only one copy of their
source code stored in one place, instead of many copies stored in
different applications. When you improve (or fix) one of these routines,
you can simply reassemble it, put its object module back into the library,
relink all of the programs that use the routine, and voilga: instant
upgrade.
Program Segments
The term segments refers to two discrete programming concepts: physical
segments and logical segments.
Physical segments are 64 KB blocks of memory. The Intel 8086/8088 and
80286 microprocessors have four segment registers, which are essentially
used as pointers to these blocks. (The 80386 has six segment registers,
which are a superset of those found on the 8086/8088 and 80286.) Each
segment register can point to the bottom of a different 64 KB area of
memory. Thus, a program can address any location in memory by appropriate
manipulation of the segment registers, but the maximum amount of memory
that it can address simultaneously is 256 KB.
As we discussed earlier in the chapter, .COM programs assume that all four
segment registers always point to the same place──the bottom of the
program. Thus, they are limited to a maximum size of 64 KB. .EXE programs,
on the other hand, can address many different physical segments and can
reset the segment registers to point to each segment as it is needed.
Consequently, the only practical limit on the size of a .EXE program is
the amount of available memory. The example programs throughout the
remainder of this book focus on .EXE programs.
Logical segments are the program components. A minimum of three logical
segments must be declared in any .EXE program: a code segment, a data
segment, and a stack segment. Programs with more than 64 KB of code or
data have more than one code or data segment. The routines or data that
are used most frequently are put into the primary code and data segments
for speed, and routines or data that are used less frequently are put into
secondary code and data segments.
Segments are declared with the SEGMENT and ENDS directives in the
following form:
name SEGMENT attributes
.
.
.
name ENDS
The attributes of a segment include its align type (BYTE, WORD, or PARA),
combine type (PUBLIC, PRIVATE, COMMON, or STACK), and class type. The
segment attributes are used by the linker when it is combining logical
segments to create the physical segments of an executable program. Most of
the time, you can get by just fine using a small selection of attributes
in a rather stereotypical way. However, if you want to use the full range
of attributes, you might want to read the detailed explanation in the MASM
manual.
Programs are classified into one memory model or another based on the
number of their code and data segments. The most commonly used memory
model for assembly-language programs is the small model, which has one
code and one data segment, but you can also use the medium, compact, and
large models (Figure 3-9). (Two additional models exist with which we
will not be concerning ourselves further: the tiny model, which consists
of intermixed code and data in a single segment── for example, a .COM file
under MS-DOS; and the huge model, which is supported by the Microsoft C
Optimizing Compiler and which allows use of data structures larger than 64
KB.)
Model Code segments Data segments
──────────────────────────────────────────────────────────────────────────
Small One One
Medium Multiple One
Compact One Multiple
Large Multiple Multiple
──────────────────────────────────────────────────────────────────────────
Figure 3-9. Memory models commonly used in assembly-language and C
programs.
For each memory model, Microsoft has established certain segment and class
names that are used by all its high-level-language compilers (Figure
3-10). Because segment names are arbitrary, you may as well adopt the
Microsoft conventions. Their use will make it easier for you to integrate
your assembly-language routines into programs written in languages such as
C, or to use routines from high-level-language libraries in your
assembly-language programs.
Another important Microsoft high-level-language convention is to use the
GROUP directive to name the near data segment (the segment the program
expects to address with offsets from the DS register) and the stack
segment as members of DGROUP (the automatic data group), a special name
recognized by the linker and also by the program loaders in Microsoft
Windows and Microsoft OS/2. The GROUP directive causes logical segments
with different names to be combined into a single physical segment so that
they can be addressed using the same segment base address. In C programs,
DGROUP also contains the local heap, which is used by the C runtime
library for dynamic allocation of small amounts of memory.
╓┌─┌───────────┌────────────┌───────────┌───────────┌────────────┌───────────╖
Memory Segment Align Combine Class Group
model name type type type
──────────────────────────────────────────────────────────────────────────
Memory Segment Align Combine Class Group
model name type type type
──────────────────────────────────────────────────────────────────────────
Small _TEXT WORD PUBLIC CODE
_DATA WORD PUBLIC DATA DGROUP
STACK PARA STACK STACK DGROUP
Medium module_TEXT WORD PUBLIC CODE
. WORD PUBLIC DATA DGROUP
.
.
_DATA
STACK PARA STACK STACK DGROUP
Compact _TEXT WORD PUBLIC CODE
data PARA PRIVATE FAR_DATA
. WORD PUBLIC DATA DGROUP
.
.
_DATA
STACK PARA STACK STACK DGROUP
Memory Segment Align Combine Class Group
model name type type type
──────────────────────────────────────────────────────────────────────────
STACK PARA STACK STACK DGROUP
Large module_TEXT WORD PUBLIC CODE
.
.
.
data PARA PRIVATE FAR_DATA
.
.
.
_DATA WORD PUBLIC DATA DGROUP
STACK PARA STACK STACK DGROUP
──────────────────────────────────────────────────────────────────────────
Figure 3-10. Segments, groups, and classes for the standard memory models
as used with assembly-language programs. The Microsoft C Optimizing
Compiler and other high-level-language compilers use a superset of these
segments and classes.
For pure assembly-language programs that will run under MS-DOS, you can
ignore DGROUP. However, if you plan to integrate assembly-language
routines and programs written in high-level languages, you'll want to
follow the Microsoft DGROUP convention. For example, if you are planning
to link routines from a C library into an assembly-language program, you
should include the line
DGROUP group _DATA,STACK
near the beginning of the program.
The final Microsoft convention of interest in creating .EXE programs is
segment order. The high-level compilers assume that code segments always
come first, followed by far data segments, followed by the near data
segment, with the stack and heap last. This order won't concern you much
until you begin integrating assembly-language code with routines from
high-level-language libraries, but it is easiest to learn to use the
convention right from the start.
Program Procedures
The procedure level of program structure is partly real and partly
conceptual. Procedures are basically just a fancy guise for subroutines.
Procedures within a program are declared with the PROC and ENDP directives
in the following form:
name PROC attribute
.
.
.
RET
name ENDP
The attribute carried by a PROC declaration, which is either NEAR or FAR,
tells the assembler what type of call you expect to use to enter the
procedure──that is, whether the procedure will be called from other
routines in the same segment or from routines in other segments. When the
assembler encounters a RET instruction within the procedure, it uses the
attribute information to generate the correct opcode for either a near
(intra-segment) or far (inter-segment) return.
Each program should have a main procedure that receives control from
MS-DOS. You specify the entry point for the program by including the name
of the main procedure in the END statement in one of the program's source
files. The main procedure's attribute (NEAR or FAR) is really not too
important, because the program returns control to MS-DOS with a function
call rather than a RET instruction. However, by convention, most
programmers assign the main procedure the FAR attribute anyway.
You should break the remainder of the program into procedures in an
orderly way, with each procedure performing a well-defined single
function, returning its results to its caller, and avoiding actions that
have global effects within the program. Ideally procedures invoke each
other only by CALL instructions, have only one entry point and one exit
point, and always exit by means of a RET instruction, never by jumping to
some other location within the program.
For ease of understanding and maintenance, a procedure should not exceed
one page (about 60 lines); if it is longer than a page, it is probably too
complex and you should delegate some of its function to one or more
subsidiary procedures. You should preface the source code for each
procedure with a detailed comment that states the procedure's calling
sequence, results returned, registers affected, and any data items
accessed or modified. The effort invested in making your procedures
compact, clean, flexible, and well-documented will be repaid many times
over when you reuse the procedures in other programs.
────────────────────────────────────────────────────────────────────────────
Chapter 4 MS-DOS Programming Tools
Preparing a new program to run under MS-DOS is an iterative process with
four basic steps:
■ Use of a text editor to create or modify an ASCII source-code file
■ Use of an assembler or high-level-language compiler (such as the
Microsoft Macro Assembler or the Microsoft C Optimizing Compiler) to
translate the source file into relocatable object code
■ Use of a linker to transform the relocatable object code into an
executable MS-DOS load module
■ Use of a debugger to methodically test and debug the program
Additional utilities the MS-DOS software developer may find necessary or
helpful include the following:
■ LIB, which creates and maintains object-module libraries
■ CREF, which generates a cross-reference listing
■ EXE2BIN, which converts .EXE files to .COM files
■ MAKE, which compares dates of files and carries out operations based on
the result of the comparison
This chapter gives an operational overview of the Microsoft programming
tools for MS-DOS, including the assembler, the C compiler, the linker, and
the librarian. In general, the information provided here also applies to
the IBM programming tools for MS-DOS, which are really the Microsoft
products with minor variations and different version numbers. Even if your
preferred programming language is not C or assembly language, you will
need at least a passing familiarity with these tools because all of the
examples in the IBM and Microsoft DOS reference manuals are written in one
of these languages.
The survey in this chapter, together with the example programs and
reference section elsewhere in the book, should provide the experienced
programmer with sufficient information to immediately begin writing useful
programs. Readers who do not have a background in C, assembly language, or
the Intel 80x86 microprocessor architecture should refer to the tutorial
and reference works listed at the end of this chapter.
File Types
The MS-DOS programming tools can create and process many different file
types. The following extensions are used by convention for these files:
╓┌─┌──────────┌──────────────────────────────────────────────────────────────╖
Extension File type
Extension File type
──────────────────────────────────────────────────────────────────────────
.ASM Assembly-language source file
.C C source file
.COM MS-DOS executable load module that does not require relocation
at runtime
.CRF Cross-reference information file produced by the assembler for
processing by CREF.EXE
.DEF Module-definition file describing a program's segment behavior
(MS OS/2 and Microsoft Windows programs only; not relevant to
normal MS-DOS applications)
.EXE MS-DOS executable load module that requires relocation at
runtime
.H C header file containing C source code for constants, macros,
and functions; merged into another C program with the #include
Extension File type
──────────────────────────────────────────────────────────────────────────
and functions; merged into another C program with the #include
directive
.INC Include file for assembly-language programs, typically
containing macros and/or equates for systemwide values such as
error codes
.LIB Object-module library file made up of one or more .OBJ files;
indexed and manipulated by LIB.EXE
.LST Program listing, produced by the assembler, that includes
memory locations, machine code, the original program text, and
error messages
.MAP Listing of symbols and their locations within a load module;
produced by the linker
.OBJ Relocatable-object-code file produced by an assembler or
compiler
Extension File type
──────────────────────────────────────────────────────────────────────────
compiler
.REF Cross-reference listing produced by CREF.EXE from the
information in a .CRF file
──────────────────────────────────────────────────────────────────────────
The Microsoft Macro Assembler
The Microsoft Macro Assembler (MASM) is distributed as the file MASM.EXE.
When beginning a program translation, MASM needs the following
information:
■ The name of the file containing the source program
■ The filename for the object program to be created
■ The destination of the program listing
■ The filename for the information that is later processed by the
cross-reference utility (CREF.EXE)
You can invoke MASM in two ways. If you enter the name of the assembler
alone, it prompts you for the names of each of the various input and
output files. The assembler supplies reasonable defaults for all the
responses except the source filename, as shown in the following example:
C>MASM <Enter>
Microsoft (R) Macro Assembler Version 5.10
Copyright (C) Microsoft Corp 1981, 1988. All rights reserved.
Source filename [.ASM]: HELLO <Enter>
Object filename [HELLO.OBJ]: <Enter>
Source listing [NUL.LST]: <Enter>
Cross-reference [NUL.CRF]: <Enter>
49006 Bytes symbol space free
0 Warning Errors
0 Severe Errors
C>
You can use a logical device name (such as PRN or COM1) at any of the MASM
prompts to send that output of the assembler to a character device rather
than a file. Note that the default for the listing and cross-reference
files is the NUL device──that is, no file is created. If you end any
response with a semicolon, MASM assumes that the remaining responses are
all to be the default.
A more efficient way to use MASM is to supply all parameters in the
command line, as follows:
MASM [options] source,[object],[listing],[crossref]
For example, the following command lines are equivalent to the preceding
interactive session:
C>MASM HELLO,,NUL,NUL <Enter>
or
C>MASM HELLO; <Enter>
These commands use the file HELLO.ASM as the source, generate the
object-code file HELLO.OBJ, and send the listing and cross-reference files
to the bit bucket.
MASM accepts several optional switches in the command line, to control
code generation and output files. Figure 4-1 lists the switches accepted
by MASM version 5.1. As shown in the following example, you can put
frequently used options in a MASM environment variable, where they will be
found automatically by the assembler:
C>SET MASM=/T /Zi <Enter>
The switches in the environment variable will be overridden by any that
you enter in the command line.
In other versions of the Microsoft Macro Assembler, additional or fewer
switches may be available. For exact instructions, see the manual for the
version of MASM that you are using.
╓┌─┌──────────┌──────────────────────────────────────────────────────────────╖
Switch Meaning
──────────────────────────────────────────────────────────────────────────
/A Arrange segments in alphabetic order.
/Bn Set size of source-file buffer (in KB).
/C Force creation of a cross-reference (.CRF) file.
/D Produce listing on both passes (to find phase errors).
/Dsymbol Define symbol as a null text string (symbol can be referenced
by conditional assembly directives in file).
/E Assemble for 80x87 numeric coprocessor emulator using IEEE
real-number format.
/Ipath Set search path for include files.
/L Force creation of a program-listing file.
/LA Force listing of all generated code.
/ML Preserve case sensitivity in all names (uppercase names
distinct from their lowercase equivalents).
/MX Preserve lowercase in external names only (names defined with
PUBLIC or EXTRN directives).
Switch Meaning
──────────────────────────────────────────────────────────────────────────
PUBLIC or EXTRN directives).
/MU Convert all lowercase names to uppercase.
/N Suppress generation of tables of macros, structures, records,
segments, groups, and symbols at the end of the listing.
/P Check for impure code in 80286/80386 protected mode.
/S Arrange segments in order of occurrence (default).
/T "Terse" mode; suppress all messages unless errors are
encountered during the assembly.
/V "Verbose" mode; report number of lines and symbols at end of
assembly.
/Wn Set error display (warning) level; n=0─2.
/X Force listing of false conditionals.
/Z Display source lines containing errors on the screen.
/Zd Include line-number information in .OBJ file.
/Zi Include line-number and symbol information in .OBJ file.
──────────────────────────────────────────────────────────────────────────
Figure 4-1. Microsoft Macro Assembler version 5.1 switches.
MASM allows you to override the default extensions on any file──a feature
that can be rather dangerous. For example, if in the preceding example you
had responded to the Object filename prompt with HELLO.ASM, the assembler
would have accepted the entry without comment and destroyed your source
file. This is not too likely to happen in the interactive command mode,
but you must be very careful with file extensions when MASM is used in a
batch file.
The Microsoft C Optimizing Compiler
The Microsoft C Optimizing Compiler consists of three executable files──
C1.EXE, C2.EXE, and C3.EXE──that implement the C preprocessor, language
translator, code generator, and code optimizer. An additional control
program, CL.EXE, executes the three compiler files in order, passing each
the necessary information about filenames and compilation options.
Before using the C compiler and the linker, you need to set up four
environment variables:
Variable Action
──────────────────────────────────────────────────────────────────────────
PATH=path Specifies the location of the three executable C
compiler files (C1, C2, and C3) if they are not
in the current directory; used by CL.EXE.
INCLUDE=path Specifies the location of #include files (default
extension .H) that are not found in the current
directory.
LIB=path Specifies the location(s) for object-code
libraries that are not found in the current
directory.
TMP=path Specifies the location for temporary working
files created by the C compiler and linker.
──────────────────────────────────────────────────────────────────────────
CL.EXE does not support an interactive mode or response files. You always
invoke it with a command line of the following form:
CL [options] file [file ...]
You may list any number of files──if a file has a .C extension, it will be
compiled into a relocatable-object-module (.OBJ) file. Ordinarily, if the
compiler encounters no errors, it automatically passes all resulting .OBJ
files and any additional .OBJ files specified in the command line to the
linker, along with the names of the appropriate runtime libraries.
The C compiler has many optional switches controlling its memory models,
output files, code generation, and code optimization. These are summarized
in Figure 4-2. The C compiler's arcane switch syntax is derived largely
from UNIX/XENIX, so don't expect it to make any sense.
╓┌─┌────────────────────────┌────────────────────────────────────────────────╖
Switch Meaning
──────────────────────────────────────────────────────────────────────────
/Ax Select memory model:
C = compact model
H = huge model
L = large model
M = medium model
Switch Meaning
──────────────────────────────────────────────────────────────────────────
M = medium model
S = small model (default)
/c Compile only; do not invoke linker.
/C Do not strip comments.
/D<name>[=text] Define macro.
/E Send preprocessor output to standard output.
/EP Send preprocessor output to standard output
without line numbers.
/F<n> Set stack size (in hexadecimal bytes).
/Fa [filename] Generate assembly listing.
/Fc [filename] Generate mixed source/object listing.
/Fe [filename] Force executable filename.
/Fl [filename] Generate object listing.
/Fm [filename] Generate map file.
/Fo [filename] Force object-module filename.
/FPx Select floating-point control:
a = calls with alternate math library
c = calls with emulator library
c87 = calls with 8087 library
Switch Meaning
──────────────────────────────────────────────────────────────────────────
c87 = calls with 8087 library
i = in-line with emulator (default)
i87 = in-line with 8087
/Fs [filename] Generate source listing.
/Gx Select code generation:
0 = 8086 instructions (default)
1 = 186 instructions
2 = 286 instructions
c = Pascal style function calls
s = no stack checking
t[n] = data size threshold
/H<n> Specify external name length.
/I<path> Specify additional #include path.
/J Specify default char type as unsigned.
/link [options] Pass switches and library names to linker.
/Ox Select optimization:
a = ignore aliasing
d = disable optimizations
i = enable intrinsic functions
Switch Meaning
──────────────────────────────────────────────────────────────────────────
i = enable intrinsic functions
l = enable loop optimizations
n = disable "unsafe" optimizations
p = enable precision optimizations
r = disable in-line return
s = optimize for space
/Ox t = optimize for speed (default)
w = ignore aliasing except across function
calls
x = enable maximum optimization (equivalent to
/Oailt /Gs)
/P Send preprocessor output to file.
/Sx Select source-listing control:
l<columns> = set line width
p<lines> = set page length
s<string> = set subtitle string
t<string> = set title string
/Tc<file> Compile file without .C extension.
/u Remove all predefined macros.
Switch Meaning
──────────────────────────────────────────────────────────────────────────
/u Remove all predefined macros.
/U<name> Remove specified predefined macro.
/V<string> Set version string.
/W<n> Set warning level (0─3).
/X Ignore "standard places" for include files.
/Zx Select miscellaneous compilation control:
a = disable extensions
c = make Pascal functions case-insensitive
d = include line-number information
e = enable extensions (default)
g = generate declarations
i = include symbolic debugging information
l = remove default library info
p<n> = pack structures on n-byte boundary
s = check syntax only
──────────────────────────────────────────────────────────────────────────
Figure 4-2. Microsoft C Optimizing Compiler version 5.1 switches.
The Microsoft Object Linker
The object module produced by MASM from a source file is in a form that
contains relocation information and may also contain unresolved references
to external locations or subroutines. It is written in a common format
that is also produced by the various high-level compilers (such as FORTRAN
and C) that run under MS-DOS. The computer cannot execute object modules
without further processing.
The Microsoft Object Linker (LINK), distributed as the file LINK.EXE,
accepts one or more of these object modules, resolves external references,
includes any necessary routines from designated libraries, performs any
necessary offset relocations, and writes a file that can be loaded and
executed by MS-DOS. The output of LINK is always in .EXE load-module
format. (See Chapter 3.)
As with MASM, you can give LINK its parameters interactively or by
entering all the required information in a single command line. If you
enter the name of the linker alone, the following type of dialog ensues:
C>LINK <Enter>
Microsoft (R) Overlay Linker Version 3.61
Copyright (C) Microsoft Corp 1983-1987. All rights reserved.
Object Modules [.OBJ]: HELLO <Enter>
Run File [HELLO.EXE]: <Enter>
List File [NUL.MAP]: HELLO <Enter>
Libraries [.LIB]: <Enter>
C>
If you are using LINK version 4.0 or later, the linker also asks for the
name of a module-definition (.DEF) file. Simply press the Enter key in
response to such a prompt. Module-definition files are used when building
Microsoft Windows or MS OS/2 "new .EXE" executable files but are not
relevant in normal MS-DOS applications.
The input file for this example was HELLO.OBJ; the output files were
HELLO.EXE (the executable program) and HELLO.MAP (the load map produced by
the linker after all references and addresses were resolved). Figure 4-3
shows the load map.
──────────────────────────────────────────────────────────────────────────
Start Stop Length Name Class
00000H 00017H 00018H _TEXT CODE
00018H 00027H 00010H _DATA DATA
00030H 000AFH 00080H STACK STACK
000B0H 000BBH 0000CH $$TYPES DEBTYP
000C0H 000D6H 00017H $$SYMBOLS DEBSYM
Address Publics by Name
Address Publics by Value
Program entry point at 0000:0000
──────────────────────────────────────────────────────────────────────────
Figure 4-3. Map produced by the Microsoft Object Linker (LINK) during the
generation of the HELLO.EXE program from Chapter 3. The program contains
one CODE, one DATA, and one STACK segment. The first instruction to be
executed lies in the first byte of the CODE segment. The $$TYPES and
$$SYMBOLS segments contain information for the CodeView debugger and are
not part of the program; these segments are ignored by the normal MS-DOS
loader.
You can obtain the same result more quickly by entering all parameters in
the command line, in the following form:
LINK options objectfile, [exefile], [mapfile], [libraries]
Thus, the command-line equivalent to the preceding interactive session is
C>LINK HELLO,HELLO,HELLO,, <Enter>
or
C>LINK HELLO,,HELLO; <Enter>
If you enter a semicolon as the last character in the command line, LINK
assumes the default values for all further parameters.
A third method of commanding LINK is with a response file. A response file
contains lines of text that correspond to the responses you would give the
linker interactively. You specify the name of the response file in the
command line with a leading @ character, as follows:
LINK @filename
You can also enter the name of a response file at any prompt. If the
response file is not complete, LINK will prompt you for the missing
information.
When entering linker commands, you can specify multiple object files with
the + operator or with spaces, as in the following example:
C>LINK HELLO+VMODE+DOSINT,MYPROG,,GRAPHICS; <Enter>
This command would link the files HELLO.OBJ, VMODE.OBJ, and DOSINT.OBJ,
searching the library file GRAPHICS.LIB to resolve any references to
symbols not defined in the specified object files, and would produce a
file named MYPROG.EXE. LINK uses the current drive and directory when they
are not explicitly included in a filename; it will not automatically use
the same drive and directory you specified for a previous file in the same
command line.
By using the + operator or space characters in the libraries field, you
can specify up to 32 library files to be searched. Each high-level-
language compiler provides default libraries that are searched
automatically during the linkage process if the linker can find them
(unless they are explicitly excluded with the /NOD switch). LINK looks for
libraries first in the current directory of the default disk drive, then
along any paths that were provided in the command line, and finally along
the path(s) specified by the LIB variable if it is present in the
environment.
LINK accepts several optional switches as part of the command line or at
the end of any interactive prompt. Figure 4-4 lists these switches. The
number of switches available and their actions vary among different
versions of LINK. See your Microsoft Object Linker instruction manual for
detailed information about your particular version.
╓┌─┌────────┌───────────────────────────┌────────────────────────────────────╖
Switch Full form Meaning
Switch Full form Meaning
──────────────────────────────────────────────────────────────────────────
/A:n /ALIGNMENT:n Set segment sector alignment factor.
N must be a power of 2 (default =
512). Not related to logical-segment
alignment (BYTE, WORD, PARA, PAGE,
and so forth). Relevant to segmented
executable files (Microsoft Windows
and MS OS/2) only.
/B /BATCH Suppress linker prompt if a library
cannot be found in the current
directory or in the locations
specified by the LIB environment
variable.
/CO /CODEVIEW Include symbolic debugging
information in the .EXE file for use
by CodeView.
/CP /CPARMAXALLOC Set the field in the .EXE file header
Switch Full form Meaning
──────────────────────────────────────────────────────────────────────────
/CP /CPARMAXALLOC Set the field in the .EXE file header
controlling the amount of memory
allocated to the program in addition
to the memory required for the
program's code, stack, and
initialized data.
/DO /DOSSEG Use standard Microsoft segment naming
and ordering conventions.
/DS /DSALLOCATE Load data at high end of the data
segment. Relevant to real-mode
programs only.
/E /EXEPACK Pack executable file by removing
sequences of repeated bytes and
optimizing relocation table.
/F /FARCALLTRANSLATION Optimize far calls to labels within
Switch Full form Meaning
──────────────────────────────────────────────────────────────────────────
/F /FARCALLTRANSLATION Optimize far calls to labels within
the same physical segment for speed
by replacing them with near calls and
NOPs.
/HE /HELP Display information about available
options.
/HI /HIGH Load program as high in memory as
possible.
/I /INFORMATION Display information about progress of
linking, including pass numbers and
the names of object files being
linked.
/INC /INCREMENTAL Force production of .SYM and .ILK
files for subsequent use by ILINK
(incremental linker). May not be used
Switch Full form Meaning
──────────────────────────────────────────────────────────────────────────
(incremental linker). May not be used
with /EXEPACK. Relevant to segmented
executable files (Microsoft Windows
and MS OS/2) only.
/LI /LINENUMBERS Write address of the first
instruction that corresponds to each
source-code line to the map file. Has
no effect if the compiler does not
include line-number information in
the object module. Force creation of
a map file.
/M[:n] /MAP[:n] Force creation of a .MAP file listing
all public symbols, sorted by name
and by location. The optional value n
is the maximum number of symbols that
can be sorted (default = 2048); when
n is supplied, the alphabetically
Switch Full form Meaning
──────────────────────────────────────────────────────────────────────────
n is supplied, the alphabetically
sorted list is omitted.
/NOD /NODEFAULTLIBRARYSEARCH Skip search of any default compiler
libraries specified in the .OBJ file.
/NOE /NOEXTENDEDDICTSEARCH Ignore extended library dictionary
(if it is present). The extended
dictionary ordinarily provides the
linker with information about
inter-module dependencies, to speed
up linking.
/NOF /NOFARCALLTRANSLATION Disable optimization of far calls to
labels within the same segment.
/NOG /NOGROUPASSOCIATION Ignore group associations when
assigning addresses to data and code
items.
Switch Full form Meaning
──────────────────────────────────────────────────────────────────────────
items.
/NOI /NOIGNORECASE Do not ignore case in names during
linking.
/NON /NONULLSDOSSEG Arrange segments as for /DOSSEG but
do not insert 16 null bytes at start
of _TEXT segment.
/NOP /NOPACKCODE Do not pack contiguous logical code
segments into a single physical
segment.
/O:n /OVERLAYINTERRUPT:n Use interrupt number n with the
overlay manager supplied with some
Microsoft high-level languages.
/PAC[:n] /PACKCODE[:n] Pack contiguous logical code segments
into a single physical code segment.
Switch Full form Meaning
──────────────────────────────────────────────────────────────────────────
into a single physical code segment.
The optional value n is the maximum
size for each packed physical code
segment (default = 65,536 bytes).
Segments in different groups are not
packed.
/PADC:n /PADCODE:n Add n filler bytes to end of each
code module so that a larger module
can be inserted later with ILINK.
Relevant to segmented executable
files (Windows and MS OS/2) only.
/PADD:n /PADDATA:n Add n filler bytes to end of each
data module so that a larger module
can be inserted later with ILINK.
Relevant to segmented executable
files (Microsoft Windows and MS OS/2)
only.
Switch Full form Meaning
──────────────────────────────────────────────────────────────────────────
only.
/PAU /PAUSE Pause during linking, allowing a
change of disks before .EXE file is
written.
/SE:n /SEGMENTS:n Set maximum number of segments in
linked program (default = 128).
/ST:n /STACK:n Set stack size of program in bytes;
ignore stack segment size
declarations within object modules
and definition file.
/W /WARNFIXUP Display warning messages for offsets
relative to a segment base that is
not the same as the group base.
Relevant to segmented executable
files (Microsoft Windows and MS OS/2)
Switch Full form Meaning
──────────────────────────────────────────────────────────────────────────
files (Microsoft Windows and MS OS/2)
only.
──────────────────────────────────────────────────────────────────────────
Figure 4-4. Switches accepted by the Microsoft Object Linker (LINK)
version 5.0. Earlier versions use a subset of these switches. Note that
any abbreviation for a switch is acceptable as long as it is sufficient to
specify the switch uniquely.
The EXE2BIN Utility
The EXE2BIN utility (EXE2BIN.EXE) transforms a .EXE file created by LINK
into an executable .COM file, if the program meets the following
prerequisites:
■ It cannot contain more than one declared segment and cannot
define a stack.
■ It must be less than 64 KB in length.
■ It must have an origin at 0100H.
■ The first location in the file must be specified as the entry point
in the source code's END directive.
Although .COM files are somewhat more compact than .EXE files, you should
avoid using them. Programs that use separate segments for code, data, and
stack are much easier to port to protected-mode environments such as MS
OS/2; in addition, .COM files do not support the symbolic debugging
information used by CodeView.
Another use for the EXE2BIN utility is to convert an installable device
driver──after it is assembled and linked into a .EXE file──into a
memory-image .BIN or .SYS file with an origin of zero. This conversion is
required in MS-DOS version 2, which cannot load device drivers as .EXE
files. The process of writing an installable device driver is discussed in
more detail in Chapter 14.
Unlike most of the other programming utilities, EXE2BIN does not have an
interactive mode. It always takes its source and destination filenames,
separated by spaces, from the MS-DOS command line, as follows:
EXE2BIN sourcefile [destinationfile]
If you do not supply the source-file extension, it defaults to .EXE; the
destination-file extension defaults to .BIN. If you do not specify a name
for the destination file, EXE2BIN gives it the same name as the source
file, with a .BIN extension.
For example, to convert the file HELLO.EXE into HELLO.COM, you would use
the following command line:
C>EXE2BIN HELLO.EXE HELLO.COM <Enter>
The EXE2BIN program also has other capabilities, such as pure binary
conversion with segment fixup for creating program images to be placed in
ROM; but because these features are rarely used during MS-DOS application
development, they will not be discussed here.
The CREF Utility
The CREF cross-reference utility CREF.EXE processes a .CRF file produced
by MASM, creating an ASCII text file with the default extension .REF. The
file contains a cross-reference listing of all symbols declared in the
program and the line numbers in which they are referenced. (See Figure
4-5.) Such a listing is very useful when debugging large
assembly-language programs with many interdependent procedures and
variables.
CREF may be supplied with its parameters interactively or in a single
command line. If you enter the utility name alone, CREF prompts you for
the input and output filenames, as shown in the following example:
C>CREF <Enter>
Microsoft (R) Cross-Reference Utility Version 5.10
Copyright (C) Microsoft Corp 1981-1985, 1987. All rights reserved.
Cross-reference [.CRF]: HELLO <Enter>
Listing [HELLO.REF]:
15 Symbols
C>
──────────────────────────────────────────────────────────────────────────
Microsoft Cross-Reference Version 5.10 Thu May 26 11:09:34 1988
HELLO.EXE --- print Hello on terminal
Symbol Cross-Reference (# definition, + modification)Cref-1
@CPU . . . . . . . . . . . . . . 1#
@VERSION . . . . . . . . . . . . 1#
CODE . . . . . . . . . . . . . . 21
CR . . . . . . . . . . . . . . . 17# 46 47
DATA . . . . . . . . . . . . . . 44
LF . . . . . . . . . . . . . . . 18# 46 47
MSG. . . . . . . . . . . . . . . 33 46#
MSG_LEN. . . . . . . . . . . . . 32 49#
PRINT. . . . . . . . . . . . . . 25# 39 60
STACK. . . . . . . . . . . . . . 23 54# 54 58
STDERR . . . . . . . . . . . . . 15#
STDIN. . . . . . . . . . . . . . 13#
STDOUT . . . . . . . . . . . . . 14# 31
_DATA. . . . . . . . . . . . . . 23 27 44# 51
_TEXT. . . . . . . . . . . . . . 21# 23 41
15 Symbols
──────────────────────────────────────────────────────────────────────────
Figure 4-5. Cross-reference listing HELLO.REF produced by the CREF
utility from the file HELLO.CRF, for the HELLO.EXE program example from
Chapter 3. The symbols declared in the program are listed on the left in
alphabetic order. To the right of each symbol is a list of all the lines
where that symbol is referenced. The number with a # sign after it denotes
the line where the symbol is declared. Numbers followed by a + sign
indicate that the symbol is modified at the specified line. The line
numbers given in the cross-reference listing correspond to the line
numbers generated by the assembler in the program-listing (.LST) file, not
to any physical line count in the original source file.
The parameters may also be entered in the command line in the following
form:
CREF CRF_file, listing_file
For example, the command-line equivalent to the preceding interactive
session is:
C>CREF HELLO,HELLO <Enter>
If CREF cannot find the specified .CRF file, it displays an error message.
Otherwise, it leaves the cross-reference listing in the specified file on
the disk. You can send the file to the printer with the COPY command, in
the following form:
COPY listing_file PRN:
You can also send the cross-reference listing directly to a character
device as it is generated by responding to the Listing prompt with the
name of the device.
The Microsoft Library Manager
Although the object modules that are produced by MASM or by high-level-
language compilers can be linked directly into executable load modules,
they can also be collected into special files called object-module
libraries. The modules in a library are indexed by name and by the public
symbols they contain, so that they can be extracted by the linker to
satisfy external references in a program.
The Microsoft Library Manager (LIB) is distributed as the file LIB.EXE.
LIB creates and maintains program libraries, adding, updating, and
deleting object files as necessary. LIB can also check a library file for
internal consistency or print a table of its contents (Figure 4-6).
LIB follows the command conventions of most other Microsoft programming
tools. You must supply it with the name of a library file to work on, one
or more operations to perform, the name of a listing file or device, and
(optionally) the name of the output library. If you do not specify a name
for the output library, LIB gives it the same name as the input library
and changes the extension of the input library to .BAK.
The LIB operations are simply the names of object files, with a prefix
character that specifies the action to be taken:
Prefix Meaning
──────────────────────────────────────────────────────────────────────────
- Delete an object module from the library.
* Extract a module and place it in a separate .OBJ file.
+ Add an object module or the entire contents of another library
to the library.
──────────────────────────────────────────────────────────────────────────
You can combine command prefixes. For example, -+ replaces a module, and
*- extracts a module into a new file and then deletes it from the library.
──────────────────────────────────────────────────────────────────────────
_abort............abort _abs..............abs
_access...........access _asctime..........asctime
_atof.............atof _atoi.............atoi
_atol.............atol _bdos.............bdos
_brk..............brk _brkctl...........brkctl
_bsearch..........bsearch _calloc...........calloc
_cgets............cgets _chdir............dir
_chmod............chmod _chsize...........chsize
.
.
.
_exit Offset: 00000010H Code and data size: 44H
__exit
_filbuf Offset: 00000160H Code and data size: BBH
__filbuf
_file Offset: 00000300H Code and data size: CAH
__iob __iob2 __lastiob
.
.
.
──────────────────────────────────────────────────────────────────────────
Figure 4-6. Extract from the table-of-contents listing produced by the
Microsoft Library Manager (LIB) for the Microsoft C library SLIBC.LIB. The
first part of the listing is an alphabetic list of all public names
declared in all of the modules in the library. Each name is associated
with the object module to which it belongs. The second part of the listing
is an alphabetic list of the object-module names in the library, each
followed by its offset within the library file and the actual size of the
module in bytes. The entry for each module is followed by a summary of the
public names that are declared within it.
When you invoke LIB with its name alone, it requests the other information
it needs interactively, as shown in the following example:
C>LIB <Enter>
Microsoft (R) Library Manager Version 3.08
Copyright (C) Microsoft Corp 1983-1987. All rights reserved.
Library name: SLIBC <Enter>
Operations: +VIDEO <Enter>
List file: SLIBC.LST <Enter>
Output library: SLIBC2 <Enter>
C>
In this example, LIB added the object module VIDEO.OBJ to the library
SLIBC.LIB, wrote a library table of contents into the file SLIBC.LST, and
named the resulting new library SLIBC2.LIB.
The Library Manager can also be run with a command line of the following
form:
LIB library [commands],[list],[newlibrary]
For example, the following command line is equivalent to the preceding
interactive session:
C>LIB SLIBC +VIDEO,SLIBC.LST,SLIBC2; <Enter>
As with the other Microsoft utilities, a semicolon at the end of the
command line causes LIB to use the default responses for any parameters
that are omitted.
Like LINK, LIB can also accept its commands from a response file. The
contents of the file are lines of text that correspond exactly to the
responses you would give LIB interactively. You specify the name of the
response file in the command line with a leading @ character, as follows:
LIB @filename
LIB has only three switches: /I (/IGNORECASE), /N (/NOIGNORECASE), and
/PAGESIZE:number. The /IGNORECASE switch is the default. The /NOIGNORECASE
switch causes LIB to regard as distinct any symbols that differ only in
the case of their component letters. You should place the /PAGESIZE
switch, which defines the size of a unit of allocation space for a given
library, immediately after the library filename. The library page size is
in bytes and must be a power of 2 between 16 and 32,768 (16, 32, 64, and
so forth); the default is 16 bytes. Because the index to a library is
always a fixed number of pages, setting a larger page size allows you to
store more object modules in that library; on the other hand, it will
result in more wasted space within the file.
The MAKE Utility
The MAKE utility (MAKE.EXE) compares dates of files and carries out
commands based on the result of that comparison. Because of this single,
rather basic capability, MAKE can be used to maintain complex programs
built from many modules. The dates of source, object, and executable files
are simply compared in a logical sequence; the assembler, compiler,
linker, and other programming tools are invoked as appropriate.
The MAKE utility processes a plain ASCII text file called, as you might
expect, a make file. You start the utility with a command-line entry in
the following form:
MAKE makefile [options]
By convention, a make file has the same name as the executable file that
is being maintained, but without an extension. The available MAKE switches
are listed in Figure 4-7.
A simple make file contains one or more dependency statements separated by
blank lines. Each dependency statement can be followed by a list of MS-DOS
commands, in the following form:
targetfile : sourcefile ...
command
command
.
.
.
If the date and time of any source file are later than those of the target
file, the accompanying list of commands is carried out. You may use
comment lines, which begin with a # character, freely in a make file. MAKE
can also process inference rules and macro definitions. For further
details on these advanced capabilities, see the Microsoft or IBM
documentation.
Switch Meaning
──────────────────────────────────────────────────────────────────────────
/D Display last modification date of each file as it is processed.
/I Ignore exit (return) codes returned by commands and programs
executed as a result of dependency statements.
/N Display commands that would be executed as a result of
dependency statements but do not execute those commands.
/S Do not display commands as they are executed.
/X Direct error messages from MAKE, or any program that MAKE runs,
<filename> to the specified file. If filename is a hyphen (-), direct
error messages to the standard output.
──────────────────────────────────────────────────────────────────────────
Figure 4-7. Switches for the MAKE utility.
A Complete Example
Let's put together everything we've learned about using the MS-DOS
programming tools so far. Figure 4-8 shows a sketch of the overall
process of building an executable program.
Assume that we have the source code for the HELLO.EXE program from Chapter
3 in the file HELLO.ASM. To assemble the source program into the
relocatable object module HELLO.OBJ with symbolic debugging information
included, also producing a program listing in the file HELLO.LST and a
cross-reference data file HELLO.CRF, we would enter
C>MASM /C /L /Zi /T HELLO; <Enter>
To convert the cross-reference raw-data file HELLO.CRF into a
cross-reference listing in the file HELLO.REF, we would enter
C>CREF HELLO,HELLO <Enter>
┌───────────────┐ ┌───────────────┐
│ MASM │ │ C or other │
│ source-code │ │ HLL source- │
│ file │ │ code file │
└───┬───────────┘ └───┬───────────┘
│ ┌─────────────────────┘ Compiler
┌───▼───────▼───┐
│ Relocatable │
│ object-module ├────┐
│ file (.OBJ) │ │
└───┬───────────┘ │
│ LIB │
┌───▼───────────┐ │ ┌───────────────┐
│ Object-module │ ▼ LINK │ Executable │
│ libraries ├─────────────► program │
│ (.LIB) │ │ (.EXE) │
└───────────────┘ │ └───┬───────────┘
│ │ EXE2BIN
┌───────────────┐ │ ┌───▼───────────┐
│ HLL │ │ │ Executable │
│ runtime ├──────┘ │ program │
│ libraries │ │ (.COM) │
└───────────────┘ └───────────────┘
Figure 4-8. Creation of an MS-DOS application program, from source code
to executable file.
To convert the relocatable object file HELLO.OBJ into the executable file
HELLO.EXE, creating a load map in the file HELLO.MAP and appending
symbolic debugging information to the executable file, we would enter
C>LINK /MAP /CODEVIEW HELLO; <Enter>
We could also automate the entire process just described by creating a
make file named HELLO (with no extension) and including the following
instructions:
hello.obj : hello.asm
masm /C /L /Zi /T hello;
cref hello,hello
hello.exe : hello.obj
link /MAP /CODEVIEW hello;
Then, when we have made some change to HELLO.ASM and want to rebuild the
executable HELLO.EXE file, we need only enter
C>MAKE HELLO <Enter>
Programming Resources and References
The literature on IBM PC─compatible personal computers, the Intel 80x86
microprocessor family, and assembly-language and C programming is vast.
The list below contains a selection of those books that I have found to be
useful and reliable. The list should not be construed as an endorsement by
Microsoft Corporation.
MASM Tutorials
Assembly Language Primer for the IBM PC and XT, by Robert Lafore. New
American Library, New York, NY, 1984. ISBN 0-452-25711-5.
8086/8088/80286 Assembly Language, by Leo Scanlon. Brady Books, Simon and
Schuster, New York, NY, 1988. ISBN 0-13-246919-7.
C Tutorials
Microsoft C Programming for the IBM, by Robert Lafore. Howard K. Sams &
Co., Indianapolis, IN, 1987. ISBN 0-672-22515-8.
Proficient C, by Augie Hansen. Microsoft Press, Redmond, WA, 1987. ISBN
1-55615-007-5.
Intel 80x86 Microprocessor References
iAPX 88 Book. Intel Corporation, Literature Department SV3-3, 3065 Bowers
Ave., Santa Clara, CA 95051. Order no. 210200.
iAPX 286 Programmer's Reference Manual. Intel Corporation, Literature
Department SV3-3, 3065 Bowers Ave., Santa Clara, CA 95051. Order no.
210498.
iAPX 386 Programmer's Reference Manual. Intel Corporation, Literature
Department SV3-3, 3065 Bowers Ave., Santa Clara, CA 95051. Order no.
230985.
PC, PC/AT, and PS/2 Architecture
The IBM Personal Computer from the Inside Out (Revised Edition), by Murray
Sargent and Richard L. Shoemaker. Addison-Wesley Publishing Company,
Reading, MA, 1986. ISBN 0-201-06918-0.
Programmer's Guide to PC & PS/2 Video Systems, by Richard Wilton.
Microsoft Press, Redmond, WA, 1987. ISBN 1-55615-103-9.
Personal Computer Technical Reference. IBM Corporation, IBM Technical
Directory, P. O. Box 2009, Racine, WI 53404. Part no. 6322507.
Personal Computer AT Technical Reference. IBM Corporation, IBM Technical
Directory, P. O. Box 2009, Racine, WI 53404. Part no. 6280070.
Options and Adapters Technical Reference. IBM Corporation, IBM Technical
Directory, P. O. Box 2009, Racine, WI 53404. Part no. 6322509.
Personal System/2 Model 30 Technical Reference. IBM Corporation, IBM
Technical Directory, P. O. Box 2009, Racine, WI 53404. Part no. 68X2201.
Personal System/2 Model 50/60 Technical Reference. IBM Corporation, IBM
Technical Directory, P. O. Box 2009, Racine, WI 53404. Part no. 68X2224.
Personal System/2 Model 80 Technical Reference. IBM Corporation, IBM
Technical Directory, P. O. Box 2009, Racine, WI 53404. Part no. 68X2256.
────────────────────────────────────────────────────────────────────────────
Chapter 5 Keyboard and Mouse Input
The fundamental means of user input under MS-DOS is the keyboard. This
follows naturally from the MS-DOS command-line interface, whose lineage
can be traced directly to minicomputer operating systems with Teletype
consoles. During the first few years of MS-DOS's existence, when
8088/8086-based machines were the norm, nearly every popular application
program used key-driven menus and text-mode displays.
However, as high-resolution graphics adapters (and 80286/80386-based
machines with enough power to drive them) have become less expensive,
programs that support windows and a graphical user interface have steadily
grown more popular. Such programs typically rely on a pointing device such
as a mouse, stylus, joystick, or light pen to let the user navigate in a
"point-and-shoot" manner, reducing keyboard entry to a minimum. As a
result, support for pointing devices has become an important consideration
for all software developers.
Keyboard Input Methods
Applications running under MS-DOS on IBM PC─compatible machines can use
several methods to obtain keyboard input:
■ MS-DOS handle-oriented functions
■ MS-DOS traditional character functions
■ IBM ROM BIOS keyboard-driver functions
These methods offer different degrees of flexibility, portability, and
hardware independence.
The handle, or stream-oriented, functions are philosophically derived from
UNIX/XENIX and were first introduced in MS-DOS version 2.0. A program uses
these functions by supplying a handle, or token, for the desired device,
plus the address and length of a buffer.
When a program begins executing, MS-DOS supplies it with predefined
handles for certain commonly used character devices, including the
keyboard:
Handle Device name Opened to
──────────────────────────────────────────────────────────────────────────
0 Standard input (stdin) CON
1 Standard output (stdout) CON
2 Standard error (stderr) CON
3 Standard auxiliary (stdaux) AUX
4 Standard printer (stdprn) PRN
──────────────────────────────────────────────────────────────────────────
These handles can be used for read and write operations without further
preliminaries. A program can also obtain a handle for a character device
by explicitly opening the device for input or output using its logical
name (as though it were a file). The handle functions support I/O
redirection, allowing a program to take its input from another device or
file instead of the keyboard, for example. Redirection is discussed in
detail in Chapter 15.
The traditional character-input functions are a superset of the character
I/O functions that were present in CP/M. Originally included in MS-DOS
simply to facilitate the porting of existing applications from CP/M, they
are still widely used. In MS-DOS versions 2.0 and later, most of the
traditional functions also support I/O redirection (although not as well
as the handle functions do).
Use of the IBM ROM BIOS keyboard functions presupposes that the program is
running on an IBM PC─compatible machine. The ROM BIOS keyboard driver
operates at a much more primitive level than the MS-DOS functions and
allows a program to circumvent I/O redirection or MS-DOS's special
handling of certain control characters. Programs that use the ROM BIOS
keyboard driver are inherently less portable than those that use the
MS-DOS functions and may interfere with the proper operation of other
programs; many of the popular terminate-and-stay-resident (TSR) utilities
fall into this category.
Keyboard Input with Handles
The principal MS-DOS function for keyboard input using handles is Int 21H
Function 3FH (Read File or Device). The parameters for this function are
a handle, the segment and offset of a buffer, and the length of the
buffer. (For a more detailed explanation of this function, see Section
II of this book, "MS-DOS Functions Reference.")
As an example, let's use the predefined standard input handle (0) and Int
21H Function 3FH to read a line from the keyboard:
──────────────────────────────────────────────────────────────────────────
buffer db 80 dup (?) ; keyboard input buffer
.
.
.
mov ah,3fh ; function 3fh = read file or device
mov bx,0 ; handle for standard input
mov cx,80 ; maximum bytes to read
mov dx,seg buffer ; DS:DX = buffer address
mov ds,dx
mov dx,offset buffer
int 21h ; transfer to MS-DOS
jc error ; jump if error detected
.
.
.
──────────────────────────────────────────────────────────────────────────
When control returns from Int 21H Function 3FH, the carry flag is clear if
the function was successful, and AX contains the number of characters
read. If there was an error, the carry flag is set and AX contains an
error code; however, this should never occur when reading the keyboard.
The standard input is redirectable, so the code just shown is not a
foolproof way of obtaining input from the keyboard. Depending upon whether
a redirection parameter was included in the command line by the user,
program input might be coming from the keyboard, a file, another character
device, or even the bit bucket (NUL device). To bypass redirection and be
absolutely certain where your input is coming from, you can ignore the
predefined standard input handle and open the console as though it were a
file, using the handle obtained from that open operation to perform your
keyboard input, as in the following example:
──────────────────────────────────────────────────────────────────────────
buffer db 80 dup (?) ; keyboard input buffer
fname db 'CON',0 ; keyboard device name
handle dw 0 ; keyboard device handle
.
.
.
mov ah,3dh ; function 3dh = open
mov al,0 ; mode = read
mov dx,seg fname ; DS:DX = device name
mov ds,dx
mov dx,offset fname
int 21h ; transfer to MS-DOS
jc error ; jump if open failed
mov handle,ax ; save handle for CON
.
.
.
mov ah,3fh ; function 3fh = read file or device
mov bx,handle ; BX = handle for CON
mov cx,80 ; maximum bytes to read
mov dx,offset buffer ; DS:DX = buffer address
int 21h ; transfer to MS-DOS
jc error ; jump if error detected
.
.
.
──────────────────────────────────────────────────────────────────────────
When a programmer uses Int 21H Function 3FH to read from the keyboard, the
exact result depends on whether MS-DOS regards the handle to be in ASCII
mode or binary mode (sometimes known as cooked mode and raw mode). ASCII
mode is the default, although binary mode can be selected with Int 21H
Function 44H (IOCTL) when necessary.
In ASCII mode, MS-DOS initially places characters obtained from the
keyboard in a 128-byte internal buffer, and the user can edit the input
with the Backspace key and the special function keys. MS-DOS automatically
echoes the characters to the standard output, expanding tab characters to
spaces (although they are left as the ASCII code 09H in the buffer). The
Ctrl-C, Ctrl-S, and Ctrl-P key combinations receive special handling, and
the Enter key is translated to a carriage return─linefeed pair. When the
user presses Enter or Ctrl-Z, MS-DOS copies the requested number of
characters (or the actual number of characters entered, if less than the
number requested) out of the internal buffer into the calling program's
buffer.
In binary mode, MS-DOS never echoes input characters. It passes the
Ctrl-C, Ctrl-S, Ctrl-P, and Ctrl-Z key combinations and the Enter key
through to the application unchanged, and Int 21H Function 3FH does not
return control to the application until the exact number of characters
requested has been received.
Ctrl-C checking is discussed in more detail at the end of this chapter.
For now, simply note that the application programmer can substitute a
custom handler for the default MS-DOS Ctrl-C handler and thereby avoid
having the application program lose control of the machine when the user
enters a Ctrl-C or Ctrl-Break.
Keyboard Input with Traditional Calls
The MS-DOS traditional keyboard functions offer a variety of character and
line-oriented services with or without echo and Ctrl-C detection. These
functions are summarized on the following page.
Int 21H Function Action Ctrl-C checking
──────────────────────────────────────────────────────────────────────────
01H Keyboard input with echo Yes
06H Direct console I/O No
07H Keyboard input without echo No
08H Keyboard input without echo Yes
0AH Buffered keyboard input Yes
0BH Input-status check Yes
0CH Input-buffer reset and input Varies
──────────────────────────────────────────────────────────────────────────
In MS-DOS versions 2.0 and later, redirection of the standard input
affects all these functions. In other words, they act as though they were
special cases of an Int 21H Function 3FH call using the predefined
standard input handle (0).
The character-input functions (01H, 06H, 07H, and 08H) all return a
character in the AL register. For example, the following sequence waits
until a key is pressed and then returns it in AL:
──────────────────────────────────────────────────────────────────────────
mov ah,1 ; function 01h = read keyboard
int 21h ; transfer to MS-DOS
──────────────────────────────────────────────────────────────────────────
The character-input functions differ in whether the input is echoed to the
screen and whether they are sensitive to Ctrl-C interrupts. Although
MS-DOS provides no pure keyboard-status function that is immune to Ctrl-C,
a program can read keyboard status (somewhat circuitously) without
interference by using Int 21H Function 06H. Extended keys, such as the
IBM PC keyboard's special function keys, require two calls to a
character-input function.
As an alternative to single-character input, a program can use
buffered-line input (Int 21H Function 0AH) to obtain an entire line from
the keyboard in one operation. MS-DOS builds up buffered lines in an
internal buffer and does not pass them to the calling program until the
user presses the Enter key. While the line is being entered, all the usual
editing keys are active and are handled by the MS-DOS keyboard driver. You
use Int 21H Function 0AH as follows:
──────────────────────────────────────────────────────────────────────────
buff db 81 ; maximum length of input
db 0 ; actual length (from MS-DOS)
db 81 dup (0) ; receives keyboard input
.
.
.
mov ah,0ah ; function 0ah = read buffered line
mov dx,seg buff ; DS:DX = buffer address
mov ds,dx
mov dx,offset buff
int 21h ; transfer to MS-DOS
.
.
.
──────────────────────────────────────────────────────────────────────────
Int 21H Function 0AH differs from Int 21H Function 3FH in several
important ways. First, the maximum length is passed in the first byte of
the buffer, rather than in the CX register. Second, the actual length is
returned in the second byte of the structure, rather than in the AX
register. Finally, when the user has entered one less than the specified
maximum number of characters, MS-DOS ignores all subsequent characters and
sounds a warning beep until the Enter key is pressed.
For detailed information about each of the traditional keyboard-input
functions, see Section II of this book, "MS-DOS Functions Reference."
Keyboard Input with ROM BIOS Functions
Programmers writing applications for IBM PC compatibles can bypass the
MS-DOS keyboard functions and choose from two hardware-dependent
techniques for keyboard input.
The first method is to call the ROM BIOS keyboard driver using Int 16H.
For example, the following sequence reads a single character from the
keyboard input buffer and returns it in the AL register:
──────────────────────────────────────────────────────────────────────────
mov ah,0 ; function 0=read keyboard
int 16h ; transfer to ROM BIOS
──────────────────────────────────────────────────────────────────────────
Int 16H Function 00H also returns the keyboard scan code in the AH
register, allowing the program to detect key codes that are not ordinarily
returned by MS-DOS. Other Int 16H services return the keyboard status
(that is, whether a character is waiting) or the keyboard shift state
(from the ROM BIOS data area 0000:0417H). For a more detailed explanation
of ROM BIOS keyboard functions, see Section III of this book, "IBM ROM
BIOS and Mouse Functions Reference."
You should consider carefully before building ROM BIOS dependence into an
application. Although this technique allows you to bypass any I/O
redirection that may be in effect, ways exist to do this without
introducing dependence on the ROM BIOS. And there are real disadvantages
to calling the ROM BIOS keyboard driver:
■ It always bypasses I/O redirection, which sometimes may not be
desirable.
■ It is dependent on IBM PC compatibility and does not work correctly,
unchanged, on some older machines such as the Hewlett-Packard
TouchScreen or the Wang Professional Computer.
■ It may introduce complicated interactions with TSR utilities.
The other and more hardware-dependent method of keyboard input on an IBM
PC is to write a new handler for ROM BIOS Int 09H and service the keyboard
controller's interrupts directly. This involves translation of scan codes
to ASCII characters and maintenance of the type-ahead buffer. In ordinary
PC applications, there is no reason to take over keyboard I/O at this
level; therefore, I will not discuss this method further here. If you are
curious about the techniques that would be required, the best reference is
the listing for the ROM BIOS Int 09H handler in the IBM PC or PC/AT
technical reference manual.
Ctrl-C and Ctrl-Break Handlers
In the discussion of keyboard input with the MS-DOS handle and traditional
functions, I made some passing references to the fact that Ctrl-C entries
can interfere with the expected behavior of those functions. Let's look at
this subject in more detail now.
During most character I/O operations, MS-DOS checks for a Ctrl-C (ASCII
code 03H) waiting at the keyboard and executes an Int 23H if one is
detected. If the system break flag is on, MS-DOS also checks for a Ctrl-C
entry during certain other operations (such as file reads and writes).
Ordinarily, the Int 23H vector points to a routine that simply terminates
the currently active process and returns control to the parent process──
usually the MS-DOS command interpreter.
In other words, if your program is executing and you enter a Ctrl-C,
accidentally or intentionally, MS-DOS simply aborts the program. Any files
the program has opened using file control blocks will not be closed
properly, any interrupt vectors it has altered may not be restored
correctly, and if it is performing any direct I/O operations (for example,
if it contains an interrupt driver for the serial port), all kinds of
unexpected events may occur.
Although you can use a number of partially effective methods to defeat
Ctrl-C checking, such as performing keyboard input with Int 21H Functions
06H and 07H, placing all character devices into binary mode, or turning
off the system break flag with Int 21H Function 33H, none of these is
completely foolproof. The simplest and most elegant way to defeat Ctrl-C
checking is simply to substitute your own Int 23H handler, which can take
some action appropriate to your program. When the program terminates,
MS-DOS automatically restores the previous contents of the Int 23H vector
from information saved in the program segment prefix. The following
example shows how to install your own Ctrl-C handler (which in this case
does nothing at all):
──────────────────────────────────────────────────────────────────────────
push ds ; save data segment
; set int 23h vector...
mov ax,2523h ; function 25h = set interrupt
; int 23h = vector for
; Ctrl-C handler
mov dx,seg handler ; DS:DX = handler address
mov ds,dx
mov dx,offset handler
int 21h ; transfer to MS-DOS
pop ds ; restore data segment
.
.
.
handler: ; a Ctrl-C handler
iret ; that does nothing
──────────────────────────────────────────────────────────────────────────
The first part of the code (which alters the contents of the Int 23H
vector) would be executed in the initialization part of the application.
The handler receives control whenever MS-DOS detects a Ctrl-C at the
keyboard. (Because this handler consists only of an interrupt return, the
Ctrl-C will remain in the keyboard input stream and will be passed to the
application when it requests a character from the keyboard, appearing on
the screen as ^C.)
When an Int 23H handler is called, MS-DOS is in a stable state. Thus, the
handler can call any MS-DOS function. It can also reset the segment
registers and the stack pointer and transfer control to some other point
in the application without ever returning control to MS-DOS with an IRET.
On IBM PC compatibles, an additional interrupt handler must be taken into
consideration. Whenever the ROM BIOS keyboard driver detects the key
combination Ctrl-Break, it calls a handler whose address is stored in the
vector for Int 1BH. The default ROM BIOS Int 1BH handler does nothing.
MS-DOS alters the Int 1BH vector to point to its own handler, which sets a
flag and returns; the net effect is to remap the Ctrl-Break into a Ctrl-C
that is forced ahead of any other characters waiting in the keyboard
buffer.
Taking over the Int 1BH vector in an application is somewhat tricky but
extremely useful. Because the keyboard is interrupt driven, a press of
Ctrl-Break lets the application regain control under almost any
circumstance──often, even if the program has crashed or is in an endless
loop.
You cannot, in general, use the same handler for Int 1BH that you use for
Int 23H. The Int 1BH handler is more limited in what it can do, because it
has been called as a result of a hardware interrupt and MS-DOS may have
been executing a critical section of code at the time the interrupt was
serviced. Thus, all registers except CS:IP are in an unknown state; they
may have to be saved and then modified before your interrupt handler can
execute. Similarly, the depth of the stack in use when the Int 1BH handler
is called is unknown, and if the handler is to perform stack-intensive
operations, it may have to save the stack segment and the stack pointer
and switch to a new stack that is known to have sufficient depth.
In normal application programs, you should probably avoid retaining
control in an Int 1BH handler, rather than performing an IRET. Because of
subtle differences among non-IBM ROM BIOSes, it is difficult to predict
the state of the keyboard controller and the 8259 Programmable Interrupt
Controller (PIC) when the Int 1BH handler begins executing. Also, MS-DOS
itself may not be in a stable state at the point of interrupt, a situation
that can manifest itself in unexpected critical errors during subsequent
I/O operations. Finally, MS-DOS versions 3.2 and later allocate a stack
from an internal pool for use by the Int 09H handler. If the Int 1BH
handler never returns, the Int 09H handler never returns either, and
repeated entries of Ctrl-Break will eventually exhaust the stack pool,
halting the system.
Because Int 1BH is a ROM BIOS interrupt and not an MS-DOS interrupt,
MS-DOS does not restore the previous contents of the Int 1BH vector when a
program exits. If your program modifies this vector, it must save the
original value and restore it before terminating. Otherwise, the vector
will be left pointing to some random area in the next program that runs,
and the next time the user presses Ctrl-Break a system crash is the best
you can hope for.
Ctrl-C and Ctrl-Break Handlers and High-Level Languages
Capturing the Ctrl-C and Ctrl-Break interrupts is straightforward when you
are programming in assembly language. The process is only slightly more
difficult with high-level languages, as long as you have enough
information about the language's calling conventions that you can link in
a small assembly-language routine as part of the program.
The BREAK.ASM listing in Figure 5-1 contains source code for a Ctrl-Break
handler that can be linked with small-model Microsoft C programs running
on an IBM PC compatible. The short C program in Figure 5-2 demonstrates
use of the handler. (This code should be readily portable to other C
compilers.)
──────────────────────────────────────────────────────────────────────────
page 55,132
title Ctrl-C & Ctrl-Break Handlers
name break
;
; Ctrl-C and Ctrl-Break handler for Microsoft C
; programs running on IBM PC compatibles
;
; by Ray Duncan
;
; Assemble with: C>MASM /Mx BREAK;
;
; This module allows C programs to retain control
; when the user enters a Ctrl-Break or Ctrl-C.
; It uses Microsoft C parameter-passing conventions
; and assumes the C small memory model.
;
; The procedure _capture is called to install
; a new handler for the Ctrl-C and Ctrl-Break
; interrupts (1bh and 23h). _capture is passed
; the address of a static variable, which will be
; set to true by the handler whenever a Ctrl-C
; or Ctrl-Break is detected. The C syntax is:
;
; static int flag;
; capture(&flag);
;
; The procedure _release is called by the C program
; to restore the original Ctrl-Break and Ctrl-C
; handler. The C syntax is:
; release();
;
; The procedure ctrlbrk is the actual interrupt
; handler. It receives control when a software
; int 1bh is executed by the ROM BIOS or int 23h
; is executed by MS-DOS. It simply sets the C
; program's variable to true (1) and returns.
;
args equ 4 ; stack offset of arguments,
; C small memory model
cr equ 0dh ; ASCII carriage return
lf equ 0ah ; ASCII linefeed
_TEXT segment word public 'CODE'
assume cs:_TEXT
public _capture
_capture proc near ; take over Ctrl-Break
; and Ctrl-C interrupt vectors
push bp ; set up stack frame
mov bp,sp
push ds ; save registers
push di
push si
; save address of
; calling program's "flag"
mov ax,word ptr [bp+args]
mov word ptr cs:flag,ax
mov word ptr cs:flag+2,ds
; save address of original
mov ax,3523h ; int 23h handler
int 21h
mov word ptr cs:int23,bx
mov word ptr cs:int23+2,es
mov ax,351bh ; save address of original
int 21h ; int 1bh handler
mov word ptr cs:int1b,bx
mov word ptr cs:int1b+2,es
push cs ; set DS:DX = address
pop ds ; of new handler
mov dx,offset _TEXT:ctrlbrk
mov ax,02523h ; set int 23h vector
int 21h
mov ax,0251bh ; set int 1bh vector
int 21h
pop si ; restore registers
pop di
pop ds
pop bp ; discard stack frame
ret ; and return to caller
_capture endp
public _release
_release proc near ; restore original Ctrl-C
; and Ctrl-Break handlers
push bp ; save registers
push ds
push di
push si
lds dx,cs:int1b ; get address of previous
; int 1bh handler
mov ax,251bh ; set int 1bh vector
int 21h
lds dx,cs:int23 ; get address of previous
; int 23h handler
mov ax,2523h ; set int 23h vector
int 21h
pop si ; restore registers
pop di ; and return to caller
pop ds
pop bp
ret
release endp
ctrlbrk proc far ; Ctrl-C and Ctrl-Break
; interrupt handler
push bx ; save registers
push ds
lds bx,cs:flag ; get address of C program's
; "flag variable"
; and set the flag "true"
mov word ptr ds:[bx],1
pop ds ; restore registers
pop bx
iret ; return from handler
ctrlbrk endp
flag dd 0 ; far pointer to caller's
; Ctrl-Break or Ctrl-C flag
int23 dd 0 ; address of original
; Ctrl-C handler
int1b dd 0 ; address of original
; Ctrl-Break handler
_TEXT ends
end
──────────────────────────────────────────────────────────────────────────
Figure 5-1. BREAK.ASM: A Ctrl-C and Ctrl-Break interrupt handler that can
be linked with Microsoft C programs.
──────────────────────────────────────────────────────────────────────────
/*
TRYBREAK.C
Demo of BREAK.ASM Ctrl-Break and Ctrl-C
interrupt handler, by Ray Duncan
To create the executable file TRYBREAK.EXE, enter:
MASM /Mx BREAK;
CL TRYBREAK.C BREAK.OBJ
*/
#include <stdio.h>
main(int argc, char *argv[])
{
int hit = 0; /* flag for key press */
int c = 0; /* character from keyboard */
static int flag = 0; /* true if Ctrl-Break
or Ctrl-C detected */
puts("\n*** TRYBREAK.C running ***\n");
puts("Press Ctrl-C or Ctrl-Break to test handler,");
puts("Press the Esc key to exit TRYBREAK.\n");
capture(&flag); /* install new Ctrl-C and
Ctrl-Break handler and
pass address of flag */
puts("TRYBREAK has captured interrupt vectors.\n");
while(1)
{
hit = kbhit(); /* check for key press */
/* (MS-DOS sees Ctrl-C
when keyboard polled) */
if(flag != 0) /* if flag is true, an */
{ /* interrupt has occurred */
puts("\nControl-Break detected.\n");
flag = 0; /* reset interrupt flag */
}
if(hit != 0) /* if any key waiting */
{
c = getch(); /* read key, exit if Esc */
if( (c & 0x7f) == 0x1b) break;
putch(c); /* otherwise display it */
}
}
release(); /* restore original Ctrl-C
and Ctrl-Break handlers */
puts("\n\nTRYBREAK has released interrupt vectors.");
}
──────────────────────────────────────────────────────────────────────────
Figure 5-2. TRYBREAK.C: A simple Microsoft C program that demonstrates
use of the interrupt handler BREAK.ASM from Figure 5-1.
In the example handler, the procedure named capture is called with the
address of an integer variable within the C program. It saves the address
of the variable, points the Int 1BH and Int 23H vectors to its own
interrupt handler, and then returns.
When MS-DOS detects a Ctrl-C or Ctrl-Break, the interrupt handler sets the
integer variable within the C program to true (1) and returns. The C
program can then poll this variable at its leisure. Of course, to detect
more than one Ctrl-C, the program must reset the variable to zero again.
The procedure named release simply restores the Int 1BH and Int 23H
vectors to their original values, thereby disabling the interrupt handler.
Although it is not strictly necessary for release to do anything about Int
23H, this action does give the C program the option of restoring the
default handler for Int 23H without terminating.
Pointing Devices
Device drivers for pointing devices are supplied by the hardware
manufacturer and are loaded with a DEVICE statement in the CONFIG.SYS
file. Although the hardware characteristics of the available pointing
devices differ greatly, nearly all of their drivers present the same
software interface to application programs: the Int 33H protocol used by
the Microsoft Mouse driver. Version 6 of the Microsoft Mouse driver (which
was current as this was written) offers the following functions:
╓┌─┌──────────────────┌──────────────────────────────────────────────────────╖
Function Meaning
──────────────────────────────────────────────────────────────────────────
00H Reset mouse and get status.
Function Meaning
──────────────────────────────────────────────────────────────────────────
00H Reset mouse and get status.
01H Show mouse pointer.
02H Hide mouse pointer.
03H Get button status and pointer position.
04H Set pointer position.
05H Get button-press information.
06H Get button-release information.
07H Set horizontal limits for pointer.
08H Set vertical limits for pointer.
09H Set graphics pointer type.
0AH Set text pointer type.
0BH Read mouse-motion counters.
0CH Install interrupt handler for mouse events.
0DH Turn on light pen emulation.
0EH Turn off light pen emulation.
0FH Set mickeys to pixel ratio.
10H Set pointer exclusion area.
13H Set double-speed threshold.
14H Swap mouse-event interrupt routines.
Function Meaning
──────────────────────────────────────────────────────────────────────────
14H Swap mouse-event interrupt routines.
15H Get buffer size for mouse-driver state.
16H Save mouse-driver state.
17H Restore mouse-driver state.
18H Install alternate handler for mouse events.
19H Get address of alternate handler.
1AH Set mouse sensitivity.
1BH Get mouse sensitivity.
1CH Set mouse interrupt rate.
1DH Select display page for pointer.
1EH Get display page for pointer.
1FH Disable mouse driver.
20H Enable mouse driver.
21H Reset mouse driver.
22H Set language for mouse-driver messages.
23H Get language number.
24H Get driver version, mouse type, and IRQ number.
──────────────────────────────────────────────────────────────────────────
Function Meaning
──────────────────────────────────────────────────────────────────────────
Although this list of mouse functions may appear intimidating, the average
application will only need a few of them.
A program first calls Int 33H Function 00H to initialize the mouse driver
for the current display mode and to check its status. At this point, the
mouse is "alive" and the application can obtain its state and position;
however, the pointer does not become visible until the process calls Int
33H Function 01H.
The program can then call Int 33H Functions 03H, 05H, and 06H to
monitor the mouse position and the status of the mouse buttons.
Alternatively, the program can register an interrupt handler for mouse
events, using Int 33H Function 0CH. This latter technique eliminates the
need to poll the mouse driver; the driver will notify the program by
calling the interrupt handler whenever the mouse is moved or a button is
pressed or released.
When the application is finished with the mouse, it can call Int 33H
Function 02H to hide the mouse pointer. If the program has registered an
interrupt handler for mouse events, it should disable further calls to the
handler by resetting the mouse driver again with Int 33H Function 00H.
For a complete description of the mouse-driver functions, see Section
III of this book, "IBM ROM BIOS and Mouse Functions Reference." Figure
5-3 shows a small demonstration program that polls the mouse continually,
to display its position and status.
──────────────────────────────────────────────────────────────────────────
/*
Simple Demo of Int 33H Mouse Driver
(C) 1988 Ray Duncan
Compile with: CL MOUDEMO.C
*/
#include <stdio.h>
#include <dos.h>
union REGS regs;
void cls(void); /* function prototypes */
void gotoxy(int, int);
main(int argc, char *argv[])
{
int x,y,buttons; /* some scratch variables */
/* for the mouse state */
regs.x.ax = 0; /* reset mouse driver */
int86(0x33, ®s, ®s); /* and check status */
if(regs.x.ax == 0) /* exit if no mouse */
{ printf("\nMouse not available\n");
exit(1);
}
cls(); /* clear the screen */
gotoxy(45,0); /* and show help info */
puts("Press Both Mouse Buttons To Exit");
regs.x.ax = 1; /* display mouse cursor */
int86(0x33, ®s, ®s);
do {
regs.x.ax = 3; /* get mouse position */
int86(0x33, ®s, ®s); /* and button status */
buttons = regs.x.bx & 3;
x = regs.x.cx;
y = regs.x.dx;
gotoxy(0,0); /* display mouse position */
printf("X = %3d Y = %3d", x, y);
} while(buttons != 3); /* exit if both buttons down */
regs.x.ax = 2; /* hide mouse cursor */
int86(0x33, ®s, ®s);
cls(); /* display message and exit */
gotoxy(0,0);
puts("Have a Mice Day!");
}
/*
Clear the screen
*/
void cls(void)
{
regs.x.ax = 0x0600; /* ROM BIOS video driver */
regs.h.bh = 7; /* int 10h function 06h */
regs.x.cx = 0; /* initializes a window */
regs.h.dh = 24;
regs.h.dl = 79;
int86(0x10, ®s, ®s);
}
/*
Position cursor to (x,y)
*/
void gotoxy(int x, int y)
{
regs.h.dl = x; /* ROM BIOS video driver */
regs.h.dh = y; /* int 10h function 02h */
regs.h.bh = 0; /* positions the cursor */
regs.h.ah = 2;
int86(0x10, ®s, ®s);
}
──────────────────────────────────────────────────────────────────────────
Figure 5-3. MOUDEMO.C: A simple Microsoft C program that polls the mouse
and continually displays the coordinates of the mouse pointer in the upper
left corner of the screen. The program uses the ROM BIOS video driver,
which is discussed in Chapter 6, to clear the screen and position the
text cursor.
────────────────────────────────────────────────────────────────────────────
Chapter 6 Video Display
The visual presentation of an application program is one of its most
important elements. Users frequently base their conclusions about a
program's performance and "polish" on the speed and attractiveness of its
displays. Therefore, a feel for the computer system's display facilities
and capabilities at all levels, from MS-DOS down to the bare hardware, is
important to you as a programmer.
Video Display Adapters
The video display adapters found in IBM PC─compatible computers have a
hybrid interface to the central processor. The overall display
characteristics, such as vertical and horizontal resolution, background
color, and palette, are controlled by values written to I/O ports whose
addresses are hardwired on the adapter, whereas the appearance of each
individual character or graphics pixel on the display is controlled by a
specific location within an area of memory called the regen buffer or
refresh buffer. Both the CPU and the video controller access this memory;
the software updates the display by simply writing character codes or bit
patterns directly into the regen buffer. (This is called memory-mapped
I/O.)
The following adapters are in common use as this book is being written:
■ Monochrome/Printer Display Adapter (MDA). Introduced with the original
IBM PC in 1981, this adapter supports 80-by-25 text display on a green
(monochrome) screen and has no graphics capabilities at all.
■ Color/Graphics Adapter (CGA). Also introduced by IBM in 1981, this
adapter supports 40-by-25 and 80-by-25 text modes and 320-by-200,
4-color or 640-by-200, 2-color graphics (all-points-addressable, or
APA) modes on composite or digital RGB monitors.
■ Enhanced Graphics Adapter (EGA). Introduced by IBM in 1985 and upwardly
compatible from the CGA, this adapter adds support for 640-by-350,
16-color graphics modes on digital RGB monitors. It also supports an
MDA-compatible text mode.
■ Multi-Color Graphics Array (MCGA). Introduced by IBM in 1987 with the
Personal System/2 (PS/2) models 25 and 30, this adapter is partially
compatible with the CGA and EGA and supports 640-by-480, 2-color or
320-by-200, 256-color graphics on analog RGB monitors.
■ Video Graphics Array (VGA). Introduced by IBM in 1987 with the PS/2
models 50, 60, and 80, this adapter is upwardly compatible from the EGA
and supports 640-by-480, 16-color or 320-by-200, 256-color graphics on
analog RGB monitors. It also supports an MDA-compatible text mode.
■ Hercules Graphics Card, Graphics CardPlus, and InColor Cards. These are
upwardly compatible from the MDA for text display but offer graphics
capabilities that are incompatible with all of the IBM adapters.
The locations of the regen buffers for the various IBM PC─compatible
adapters are shown in Figure 6-1.
┌───────────────────────────────────────────────────────┐
│ ROM BIOS │
FE000H ├───────────────────────────────────────────────────────┤
│ System ROM, Stand-alone BASIC, etc. │
F4000H ├───────────────────────────────────────────────────────┤
│ Reserved for BIOS extensions │
│ (hard-disk controller, etc.) │
C0000H ├───────────────────────────────────────────────────────┤
│ Reserved │
BC000H ├───────────────────────────────────────────────────────┤
│ 16 KB regen buffer for CGA, EGA, MCGA, and VGA │
│ in text modes and 200-line graphics modes │
B8000H ├───────────────────────────────────────────────────────┤
│ Reserved │
B1000H ├───────────────────────────────────────────────────────┤
│ 4 KB Monochrome Adapter regen buffer │
B0000H ├───────────────────────────────────────────────────────┤
│ Regen buffer area for EGA, MCGA, and VGA │
│ in 350-line or 480-line graphics modes │
A0000H ├───────────────────────────────────────────────────────┤
│ Transient part of COMMAND.COM │
├───────────────────────────────────────────────────────┤
│ Transient program area │
varies ├───────────────────────────────────────────────────────┤
│ MS-DOS and its buffers, │
│ tables, and device drivers │
00400H ├───────────────────────────────────────────────────────┤
│ Interrupt vectors │
00000H └───────────────────────────────────────────────────────┘
Figure 6-1. Memory diagram of an IBM PC─compatible personal computer,
showing the locations of the regen buffers for various adapters.
Support Considerations
MS-DOS offers several functions to transfer text to the display. Version 1
supported only Teletype-like output capabilities; version 2 added an
optional ANSI console driver to allow the programmer to clear the screen,
position the cursor, and select colors and attributes with standard escape
sequences embedded in the output. Programs that use only the MS-DOS
functions will operate properly on any computer system that runs MS-DOS,
regardless of the level of IBM hardware compatibility.
On IBM PC─compatible machines, the ROM BIOS contains a video driver that
programs can invoke directly, bypassing MS-DOS. The ROM BIOS functions
allow a program to write text or individual pixels to the screen or to
select display modes, video pages, palette, and foreground and background
colors. These functions are relatively efficient (compared with the MS-DOS
functions, at least), although the graphics support is primitive.
Unfortunately, the display functions of both MS-DOS and the ROM BIOS were
designed around the model of a cursor-addressable terminal and therefore
do not fully exploit the capabilities of the memory-mapped, high-bandwidth
display adapters used on IBM PC─compatible machines. As a result, nearly
every popular interactive application with full-screen displays or
graphics capability ignores both MS-DOS and the ROM BIOS and writes
directly to the video controller's registers and regen buffer.
Programs that control the hardware directly are sometimes called
"ill-behaved," because they are performing operations that are normally
reserved for operating-system device drivers. These programs are a severe
management problem in multitasking real-mode environments such as DesqView
and Microsoft Windows, and they are the main reason why such environments
are not used more widely. It could be argued, however, that the blame for
such problematic behavior lies not with the application programs but with
the failure of MS-DOS and the ROM BIOS──even six years after the first
appearance of the IBM PC──to provide display functions of adequate range
and power.
MS-DOS Display Functions
Under MS-DOS versions 2.0 and later, the preferred method for sending text
to the display is to use handle-based Int 21H Function 40H (Write File or
Device). When an application program receives control, MS-DOS has already
assigned it handles for the standard output (1) and standard error (2)
devices, and these handles can be used immediately. For example, the
sequence at the top of the following page writes the message hello to the
display using the standard output handle.
──────────────────────────────────────────────────────────────────────────
msg db 'hello' ; message to display
msg_len equ $-msg ; length of message
.
.
.
mov ah,40h ; function 40h = write file or device
mov bx,1 ; BX = standard output handle
mov cx,msg_len ; CX = message length
mov dx,seg msg ; DS:DX = address of message
mov ds,dx
mov dx,offset msg
int 21h ; transfer to MS-DOS
jc error ; jump if error detected
.
.
.
──────────────────────────────────────────────────────────────────────────
If there is no error, the function returns the carry flag cleared and the
number of characters actually transferred in register AX. Unless a Ctrl-Z
is embedded in the text or the standard output is redirected to a disk
file and the disk is full, this number should equal the number of
characters requested.
As in the case of keyboard input, the user's ability to specify
command-line redirection parameters that are invisible to the application
means that if you use the predefined standard output handle, you can't
always be sure where your output is going. However, to ensure that your
output actually goes to the display, you can use the predefined standard
error handle, which is always opened to the CON (logical console) device
and is not redirectable.
As an alternative to the standard output and standard error handles, you
can bypass any output redirection and open a separate channel to CON,
using the handle obtained from that open operation for character output.
For example, the following code opens the console display for output and
then writes the string hello to it:
──────────────────────────────────────────────────────────────────────────
fname db 'CON',0 ; name of CON device
handle dw 0 ; handle for CON device
msg db 'hello' ; message to display
msg_len equ $-msg ; length of message
.
.
.
mov ax,3d02h ; AH = function 3dh = open
; AL = mode = read/write
mov dx,seg fname ; DS:DX = device name
mov ds,dx
mov dx,offset fname
int 21h ; transfer to MS-DOS
jc error ; jump if open failed
mov handle,ax ; save handle for CON
.
.
.
mov ah,40h ; function 40h = write
mov cx,msg_len ; CX = message length
mov dx,seg msg ; DS:DX = address of message
mov ds,dx
mov dx,offset msg
mov bx,handle ; BX = CON device handle
int 21h ; transfer to MS-DOS
jc error ; jump if error detected
.
.
.
──────────────────────────────────────────────────────────────────────────
As with the keyboard input functions, MS-DOS also supports traditional
display functions that are upwardly compatible from the corresponding CP/M
output calls:
■ Int 21H Function 02H sends the character in the DL register to the
standard output device. It is sensitive to Ctrl-C interrupts, and it
handles carriage returns, linefeeds, bell codes, and backspaces
appropriately.
■ Int 21H Function 06H transfers the character in the DL register to the
standard output device, but it is not sensitive to Ctrl-C interrupts.
You must take care when using this function, because it can also be
used for input and for status requests.
■ Int 21H Function 09H sends a string to the standard output device. The
string is terminated by the $ character.
With MS-DOS version 2 or later, these three traditional functions are
converted internally to handle-based writes to the standard output and
thus are susceptible to output redirection.
The sequence at the top of the following page sounds a warning beep by
sending an ASCII bell code (07H) to the display driver using the
traditional character-output call Int 21H Function 02H.
──────────────────────────────────────────────────────────────────────────
.
.
.
mov dl,7 ; 07h = ASCII bell code
mov ah,2 ; function 02h = display character
int 21h ; transfer to MS-DOS
.
.
.
──────────────────────────────────────────────────────────────────────────
The following sequence uses the traditional string-output call Int 21H
Function 09H to display a string:
──────────────────────────────────────────────────────────────────────────
msg db 'hello$'
.
.
.
mov dx,seg msg ; DS:DX = message address
mov ds,dx
mov dx,offset msg
mov ah,9 ; function 09h = write string
int 21h ; transfer to MS-DOS
.
.
.
──────────────────────────────────────────────────────────────────────────
Note that MS-DOS detects the $ character as a terminator and does not
display it on the screen.
Screen Control with MS-DOS Functions
With version 2.0 or later, if MS-DOS loads the optional device driver
ANSI.SYS in response to a DEVICE directive in the CONFIG.SYS file,
programs can clear the screen, control the cursor position, and select
foreground and background colors by embedding escape sequences in the text
output. Escape sequences are so called because they begin with an escape
character (1BH), which alerts the driver to intercept and interpret the
subsequent characters in the sequence. When the ANSI driver is not loaded,
MS-DOS simply passes the escape sequence to the display like any other
text, usually resulting in a chaotic screen.
The escape sequences that can be used with the ANSI driver for screen
control are a subset of those defined in the ANSI 3.64─1979 Standard.
These standard sequences are summarized in Figure 6-2. Note that case is
significant for the last character in an escape sequence and that numbers
must always be represented as ASCII digit strings, not as their binary
values. (A separate set of escape sequences supported by ANSI.SYS, but not
compatible with the ANSI standard, may be used for reprogramming and
remapping the keyboard.)
╓┌─┌──────────────────┌──────────────────────────────────────────────────────╖
Escape sequence Meaning
──────────────────────────────────────────────────────────────────────────
Esc[2J Clear screen; place cursor in upper left corner (home
position).
Esc[K Clear from cursor to end of line.
Esc[row;colH Position cursor. (Row is the y coordinate in the range
1─25 and col is the x coordinate in the range 1─80 for
80-by-25 text display modes.) Escape sequences
terminated with the letter f instead of H have the same
effect.
Escape sequence Meaning
──────────────────────────────────────────────────────────────────────────
effect.
Esc[nA Move cursor up n rows.
Esc[nB Move cursor down n rows.
Esc[nC Move cursor right n columns.
Esc[nD Move cursor left n columns.
Esc[s Save current cursor position.
Esc[u Restore cursor to saved position.
Esc[6n Return current cursor position on the standard input
handle in the format Esc[row;colR.
Esc[nm Select character attributes:
0 = no special attributes
1 = high intensity
2 = low intensity
3 = italic
4 = underline
5 = blink
6 = rapid blink
7 = reverse video
8 = concealed text (no display)
Escape sequence Meaning
──────────────────────────────────────────────────────────────────────────
8 = concealed text (no display)
30 = foreground black
31 = foreground red
32 = foreground green
33 = foreground yellow
34 = foreground blue
35 = foreground magenta
36 = foreground cyan
37 = foreground white
40 = background black
41 = background red
42 = background green
43 = background yellow
44 = background blue
45 = background magenta
46 = background cyan
47 = background white
Esc[=nh Select display mode:
0 = 40-by-25, 16-color text (color burst off)
Escape sequence Meaning
──────────────────────────────────────────────────────────────────────────
0 = 40-by-25, 16-color text (color burst off)
1 = 40-by-25, 16-color text
2 = 80-by-25, 16-color text (color burst off)
3 = 80-by-25, 16-color text
4 = 320-by-200, 4-color graphics
5 = 320-by-200, 4-color graphics (color burst off)
6 = 620-by-200, 2-color graphics
14 = 640-by-200, 16-color graphics (EGA and VGA,
MS-DOS 4.0)
15 = 640-by-350, 2-color graphics (EGA and VGA,
MS-DOS 4.0)
16 = 640-by-350, 16-color graphics (EGA and VGA,
MS-DOS 4.0)
17 = 640-by-480, 2-color graphics (MCGA and VGA,
MS-DOS 4.0)
18 = 640-by-480, 16-color graphics (VGA, MS-DOS 4.0)
19 = 320-by-200, 256-color graphics (MCGA and VGA,
MS-DOS 4.0)
Escape sequences terminated with l instead of h have
Escape sequence Meaning
──────────────────────────────────────────────────────────────────────────
Escape sequences terminated with l instead of h have
the same effect.
Esc[=7h Enable line wrap.
Esc[=7l Disable line wrap.
──────────────────────────────────────────────────────────────────────────
Figure 6-2. The ANSI escape sequences supported by the MS-DOS ANSI.SYS
driver. Programs running under MS-DOS 2.0 or later may use these
functions, if ANSI.SYS is loaded, to control the appearance of the display
in a hardware-independent manner. The symbol Esc indicates an ASCII escape
code──a character with the value 1BH. Note that cursor positions in ANSI
escape sequences are one-based, unlike the cursor coordinates used by the
IBM ROM BIOS, which are zero-based. Numbers embedded in an escape sequence
must always be represented as a string of ASCII digits, not as their
binary values.
Binary Output Mode
Under MS-DOS version 2 or later, you can substantially increase display
speeds for well-behaved application programs without sacrificing hardware
independence by selecting binary (raw) mode for the standard output. In
binary mode, MS-DOS does not check between each character it transfers to
the output device for a Ctrl-C waiting at the keyboard, nor does it filter
the output string for certain characters such as Ctrl-Z.
Bit 5 in the device information word associated with a device handle
controls binary mode. Programs access the device information word by using
Subfunctions 00H and 01H of the MS-DOS IOCTL function (I/O Control, Int
21H Function 44H). For example, the sequence on the following page places
the standard output handle into binary mode.
──────────────────────────────────────────────────────────────────────────
; get device information...
mov bx,1 ; standard output handle
mov ax,4400h ; function 44h subfunction 00h
int 21h ; transfer to MS-DOS
mov dh,0 ; set upper byte of DX = 0
or dl,20h ; set binary mode bit in DL
; write device information...
; (BX still has handle)
mov ax,4401h ; function 44h subfunction 01h
int 21h ; transfer to MS-DOS
──────────────────────────────────────────────────────────────────────────
Note that if a program changes the mode of any of the standard handles, it
should restore those handles to ASCII (cooked) mode before it exits.
Otherwise, subsequent application programs may behave in unexpected ways.
For more detailed information on the IOCTL function, see Section II of
this book, "MS-DOS Functions Reference."
The ROM BIOS Display Functions
You can somewhat improve the display performance of programs that are
intended for use only on IBM PC─compatible machines by using the ROM BIOS
video driver instead of the MS-DOS output functions. Accessed by means of
Int 10H, the ROM BIOS driver supports the following functions for all of
the currently available IBM display adapters:
╓┌─┌──────────────────┌──────────────────────────────────────────────────────╖
Function Action
──────────────────────────────────────────────────────────────────────────
Display mode control
00H Set display mode.
0FH Get display mode.
Cursor control
01H Set cursor size.
02H Set cursor position.
03H Get cursor position and size.
Writing to the display
09H Write character and attribute at cursor.
0AH Write character-only at cursor.
0EH Write character in teletype mode.
Reading from the display
08H Read character and attribute at cursor.
Function Action
──────────────────────────────────────────────────────────────────────────
Graphics support
0CH Write pixel.
0DH Read pixel.
Scroll or clear display
06H Scroll up or initialize window.
07H Scroll down or initialize window.
Miscellaneous
04H Read light pen.
05H Select display page.
0BH Select palette/set border color.
──────────────────────────────────────────────────────────────────────────
Additional ROM BIOS functions are available on the EGA, MCGA, VGA, and
PCjr to support the enhanced features of these adapters, such as
programmable palettes and character sets (fonts). Some of the functions
are valid only in certain display modes.
Each display mode is characterized by the number of colors it can display,
its vertical resolution, its horizontal resolution, and whether it
supports text or graphics memory mapping. The ROM BIOS identifies it with
a unique number. Section III of this book, "IBM ROM BIOS and Mouse
Functions Reference," documents all of the ROM BIOS Int 10H functions and
display modes.
As you can see from the preceding list, the ROM BIOS offers several
desirable capabilities that are not available from MS-DOS, including
initialization or scrolling of selected screen windows, modification of
the cursor shape, and reading back the character being displayed at an
arbitrary screen location. These functions can be used to isolate your
program from the hardware on any IBM PC─compatible adapter. However, the
ROM BIOS functions do not suffice for the needs of a high-performance,
interactive, full-screen program such as a word processor. They do not
support the rapid display of character strings at an arbitrary screen
position, and they do not implement graphics operations at the level
normally required by applications (for example, bit-block transfers and
rapid drawing of lines, circles, and filled polygons). And, of course,
they are of no use whatsoever in non-IBM display modes such as the
monochrome graphics mode of the Hercules Graphics Card.
Let's look at a simple example of a call to the ROM BIOS video driver. The
following sequence writes the string hello to the screen:
──────────────────────────────────────────────────────────────────────────
msg db 'hello'
msg_len equ $-msg
.
.
.
mov si,seg msg ; DS:SI = message address
mov ds,si
mov si,offset msg
mov cx,msg_len ; CX = message length
cld
next: lodsb ; get AL = next character
push si ; save message pointer
mov ah,0eh ; int 10h function 0eh = write
; character in teletype mode
mov bh,0 ; assume video page 0
mov bl,color ; (use in graphics modes only)
int 10h ; transfer to ROM BIOS
pop si ; restore message pointer
loop next ; loop until message done
.
.
.
──────────────────────────────────────────────────────────────────────────
(Note that the SI and DI registers are not necessarily preserved across a
call to a ROM BIOS video function.)
Memory-mapped Display Techniques
Display performance is best when an application program takes over
complete control of the video adapter and the refresh buffer. Because the
display is memory-mapped, the speed at which characters can be put on the
screen is limited only by the CPU's ability to copy bytes from one
location in memory to another. The trade-off for this performance is that
such programs are highly sensitive to hardware compatibility and do not
always function properly on "clones" or even on new models of IBM video
adapters.
Text Mode
Direct programming of the IBM PC─compatible video adapters in their text
display modes (sometimes also called alphanumeric display modes) is
straightforward. The character set is the same for all, and the cursor
home position──(x,y) = (0,0)──is defined to be the upper left corner of
the screen (Figure 6-3). The MDA uses 4 KB of memory starting at segment
B000H as a regen buffer, and the various adapters with both text and
graphics capabilities (CGA, EGA, MCGA, and VGA) use 16 KB of memory
starting at segment B800H. (See Figure 6-1.) In the latter case, the 16
KB is divided into "pages" that can be independently updated and
displayed.
(0,0)┌─────────────────────────────────┐(79,0)
│ │
│ │
│ │
│ │
│ │
│ │
│ │
(0,24)└─────────────────────────────────┘(79,24)
Figure 6-3. Cursor addressing for 80-by-25 text display modes (IBM ROM
BIOS modes 2, 3, and 7).
Each character-display position is allotted 2 bytes in the regen buffer.
The first byte (even address) contains the ASCII code of the character,
which is translated by a special hardware character generator into a
dot-matrix pattern for the screen. The second byte (odd address) is the
attribute byte. Several bit fields in this byte control such features as
blinking, intensity (highlighting), and reverse video, depending on the
adapter type and display mode (Figures 6-4 and 6-5). Figure 6-6 shows a
hex and ASCII dump of part of the video map for the MDA.
Display Background Foreground
──────────────────────────────────────────────────────────────────────────
No display (black) 000 000
No display (white)☼ 111 111
Underline 000 001
Normal video 000 111
Reverse video 111 000
──────────────────────────────────────────────────────────────────────────
Figure 6-4. Attribute byte for 80-by-25 monochrome text display mode on
the MDA, Hercules cards, EGA, and VGA (IBM ROM BIOS mode 7).
Value Color
──────────────────────────────────────────────────────────────────────────
0 Black
1 Blue
2 Green
3 Cyan
4 Red
5 Magenta
6 Brown
7 White
8 Gray
9 Light blue
10 Light green
11 Light cyan
12 Light red
13 Light magenta
14 Yellow
15 Intense white
──────────────────────────────────────────────────────────────────────────
Figure 6-5. Attribute byte for the 40-by-25 and 80-by-25 text display
modes on the CGA, EGA, MCGA, and VGA (IBM ROM BIOS modes 0─3). The table
of color values assumes default palette programming and that the B or I
bit controls intensity.
──────────────────────────────────────────────────────────────────────────
B000:0000 3e 07 73 07 65 07 6c 07 65 07 63 07 74 07 20 07
B000:0010 74 07 65 07 6d 07 70 07 20 07 20 07 20 07 20 07
B000:0020 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07
B000:0030 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07
B000:0040 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07
B000:0050 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07
B000:0060 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07
B000:0070 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07
B000:0080 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07
B000:0090 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07
──────────────────────────────────────────────────────────────────────────
Figure 6-6. Example dump of the first 160 bytes of the MDA's regen
buffer. These bytes correspond to the first visible line on the screen.
Note that ASCII character codes are stored in even bytes and their
respective character attributes in odd bytes; all the characters in this
example line have the attribute normal video.
You can calculate the memory offset of any character on the display as the
line number (y coordinate) times 80 characters per line times 2 bytes per
character, plus the column number (x coordinate) times 2 bytes per
character, plus (for the text/graphics adapters) the page number times the
size of the page (4 KB per page in 80-by-25 modes; 2 KB per page in
40-by-25 modes). In short, the formula for the offset of the
character-attribute pair for a given screen position (x,y) in 80-by-25
text modes is
offset = ((y * 50H + x) * 2) + (page * 1000H)
In 40-by-25 text modes, the formula is
offset = ((y * 50H + x) * 2) + (page * 0800H)
Of course, the segment register being used to address the video buffer
must be set appropriately, depending on the type of display adapter.
As a simple example, assume that the character to be displayed is in the
AL register, the desired attribute byte for the character is in the AH
register, the x coordinate (column) is in the BX register, and the y
coordinate (row) is in the CX register. The following code stores the
character and attribute byte into the MDA's video refresh buffer at the
proper location:
──────────────────────────────────────────────────────────────────────────
push ax ; save char and attribute
mov ax,160
mul cx ; DX:AX = Y * 160
shl bx,1 ; multiply X by 2
add bx,ax ; BX = (Y*160) + (X*2)
mov ax,0b000h ; ES = segment of monochrome
mov es,ax ; adapter refresh buffer
pop ax ; restore char and attribute
mov es:[bx],ax ; write them to video buffer
──────────────────────────────────────────────────────────────────────────
More frequently, we wish to move entire strings into the refresh buffer,
starting at a given coordinate. In the next example, assume that the DS:SI
registers point to the source string, the ES:DI registers point to the
starting position in the video buffer (calculated as shown in the previous
example), the AH register contains the attribute byte to be assigned to
every character in the string, and the CX register contains the length of
the string. The following code moves the entire string into the refresh
buffer:
──────────────────────────────────────────────────────────────────────────
xfer: lodsb ; fetch next character
stosw ; store char + attribute
loop xfer ; until all chars moved
──────────────────────────────────────────────────────────────────────────
Of course, the video drivers written for actual application programs must
take into account many additional factors, such as checking for special
control codes (linefeeds, carriage returns, tabs), line wrap, and
scrolling.
Programs that write characters directly to the CGA regen buffer in text
modes must deal with an additional complicating factor──they must examine
the video controller's status port and access the refresh buffer only
during the horizontal retrace or vertical retrace intervals. (A retrace
interval is the period when the electron beam that illuminates the screen
phosphors is being repositioned to the start of a new scan line.)
Otherwise, the contention for memory between the CPU and the video
controller is manifest as unsightly "snow" on the display. (If you are
writing programs for any of the other IBM PC─compatible video adapters,
such as the MDA, EGA, MCGA, or VGA, you can ignore the retrace intervals;
snow is not a problem with these video controllers.)
A program can detect the occurrence of a retrace interval by monitoring
certain bits in the video controller's status register. For example,
assume that the offset for the desired character position has been
calculated as in the preceding example and placed in the BX register, the
segment for the CGA's refresh buffer is in the ES register, and an ASCII
character code to be displayed is in the CL register. The following code
waits for the beginning of a new horizontal retrace interval and then
writes the character into the buffer:
──────────────────────────────────────────────────────────────────────────
mov dx,03dah ; DX = video controller's
; status port address
cli ; disable interrupts
; if retrace is already
; in progress, wait for
; it to end...
wait1: in al,dx ; read status port
and al,1 ; check if retrace bit on
jnz wait1 ; yes, wait
; wait for new retrace
; interval to start...
wait2: in al,dx ; read status port
and al,1 ; retrace bit on yet?
jz wait2 ; jump if not yet on
mov es:[bx],cl ; write character to
; the regen buffer
sti ; enable interrupts again
──────────────────────────────────────────────────────────────────────────
The first wait loop "synchronizes" the code to the beginning of a
horizontal retrace interval. If only the second wait loop were used (that
is, if a character were written when a retrace interval was already in
progress), the write would occasionally begin so close to the end of a
horizontal retrace "window" that it would partially miss the retrace,
resulting in scattered snow at the left edge of the display. Notice that
the code also disables interrupts during accesses to the video buffer, so
that service of a hardware interrupt won't disrupt the synchronization
process.
Because of the retrace-interval constraints just outlined, the rate at
which you can update the CGA in text modes is severely limited when the
updating is done one character at a time. You can obtain better results by
calculating all the relevant addresses and setting up the appropriate
registers, disabling the video controller by writing to register 3D8H,
moving the entire string to the buffer with a REP MOVSW operation, and
then reenabling the video controller. If the string is of reasonable
length, the user won't even notice a flicker in the display. Of course,
this procedure introduces additional hardware dependence into your code
because it requires much greater knowledge of the 6845 controller.
Luckily, snow is not a problem in CGA graphics modes.
Graphics Mode
Graphics-mode memory-mapped programming for IBM PC─compatible adapters is
considerably more complicated than text-mode programming. Each bit or
group of bits in the regen buffer corresponds to an addressable point, or
pixel, on the screen. The mapping of bits to pixels differs for each of
the available graphics modes, with their differences in resolution and
number of supported colors. The newer adapters (EGA, MCGA, and VGA) also
use the concept of bit planes, where bits of a pixel are segregated into
multiple banks of memory mapped at the same address; you must manipulate
these bit planes by a combination of memory-mapped I/O and port
addressing.
IBM-video-systems graphics programming is a subject large enough for a
book of its own, but we can use the 640-by-200, 2-color graphics display
mode of the CGA (which is also supported by all subsequent IBM
text/graphics adapters) to illustrate a few of the techniques involved.
This mode is simple to deal with because each pixel is represented by a
single bit. The pixels are assigned (x,y) coordinates in the range (0,0)
through (639,199), where x is the horizontal displacement, y is the
vertical displacement, and the home position (0,0) is the upper left
corner of the display. (See Figure 6-7.)
(0,0)┌─────────────────────────────────┐(639,0)
│ │
│ │
│ │
│ │
│ │
│ │
│ │
(0,199)└─────────────────────────────────┘(639,199)
Figure 6-7. Point addressing for 640-by-200, 2-color graphics modes on
the CGA, EGA, MCGA, and VGA (IBM ROM BIOS mode 6).
Each successive group of 80 bytes (640 bits) represents one horizontal
scan line. Within each byte, the bits map one-for-one onto pixels, with
the most significant bit corresponding to the leftmost displayed pixel of
a set of eight pixels and the least significant bit corresponding to the
rightmost displayed pixel of the set. The memory map is set up so that all
the even y coordinates are scanned as a set and all the odd y coordinates
are scanned as a set; this mapping is referred to as the memory interlace.
To find the regen buffer offset for a particular (x,y) coordinate, you
would use the following formula:
offset = ((y AND 1) * 2000H) + (y/2 * 50H) + (x/8)
The assembly-language implementation of this formula is as follows:
──────────────────────────────────────────────────────────────────────────
; assume AX = Y, BX = X
shr bx,1 ; divide X by 8
shr bx,1
shr bx,1
push ax ; save copy of Y
shr ax,1 ; find (Y/2) * 50h
mov cx,50h ; with product in DX:AX
mul cx
add bx,ax ; add product to X/8
pop ax ; add (Y AND 1) * 2000h
and ax,1
jz label1
add bx,2000h
label1: ; now BX = offset into
; video buffer
──────────────────────────────────────────────────────────────────────────
After calculating the correct byte address, you can use the following
formula to calculate the bit position for a given pixel coordinate:
bit = 7 - (x MOD 8)
where bit 7 is the most significant bit and bit 0 is the least significant
bit. It is easiest to build an 8-byte table, or array of bit masks, and
use the operation X AND 7 to extract the appropriate entry from the table:
(X AND 7) Bit mask (X AND 7) Bit mask
──────────────────────────────────────────────────────────────────────────
0 80H 4 08H
1 40H 5 04H
2 20H 6 02H
3 10H 7 01H
──────────────────────────────────────────────────────────────────────────
The assembly-language implementation of this second calculation is as
follows:
──────────────────────────────────────────────────────────────────────────
table db 80h ; X AND 7 = offset 0
db 40h ; X AND 7 = offset 1
db 20h ; X AND 7 = offset 2
db 10h ; X AND 7 = offset 3
db 08h ; X AND 7 = offset 4
db 04h ; X AND 7 = offset 5
db 02h ; X AND 7 = offset 6
db 01h ; X AND 7 = offset 7
.
.
.
; assume BX = X coordinate
and bx,7 ; isolate 0─7 offset
mov al,[bx+table]
; now AL = mask from table
.
.
.
──────────────────────────────────────────────────────────────────────────
The program can then use the mask, together with the byte offset
previously calculated, to set or clear the appropriate bit in the video
controller's regen buffer.
────────────────────────────────────────────────────────────────────────────
Chapter 7 Printer and Serial Port
MS-DOS supports printers, plotters, modems, and other hard-copy output or
communication devices with device drivers for parallel ports and serial
ports. Parallel ports are so named because they transfer a byte──8 bits──
in parallel to the destination device over eight separate physical paths
(plus additional status and handshaking signals). The serial port, on the
other hand, communicates with the CPU with bytes but sends data to or
receives data from its destination device serially──a bit at a time──over
a single physical connection.
Parallel ports are typically used for high-speed output devices, such as
line printers, over relatively short distances (less than 50 feet). They
are rarely used for devices that require two-way communication with the
computer. Serial ports are used for lower-speed devices, such as modems
and terminals, that require two-way communication (although some printers
also have serial interfaces). A serial port can drive its device reliably
over much greater distances (up to 1000 feet) over as few as three wires──
transmit, receive, and ground.
The most commonly used type of serial interface follows a standard called
RS-232. This standard specifies a 25-wire interface with certain
electrical characteristics, the use of various handshaking signals, and a
standard DB-25 connector. Other serial-interface standards exist──for
example, the RS-422, which is capable of considerably higher speeds than
the RS-232── but these are rarely used in personal computers (except for
the Apple Macintosh) at this time.
MS-DOS has built-in device drivers for three parallel adapters, and for
two serial adapters on the PC or PC/AT and three serial adapters on the
PS/2. The logical names for these devices are LPT1, LPT2, LPT3, COM1,
COM2, and COM3. The standard printer (PRN) and standard auxiliary (AUX)
devices are normally aliased to LPT1 and COM1, but you can redirect PRN to
one of the serial ports with the MS-DOS MODE command.
As with keyboard and video display I/O, you can manage printer and
serial-port I/O at several levels that offer different degrees of
flexibility and hardware independence:
■ MS-DOS handle-oriented functions
■ MS-DOS traditional character functions
■ IBM ROM BIOS driver functions
In the case of the serial port, direct control of the hardware by
application programs is also common. I will discuss each of these I/O
methods briefly, with examples, in the following pages.
Printer Output
The preferred method of printer output is to use the handle write function
(Int 21H Function 40H) with the predefined standard printer handle (4).
For example, you could write the string hello to the printer as follows:
──────────────────────────────────────────────────────────────────────────
msg db 'hello' ; message for printer
msg_len equ $-msg ; length of message
.
.
.
mov ah,40h ; function 40h = write file or device
mov bx,4 ; BX = standard printer handle
mov cx,msg_len ; CX = length of string
mov dx,seg msg ; DS:DX = string address
mov ds,dx
mov dx,offset msg
int 21h ; transfer to MS-DOS
jc error ; jump if error
.
.
.
──────────────────────────────────────────────────────────────────────────
If there is no error, the function returns the carry flag cleared and the
number of characters actually transferred to the list device in register
AX. Under normal circumstances, this number should always be the same as
the length requested and the carry flag indicating an error should never
be set. However, the output will terminate early if your data contains an
end-of-file mark (Ctrl-Z).
You can write independently to several list devices (for example, LPT1,
LPT2) by issuing a specific open request (Int 21H Function 3DH) for each
device and using the handles returned to access the printers individually
with Int 21H Function 40H. You have already seen this general approach in
Chapters 5 and 6.
An alternative method of printer output is to use the traditional Int 21H
Function 05H, which transfers the character in the DL register to the
printer. (This function is sensitive to Ctrl-C interrupts.) For example,
the assembly-language code sequence at the top of the following page would
write the the string hello to the line printer.
──────────────────────────────────────────────────────────────────────────
msg db 'hello' ; message for printer
msg_len equ $-msg ; length of message
.
.
.
mov bx,seg msg ; DS:BX = string address
mov ds,bx
mov bx,offset msg
mov cx,msg_len ; CX = string length
next: mov dl,[bx] ; get next character
mov ah,5 ; function 05h = printer output
int 21h ; transfer to MS-DOS
inc bx ; bump string pointer
loop next ; loop until string done
.
.
.
──────────────────────────────────────────────────────────────────────────
Programs that run on IBM PC─compatible machines can obtain improved
printer throughput by bypassing MS-DOS and calling the ROM BIOS printer
driver directly by means of Int 17H. Section III of this book, "IBM ROM
BIOS and Mouse Functions Reference," documents the Int 17H functions in
detail. Use of the ROM BIOS functions also allows your program to test
whether the printer is off line or out of paper, a capability that MS-DOS
does not offer.
For example, the following sequence of instructions calls the ROM BIOS
printer driver to send the string hello to the line printer:
──────────────────────────────────────────────────────────────────────────
msg db 'hello' ; message for printer
msg_len equ $-msg ; length of message
.
.
.
mov bx,seg msg ; DS:BX = string address
mov ds,bx
mov bx,offset msg
mov cx,msg_len ; CX = string length
mov dx,0 ; DX = printer number
next: mov al,[bx] ; AL = character to print
mov ah,0 ; function 00h = printer output
int 17h ; transfer to ROM BIOS
inc bx ; bump string pointer
loop next ; loop until string done
.
.
.
──────────────────────────────────────────────────────────────────────────
Note that the printer numbers used by the ROM BIOS are zero-based, whereas
the printer numbers in MS-DOS logical-device names are one-based. For
example, ROM BIOS printer 0 corresponds to LPT1.
Finally, the most hardware-dependent technique of printer output is to
access the printer controller directly. Considering the functionality
already provided in MS-DOS and the IBM ROM BIOS, as well as the speeds of
the devices involved, I cannot see any justification for using direct
hardware control in this case. The disadvantage of introducing such
extreme hardware dependence for such a low-speed device would far outweigh
any small performance gains that might be obtained.
The Serial Port
MS-DOS support for serial ports (often referred to as the auxiliary device
in MS-DOS manuals) is weak compared with its keyboard, video-display, and
printer support. This is one area where the application programmer is
justified in making programs hardware dependent to extract adequate
performance.
Programs that restrict themselves to MS-DOS functions to ensure
portability can use the handle read and write functions (Int 21H Functions
3FH and 40H), with the predefined standard auxiliary handle (3) to
access the serial port. For example, the following code writes the string
hello to the serial port that is currently defined as the AUX device:
──────────────────────────────────────────────────────────────────────────
msg db 'hello' ; message for serial port
msg_len equ $-msg ; length of message
.
.
.
mov ah,40h ; function 40h = write file or device
mov bx,3 ; BX = standard aux handle
mov cx,msg_len ; CX = string length
mov dx,seg msg ; DS:DX = string address
mov ds,dx
mov dx,offset msg
int 21h ; transfer to MS-DOS
jc error ; jump if error
.
.
.
──────────────────────────────────────────────────────────────────────────
The standard auxiliary handle gives access to only the first serial port
(COM1). If you want to read or write COM2 and COM3 using the handle calls,
you must issue an open request (Int 21H Function 3DH) for the desired
serial port and use the handle returned by that function with Int 21H
Functions 3FH and 40H.
Some versions of MS-DOS have a bug in character-device handling that
manifests itself as follows: If you issue a read request with Int 21H
Function 3FH for the exact number of characters that are waiting in the
driver's buffer, the length returned in the AX register is the number of
characters transferred minus one. You can circumvent this problem by
always requesting more characters than you expect to receive or by placing
the device handle into binary mode using Int 21H Function 44H.
MS-DOS also supports two traditional functions for serial-port I/O. Int
21H Function 03H inputs a character from COM1 and returns it in the AL
register; Int 21H Function 04H transmits the character in the DL register
to COM1. Like the other traditional calls, these two are direct
descendants of the CP/M auxiliary-device functions.
For example, the following code sends the string hello to COM1 using the
traditional Int 21H Function 04H:
──────────────────────────────────────────────────────────────────────────
msg db 'hello' ; message for serial port
msg_len equ $-msg ; length of message
.
.
.
mov bx,seg msg ; DS:BX = string address
mov ds,bx
mov bx,offset msg
mov cx,msg_len ; CX = length of string
mov dl,[bx] ; get next character
mov ah,4 ; function 04h = aux output
int 21h ; transfer to MS-DOS
inc bx ; bump pointer to string
loop next ; loop until string done
.
.
.
──────────────────────────────────────────────────────────────────────────
MS-DOS translates the traditional auxiliary-device functions into calls on
the same device driver used by the handle calls. Therefore, it is
generally preferable to use the handle functions in the first place,
because they allow very long strings to be read or written in one
operation, they give access to serial ports other than COM1, and they are
symmetrical with the handle video-display, keyboard, printer, and file I/O
methods described elsewhere in this book.
Although the handle or traditional serial-port functions allow you to
write programs that are portable to any machine running MS-DOS, they have
a number of disadvantages:
■ The built-in MS-DOS serial-port driver is slow and is not interrupt
driven.
■ MS-DOS serial-port I/O is not buffered.
■ Determining the status of the auxiliary device requires a separate call
to the IOCTL function (Int 21H Function 44H)──if you request input and
no characters are ready, your program will simply hang.
■ MS-DOS offers no standardized function to configure the serial port
from within a program.
For programs that are going to run on the IBM PC or compatibles, a more
flexible technique for serial-port I/O is to call the IBM ROM BIOS
serial-port driver by means of Int 14H. You can use this driver to
initialize the serial port to a desired configuration and baud rate,
examine the status of the controller, and read or write characters.
Section III of this book, "IBM ROM BIOS and Mouse Functions Reference,"
documents the functions available from the ROM BIOS serial-port driver.
For example, the following sequence sends the character X to the first
serial port (COM1):
──────────────────────────────────────────────────────────────────────────
.
.
.
mov ah,1 ; function 01h = send character
mov al,'X' ; AL = character to transmit
mov dx,0 ; DX = serial-port number
int 14h ; transfer to ROM BIOS
and ah,80h ; did transmit fail?
jnz error ; jump if transmit error
.
.
.
──────────────────────────────────────────────────────────────────────────
As with the ROM BIOS printer driver, the serial-port numbers used by the
ROM BIOS are zero-based, whereas the serial-port numbers in MS-DOS
logical-device names are one-based. In this example, serial port 0
corresponds to COM1.
Unfortunately, like the MS-DOS auxiliary-device driver, the ROM BIOS
serial-port driver is not interrupt driven. Although it will support
higher transfer speeds than the MS-DOS functions, at rates greater than
2400 baud it may still lose characters. Consequently, most programmers
writing high-performance applications that use a serial port (such as
telecommunications programs) take complete control of the serial-port
controller and provide their own interrupt driver. The built-in functions
provided by MS-DOS, and by the ROM BIOS in the case of the IBM PC, are
simply not adequate.
Writing such programs requires a good understanding of the hardware. In
the case of the IBM PC, the chips to study are the INS8250 Asynchronous
Communications Controller and the Intel 8259A Programmable Interrupt
Controller. The IBM technical reference documentation for these chips is a
bit disorganized, but most of the necessary information is there if you
look for it.
The TALK Program
The simple terminal-emulator program TALK.ASM (Figure 7-1) is an example
of a useful program that performs screen, keyboard, and serial-port I/O.
This program recapitulates all of the topics discussed in Chapters 5
through 7. TALK uses the IBM PC's ROM BIOS video driver to put characters
on the screen, to clear the display, and to position the cursor; it uses
the MS-DOS character-input calls to read the keyboard; and it contains its
own interrupt driver for the serial-port controller.
──────────────────────────────────────────────────────────────────────────
name talk
page 55,132
.lfcond ; List false conditionals too
title TALK--Simple terminal emulator
;
; TALK.ASM--Simple IBM PC terminal emulator
;
; Copyright (c) 1988 Ray Duncan
;
; To assemble and link this program into TALK.EXE:
;
; C>MASM TALK;
; C>LINK TALK;
;
stdin equ 0 ; standard input handle
stdout equ 1 ; standard output handle
stderr equ 2 ; standard error handle
cr equ 0dh ; ASCII carriage return
lf equ 0ah ; ASCII linefeed
bsp equ 08h ; ASCII backspace
escape equ 1bh ; ASCII escape code
dattr equ 07h ; display attribute to use
; while in emulation mode
bufsiz equ 4096 ; size of serial-port buffer
echo equ 0 ; 0 = full-duplex, -1 = half-duplex
equ -1
false equ 0
com1 equ true ; use COM1 if nonzero
com2 equ not com1 ; use COM2 if nonzero
pic_mask equ 21h ; 8259 interrupt mask port
pic_eoi equ 20h ; 8259 EOI port
if com1
com_data equ 03f8h ; port assignments for COM1
com_ier equ 03f9h
com_mcr equ 03fch
com_sts equ 03fdh
com_int equ 0ch ; COM1 interrupt number
int_mask equ 10h ; IRQ4 mask for 8259
endif
if com2
com_data equ 02f8h ; port assignments for COM2
com_ier equ 02f9h
com_mcr equ 02fch
com_sts equ 02fdh
com_int equ 0bh ; COM2 interrupt number
int_mask equ 08h ; IRQ3 mask for 8259
endif
_TEXT segment word public 'CODE'
assume cs:_TEXT,ds:_DATA,es:_DATA,ss:STACK
talk proc far ; entry point from MS-DOS
mov ax,_DATA ; make data segment addressable
mov ds,ax
mov es,ax
; initialize display for
; terminal emulator mode...
mov ah,15 ; get display width and
int 10h ; current display mode
dec ah ; save display width for use
mov columns,ah ; by the screen-clear routine
cmp al,7 ; enforce text display mode
je talk2 ; mode 7 ok, proceed
cmp al,3
jbe talk2 ; modes 0-3 ok, proceed
mov dx,offset msg1
mov cx,msg1_len
jmp talk6 ; print error message and exit
talk2: mov bh,dattr ; clear screen and home cursor
call cls
call asc_enb ; capture serial-port interrupt
; vector and enable interrupts
mov dx,offset msg2 ; display message
mov cx,msg2_len ; 'terminal emulator running'
mov bx,stdout ; BX = standard output handle
mov ah,40h ; function 40h = write file or device
int 21h ; transfer to MS-DOS
talk3: call pc_stat ; keyboard character waiting?
jz talk4 ; nothing waiting, jump
call pc_in ; read keyboard character
cmp al,0 ; is it a function key?
jne talk32 ; not function key, jump
call pc_in ; function key, discard 2nd
; character of sequence
jmp talk5 ; then terminate program
talk32: ; keyboard character received
if echo
push ax ; if half-duplex, echo
call pc_out ; character to PC display
pop ax
endif
call com_out ; write char to serial port
talk4: call com_stat ; serial-port character waiting?
jz talk3 ; nothing waiting, jump
call com_in ; read serial-port character
cmp al,20h ; is it control code?
jae talk45 ; jump if not
call ctrl_code ; control code, process it
jmp talk3 ; check keyboard again
talk45: ; noncontrol char received,
call pc_out ; write it to PC display
jmp talk4 ; see if any more waiting
talk5: ; function key detected,
; prepare to terminate...
mov bh,07h ; clear screen and home cursor
call cls
mov dx,offset msg3 ; display farewell message
mov cx,msg3_len
talk6: push dx ; save message address
push cx ; and message length
call asc_dsb ; disable serial-port interrupts
; and release interrupt vector
pop cx ; restore message length
pop dx ; and address
mov bx,stdout ; handle for standard output
mov ah,40h ; function 40h = write device
int 21h ; transfer to MS-DOS
mov ax,4c00h ; terminate program with
int 21h ; return code = 0
talk endp
com_stat proc near ; check asynch status; returns
; Z = false if character ready
; Z = true if nothing waiting
push ax
mov ax,asc_in ; compare ring buffer pointers
cmp ax,asc_out
pop ax
ret ; return to caller
stat endp
com_in proc near ; get character from serial-
; port buffer; returns
; new character in AL
push bx ; save register BX
com_in1: ; if no char waiting, wait
mov bx,asc_out ; until one is received
cmp bx,asc_in
je com_in1 ; jump, nothing waiting
mov al,[bx+asc_buf] ; character is ready,
; extract it from buffer
inc bx ; update buffer pointer
cmp bx,bufsiz
jne com_in2
xor bx,bx ; reset pointer if wrapped
com_in2:
mov asc_out,bx ; store updated pointer
pop bx ; restore register BX
ret ; and return to caller
com_in endp
com_out proc near ; write character in AL
; to serial port
push dx ; save register DX
push ax ; save character to send
mov dx,com_sts ; DX = status port address
com_out1: ; check if transmit buffer
in al,dx ; is empty (TBE bit = set)
and al,20h
jz com_out1 ; no, must wait
pop ax ; get character to send
mov dx,com_data ; DX = data port address
out dx,al ; transmit the character
pop dx ; restore register DX
ret ; and return to caller
com_out endp
pc_stat proc near ; read keyboard status; returns
; Z = false if character ready
; Z = true if nothing waiting
; register DX destroyed
mov al,in_flag ; if character already
or al,al ; waiting, return status
jnz pc_stat1
mov ah,6 ; otherwise call MS-DOS to
mov dl,0ffh ; determine keyboard status
int 21h
jz pc_stat1 ; jump if no key ready
mov in_char,al ; got key, save it for
mov in_flag,0ffh ; "pc_in" routine
pc_stat1: ; return to caller with
ret ; Z flag set appropriately
pc_stat endp
pc_in proc near ; read keyboard character,
; return it in AL
; DX may be destroyed
mov al,in_flag ; key already waiting?
or al,al
jnz pc_in1 ; yes, return it to caller
call pc_stat ; try to read a character
jmp pc_in
pc_in1: mov in_flag,0 ; clear char-waiting flag
mov al,in_char ; and return AL = character
ret
pc_in endp
pc_out proc near ; write character in AL
; to the PC's display
mov ah,0eh ; ROM BIOS function 0eh =
; "teletype output"
push bx ; save register BX
xor bx,bx ; assume page 0
int 10h ; transfer to ROM BIOS
pop bx ; restore register BX
ret ; and return to caller
pc_out endp
cls proc near ; clear display using
; char attribute in BH
; registers AX, CX,
; and DX destroyed
mov dl,columns ; set DL,DH = X,Y of
mov dh,24 ; lower right corner
mov cx,0 ; set CL,CH = X,Y of
; upper left corner
mov ax,600h ; ROM BIOS function 06h =
; "scroll or initialize
; window"
int 10h ; transfer to ROM BIOS
call home ; set cursor at (0,0)
ret ; and return to caller
cls endp
clreol proc near ; clear from cursor to end
; of line using attribute
; in BH, registers AX, CX,
; and DX destroyed
call getxy ; get current cursor position
mov cx,dx ; current position = "upper
; left corner" of window;
mov dl,columns ; "lower right corner" X is
; max columns, Y is same
; as upper left corner
mov ax,600h ; ROM BIOS function 06h =
; "scroll or initialize
; window"
int 10h ; transfer to ROM BIOS
ret ; return to caller
clreol endp
home proc near ; put cursor at home position
mov dx,0 ; set (X,Y) = (0,0)
call gotoxy ; position the cursor
ret ; return to caller
home endp
gotoxy proc near ; position the cursor
; call with DL = X, DH = Y
push bx ; save registers
push ax
mov bh,0 ; assume page 0
mov ah,2 ; ROM BIOS function 02h =
; set cursor position
int 10h ; transfer to ROM BIOS
pop ax ; restore registers
pop bx
ret ; and return to caller
gotoxy endp
getxy proc near ; get cursor position,
; returns DL = X, DH = Y
push ax ; save registers
push bx
push cx
mov ah,3 ; ROM BIOS function 03h =
; get cursor position
mov bh,0 ; assume page 0
int 10h ; transfer to ROM BIOS
pop cx ; restore registers
pop bx
pop ax
ret ; and return to caller
getxy endp
ctrl_code proc near ; process control code
; call with AL = char
cmp al,cr ; if carriage return
je ctrl8 ; just send it
cmp al,lf ; if linefeed
je ctrl8 ; just send it
cmp al,bsp ; if backspace
je ctrl8 ; just send it
cmp al,26 ; is it cls control code?
jne ctrl7 ; no, jump
mov bh,dattr ; cls control code, clear
call cls ; screen and home cursor
jmp ctrl9
ctrl7:
cmp al,escape ; is it Escape character?
jne ctrl9 ; no, throw it away
call esc_seq ; yes, emulate CRT terminal
jmp ctrl9
ctrl8: call pc_out ; send CR, LF, or backspace
; to the display
ctrl9: ret ; return to caller
ctrl_code endp
esc_seq proc near ; decode Televideo 950 escape
; sequence for screen control
call com_in ; get next character
cmp al,84 ; is it clear to end of line?
jne esc_seq1 ; no, jump
mov bh,dattr ; yes, clear to end of line
call clreol
jmp esc_seq2 ; then exit
esc_seq1:
cmp al,61 ; is it cursor positioning?
jne esc_seq2 ; no jump
call com_in ; yes, get Y parameter
sub al,33 ; and remove offset
mov dh,al
call com_in ; get X parameter
sub al,33 ; and remove offset
mov dl,al
call gotoxy ; position the cursor
esc_seq2: ; return to caller
ret
esc_seq endp
asc_enb proc near ; capture serial-port interrupt
; vector and enable interrupt
; save address of previous
; interrupt handler...
mov ax,3500h+com_int ; function 35h = get vector
int 21h ; transfer to MS-DOS
mov word ptr oldvec+2,es
mov word ptr oldvec,bx
; now install our handler...
push ds ; save our data segment
mov ax,cs ; set DS:DX = address
mov ds,ax ; of our interrupt handler
mov dx,offset asc_int
mov ax,2500h+com_int ; function 25h = set vector
int 21h ; transfer to MS-DOS
pop ds ; restore data segment
mov dx,com_mcr ; set modem-control register
mov al,0bh ; DTR and OUT2 bits
out dx,al
mov dx,com_ier ; set interrupt-enable
mov al,1 ; register on serial-
out dx,al ; port controller
in al,pic_mask ; read current 8259 mask
and al,not int_mask ; set mask for COM port
out pic_mask,al ; write new 8259 mask
ret ; back to caller
asc_enb endp
asc_dsb proc near ; disable interrupt and
; release interrupt vector
in al,pic_mask ; read current 8259 mask
or al,int_mask ; reset mask for COM port
out pic_mask,al ; write new 8259 mask
push ds ; save our data segment
lds dx,oldvec ; load address of
; previous interrupt handler
mov ax,2500h+com_int ; function 25h = set vector
int 21h ; transfer to MS-DOS
pop ds ; restore data segment
ret ; back to caller
asc_dsb endp
asc_int proc far ; interrupt service routine
; for serial port
sti ; turn interrupts back on
push ax ; save registers
push bx
push dx
push ds
mov ax,_DATA ; make our data segment
mov ds,ax ; addressable
cli ; clear interrupts for
; pointer manipulation
mov dx,com_data ; DX = data port address
in al,dx ; read this character
mov bx,asc_in ; get buffer pointer
mov [asc_buf+bx],al ; store this character
inc bx ; bump pointer
cmp bx,bufsiz ; time for wrap?
jne asc_int1 ; no, jump
xor bx,bx ; yes, reset pointer
asc_int1: ; store updated pointer
mov asc_in,bx
sti ; turn interrupts back on
mov al,20h ; send EOI to 8259
out pic_eoi,al
pop ds ; restore all registers
pop dx
pop bx
pop ax
iret ; return from interrupt
asc_int endp
_TEXT ends
_DATA segment word public 'DATA'
in_char db 0 ; PC keyboard input char
in_flag db 0 ; <>0 if char waiting
columns db 0 ; highest numbered column in
; current display mode (39 or 79)
msg1 db cr,lf
db 'Display must be text mode.'
db cr,lf
msg1_len equ $-msg1
msg2 db 'Terminal emulator running...'
db cr,lf
msg2_len equ $-msg2
msg3 db 'Exit from terminal emulator.'
db cr,lf
msg3_len equ $-msg3
oldvec dd 0 ; original contents of serial-
; port interrupt vector
asc_in dw 0 ; input pointer to ring buffer
asc_out dw 0 ; output pointer to ring buffer
asc_buf db bufsiz dup (?) ; communications buffer
_DATA ends
STACK segment para stack 'STACK'
db 128 dup (?)
STACK ends
end talk ; defines entry point
──────────────────────────────────────────────────────────────────────────
Figure 7-1. TALK.ASM: A simple terminal-emulator program for IBM
PC─compatible computers. This program demonstrates use of the MS-DOS and
ROM BIOS video and keyboard functions and direct control of the
serial-communications adapter.
The TALK program illustrates the methods that an application should use to
take over and service interrupts from the serial port without running
afoul of MS-DOS conventions.
The program begins with some equates and conditional assembly statements
that configure the program for half- or full-duplex and for the desired
serial port (COM1 or COM2). At entry from MS-DOS, the main routine of the
program──the procedure named talk──checks the status of the serial port,
initializes the display, and calls the asc_enb routine to take over the
serial-port interrupt vector and enable interrupts. The talk procedure
then enters a loop that reads the keyboard and sends the characters out
the serial port and then reads the serial port and puts the characters on
the display──in other words, it causes the PC to emulate a simple CRT
terminal.
The TALK program intercepts and handles control codes (carriage return,
linefeed, and so forth) appropriately. It detects escape sequences and
handles them as a subset of the Televideo 950 terminal capabilities. (You
can easily modify the program to emulate any other cursor-addressable
terminal.) When one of the PC's special function keys is pressed, the
program disables serial-port interrupts, releases the serial-port
interrupt vector, and exits back to MS-DOS.
There are several TALK program procedures that are worth your attention
because they can easily be incorporated into other programs. These are
listed in the table on the following page.
╓┌─┌──────────────────┌──────────────────────────────────────────────────────╖
Procedure Action
──────────────────────────────────────────────────────────────────────────
asc_enb Takes over the serial-port interrupt vector and enables
interrupts by writing to the modem-control register of
the INS8250 and the interrupt-mask register of the
8259A.
asc_dsb Restores the original state of the serial-port
interrupt vector and disables interrupts by writing to
the interrupt-mask register of the 8259A.
asc_int Services serial-port interrupts, placing received
characters into a ring buffer.
com_stat Tests whether characters from the serial port are
waiting in the ring buffer.
com_in Removes characters from the interrupt handler's ring
buffer and increments the buffer pointers
appropriately.
Procedure Action
──────────────────────────────────────────────────────────────────────────
appropriately.
com_out Sends one character to the serial port.
cls Calls the ROM BIOS video driver to clear the screen.
clreol Calls the ROM BIOS video driver to clear from the
current cursor position to the end of the line.
home Places the cursor in the upper left corner of the
screen.
gotoxy Positions the cursor at the desired position on the
display.
getxy Obtains the current cursor position.
pc_out Sends one character to the PC's display.
Procedure Action
──────────────────────────────────────────────────────────────────────────
pc_stat Gets status for the PC's keyboard.
pc_in Returns a character from the PC's keyboard.
──────────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────
Chapter 8 File Management
The dual heritage of MS-DOS──CP/M and UNIX/XENIX──is perhaps most clearly
demonstrated in its file-management services. In general, MS-DOS provides
at least two distinct operating-system calls for each major file or record
operation. This chapter breaks this overlapping battery of functions into
two groups and explains the usage, advantages, and disadvantages of each.
I will refer to the set of file and record functions that are compatible
with CP/M as FCB functions. These functions rely on a data structure
called a file control block (hence, FCB) to maintain certain bookkeeping
information about open files. This structure resides in the application
program's memory space. The FCB functions allow the programmer to create,
open, close, and delete files and to read or write records of any size at
any record position within such files. These functions do not support the
hierarchical (treelike) file structure that was first introduced in MS-DOS
version 2.0, so they can be used only to access files in the current
subdirectory for a given disk drive.
I will refer to the set of file and record functions that provide
compatibility with UNIX/XENIX as the handle functions. These functions
allow the programmer to open or create files by passing MS-DOS a
null-terminated string that describes the file's location in the
hierarchical file structure (the drive and path), the file's name, and its
extension. If the open or create operation is successful, MS-DOS returns a
16-bit token, or handle, that is saved by the application program and used
to specify the file in subsequent operations.
When you use the handle functions, the operating system maintains the data
structures that contain bookkeeping information about the file inside its
own memory space, and these structures are not accessible to the
application program. The handle functions fully support the hierarchical
file structure, allowing the programmer to create, open, close, and delete
files in any subdirectory on any disk drive and to read or write records
of any size at any byte offset within such files.
Although we are discussing the FCB functions first in this chapter for
historical reasons, new MS-DOS applications should always be written using
the more powerful handle functions. Use of the FCB functions in new
programs should be avoided, unless compatibility with MS-DOS version 1.0
is needed.
Using the FCB Functions
Understanding the structure of the file control block is the key to
success with the FCB family of file and record functions. An FCB is a
37-byte data structure allocated within the application program's memory
space; it is divided into many fields (Figure 8-1). Typically, the
program initializes an FCB with a drive code, a filename, and an extension
(conveniently accomplished with the parse-filename service, Int 21H
Function 29H) and then passes the address of the FCB to MS-DOS to open or
create the file. If the file is successfully opened or created, MS-DOS
fills in certain fields of the FCB with information from the file's entry
in the disk directory. This information includes the file's exact size in
bytes and the date and time the file was created or last updated. MS-DOS
also places certain other information within a reserved area of the FCB;
however, this area is used by the operating system for its own purposes
and varies among different versions of MS-DOS. Application programs should
never modify the reserved area.
For compatibility with CP/M, MS-DOS automatically sets the record-size
field of the FCB to 128 bytes. If the program does not want to use this
default record size, it must place the desired size (in bytes) into the
record-size field after the open or create operation. Subsequently, when
the program needs to read or write records from the file, it must pass the
address of the FCB to MS-DOS; MS-DOS, in turn, keeps the FCB updated with
information about the current position of the file pointer and the size of
the file. Data is always read to or written from the current disk transfer
area (DTA), whose address is set with Int 21H Function 1AH. If the
application program wants to perform random record access, it must set the
record number into the FCB before issuing each function call; when
sequential record access is being used, MS-DOS maintains the FCB and no
special intervention is needed from the application.
Byte offset
00H ┌───────────────────────────────────────────────────────┐
│ Drive identification │ Note 1
01H ├───────────────────────────────────────────────────────┤
│ Filename (8 characters) │ Note 2
09H ├───────────────────────────────────────────────────────┤
│ Extension (3 characters) │ Note 2
0CH ├───────────────────────────────────────────────────────┤
│ Current block number │ Note 9
0EH ├───────────────────────────────────────────────────────┤
│ Record size │ Note 10
10H ├───────────────────────────────────────────────────────┤
│ File size (4 bytes) │ Notes 3, 6
14H ├───────────────────────────────────────────────────────┤
│ Date created/updated │ Note 7
16H ├───────────────────────────────────────────────────────┤
│ Time created/updated │ Note 8
18H ├───────────────────────────────────────────────────────┤
│ Reserved │
20H ├───────────────────────────────────────────────────────┤
│ Current-record number │ Note 9
21H ├───────────────────────────────────────────────────────┤
│ Relative-record number (4 bytes) │ Note 5
└───────────────────────────────────────────────────────┘
Figure 8-1. Normal file control block. Total length is 37 bytes (25H
bytes). See notes on pages 133─34.
In general, MS-DOS functions that use FCBs accept the full address of the
FCB in the DS:DX register and pass back a return code in the AL register
(Figure 8-2). For file-management calls (open, close, create, and
delete), this return code is zero if the function was successful and 0FFH
(255) if the function failed. For the FCB-type record read and write
functions, the success code returned in the AL register is again zero, but
there are several failure codes. Under MS-DOS version 3.0 or later, more
detailed error reporting can be obtained by calling Int 21H Function 59H
(Get Extended Error Information) after a failed FCB function call.
When a program is loaded under MS-DOS, the operating system sets up two
FCBs in the program segment prefix, at offsets 005CH and 006CH. These are
often referred to as the default FCBs, and they are included to provide
upward compatibility from CP/M. MS-DOS parses the first two parameters in
the command line that invokes the program (excluding any redirection
directives) into the default FCBs, under the assumption that they may be
file specifications. The application must determine whether they really
are filenames or not. In addition, because the default FCBs overlap and
are not in a particularly convenient location (especially for .EXE
programs), they usually must be copied elsewhere in order to be used
safely. (See Chapter 3.)
──────────────────────────────────────────────────────────────────────────
; filename was previously
; parsed into "my_fcb"
mov dx,seg my_fcb ; DS:DX = address of
mov ds,dx ; file control block
mov dx,offset my_fcb
mov ah,0fh ; function 0fh = open
int 21h
or al,al ; was open successful?
jnz error ; no, jump to error routine
.
.
.
my_fcb db 37 dup (0) ; file control block
──────────────────────────────────────────────────────────────────────────
Figure 8-2. A typical FCB file operation. This sequence of code attempts
to open the file whose name was previously parsed into the FCB named
my_fcb.
Note that the structures of FCBs under CP/M and MS-DOS are not identical.
However, the differences lie chiefly in the reserved areas of the FCBs
(which should not be manipulated by application programs in any case), so
well-behaved CP/M applications should be relatively easy to port into
MS-DOS. It seems, however, that few such applications exist. Many of the
tricks that were played by clever CP/M programmers to increase performance
or circumvent the limitations of that operating system can cause severe
problems under MS-DOS, particularly in networking environments. At any
rate, much better performance can be achieved by thoroughly rewriting the
CP/M applications to take advantage of the superior capabilities of
MS-DOS.
You can use a special FCB variant called an extended file control block to
create or access files with special attributes (such as hidden or
read-only files), volume labels, and subdirectories. An extended FCB has a
7-byte header followed by the 37-byte structure of a normal FCB (Figure
8-3). The first byte contains 0FFH, which could never be a legal drive
code and thus indicates to MS-DOS that an extended FCB is being used. The
next 5 bytes are reserved and are unused in current versions of MS-DOS.
The seventh byte contains the attribute of the special file type that is
being accessed. (Attribute bytes are discussed in more detail in Chapter
9.) Any MS-DOS function that uses a normal FCB can also use an extended
FCB.
The FCB file- and record-management functions may be gathered into the
following broad classifications:
Byte
offset
00H ┌───────────────────────────────────────────────────────┐
│ 0FFH │ Note 11
01H ├───────────────────────────────────────────────────────┤
│ Reserved (5 bytes, must be zero) │
06H ├───────────────────────────────────────────────────────┤
│ Attribute byte │ Note 12
07H ├───────────────────────────────────────────────────────┤
│ Drive identification │ Note 1
08H ├───────────────────────────────────────────────────────┤
│ Filename (8 characters) │ Note 2
10H ├───────────────────────────────────────────────────────┤
│ Extension (3 characters) │ Note 2
13H ├───────────────────────────────────────────────────────┤
│ Current-block number │ Note 9
15H ├───────────────────────────────────────────────────────┤
│ Record size │ Note 10
17H ├───────────────────────────────────────────────────────┤
│ File size (4 bytes) │ Notes 3, 6
1BH ├───────────────────────────────────────────────────────┤
│ Date created/updated │ Note 7
1DH ├───────────────────────────────────────────────────────┤
│ Time created/updated │ Note 8
1FH ├───────────────────────────────────────────────────────┤
│ Reserved │
27H ├───────────────────────────────────────────────────────┤
│ Current-record number │ Note 9
28H ├───────────────────────────────────────────────────────┤
│ Relative-record number (4 bytes) │ Note 5
└───────────────────────────────────────────────────────┘
Figure 8-3. Extended file control block. Total length is 44 bytes (2CH
bytes). See notes on pages 133─34.
╓┌─┌────────────────────────┌────────────────────────────────────────────────╖
Function Action
──────────────────────────────────────────────────────────────────────────
Common FCB file operations
0FH Open file.
10H Close file.
16H Create file.
Common FCB record operations
14H Perform sequential read.
15H Perform sequential write.
Function Action
──────────────────────────────────────────────────────────────────────────
15H Perform sequential write.
21H Perform random read.
22H Perform random write.
27H Perform random block read.
28H Perform random block write.
Other vital FCB operations
1AH Set disk transfer address.
29H Parse filename.
Less commonly used FCB file operations
13H Delete file.
17H Rename file.
Less commonly used FCB record operations
23H Obtain file size.
24H Set relative-record number.
──────────────────────────────────────────────────────────────────────────
Function Action
──────────────────────────────────────────────────────────────────────────
Several of these functions have special properties. For example, Int 21H
Functions 27H (Random Block Read) and 28H (Random Block Write) allow
reading and writing of multiple records of any size and also update the
random-record field automatically (unlike Int 21H Functions 21H and
22H). Int 21H Function 28H can truncate a file to any desired size, and
Int 21H Function 17H used with an extended FCB can alter a volume label
or rename a subdirectory.
Section 2 of this book, "MS-DOS Functions Reference," gives detailed
specifications for each of the FCB file and record functions, along with
assembly-language examples. It is also instructive to compare the
preceding groups with the corresponding groups of handle-type functions
listed on pages 140─41.
──────────────────────────────────────────────────────────────────────────
Notes for Figures 8-1 and 8-3
1. The drive identification is a binary number: 00=default drive,
01=drive A:, 02=drive B:, and so on. If the application program
supplies the drive code as zero (default drive), MS-DOS fills in the
code for the actual current disk drive after a successful open or
create call.
2. File and extension names must be left justified and padded with
blanks.
3. The file size, date, time, and reserved fields should not be
modified by applications.
4. All word fields are stored with the least significant byte at the
lower address.
5. The relative-record field is treated as 4 bytes if the record size
is less than 64 bytes; otherwise, only the first 3 bytes of this
field are used.
6. The file-size field is in the same format as in the directory, with
the less significant word at the lower address.
7. The date field is mapped as in the directory. Viewed as a 16-bit
word (as it would appear in a register), the field is broken down as
follows:
F E D C B A 9 8 7 6 5 4 3 2 1 0
┌─────────────────────┬─────────────────────┬─────────────────────┐
│ Year │ Month │ Day │
└─────────────────────┴─────────────────────┴─────────────────────┘
Bits Contents
────────────────────────────────────────────────────────────────────────
00H─04H Day (1─31)
05H─08H Month (1─12)
09H─0FH Year, relative to 1980
────────────────────────────────────────────────────────────────────────
8. The time field is mapped as in the directory. Viewed as a 16-bit
word (as it would appear in a register), the field is broken down as
follows:
F E D C B A 9 8 7 6 5 4 3 2 1 0
┌───────────────────┬───────────────────────┬─────────────────────┐
│ Hours │ Minutes │ 2-second increments │
└───────────────────┴───────────────────────┴─────────────────────┘
Bits Contents
────────────────────────────────────────────────────────────────────────
00H─04H 2-second increments (0─29)
05H─0AH Minutes (0─59)
0BH─0FH Hours (0─23)
────────────────────────────────────────────────────────────────────────
9. The current-block and current-record numbers are used together on
sequential reads and writes. This simulates the behavior of CP/M.
10. The Int 21H open (0FH) and create (16H) functions set the
record-size field to 128 bytes, to provide compatibility with CP/M.
If you use another record size, you must fill it in after the open
or create operation.
11. An 0FFH (255) in the first byte of the structure signifies that it
is an extended file control block. You can use extended FCBs with
any of the functions that accept an ordinary FCB. (See also note
12.)
12. The attribute byte in an extended FCB allows access to files with
the special characteristics hidden, system, or read-only. You can
also use extended FCBs to read volume labels and the contents of
special subdirectory files.
──────────────────────────────────────────────────────────────────────────
FCB File-Access Skeleton
The following is a typical program sequence to access a file using the
FCB, or traditional, functions (Figure 8-4):
1. Zero out the prospective FCB.
2. Obtain the filename from the user, from the default FCBs, or from the
command tail in the PSP.
3. If the filename was not obtained from one of the default FCBs, parse
the filename into the new FCB using Int 21H Function 29H.
4. Open the file (Int 21H Function 0FH) or, if writing new data only,
create the file or truncate any existing file of the same name to zero
length (Int 21H Function 16H).
5. Set the record-size field in the FCB, unless you are using the default
record size. Recall that it is important to do this after a successful
open or create operation. (See Figure 8-5.)
6. Set the relative-record field in the FCB if you are performing random
record I/O.
7. Set the disk transfer area address using Int 21H Function 1AH, unless
the buffer address has not been changed since the last call to this
function. If the application never performs a set DTA, the DTA address
defaults to offset 0080H in the PSP.
8. Request the needed read- or write-record operation (Int 21H Function
14H─Sequential Read, 15H─Sequential Write, 21H─Random Read,
22H─Random Write, 27H─Random Block Read, 28H─Random Block Write).
9. If the program is not finished processing the file, go to step 6;
otherwise, close the file (Int 21H Function 10H). If the file was
used for reading only, you can skip the close operation under early
versions of MS-DOS. However, this shortcut can cause problems under
MS-DOS versions 3.0 and later, especially when the files are being
accessed across a network.
──────────────────────────────────────────────────────────────────────────
recsize equ 1024 ; file record size
.
.
.
mov ah,29h ; parse input filename
mov al,1 ; skip leading blanks
mov si,offset fname1 ; address of filename
mov di,offset fcb1 ; address of FCB
int 21h
or al,al ; jump if name
jnz name_err ; was bad
.
.
.
mov ah,29h ; parse output filename
mov al,1 ; skip leading blanks
mov si,offset fname2 ; address of filename
mov di,offset fcb2 ; address of FCB
int 21h
or al,al ; jump if name
jnz name_err ; was bad
.
.
.
mov ah,0fh ; open input file
mov dx,offset fcb1
int 21h
or al,al ; open successful?
jnz no_file ; no, jump
.
.
.
mov ah,16h ; create and open
mov dx,offset fcb2 ; output file
int 21h
or al,al ; create successful?
jnz disk_full ; no, jump
.
.
. ; set record sizes
mov word ptr fcb1+0eh,recsize
mov word ptr fcb2+0eh,recsize
.
.
.
mov ah,1ah ; set disk transfer
mov dx,offset buffer ; address for reads
int 21h ; and writes
.
next: . ; process next record
.
mov ah,14h ; sequential read from
mov dx,offset fcb1 ; input file
int 21h
cmp al,01 ; check for end of file
je file_end ; jump if end of file
cmp al,03
je file_end ; jump if end of file
or al,al ; other read fault?
jnz bad_read ; jump if bad read
.
.
.
mov ah,15h ; sequential write to
mov dx,offset fcb2 ; output file
int 21h
or al,al ; write successful?
jnz bad_write ; jump if write failed
.
.
.
jmp next ; process next record
.
file_end: . ; reached end of input
.
mov ah,10h ; close input file
mov dx,offset fcb1
int 21h
.
.
.
mov ah,10h ; close output file
mov dx,offset fcb2
int 21h
.
.
.
mov ax,4c00h ; exit with return
int 21h ; code of zero
.
.
.
fname1 db 'OLDFILE.DAT',0 ; name of input file
fname2 db 'NEWFILE.DAT',0 ; name of output file
fcb1 db 37 dup (0) ; FCB for input file
fcb2 db 37 dup (0) ; FCB for output file
buffer db recsize dup (?) ; buffer for file I/O
──────────────────────────────────────────────────────────────────────────
Figure 8-4. Skeleton of an assembly-language program that performs file
and record I/O using the FCB family of functions.
Byte Offset FCB before open FCB contents FCB after open
┌────────────────────┬────────────────────┬────────────────────┐
00H │ 00 │ Drive │ 03 │
├────────────────────┼────────────────────┼────────────────────┤
01H │ 4D │ │ 4D │
02H │ 59 │ │ 59 │
03H │ 46 │ │ 46 │
04H │ 49 │ Filename │ 49 │
05H │ 4C │ │ 4C │
06H │ 45 │ │ 45 │
07H │ 20 │ │ 20 │
08H │ 20 │ │ 20 │
├────────────────────┼────────────────────┼────────────────────┤
09H │ 44 │ │ 44 │
0AH │ 41 │ Extension │ 41 │
0BH │ 54 │ │ 54 │
├────────────────────┼────────────────────┼────────────────────┤
0CH │ 00 │ │ 00 │
0DH │ 00 │ Current block │ 00 │
├────────────────────┼────────────────────┼────────────────────┤
0EH │ 00 │ │ 80 │
0FH │ 00 │ Record size │ 00 │
├────────────────────┼────────────────────┼────────────────────┤
10H │ 00 │ │ 80 │
11H │ 00 │ │ 3D │
12H │ 00 │ File size │ 00 │
13H │ 00 │ │ 00 │
├────────────────────┼────────────────────┼────────────────────┤
14H │ 00 │ │ 43 │
15H │ 00 │ File date │ 0B │
├────────────────────┼────────────────────┼────────────────────┤
16H │ 00 │ │ A1 │
17H │ 00 │ File time │ 52 │
├────────────────────┼────────────────────┼────────────────────┤
18H │ 00 │ │ 03 │
19H │ 00 │ │ 02 │
1AH │ 00 │ │ 42 │
1BH │ 00 │ │ 73 │
1CH │ 00 │ Reserved │ 00 │
1DH │ 00 │ │ 01 │
1EH │ 00 │ │ 35 │
1FH │ 00 │ │ 0F │
├────────────────────┼────────────────────┼────────────────────┤
20H │ 00 │ Current record │ 00 │
├────────────────────┼────────────────────┼────────────────────┤
21H │ 00 │ │ 00 │
22H │ 00 │ Relative-record │ 00 │
23H │ 00 │ number │ 00 │
24H │ 00 │ │ 00 │
└────────────────────┴────────────────────┴────────────────────┘
Figure 8-5. A typical file control block before and after a successful
open call (Int 21H Function 0FH).
Points to Remember
Here is a summary of the pros and cons of using the FCB-related file and
record functions in your programs.
Advantages:
■ Under MS-DOS versions 1 and 2, the number of files that can be open
concurrently when using FCBs is unlimited. (This is not true under
MS-DOS versions 3.0 and later, especially if networking software is
running.)
■ File-access methods using FCBs are familiar to programmers with a CP/M
background, and well-behaved CP/M applications require little change in
logical flow to run under MS-DOS.
■ MS-DOS supplies the size, time, and date for a file to its FCB after
the file is opened. The calling program can inspect this information.
Disadvantages:
■ FCBs take up room in the application program's memory space.
■ FCBs offer no support for the hierarchical file structure (no access to
files outside the current directory).
■ FCBs provide no support for file locking/sharing or record locking in
networking environments.
■ In addition to the read or write call itself, file reads or writes
using FCBs require manipulation of the FCB to set record size and
record number, plus a previous call to a separate MS-DOS function to
set the DTA address.
■ Random record I/O using FCBs for a file containing variable-length
records is very clumsy and inconvenient.
■ You must use extended FCBs, which are incompatible with CP/M anyway, to
access or create files with special attributes such as hidden,
read-only, or system.
■ The FCB file functions have poor error reporting. This situation has
been improved somewhat in MS-DOS version 3 because a program can call
the added Int 21H Function 59H (Get Extended Error Information) after
a failed FCB function to obtain additional information.
■ Microsoft discourages use of FCBs. FCBs will make your program more
difficult to port to MS OS/2 later because MS OS/2 does not support
FCBs in protected mode at all.
Using the Handle Functions
The handle file- and record-management functions access files in a fashion
similar to that used under the UNIX/XENIX operating system. Files are
designated by an ASCIIZ string (an ASCII character string terminated by a
null, or zero, byte) that can contain a drive designator, path, filename,
and extension. For example, the file specification
C:\SYSTEM\COMMAND.COM
would appear in memory as the following sequence of bytes:
43 3A 5C 53 59 53 54 45 4D 5C 43 4F 4D 4D 41 4E 44 2E 43 4F 4D 00
When a program wishes to open or create a file, it passes the address of
the ASCIIZ string specifying the file to MS-DOS in the DS:DX registers
(Figure 8-6). If the operation is successful, MS-DOS returns a 16-bit
handle to the program in the AX register. The program must save this
handle for further reference.
──────────────────────────────────────────────────────────────────────────
mov ah,3dh ; function 3dh = open
mov al,2 ; mode 2 = read/write
mov dx,seg filename ; address of ASCIIZ
mov ds,dx ; file specification
mov dx,offset filename
int 21h ; request open from DOS
jc error ; jump if open failed
mov handle,ax ; save file handle
.
.
.
filename db 'C:\MYDIR\MYFILE.DAT',0 ; filename
handle dw 0 ; file handle
──────────────────────────────────────────────────────────────────────────
Figure 8-6. A typical handle file operation. This sequence of code
attempts to open the file designated in the ASCIIZ string whose address is
passed to MS-DOS in the DS:DX registers.
When the program requests subsequent operations on the file, it usually
places the handle in the BX register before the call to MS-DOS. All the
handle functions return with the CPU's carry flag cleared if the operation
was successful, or set if the operation failed; in the latter case, the AX
register contains a code describing the failure.
MS-DOS restricts the number of handles that can be active at any one
time──that is, the number of files and devices that can be open
concurrently when using the handle family of functions──in two different
ways:
■ The maximum number of concurrently open files in the system, for all
active processes combined, is specified by the entry
FILES=nn
in the CONFIG.SYS file. This entry determines the number of entries
to be allocated in the system file table; under MS-DOS version 3, the
default value is 8 and the maximum is 255. After MS-DOS is booted and
running, you cannot expand this table to increase the total number of
files that can be open. You must use an editor to modify the CONFIG.SYS
file and then restart the system.
■ The maximum number of concurrently open files for a single process is
20, assuming that sufficient entries are also available in the system
file table. When a program is loaded, MS-DOS preassigns 5 of its
potential 20 handles to the standard devices. Each time the process
issues an open or create call, MS-DOS assigns a handle from the
process's private allocation of 20, until all the handles are used up
or the system file table is full. In MS-DOS versions 3.3 and later, you
can expand the per-process limit of 20 handles with a call to Int 21H
Function 67H (Set Handle Count).
The handle file- and record-management calls may be gathered into the
following broad classifications for study:
╓┌─┌────────────────────────┌────────────────────────────────────────────────╖
Function Action
Function Action
──────────────────────────────────────────────────────────────────────────
Common handle file operations
3CH Create file (requires ASCIIZ string).
3DH Open file (requires ASCIIZ string).
3EH Close file.
Common handle record operations
42H Set file pointer (also used to find file size).
3FH Read file.
40H Write file.
Less commonly used handle operations
41H Delete file.
43H Get or modify file attributes.
44H IOCTL (I/O Control).
45H Duplicate handle.
46H Redirect handle.
56H Rename file.
57H Get or set file date and time.
5AH Create temporary file (versions 3.0 and later).
Function Action
──────────────────────────────────────────────────────────────────────────
5AH Create temporary file (versions 3.0 and later).
5BH Create file (fails if file already exists;
versions 3.0 and later).
5CH Lock or unlock file region (versions 3.0 and
later).
67H Set handle count (versions 3.3 and later).
68H Commit file (versions 3.3 and later).
6CH Extended open file (version 4).
──────────────────────────────────────────────────────────────────────────
Compare the groups of handle-type functions in the preceding table with
the groups of FCB functions outlined earlier, noting the degree of
functional overlap. Section 2 of this book, "MS-DOS Functions Reference,"
gives detailed specifications for each of the handle functions, along with
assembly-language examples.
Handle File-Access Skeleton
The following is a typical program sequence to access a file using the
handle family of functions (Figure 8-7):
1. Get the filename from the user by means of the buffered input service
(Int 21H Function 0AH) or from the command tail supplied by MS-DOS in
the PSP.
2. Put a zero at the end of the file specification in order to create an
ASCIIZ string.
3. Open the file using Int 21H Function 3DH and mode 2 (read/write
access), or create the file using Int 21H Function 3CH. (Be sure to
set the CX register to zero, so that you don't accidentally make a
file with special attributes.) Save the handle that is returned.
4. Set the file pointer using Int 21H Function 42H. You may set the
file-pointer position relative to one of three different locations:
the start of the file, the current pointer position, or the end of the
file. If you are performing sequential record I/O, you can usually
skip this step because MS-DOS will maintain the file pointer for you
automatically.
5. Read from the file (Int 21H Function 3FH) or write to the file (Int
21H Function 40H). Both of these functions require that the BX
register contain the file's handle, the CX register contain the length
of the record, and the DS:DX registers point to the data being
transferred. Both return the actual number of bytes transferred in the
AX register.
In a read operation, if the number of bytes read is less than the
number requested, the end of the file has been reached. In a write
operation, if the number of bytes written is less than the number
requested, the disk containing the file is full. Neither of these
conditions is returned as an error code; that is, the carry flag is
not set.
6. If the program is not finished processing the file, go to step 4;
otherwise, close the file (Int 21H Function 3EH). Any normal exit
from the program will also close all active handles.
──────────────────────────────────────────────────────────────────────────
recsize equ 1024 ; file record size
.
.
.
mov ah,3dh ; open input file
mov al,0 ; mode = read only
mov dx,offset fname1 ; name of input file
int 21h
jc no_file ; jump if no file
mov handle1,ax ; save token for file
.
.
.
mov ah,3ch ; create output file
mov cx,0 ; attribute = normal
mov dx,offset fname2 ; name of output file
int 21h
jc disk_full ; jump if create fails
mov handle2,ax ; save token for file
.
next: . ; process next record
.
mov ah,3fh ; sequential read from
mov bx,handle1 ; input file
mov cx,recsize
mov dx,offset buffer
int 21h
jc bad_read ; jump if read error
or ax,ax ; check bytes transferred
jz file_end ; jump if end of file
.
.
.
mov ah,40h ; sequential write to
mov bx,handle2 ; output file
mov cx,recsize
mov dx,offset buffer
int 21h
jc bad_write ; jump if write error
cmp ax,recsize ; whole record written?
jne disk_full ; jump if disk is full
.
.
.
jmp next ; process next record
.
file_end: . ; reached end of input
.
mov ah,3eh ; close input file
mov bx,handle1
int 21h
.
.
.
mov ah,3eh ; close output file
mov bx,handle2
int 21h
.
.
.
mov ax,4c00h ; exit with return
int 21h ; code of zero
.
.
.
fname1 db 'OLDFILE.DAT',0 ; name of input file
fname2 db 'NEWFILE.DAT',0 ; name of output file
handle1 dw 0 ; token for input file
handle2 dw 0 ; token for output file
buffer db recsize dup (?) ; buffer for file I/O
──────────────────────────────────────────────────────────────────────────
Figure 8-7. Skeleton of an assembly-language program that performs
sequential processing on an input file and writes the results to an output
file using the handle file and record functions. This code assumes that
the DS and ES registers have already been set to point to the segment
containing the buffers and filenames.
Points to Remember
Here is a summary of the pros and cons of using the handle file and record
operations in your program. Compare this list with the one given earlier
in the chapter for the FCB family of functions.
Advantages:
■ The handle calls provide direct support for I/O redirection and pipes
with the standard input and output devices in a manner functionally
similar to that used by UNIX/XENIX.
■ The handle functions provide direct support for directories (the
hierarchical file structure) and special file attributes.
■ The handle calls support file sharing/locking and record locking in
networking environments.
■ Using the handle functions, the programmer can open channels to
character devices and treat them as files.
■ The handle calls make the use of random record access extremely easy.
The current file pointer can be moved to any byte offset relative to
the start of the file, the end of the file, or the current pointer
position. Records of any length, up to an entire segment (65,535
bytes), can be read to any memory address in one operation.
■ The handle functions have relatively good error reporting in MS-DOS
version 2, and error reporting has been enhanced even further in MS-DOS
versions 3.0 and later.
■ Microsoft strongly encourages use of the handle family of functions in
order to provide upward compatibility with MS OS/2.
Disadvantages:
■ There is a limit per program of 20 concurrently open files and devices
using handles in MS-DOS versions 2.0 through 3.2.
■ Minor gaps still exist in the implementation of the handle functions.
For example, you must still use extended FCBs to change volume labels
and to access the contents of the special files that implement
directories.
MS-DOS Error Codes
When one of the handle file functions fails with the carry flag set, or
when a program calls Int 21H Function 59H (Get Extended Error
Information) following a failed FCB function or other system service, one
of the following error codes may be returned:
╓┌─┌────────────────────────┌────────────────────────────────────────────────╖
Value Meaning
──────────────────────────────────────────────────────────────────────────
MS-DOS version 2 error codes
01H Function number invalid
02H File not found
03H Path not found
04H Too many open files
05H Access denied
06H Handle invalid
07H Memory control blocks destroyed
08H Insufficient memory
09H Memory block address invalid
0AH (10) Environment invalid
0BH (11) Format invalid
0CH (12) Access code invalid
0DH (13) Data invalid
0EH (14) Unknown unit
Value Meaning
──────────────────────────────────────────────────────────────────────────
0EH (14) Unknown unit
0FH (15) Disk drive invalid
10H (16) Attempted to remove current directory
11H (17) Not same device
12H (18) No more files
Mappings to critical-error codes
13H (19) Write-protected disk
14H (20) Unknown unit
15H (21) Drive not ready
16H (22) Unknown command
17H (23) Data error (CRC)
18H (24) Bad request-structure length
19H (25) Seek error
1AH (26) Unknown media type
1BH (27) Sector not found
1CH (28) Printer out of paper
1DH (29) Write fault
1EH (30) Read fault
Value Meaning
──────────────────────────────────────────────────────────────────────────
1EH (30) Read fault
1FH (31) General failure
MS-DOS version 3 and later extended error codes
20H (32) Sharing violation
21H (33) File-lock violation
22H (34) Disk change invalid
23H (35) FCB unavailable
24H (36) Sharing buffer exceeded
25H─31H (37─49) Reserved
32H (50) Unsupported network request
33H (51) Remote machine not listening
34H (52) Duplicate name on network
35H (53) Network name not found
36H (54) Network busy
37H (55) Device no longer exists on network
38H (56) NetBIOS command limit exceeded
39H (57) Error in network adapter hardware
3AH (58) Incorrect response from network
Value Meaning
──────────────────────────────────────────────────────────────────────────
3AH (58) Incorrect response from network
3BH (59) Unexpected network error
3CH (60) Remote adapter incompatible
3DH (61) Print queue full
3EH (62) Not enough room for print file
3FH (63) Print file was deleted
40H (64) Network name deleted
41H (65) Network access denied
42H (66) Incorrect network device type
43H (67) Network name not found
44H (68) Network name limit exceeded
45H (69) NetBIOS session limit exceeded
46H (70) Temporary pause
47H (71) Network request not accepted
48H (72) Print or disk redirection paused
49H─4FH (73─79) Reserved
50H (80) File already exists
51H (81) Reserved
52H (82) Cannot make directory
Value Meaning
──────────────────────────────────────────────────────────────────────────
52H (82) Cannot make directory
53H (83) Fail on Int 24H (critical error)
54H (84) Too many redirections
55H (85) Duplicate redirection
56H (86) Invalid password
57H (87) Invalid parameter
58H (88) Net write fault
──────────────────────────────────────────────────────────────────────────
Under MS-DOS versions 3.0 and later, you can also use Int 21H Function
59H to obtain other information about the error, such as the error locus
and the recommended recovery action.
Critical-Error Handlers
In Chapter 5, we discussed how an application program can take over the
Ctrl-C handler vector (Int 23H) and replace the MS-DOS default handler, to
avoid losing control of the computer when the user enters a Ctrl-C or
Ctrl-Break at the keyboard. Similarly, MS-DOS provides a
critical-error-handler vector (Int 24H) that defines the routine to be
called when unrecoverable hardware faults occur. The default MS-DOS
critical-error handler is the routine that displays a message describing
the error type and the cue
Abort, Retry, Ignore?
This message appears after such actions as the following:
■ Attempting to open a file on a disk drive that doesn't contain a floppy
disk or whose door isn't closed
■ Trying to read a disk sector that contains a CRC error
■ Trying to print when the printer is off line
The unpleasant thing about MS-DOS's default critical-error handler is, of
course, that if the user enters an A for Abort, the application that is
currently executing is terminated abruptly and never has a chance to clean
up and make a graceful exit. Intermediate files may be left on the disk,
files that have been extended using FCBs are not properly closed so that
the directory is updated, interrupt vectors may be left pointing into the
transient program area, and so forth.
To write a truly bombproof MS-DOS application, you must take over the
critical-error-handler vector and point it to your own routine, so that
your program intercepts all catastrophic hardware errors and handles them
appropriately. You can use MS-DOS Int 21H Function 25H to alter the Int
24H vector in a well-behaved manner. When your application exits, MS-DOS
will automatically restore the previous contents of the Int 24H vector
from information saved in the program segment prefix.
MS-DOS calls the critical-error handler for two general classes of
errors── disk-related and non-disk-related──and passes different
information to the handler in the registers for each of these classes.
For disk-related errors, MS-DOS sets the registers as shown on the
following page. (Bits 3─5 of the AH register are relevant only in MS-DOS
versions 3.1 and later.)
╓┌─┌──────────────────┌─────────────────┌────────────────────────────────────╖
Register Bit(s) Significance
──────────────────────────────────────────────────────────────────────────
AH 7 0, to signify disk error
6 Reserved
5 0 = ignore response not allowed
1 = ignore response allowed
4 0 = retry response not allowed
1 = retry response allowed
3 0 = fail response not allowed
1 = fail response allowed
1─2 Area where disk error occurred
00 = MS-DOS area
01 = file allocation table
10 = root directory
11 = files area
0 0 = read operation
1 = write operation
AL 0─7 Drive code (0 = A, 1 = B, and so
forth)
DI 0─7 Driver error code
8─15 Not used
Register Bit(s) Significance
──────────────────────────────────────────────────────────────────────────
8─15 Not used
BP:SI Segment:offset of device-driver
header
──────────────────────────────────────────────────────────────────────────
For non-disk-related errors, the interrupt was generated either as the
result of a character-device error or because a corrupted memory image of
the file allocation table was detected. In this case, MS-DOS sets the
registers as follows:
Register Bit(s) Significance
──────────────────────────────────────────────────────────────────────────
AH 7 1, to signify a non-disk error
DI 0─7 Driver error code
8─15 Not used
BP:SI Segment:offset of device-driver
header
──────────────────────────────────────────────────────────────────────────
To determine whether the critical error was caused by a character device,
use the address in the BP:SI registers to examine the device attribute
word at offset 0004H in the presumed device-driver header. If bit 15 is
set, then the error was indeed caused by a character device, and the
program can inspect the name field of the driver's header to determine the
device.
At entry to a critical-error handler, MS-DOS has already disabled
interrupts and set up the stack as shown in Figure 8-8. A critical-error
handler cannot use any MS-DOS services except Int 21H Functions 01H
through 0CH (Traditional Character I/O), Int 21H Function 30H (Get MS-DOS
Version), and Int 21H Function 59H (Get Extended Error Information).
These functions use a special stack so that the context of the original
function (which generated the critical error) will not be lost.
┌───────┐─┐
│ Flags │ │
├───────┤ │ Flags and CS:IP pushed
│ CS │ ├─ on stack by original
├───────┤ │ Int 21H call
│ IP │ │
├───────┤═╡◄─SS:SP on entry to
│ ES │ │ Int 21H handler
├───────┤ │
│ DS │ │
├───────┤ │
│ BP │ │
├───────┤ │
│ DI │ │
├───────┤ ├─ Registers at point of
│ SI │ │ original Int 21H call
├───────┤ │
│ DX │ │
├───────┤ │
│ CX │ │
├───────┤ │
│ BX │ │
├───────┤ │
│ AX │ │
├───────┤═╡
│ Flags │ │
├───────┤ │
│ CS │ ├─ Return address for
├───────┤ │ Int 24H handler
│ IP │ │
└──────┘─┘
└───── SS:SP on entry to
Int 24H handler
Figure 8-8. The stack at entry to a critical-error handler.
The critical-error handler should return to MS-DOS by executing an IRET,
passing one of the following action codes in the AL register:
Code Meaning
──────────────────────────────────────────────────────────────────────────
0 Ignore the error (MS-DOS acts as though the original
function call had succeeded).
1 Retry the operation.
2 Terminate the process that encountered the error.
3 Fail the function (an error code is returned to the
requesting process). Versions 3.1 and later only.
──────────────────────────────────────────────────────────────────────────
The critical-error handler should preserve all other registers and must
not modify the device-driver header pointed to by BP:SI. A skeleton
example of a critical-error handler is shown in Figure 8-9.
──────────────────────────────────────────────────────────────────────────
; prompt message used by
; critical-error handler
prompt db cr,lf,'Critical Error Occurred: '
db 'Abort, Retry, Ignore, Fail? $'
keys db 'aArRiIfF' ; possible user response keys
keys_len equ $-keys ; (both cases of each allowed)
codes db 2,2,1,1,0,0,3,3 ; codes returned to MS-DOS kernel
; for corresponding response keys
;
; This code is executed during program's initialization
; to install the new critical-error handler.
;
.
.
.
push ds ; save our data segment
mov dx,seg int24 ; DS:DX = handler address
mov ds,dx
mov dx,offset int24
mov ax,2524h ; function 25h = set vector
int 21h ; transfer to MS-DOS
pop ds ; restore data segment
.
.
.
;
; This is the replacement critical-error handler. It
; prompts the user for Abort, Retry, Ignore, or Fail, and
; returns the appropriate code to the MS-DOS kernel.
;
int24 proc far ; entered from MS-DOS kernel
push bx ; save registers
push cx
push dx
push si
push di
push bp
push ds
push es
int24a: mov ax,seg prompt ; display prompt for user
mov ds,ax ; using function 9 (print string
mov es,ax ; terminated by $ character)
mov dx,offset prompt
mov ah,9
int 21h
mov ah,1 ; get user's response
int 21h ; function 1 = read one character
mov di,offset keys ; look up code for response key
mov cx,keys_len
cld
repne scasb
jnz int24a ; prompt again if bad response
; set AL = action code for MS-DOS
; according to key that was entered:
; 0 = ignore, 1 = retry, 2 = abort,
; 3 = fail
mov al,[di+keys_len-1]
pop es ; restore registers
pop ds
pop bp
pop di
pop si
pop dx
pop cx
pop bx
iret ; exit critical-error handler
int24 endp
──────────────────────────────────────────────────────────────────────────
Figure 8-9. A skeleton example of a replacement critical-error handler.
Example Programs: DUMP.ASM and DUMP.C
The programs DUMP.ASM (Figure 8-10) and DUMP.C (Figure 8-11) are
parallel examples of the use of the handle file and record functions. The
assembly-language version, in particular, illustrates features of a
well-behaved MS-DOS utility:
■ The program checks the version of MS-DOS to ensure that all the
functions it is going to use are really available.
■ The program parses the drive, path, and filename from the command tail
in the program segment prefix.
■ The program uses buffered I/O for speed.
■ The program sends error messages to the standard error device.
■ The program sends normal program output to the standard output device,
so that the dump output appears by default on the system console but
can be redirected to other character devices (such as the line printer)
or to a file.
The same features are incorporated into the C version of the program, but
some of them are taken care of behind the scenes by the C runtime library.
──────────────────────────────────────────────────────────────────────────
name dump
page 55,132
title DUMP--display file contents
;
; DUMP--Display contents of file in hex and ASCII
;
; Build: C>MASM DUMP;
; C>LINK DUMP;
;
; Usage: C>DUMP unit:\path\filename.exe [ >device ]
;
; Copyright (C) 1988 Ray Duncan
;
cr equ 0dh ; ASCII carriage return
lf equ 0ah ; ASCII line feed
tab equ 09h ; ASCII tab code
blank equ 20h ; ASCII space code
cmd equ 80h ; buffer for command tail
blksize equ 16 ; input file record size
stdin equ 0 ; standard input handle
stdout equ 1 ; standard output handle
stderr equ 2 ; standard error handle
_TEXT segment word public 'CODE'
assume cs:_TEXT,ds:_DATA,es:_DATA,ss:STACK
dump proc far ; entry point from MS-DOS
push ds ; save DS:0000 for final
xor ax,ax ; return to MS-DOS, in case
push ax ; function 4ch can't be used
mov ax,_DATA ; make our data segment
mov ds,ax ; addressable via DS register
; check MS-DOS version
mov ax,3000h ; function 30h = get version
int 21h ; transfer to MS-DOS
cmp al,2 ; major version 2 or later?
jae dump1 ; yes, proceed
; if MS-DOS 1.x, display
; error message and exit
mov dx,offset msg3 ; DS:DX = message address
mov ah,9 ; function 9 = print string
int 21h ; transfer to MS-DOS
ret ; then exit the old way
dump1: ; check if filename present
mov bx,offset cmd ; ES:BX = command tail
call argc ; count command arguments
cmp ax,2 ; are there 2 arguments?
je dump2 ; yes, proceed
; missing filename, display
; error message and exit
mov dx,offset msg2 ; DS:DX = message address
mov cx,msg2_len ; CX = message length
jmp dump9 ; go display it
dump2: ; get address of filename
mov ax,1 ; AX = argument number
; ES:BX still = command tail
call argv ; returns ES:BX = address,
; and AX = length
mov di,offset fname ; copy filename to buffer
mov cx,ax ; CX = length
dump3: mov al,es:[bx] ; copy one byte
mov [di],al
inc bx ; bump string pointers
inc di
loop dump3 ; loop until string done
mov byte ptr [di],0 ; add terminal null byte
mov ax,ds ; make our data segment
mov es,ax ; addressable by ES too
; now open the file
mov ax,3d00h ; function 3dh = open file
; mode 0 = read only
mov dx,offset fname ; DS:DX = filename
int 21h ; transfer to MS-DOS
jnc dump4 ; jump, open successful
; open failed, display
; error message and exit
mov dx,offset msg1 ; DS:DX = message address
mov cx,msg1_len ; CX = message length
jmp dump9 ; go display it
dump4: mov fhandle,ax ; save file handle
dump5: ; read block of file data
mov bx,fhandle ; BX = file handle
mov cx,blksize ; CX = record length
mov dx,offset fbuff ; DS:DX = buffer
mov ah,3fh ; function 3fh = read
int 21h ; transfer to MS-DOS
mov flen,ax ; save actual length
cmp ax,0 ; end of file reached?
jne dump6 ; no, proceed
cmp word ptr fptr,0 ; was this the first read?
jne dump8 ; no, exit normally
; display empty file
; message and exit
mov dx,offset msg4 ; DS:DX = message address
mov cx,msg4_len ; CX = length
jmp dump9 ; go display it
dump6: ; display heading at
; each 128-byte boundary
test fptr,07fh ; time for a heading?
jnz dump7 ; no, proceed
; display a heading
mov dx,offset hdg ; DS:DX = heading address
mov cx,hdg_len ; CX = heading length
mov bx,stdout ; BX = standard output
mov ah,40h ; function 40h = write
int 21h ; transfer to MS-DOS
dump7: call conv ; convert binary record
; to formatted ASCII
; display formatted output
mov dx,offset fout ; DX:DX = output address
mov cx,fout_len ; CX = output length
mov bx,stdout ; BX = standard output
mov ah,40h ; function 40h = write
int 21h ; transfer to MS-DOS
jmp dump5 ; go get another record
dump8: ; close input file
mov bx,fhandle ; BX = file handle
mov ah,3eh ; function 3eh = close
int 21h ; transfer to MS-DOS
mov ax,4c00h ; function 4ch = terminate,
; return code = 0
int 21h ; transfer to MS-DOS
dump9: ; display message on
; standard error device
; DS:DX = message address
; CX = message length
mov bx,stderr ; standard error handle
mov ah,40h ; function 40h = write
int 21h ; transfer to MS-DOS
mov ax,4c01h ; function 4ch = terminate,
; return code = 1
int 21h ; transfer to MS-DOS
dump endp
conv proc near ; convert block of data
; from input file
mov di,offset fout ; clear output format
mov cx,fout_len-2 ; area to blanks
mov al,blank
rep stosb
mov di,offset fout ; convert file offset
mov ax,fptr ; to ASCII for output
call w2a
mov bx,0 ; init buffer pointer
conv1: mov al,[fbuff+bx] ; fetch byte from buffer
mov di,offset foutb ; point to output area
; format ASCII part...
; store '.' as default
mov byte ptr [di+bx],'.'
cmp al,blank ; in range 20h-7eh?
jb conv2 ; jump, not alphanumeric
cmp al,7eh ; in range 20h-7eh?
ja conv2 ; jump, not alphanumeric
mov [di+bx],al ; store ASCII character
conv2: ; format hex part...
mov di,offset fouta ; point to output area
add di,bx ; base addr + (offset*3)
add di,bx
add di,bx
call b2a ; convert byte to hex
inc bx ; advance through record
cmp bx,flen ; entire record converted?
jne conv1 ; no, get another byte
; update file pointer
add word ptr fptr,blksize
ret
conv endp
w2a proc near ; convert word to hex ASCII
; call with AX = value
; DI = addr for string
; returns AX, DI, CX destroyed
push ax ; save copy of value
mov al,ah
call b2a ; convert upper byte
pop ax ; get back copy
call b2a ; convert lower byte
ret
w2a endp
b2a proc near ; convert byte to hex ASCII
; call with AL = binary value
; DI = addr for string
; returns AX, DI, CX modified
sub ah,ah ; clear upper byte
mov cl,16
div cl ; divide byte by 16
call ascii ; quotient becomes the first
stosb ; ASCII character
mov al,ah
call ascii ; remainder becomes the
stosb ; second ASCII character
ret
b2a endp
ascii proc near ; convert value 0-0fh in AL
; into "hex ASCII" character
add al,'0' ; offset to range 0-9
cmp al,'9' ; is it > 9?
jle ascii2 ; no, jump
add al,'A'-'9'-1 ; offset to range A-F,
ascii2: ret ; return AL = ASCII char
ascii endp
argc proc near ; count command-line arguments
; call with ES:BX = command line
; returns AX = argument count
push bx ; save original BX and CX
push cx ; for later
mov ax,1 ; force count >= 1
argc1: mov cx,-1 ; set flag = outside argument
argc2: inc bx ; point to next character
cmp byte ptr es:[bx],cr
je argc3 ; exit if carriage return
cmp byte ptr es:[bx],blank
je argc1 ; outside argument if ASCII blank
cmp byte ptr es:[bx],tab
je argc1 ; outside argument if ASCII tab
; otherwise not blank or tab,
jcxz argc2 ; jump if already inside argument
inc ax ; else found argument, count it
not cx ; set flag = inside argument
jmp argc2 ; and look at next character
argc3: pop cx ; restore original BX and CX
pop bx
ret ; return AX = argument count
argc endp
argv proc near ; get address & length of
; command line argument
; call with ES:BX = command line
; AX = argument #
; returns ES:BX = address
; AX = length
push cx ; save original CX and DI
push di
or ax,ax ; is it argument 0?
jz argv8 ; yes, jump to get program name
xor ah,ah ; initialize argument counter
argv1: mov cx,-1 ; set flag = outside argument
argv2: inc bx ; point to next character
cmp byte ptr es:[bx],cr
je argv7 ; exit if carriage return
cmp byte ptr es:[bx],blank
je argv1 ; outside argument if ASCII blank
cmp byte ptr es:[bx],tab
je argv1 ; outside argument if ASCII tab
; if not blank or tab...
jcxz argv2 ; jump if already inside argument
inc ah ; else count arguments found
cmp ah,al ; is this the one we're looking for?
je argv4 ; yes, go find its length
not cx ; no, set flag = inside argument
jmp argv2 ; and look at next character
argv4: ; found desired argument, now
; determine its length...
mov ax,bx ; save param starting address
argv5: inc bx ; point to next character
cmp byte ptr es:[bx],cr
je argv6 ; found end if carriage return
cmp byte ptr es:[bx],blank
je argv6 ; found end if ASCII blank
cmp byte ptr es:[bx],tab
jne argv5 ; found end if ASCII tab
argv6: xchg bx,ax ; set ES:BX = argument address
sub ax,bx ; and AX = argument length
jmp argvx ; return to caller
argv7: xor ax,ax ; set AX = 0, argument not found
jmp argvx ; return to caller
argv8: ; special handling for argv = 0
mov ax,3000h ; check if DOS 3.0 or later
int 21h ; (force AL = 0 in case DOS 1)
cmp al,3
jb argv7 ; DOS 1 or 2, return null param
mov es,es:[2ch] ; get environment segment from PSP
xor di,di ; find the program name by
xor al,al ; first skipping over all the
mov cx,-1 ; environment variables...
cld
argv9: repne scasb ; scan for double null (can't use
scasb ; SCASW since might be odd addr)
jne argv9 ; loop if it was a single null
add di,2 ; skip count word in environment
mov bx,di ; save program name address
mov cx,-1 ; now find its length...
repne scasb ; scan for another null byte
not cx ; convert CX to length
dec cx
mov ax,cx ; return length in AX
argvx: ; common exit point
pop di ; restore original CX and DI
pop cx
ret ; return to caller
argv endp
_TEXT ends
_DATA segment word public 'DATA'
fname db 64 dup (0) ; buffer for input filespec
fhandle dw 0 ; token from PCDOS for input file
flen dw 0 ; actual length read
fptr dw 0 ; relative address in file
fbuff db blksize dup (?) ; data from input file
fout db 'nnnn' ; formatted output area
db blank,blank
fouta db 16 dup ('nn',blank)
db blank
foutb db 16 dup (blank),cr,lf
fout_len equ $-fout
hdg db cr,lf ; heading for each 128 bytes
db 7 dup (blank) ; of formatted output
db '0 1 2 3 4 5 6 7 '
db '8 9 A B C D E F',cr,lf
hdg_len equ $-hdg
msg1 db cr,lf
db 'dump: file not found'
db cr,lf
msg1_len equ $-msg1
msg2 db cr,lf
db 'dump: missing file name'
db cr,lf
msg2_len equ $-msg2
msg3 db cr,lf
db 'dump: wrong MS-DOS version'
db cr,lf,'$'
msg4 db cr,lf
db 'dump: empty file'
db cr,lf
msg4_len equ $-msg4
_DATA ends
STACK segment para stack 'STACK'
db 64 dup (?)
STACK ends
end dump
──────────────────────────────────────────────────────────────────────────
Figure 8-10. The assembly-language version: DUMP.ASM.
──────────────────────────────────────────────────────────────────────────
/*
DUMP.C Displays the binary contents of a file in
hex and ASCII on the standard output device.
Compile: C>CL DUMP.C
Usage: C>DUMP unit:path\filename.ext
Copyright (C) 1988 Ray Duncan
*/
#include <stdio.h>
#include <io.h>
#include <fcntl.h>
#define REC_SIZE 16 /* input file record size */
main(int argc, char *argv[])
{
int fd; /* input file handle */
int status = 0; /* status from file read */
long fileptr = 0L; /* current file byte offset */
char filebuf[REC_SIZE]; /* data from file */
if(argc != 2) /* abort if missing filename */
{ fprintf(stderr,"\ndump: wrong number of parameters\n");
exit(1);
}
/* open file in binary mode,
abort if open fails */
if((fd = open(argv[1],O_RDONLY | O_BINARY) ) == -1)
{ fprintf(stderr, "\ndump: can't find file %s \n", argv[1]);
exit(1);
}
/* read and dump records
until end of file */
while((status = read(fd,filebuf,REC_SIZE) ) != 0)
{ dump_rec(filebuf, fileptr, status);
fileptr += REC_SIZE;
}
close(fd); /* close input file */
exit(0); /* return success code */
}
/*
Display record (16 bytes) in hex and ASCII on standard output
*/
dump_rec(char *filebuf, long fileptr, int length)
{
int i; /* index to current record */
if(fileptr % 128 == 0) /* display heading if needed */
printf("\n\n 0 1 2 3 4 5 6 7 8 9 A B C D E F")
printf("\n%04lX ",fileptr); /* display file offset */
/* display hex equivalent of
each byte from file */
for(i = 0; i < length; i++)
printf(" %02X", (unsigned char) filebuf[i]);
if(length != 16) /* spaces if partial record */
for (i=0; i<(16-length); i++) printf(" ");
/* display ASCII equivalent of
each byte from file */
printf(" ");
for(i = 0; i < length; i++)
{ if(filebuf[i] < 32 || filebuf[i] > 126) putchar('.');
else putchar(filebuf[i]);
}
}
──────────────────────────────────────────────────────────────────────────
Figure 8-11. The C version: DUMP.C.
The assembly-language version of the DUMP program contains a number of
subroutines that you may find useful in your own programming efforts.
These include the following:
Subroutine Action
──────────────────────────────────────────────────────────────────────────
argc Returns the number of command-line arguments.
argv Returns the address and length of a particular command-line
argument.
w2a Converts a binary word (16 bits) into hex ASCII for output.
b2a Converts a binary byte (8 bits) into hex ASCII for output.
ascii Converts 4 bits into a single hex ASCII character.
──────────────────────────────────────────────────────────────────────────
It is interesting to compare these two equivalent programs. The C program
contains only 77 lines, whereas the assembly-language program has 436
lines. Clearly, the C source code is less complex and easier to maintain.
On the other hand, if size and efficiency are important, the DUMP.EXE file
generated by the C compiler is 8563 bytes, whereas the assembly-language
DUMP.EXE file is only 1294 bytes and runs twice as fast as the C program.
────────────────────────────────────────────────────────────────────────────
Chapter 9 Volumes and Directories
Each file in an MS-DOS system is uniquely identified by its name and its
location. The location, in turn, has two components: the logical drive
that contains the file and the directory on that drive where the filename
can be found.
Logical drives are specified by a single letter followed by a colon (for
example, A:). The number of logical drives in a system is not necessarily
the same as the number of physical drives; for example, it is common for
large fixed-disk drives to be divided into two or more logical drives. The
key aspect of a logical drive is that it contains a self-sufficient file
system; that is, it contains one or more directories, zero or more
complete files, and all the information needed to locate the files and
directories and to determine which disk space is free and which is already
in use.
Directories are simply lists or catalogs. Each entry in a directory
consists of the name, size, starting location, attributes, and last
modification date and time of a file or another directory that the disk
contains. The detailed information about the location of every block of
data assigned to a file or directory is in a separate control area on the
disk called the file allocation table (FAT). (See Chapter 10 for a
detailed discussion of the internal format of directories and the FAT.)
Every disk potentially has two distinct kinds of directories: the root
directory and all other directories. The root directory is always present
and has a maximum number of entries, determined when the disk is
formatted; this number cannot be changed. The subdirectories of the root
directory, which may or may not be present on a given disk, can be nested
to any level and can grow to any size (Figure 9-1). This is the
hierarchical, or tree, directory structure referred to in earlier
chapters. Every directory has a name, except for the root directory, which
is designated by a single backslash (\) character.
MS-DOS keeps track of a "current drive" for the system and uses this drive
when a file specification does not include an explicit drive code.
Similarly, MS-DOS maintains a "current directory" for each logical drive.
You can select any particular directory on a drive by naming in order──
either from the root directory or relative to the current directory──the
directories that lead to its location in the tree structure. Such a list
of directories, separated by backslash delimiters, is called a path. When
a complete path from the root directory is prefixed by a logical drive
code and followed by a filename and extension, the resulting string is a
fully qualified filename and unambiguously specifies a file.
┌────────────┐
│ Drive │
│ identifier │
└─────┬──────┘
│
┌───────┴────────┐
│ Root directory │
│ (volume label) │
└─┬──┬──┬───┬──┬─┘
┌───────────────────┘ │ │ │ └───────────────────┐
│ ┌───────────┘ │ └───────────┐ │
┌────┴───┐ ┌────┴──────┐ ┌───┴────┐ ┌──────┴────┐ ┌───┴────┐
│ File A │ │ Directory │ │ File B │ │ Directory │ │ File C │
└────────┘ └─┬───────┬─┘ └────────┘ └─┬─────────┘ └─┬──────┘
│ │ │ │
│ │ │ │
┌─────┘ │ │ │
│ │ │ │
┌────┴──────┐ ┌──┴─────┐ ┌─────┴──┐ ┌───┴────┐
│ Directory │ │ File D │ │ File E │ │ File F │
└───────────┘ └────────┘ └────────┘ └────────┘
Figure 9-1. An MS-DOS file-system structure.
Drive and Directory Control
You can examine, select, create, and delete disk directories interactively
with the DIR, CHDIR (CD), MKDIR (MD), and RMDIR (RD) commands. You can
select a new current drive by entering the letter of the desired drive,
followed by a colon. MS-DOS provides the following Int 21H functions to
give application programs similar control over drives and directories:
Function Action
──────────────────────────────────────────────────────────────────────────
0EH Select current drive.
19H Get current drive.
39H Create directory.
3AH Remove directory.
3BH Select current directory.
47H Get current directory.
──────────────────────────────────────────────────────────────────────────
The two functions that deal with disk drives accept or return a binary
drive code──0 represents drive A, 1 represents drive B, and so on. This
differs from most other MS-DOS functions, which use 0 to indicate the
current drive, 1 for drive A, and so on.
The first three directory functions in the preceding list require an
ASCIIZ string that describes the path to the desired directory. As with
the handle-based file open and create functions, the address of the ASCIIZ
string is passed in the DS:DX registers. On return, the carry flag is
clear if the function succeeds or set if the function failed, with an
error code in the AX register. The directory functions can fail for a
variety of reasons, but the most common cause of an error is that some
element of the indicated path does not exist.
The last function in the preceding list, Int 21H Function 47H, allows you
to obtain an ASCIIZ path for the current directory on the specified or
default drive. MS-DOS supplies the path string without the drive
identifier or a leading backslash. Int 21H Function 47H is most commonly
used with Int 21H Function 19H to build fully qualified filenames. Such
filenames are desirable because they remain valid if the user changes the
current drive or directory.
Section 2 of this book, "MS-DOS Functions Reference," gives detailed
information on the drive and directory control functions.
Searching Directories
When you request an open operation on a file, you are implicitly
performing a search of a directory. MS-DOS examines each entry of the
directory to find a match for the filename you have given as an argument;
if the file is found, MS-DOS copies certain information from the directory
into a data structure that it can use to control subsequent read or write
operations to the file. Thus, if you wish to test for the existence of a
specific file, you need only perform an open operation and observe whether
it is successful. (If it is, you should, of course, perform a subsequent
close operation to avoid needless expenditure of handles.)
Sometimes you may need to perform more elaborate searches of a disk
directory. Perhaps you wish to find all the files with a certain
extension, a file with a particular attribute, or the names of the
subdirectories of a certain directory. Although the locations of a disk's
directories and the specifics of the entries that are found in them are of
necessity hardware dependent (for example, interpretation of the field
describing the starting location of a file depends upon the physical disk
format), MS-DOS does provide functions that will allow examination of a
disk directory in a hardware-independent fashion.
In order to search a disk directory successfully, you must understand two
types of MS-DOS search services. The first type is the "search for first"
function, which accepts a file specification──possibly including wildcard
characters──and looks for the first matching file in the directory of
interest. If it finds a match, the function fills a buffer owned by the
requesting program with information about the file; if it does not find a
match, it returns an error flag.
A program can call the second type of search service, called "search for
next," only after a successful "search for first." If the file
specification that was originally passed to "search for first" included
wildcard characters and at least one matching file was present, the
program can call "search for next" as many times as necessary to find all
additional matching files. Like "search for first," "search for next"
returns information about the matched files in a buffer designated by the
requesting program. When it can find no more matching files, "search for
next" returns an error flag.
As with nearly every other operation, MS-DOS provides two parallel sets of
directory-searching services:
Action FCB function Handle function
──────────────────────────────────────────────────────────────────────────
Search for first 11H 4EH
Search for next 12H 4FH
──────────────────────────────────────────────────────────────────────────
The FCB directory functions allow searches to match a filename and
extension, both possibly containing wildcard characters, within the
current directory for the specified or current drive. The handle directory
functions, on the other hand, allow a program to perform searches within
any directory on any drive, regardless of the current directory.
Searches that use normal FCBs find only normal files. Searches that use
extended FCBs, or the handle-type functions, can be qualified with file
attributes. The attribute bits relevant to searches are as follows:
Bit Significance
──────────────────────────────────────────────────────────────────────────
0 Read-only file
1 Hidden file
2 System file
3 Volume label
4 Directory
5 Archive needed (set when file modified)
──────────────────────────────────────────────────────────────────────────
The remaining bits of a search function's attribute parameter should be
zero. When any of the preceding attribute bits are set, the search
function returns all normal files plus any files with the specified
attributes, except in the case of the volume-label attribute bit, which
receives special treatment as described later in this chapter. Note that
by setting bit 4 you can include directories in a search, exactly as
though they were files.
Both the FCB and handle directory-searching functions require that the
disk transfer area address be set (with Int 21H Function 1AH), before the
call to "search for first," to point to a working buffer for use by
MS-DOS. The DTA address should not be changed between calls to "search for
first" and "search for next." When it finds a matching file, MS-DOS places
the information about the file in the buffer and then inspects the buffer
on the next "search for next" call, to determine where to resume the
search. The format of the data returned in the buffer is different for the
FCB and handle functions, so read the detailed descriptions in Section 2
of this book, "MS-DOS Functions Reference," before attempting to interpret
the buffer contents.
Figures 9-2 and 9-3 provide equivalent examples of searches for all
files in a given directory that have the .ASM extension, one example using
the FCB directory functions (Int 21H Functions 11H and 12H) and the
other using the handle functions (Int 21H Functions 4EH and 4FH). (Both
programs use the handle write function with the standard output handle to
display the matched filenames, to avoid introducing tangential differences
in the listings.)
──────────────────────────────────────────────────────────────────────────
start: ; set DTA address for buffer
; used by search functions
mov dx,seg buff ; DS:DX = buffer address
mov ds,dx
mov dx,offset buff
mov ah,1ah ; function 1ah = search for first
int 21h ; transfer to MS-DOS
; search for first match...
mov dx,offset fcb ; DS:DX = FCB address
mov ah,11h ; function 11h = search for first
int 21h ; transfer to MS-DOS
or al,al ; any matches at all?
jnz exit ; no, quit
disp: ; go to a new line...
mov dx,offset crlf ; DS:DX = CR-LF string
mov cx,2 ; CX = string length
mov bx,1 ; BX = standard output handle
mov ah,40h ; function 40h = write
int 21h ; transfer to MS-DOS
; display matching file
mov dx,offset buff+1 ; DS:DX = filename
mov cx,11 ; CX = length
mov bx,1 ; BX = standard output handle
mov ah,40h ; function 40h = write
int 21h ; transfer to MS-DOS
; search for next match...
mov dx,offset fcb ; DS:DX = FCB address
mov ah,12h ; function 12h = search for next
int 21h ; transfer to MS-DOS
or al,al ; any more matches?
jz disp ; yes, go show filename
exit: ; final exit point
mov ax,4c00h ; function 4ch = terminate,
; return code = 0
int 21h ; transfer to MS-DOS
.
.
.
crlf db 0dh,0ah ; ASCII carriage return-
; linefeed string
fcb db 0 ; drive = current
db 8 dup ('?') ; filename = wildcard
db 'ASM' ; extension = ASM
db 25 dup (0) ; remainder of FCB = zero
buff db 64 dup (0) ; receives search results
──────────────────────────────────────────────────────────────────────────
Figure 9-2. Example of an FCB-type directory search using Int 21H
Functions 11H and 12H. This routine displays the names of all files in
the current directory that have the .ASM extension.
──────────────────────────────────────────────────────────────────────────
start: ; set DTA address for buffer
; used by search functions
mov dx,seg buff ; DS:DX = buffer address
mov ds,dx
mov dx,offset buff
mov ah,1ah ; function 1ah = search for first
int 21h ; transfer to MS-DOS
; search for first match...
mov dx,offset fname ; DS:DX = wildcard filename
mov cx,0 ; CX = normal file attribute
mov ah,4eh ; function 4eh = search for first
int 21h ; transfer to MS-DOS
jc exit ; quit if no matches at all
disp: ; go to a new line...
mov dx,offset crlf ; DS:DX = CR-LF string
mov cx,2 ; CX = string length
mov bx,1 ; BX = standard output handle
mov ah,40h ; function 40h = write
int 21h ; transfer to MS-DOS
; find length of filename...
mov cx,0 ; CX will be char count
; DS:SI = start of name
mov si,offset buff+30
disp1: lodsb ; get next character
or al,al ; is it null character?
jz disp2 ; yes, found end of string
inc cx ; else count characters
jmp disp1 ; and get another
disp2: ; display matching file...
; CX already contains length
; DS:DX = filename
mov dx,offset buff+30
mov bx,1 ; BX = standard output handle
mov ah,40h ; function 40h = write
int 21h ; transfer to MS-DOS
; find next matching file...
mov ah,4fh ; function 4fh = search for next
int 21h ; transfer to MS-DOS
jnc disp ; jump if another match found
exit: ; final exit point
mov ax,4c00h ; function 4ch = terminate,
; return code = 0
int 21h ; transfer to MS-DOS
.
.
.
crlf db 0dh,0ah ; ASCII carriage return-
; linefeed string
fname db '*.ASM',0 ; ASCIIZ filename to
; be matched
buff db 64 dup (0) ; receives search results
──────────────────────────────────────────────────────────────────────────
Figure 9-3. Example of a handle-type directory search using Int 21H
Functions 4EH and 4FH. This routine also displays the names of all files
in the current directory that have a .ASM extension.
Moving Files
The rename file function that was added in MS-DOS version 2.0, Int 21H
Function 56H, has the little-advertised capability to move a file from
one directory to another. The function has two ASCIIZ parameters: the
"old" and "new" names for the file. If the old and new paths differ,
MS-DOS moves the file; if the filename or extension components differ,
MS-DOS renames the file. MS-DOS can carry out both of these actions in the
same function call.
Of course, the old and new directories must be on the same drive, because
the file's actual data is not moved at all; only the information that
describes the file is removed from one directory and placed in another
directory. Function 56H fails if the two ASCIIZ strings include different
logical-drive codes, if the file is read-only, or if a file with the same
name and location as the "new" filename already exists.
The FCB-based rename file service, Int 21H Function 17H, works only on
the current directory and cannot be used to move files.
Volume Labels
Support for volume labels was first added to MS-DOS in version 2.0. A
volume label is an optional name of from 1 to 11 characters that the user
assigns to a disk during a FORMAT operation. You can display a volume
label with the DIR, TREE, CHKDSK, or VOL command. Beginning with MS-DOS
version 3.0, you can use the LABEL command to add, display, or alter the
label after formatting. In MS-DOS version 4, the FORMAT program also
assigns a semi-random 32-bit binary ID to each disk it formats; you can
display this value, but you cannot change it.
The distinction between volumes and drives is important. A volume label is
associated with a specific storage medium. A drive identifier (such as A)
is associated with a physical device that a storage medium can be mounted
on. In the case of fixed-disk drives, the medium associated with a drive
identifier does not change (hence the name). In the case of floppy disks
or other removable media, the disk accessed with a given drive identifier
might have any volume label or none at all.
Hence, volume labels do not take the place of the logical-drive identifier
and cannot be used as part of a pathname to identify a file. In fact, in
MS-DOS version 2, the system does not use volume labels internally at all.
In MS-DOS versions 3.0 and later, a disk driver can use volume labels to
detect whether the user has replaced a disk while a file is open; this use
is optional, however, and is not implemented in all systems.
MS-DOS volume labels are implemented as a special type of entry in a
disk's root directory. The entry contains a time-and-date stamp and has an
attribute value of 8 (i.e., bit 3 set). Except for the attribute, a volume
label is identical to the directory entry for a file that was created but
never had any data written into it, and you can manipulate volume labels
with Int 21H functions much as you manipulate files. However, a volume
label receives special handling at several levels:
■ When you create a volume label after a disk is formatted, MS-DOS always
places it in the root directory, regardless of the current directory.
■ A disk can contain only one volume label; attempts to create additional
volume labels (even with different names) will fail.
■ MS-DOS always carries out searches for volume labels in the root
directory, regardless of the current directory, and does not also
return all normal files.
In MS-DOS version 2, support for volume labels is not completely
integrated into the handle file functions, and you must use extended FCBs
instead to manipulate volume labels. For example, the code in Figure 9-4
searches for the volume label in the root directory of the current drive.
You can also change volume labels with extended FCBs and the rename file
function (Int 21H Function 17H), but you should not attempt to remove an
existing volume label with Int 21H Function 13H under MS-DOS version 2,
because this operation can damage the disk's FAT in an unpredictable
manner.
In MS-DOS versions 3.0 and later, you can create a volume label in the
expected manner, using Int 21H Function 3CH and an attribute of 8, and
you can use the handle-type "search for first" function (4EH) to obtain
an existing volume label for a logical drive (Figure 9-5). However, you
still must use extended FCBs to change a volume label.
──────────────────────────────────────────────────────────────────────────
buff db 64 dup (?) ; receives search results
xfcb db 0ffh ; flag signifying extended FCB
db 5 dup (0) ; reserved
db 8 ; volume attribute byte
db 0 ; drive code (0 = current)
db 11 dup ('?') ; wildcard filename and extension
db 25 dup (0) ; remainder of FCB (not used)
.
.
.
; set DTA address for buffer
; used by search functions
mov dx,seg buff ; DS:DX = buffer address
mov ds,dx
mov dx,offset buff
mov ah,1ah ; function 1ah = set DTA
int 21h ; transfer to MS-DOS
; now search for label...
; DS:DX = extended FCB
mov dx,offset xfcb
mov ah,11h ; function 11h = search for first
int 21h ; transfer to MS-DOS
cmp al,0ffh ; search successful?
je no_label ; jump if no volume label
.
.
.
──────────────────────────────────────────────────────────────────────────
Figure 9-4. A volume-label search under MS-DOS version 2, using an
extended file control block. If the search is successful, the volume label
is returned in buff, formatted in the filename and extension fields of an
extended FCB.
──────────────────────────────────────────────────────────────────────────
buff db 64 dup (?) ; receives search results
wildcd db '*.*',0 ; wildcard ASCIIZ filename
.
.
.
; set DTA address for buffer
; used by search functions
mov dx,seg buff ; DS:DX = buffer address
mov ds,dx
mov dx,offset buff
mov ah,1ah ; function 1ah = set DTA
int 21h ; transfer to MS-DOS
; now search for label...
; DS:DX = ASCIIZ string
mov dx,offset wildcd
mov cx,8 ; CX = volume attribute
mov ah,4eh ; function 4eh = search for first
int 21h ; transfer to MS-DOS
jc no_label ; jump if no volume label
.
.
.
──────────────────────────────────────────────────────────────────────────
Figure 9-5. A volume-label search under MS-DOS version 3, using the
handle-type file functions. If the search is successful (carry flag
returned clear), the volume name is placed at location buff+1EH in the
form of an ASCIIZ string.
────────────────────────────────────────────────────────────────────────────
Chapter 10 Disk Internals
MS-DOS disks are organized according to a rather rigid scheme that is
easily understood and therefore easily manipulated. Although you will
probably never need to access the special control areas of a disk
directly, an understanding of their internal structure leads to a better
understanding of the behavior and performance of MS-DOS as a whole.
From the application programmer's viewpoint, MS-DOS presents disk devices
as logical volumes that are associated with a drive code (A, B, C, and so
on) and that have a volume name (optional), a root directory, and from
zero to many additional directories and files. MS-DOS shields the
programmer from the physical characteristics of the medium by providing a
battery of disk services through Int 21H. Using these services, the
programmer can create, open, read, write, close, and delete files in a
uniform way, regardless of the disk drive's size, speed, number of
read/write heads, number of tracks, and so forth.
Requests from an application program for file operations actually go
through two levels of translation before resulting in the physical
transfer of data between the disk device and random-access memory:
1. Beneath the surface, MS-DOS views each logical volume, whether it is
an entire physical unit such as a floppy disk or only a part of a
fixed disk, as a continuous sequence of logical sectors, starting at
sector 0. (A logical disk volume can also be implemented on other
types of storage. For example, RAM disks map a disk structure onto an
area of random-access memory.) MS-DOS translates an application
program's Int 21H file-management requests into requests for transfers
of logical sectors, using the information found in the volume's
directories and allocation tables. (For those rare situations where it
is appropriate, programs can also access logical sectors directly with
Int 25H and Int 26H.)
2. MS-DOS then passes the requests for logical sectors to the disk
device's driver, which maps them onto actual physical addresses (head,
track, and sector). Disk drivers are extremely hardware dependent and
are always written in assembly language for maximum speed. In most
versions of MS-DOS, a driver for IBM-compatible floppy- and fixed-disk
drives is built into the MS-DOS BIOS module (IO.SYS) and is always
loaded during system initialization; you can install additional
drivers for non-IBM-compatible disk devices by including the
appropriate DEVICE directives in the CONFIG.SYS file.
Each MS-DOS logical volume is divided into several fixed-size control
areas and a files area (Figure 10-1). The size of each control area
depends on several factors──the size of the volume and the version of
FORMAT used to initialize the volume, for example──but all of the
information needed to interpret the structure of a particular logical
volume can be found on the volume itself in the boot sector.
┌───────────────────────────────────────────────────────┐
│ Boot sector │
│ Reserved area │
├───────────────────────────────────────────────────────┤
│ File allocation table #1 │
├───────────────────────────────────────────────────────┤
│ Possible additional copies of FAT │
├───────────────────────────────────────────────────────┤
│ Root directory │
├───────────────────────────────────────────────────────┤
│ │
│ Files area │
│ │
└───────────────────────────────────────────────────────┘
Figure 10-1. Map of a typical MS-DOS logical volume. The boot sector
(logical sector 0) contains the OEM identification, BIOS parameter block
(BPB), and disk bootstrap. The remaining sectors are divided among an
optional reserved area, one or more copies of the file allocation table,
the root directory, and the files area.
The Boot Sector
Logical sector 0, known as the boot sector, contains all of the critical
information regarding the disk medium's characteristics (Figure 10-2).
The first byte in the sector is always an 80x86 jump instruction──either a
normal intrasegment JMP (opcode 0E9H) followed by a 16-bit displacement or
a "short" JMP (opcode 0EBH) followed by an 8-bit displacement and then by
an NOP (opcode 90H). If neither of these two JMP opcodes is present, the
disk has not been formatted or was not formatted for use with MS-DOS. (Of
course, the presence of the JMP opcode does not in itself ensure that the
disk has an MS-DOS format.)
Following the initial JMP instruction is an 8-byte field that is reserved
by Microsoft for OEM identification. The disk-formatting program, which is
specialized for each brand of computer, disk controller, and medium, fills
in this area with the name of the computer manufacturer and the
manufacturer's internal MS-DOS version number.
00H ┌───────────────────────────────────────────────┐
│ E9 XX XX or EB XX 90 │
03H ├───────────────────────────────────────────────┤
│ OEM name and version │
│ (8 bytes) │
OBH ├───────────────────────────────────────────────┤─┐
│ Bytes per sector (2 bytes) │ │
ODH ├───────────────────────────────────────────────┤ │
│ Sectors per allocation unit (1 byte) │ │
0EH ├───────────────────────────────────────────────┤ │
│ Reserved sectors, starting at 0 (2 bytes) │ │
10H ├───────────────────────────────────────────────┤ │
│ Number of FATs (1 byte) │ B
11H ├───────────────────────────────────────────────┤ P
│ Number of root-directory entries (2 bytes) │ B
13H ├───────────────────────────────────────────────┤ │
│ Total sectors in logical volume (2 bytes) │ │
15H ├───────────────────────────────────────────────┤ │ MS-DOS
│ Media descriptor byte │ │ version 2.0
16H ├───────────────────────────────────────────────┤ │
│ Number of sectors per FAT (2 bytes) │ │
18H ├───────────────────────────────────────────────┤═╡
│ Sectors per track (2 bytes) │ │
1AH ├───────────────────────────────────────────────┤ │
│ Number of heads (2 bytes) │ │ MS-DOS
1CH ├───────────────────────────────────────────────┤ │ version 3.0
│ Number of hidden sectors (4 bytes) │═╡
20H ├───────────────────────────────────────────────┤ │ MS-DOS
│ Total sectors in logical volume │ │ version 4.0
│ (MS-DOS 4.0 and volume size >32 MB) │ │
24H ├───────────────────────────────────────────────┤═╡
│ Physical drive number │ │
25H ├───────────────────────────────────────────────┤ │
│ Reserved │ │
26H ├───────────────────────────────────────────────┤ │
│ Extended boot signature record (29H) │ │ Additional
27H ├───────────────────────────────────────────────┤ │ MS-DOS 4.0
│ 32-bit binary volume ID │ │ information
2BH ├───────────────────────────────────────────────┤ │
│ Volume label (11 bytes) │ │
36H ├───────────────────────────────────────────────┤ │
│ Reserved (8 bytes) │ │
3EH ├───────────────────────────────────────────────┤─┘
│ Bootstrap │
└───────────────────────────────────────────────┘
Figure 10-2. Map of the boot sector of an MS-DOS disk. Note the JMP at
offset 0, the OEM identification field, the MS-DOS version 2 compatible
BIOS parameter block (bytes 0BH─17H), the three additional WORD fields for
MS-DOS version 3, the double-word number-of-sectors field and 32-bit
binary volume ID for MS-DOS version 4.0, and the bootstrap code.
The third major component of the boot sector is the BIOS parameter block
(BPB) in bytes 0BH through 17H. (Additional fields are present in MS-DOS
versions 3.0 and later.) This data structure describes the physical disk
characteristics and allows the device driver to calculate the proper
physical disk address for a given logical-sector number; it also contains
information that is used by MS-DOS and various system utilities to
calculate the address and size of each of the disk control areas (file
allocation tables and root directory).
The final element of the boot sector is the disk bootstrap routine. The
disk bootstrap is usually read into memory by the ROM bootstrap, which is
executed automatically when the computer is turned on. The ROM bootstrap
is usually just smart enough to home the head of the disk drive (move it
to track 0), read the first physical sector into RAM at a predetermined
location, and jump to it. The disk bootstrap is more sophisticated. It
calculates the physical disk address of the beginning of the files area,
reads the files containing the operating system into memory, and transfers
control to the BIOS module at location 0070:0000H. (See Chapter 2.)
Figures 10-3 and 10-4 show a partial hex dump and disassembly of a
PC-DOS 3.3 floppy-disk boot sector.
──────────────────────────────────────────────────────────────────────────
0 1 2 3 4 5 6 7 8 9 A B C D E F
0000 EB 34 90 49 42 4D 20 20 33 2E 33 00 02 02 01 00 .4.IBM 3.3.....
0010 02 70 00 D0 02 FD 02 00 09 00 02 00 00 00 00 00 .p..............
0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 12 ................
0030 00 00 00 00 01 00 FA 33 C0 8E D0 BC 00 7C 16 07 .......3.....|..
.
.
.
01C0 0D 0A 44 69 73 6B 20 42 6F 6F 74 20 66 61 69 6C ..Disk Boot fail
01D0 75 72 65 0D 0A 00 49 42 4D 42 49 4F 20 20 43 4F ure...IBMBIO CO
01E0 4D 49 42 4D 44 4F 53 20 20 43 4F 4D 00 00 00 00 MIBMDOS COM....
01F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 AA ..............U.
──────────────────────────────────────────────────────────────────────────
Figure 10-3. Partial hex dump of the boot sector (track 0, head 0, sector
1) of a PC-DOS version 3.3 floppy disk. This sector contains the OEM
identification, a copy of the BIOS parameter block describing the medium,
and the bootstrap routine that reads the BIOS into memory and transfers
control to it. See also Figures 10-2 and 10-4.
──────────────────────────────────────────────────────────────────────────
jmp $+54 ; jump to bootstrap
nop
db 'IBM 3.3' ; OEM identification
; BIOS parameter block
dw 512 ; bytes per sector
db 2 ; sectors per cluster
dw 1 ; reserved sectors
db 2 ; number of FATs
dw 112 ; root directory entries
dw 720 ; total sectors
db 0fdh ; media descriptor byte
dw 2 ; sectors per FAT
dw 9 ; sectors per track
dw 2 ; number of heads
dd 0 ; hidden sectors
.
.
.
──────────────────────────────────────────────────────────────────────────
Figure 10-4. Partial disassembly of the boot sector shown in Figure
10-3.
The Reserved Area
The boot sector is actually part of a reserved area that can span from one
to several sectors. The reserved-sectors word in the BPB, at offset 0EH in
the boot sector, describes the size of this area. Remember that the number
in the BPB field includes the boot sector itself, so if the value is 1 (as
it is on IBM PC floppy disks), the length of the reserved area is actually
0 sectors.
The File Allocation Table
When a file is created or extended, MS-DOS assigns it groups of disk
sectors from the files area in powers of 2. These are known as allocation
units or clusters. The number of sectors per cluster for a given medium is
defined in the BPB and can be found at offset 0DH in the disk's boot
sector. Below are some example cluster sizes:
Disk type Power of 2 Sectors/cluster
──────────────────────────────────────────────────────────────────────────
5.25" 180 KB floppy disk 0 1
5.25" 360 KB floppy disk 1 2
PC/AT fixed disk 2 4
PC/XT fixed disk 3 8
──────────────────────────────────────────────────────────────────────────
The file allocation table (FAT) is divided into fields that correspond
directly to the assignable clusters on the disk. These fields are 12 bits
in MS-DOS versions 1 and 2 and may be either 12 bits or 16 bits in
versions 3.0 and later, depending on the size of the medium (12 bits if
the disk contains fewer than 4087 clusters, 16 bits otherwise).
The first two fields in the FAT are always reserved. On IBM-compatible
media, the first 8 bits of the first reserved FAT entry contain a copy of
the media descriptor byte, which is also found in the BPB in the boot
sector. The second, third, and (if applicable) fourth bytes, which
constitute the remainder of the first two reserved FAT fields, always
contain 0FFH. The currently defined IBM-format media descriptor bytes are
as follows:
MS-DOS version
where first
Descriptor Medium supported
──────────────────────────────────────────────────────────────────────────
0F0H 3.5" floppy disk, 2-sided, 18-sector 3.3
0F8H Fixed disk 2.0
0F9H 5.25" floppy disk, 2-sided, 15-sector 3.0
3.5" floppy disk, 2-sided, 9-sector 3.2
0FCH 5.25" floppy disk, 1-sided, 9-sector 2.0
0FDH 5.25" floppy disk, 2-sided, 9-sector 2.0
8" floppy disk, 1-sided, single-density
0FEH 5.25" floppy disk, 1-sided, 8-sector 1.0
8" floppy disk, 1-sided, single-density
8" floppy disk, 2-sided, double-density
0FFH 5.25" floppy disk, 2-sided, 8-sector 1.1
──────────────────────────────────────────────────────────────────────────
The remainder of the FAT entries describe the use of their corresponding
disk clusters. The contents of the FAT fields are interpreted as follows:
Value Meaning
──────────────────────────────────────────────────────────────────────────
(0)000H Cluster available
(F)FF0─(F)FF6H Reserved cluster
(F)FF7H Bad cluster, if not part of chain
(F)FF8─(F)FFFH Last cluster of file
(X)XXX Next cluster in file
──────────────────────────────────────────────────────────────────────────
Each file's entry in a directory contains the number of the first cluster
assigned to that file, which is used as an entry point into the FAT. From
the entry point on, each FAT slot contains the cluster number of the next
cluster in the file, until a last-cluster mark is encountered.
At the computer manufacturer's option, MS-DOS can maintain two or more
identical copies of the FAT on each volume. MS-DOS updates all copies
simultaneously whenever files are extended or the directory is modified.
If access to a sector in a FAT fails due to a read error, MS-DOS tries the
other copies until a successful disk read is obtained or all copies are
exhausted. Thus, if one copy of the FAT becomes unreadable due to wear or
a software accident, the other copies may still make it possible to
salvage the files on the disk. As part of its procedure for checking the
integrity of a disk, the CHKDSK program compares the multiple copies
(usually two) of the FAT to make sure they are all readable and
consistent.
The Root Directory
Following the file allocation tables is an area known in MS-DOS versions
2.0 and later as the root directory. (Under MS-DOS version 1, it was the
only directory on the disk.) The root directory contains 32-byte entries
that describe files, other directories, and the optional volume label
(Figure 10-5). An entry beginning with the byte value E5H is available
for reuse; it represents a file or directory that has been erased. An
entry beginning with a null (zero) byte is the logical end-of-directory;
that entry and all subsequent entries have never been used.
00H ┌──────────────────────────────┐
│ Filename │ Note 1
08H ├──────────────────────────────┤
│ Extension │
0BH ├──────────────────────────────┤
│ File attribute │ Note 2
0CH ├──────────────────────────────┤
│ Reserved │
16H ├──────────────────────────────┤
│ Time created or last updated │ Note 3
18H ├──────────────────────────────┤
│ Date created or last updated │ Note 4
1AH ├──────────────────────────────┤
│ Starting cluster │
1CH ├──────────────────────────────┤
│ File size, 4 bytes │ Note 5
20H └──────────────────────────────┘
Figure 10-5. Format of a single entry in a disk directory. Total length
is 32 bytes (20H bytes).
──────────────────────────────────────────────────────────────────────────
Notes for Figure 10-5
1. The first byte of the filename field of a directory entry may
contain the following special information:
Value Meaning
────────────────────────────────────────────────────────────────────────
00H Directory entry has never been used; end of occupied
portion of directory.
05H First character of filename is actually E5H.
2EH Entry is an alias for the current or parent directory.
If the next byte is also 2EH, the cluster field
contains the cluster number of the parent directory
(zero if the parent directory is the root directory).
E5H File has been erased.
────────────────────────────────────────────────────────────────────────
2. The attribute byte of the directory entry is mapped as follows:
Bit Meaning
────────────────────────────────────────────────────────────────────────
0 Read-only; attempts to open file for write or to
delete file will fail.
1 Hidden file; excluded from normal searches.
2 System file; excluded from normal searches.
3 Volume label; can exist only in root directory.
4 Directory; excluded from normal searches.
5 Archive bit; set whenever file is modified.
6 Reserved.
7 Reserved.
────────────────────────────────────────────────────────────────────────
3. The time field is encoded as follows:
Bits Contents
────────────────────────────────────────────────────────────────────────
00H─04H Binary number of 2-second increments (0─29,
corresponding to 0─58 seconds)
05H─0AH Binary number of minutes (0─59)
0BH─0FH Binary number of hours (0─23)
────────────────────────────────────────────────────────────────────────
4. The date field is encoded as follows:
Bits Contents
────────────────────────────────────────────────────────────────────────
00H─04H Day of month (1─31)
05H─08H Month (1─12)
09H─0FH Year (relative to 1980)
────────────────────────────────────────────────────────────────────────
5. The file-size field is interpreted as a 4-byte integer, with the
low-order 2 bytes of the number stored first.
──────────────────────────────────────────────────────────────────────────
The root directory has a number of special properties. Its size and
position are fixed and are determined by the FORMAT program when a disk is
initialized. This information can be obtained from the boot sector's BPB.
If the disk is bootable, the first two entries in the root directory
always describe the files containing the MS-DOS BIOS and the MS-DOS
kernel. The disk bootstrap routine uses these entries to bring the
operating system into memory and start it up.
Figure 10-6 shows a partial hex dump of the first sector of the root
directory on a bootable PC-DOS 3.3 floppy disk.
──────────────────────────────────────────────────────────────────────────
0 1 2 3 4 5 6 7 8 9 A B C D E F
0000 49 42 4D 42 49 4F 20 20 43 4F 4D 27 00 00 00 00 IBMBIO COM'....
0010 00 00 00 00 00 00 00 60 72 0E 02 00 54 56 00 00 .......'r...TV..
0020 49 42 4D 44 4F 53 20 20 43 4F 4D 27 00 00 00 00 IBMDOS COM'....
0030 00 00 00 00 00 00 00 60 71 0E 18 00 CF 75 00 00 .......'q....u..
0040 43 4F 4D 4D 41 4E 44 20 43 4F 4D 20 00 00 00 00 COMMAND COM ....
0050 00 00 00 00 00 00 00 60 71 0E 36 00 DB 62 00 00 .......'q.6..b..
0060 42 4F 4F 54 44 49 53 4B 20 20 20 28 00 00 00 00 BOOTDISK (....
0070 00 00 00 00 00 00 A1 00 21 00 00 00 00 00 00 00 ........!.......
0080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
.
.
.
──────────────────────────────────────────────────────────────────────────
Figure 10-6. Partial hex dump of the first sector of the root directory
for a PC-DOS 3.3 disk containing the three system files and a volume
label.
The Files Area
The remainder of the volume after the root directory is known as the files
area. MS-DOS views the sectors in this area as a pool of clusters, each
containing one or more logical sectors, depending on the disk format. Each
cluster has a corresponding entry in the FAT that describes its current
use: available, reserved, assigned to a file, or unusable (because of
defects in the medium). Because the first two fields of the FAT are
reserved, the first cluster in the files area is assigned the number 2.
When a file is extended under versions 1 and 2, MS-DOS searches the FAT
from the beginning until it finds a free cluster (designated by a zero FAT
field); it then changes that FAT field to a last-cluster mark and updates
the previous last cluster of the file's chain to point to the new last
cluster. Under versions 3.0 and later, however, MS-DOS searches the FAT
from the most recently allocated cluster; this reduces file fragmentation
and improves overall access times.
Directories other than the root directory are simply a special type of
file. Their storage is allocated from the files area, and their contents
are 32-byte entries──in the same format as those used in the root
directory──that describe files or other directories. Directory entries
that describe other directories contain an attribute byte with bit 4 set,
zero in the file-length field, and the date and time that the directory
was created (Figure 10-7). The first cluster field points, of course, to
the first cluster in the files area that belongs to the directory. (The
directory's other clusters can be found only by tracing through the FAT.)
All directories except the root directory contain two special directory
entries with the names . and ... MS-DOS puts these entries in place when
it creates a directory, and they cannot be deleted. The . entry is an
alias for the current directory; its cluster field points to the cluster
in which it is found. The .. entry is an alias for the directory's parent
(the directory immediately above it in the tree structure); its cluster
field points to the first cluster of the parent directory. If the parent
is the root directory, the cluster field of the .. entry contains zero
(Figure 10-8).
──────────────────────────────────────────────────────────────────────────
.
.
.
0080 4D 59 44 49 52 20 20 20 20 20 20 10 00 00 00 00 MYDIR .....
0090 00 00 00 00 00 00 87 9A 9B 0A 2A 00 00 00 00 00 ..........*.....
.
.
.
──────────────────────────────────────────────────────────────────────────
Figure 10-7. Extract from the root directory of an MS-DOS disk, showing
the entry for a subdirectory named MYDIR. Bit 4 in the attribute byte is
set, the cluster field points to the first cluster of the subdirectory
file, the date and time stamps are valid, but the file length is zero.
──────────────────────────────────────────────────────────────────────────
0 1 2 3 4 5 6 7 8 9 A B C D E F
0000 2E 20 20 20 20 20 20 20 20 20 20 10 00 00 00 00 . .....
0010 00 00 00 00 00 00 87 9A 9B 0A 2A 00 00 00 00 00 ..........*.....
0020 2E 2E 20 20 20 20 20 20 20 20 20 10 00 00 00 00 .. .....
0030 00 00 00 00 00 00 87 9A 9B 0A 00 00 00 00 00 00 ................
0040 4D 59 46 49 4C 45 20 20 44 41 54 20 00 00 00 00 MYFILE DAT ....
0050 00 00 00 00 00 00 98 9A 9B 0A 2B 00 15 00 00 00 ..........+.....
0060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
.
.
.
──────────────────────────────────────────────────────────────────────────
Figure 10-8. Hex dump of the first block of the directory MYDIR. Note the
. and .. entries. This directory contains exactly one file, MYFILE.DAT.
Interpreting the File Allocation Table
Now that we understand how the disk is structured, let's see how we can
use this knowledge to find a FAT position from a cluster number.
If the FAT has 12-bit entries, use the following procedure:
1. Use the directory entry to find the starting cluster of the file in
question.
2. Multiply the cluster number by 1.5.
3. Use the integral part of the product as the offset into the FAT and
move the word at that offset into a register. Remember that a FAT
position can span a physical disk-sector boundary.
4. If the product is a whole number, AND the register with 0FFFH.
5. Otherwise, "logical shift" the register right 4 bits.
6. If the result is a value from 0FF8H through 0FFFH, the file has no
more clusters. Otherwise, the result is the number of the next cluster
in the file.
On disks with at least 4087 clusters formatted under MS-DOS version 3.0 or
later, the FAT entries use 16 bits, and the extraction of a cluster number
from the table is much simpler:
1. Use the directory entry to find the starting cluster of the file in
question.
2. Multiply the cluster number by 2.
3. Use the product as the offset into the FAT and move the word at that
offset into a register.
4. If the result is a value from 0FFF8H through 0FFFFH, the file has no
more clusters. Otherwise, the result is the number of the next cluster
in the file.
To convert cluster numbers to logical sectors, subtract 2, multiply the
result by the number of sectors per cluster, and add the logical-sector
number of the beginning of the data area (this can be calculated from the
information in the BPB).
As an example, let's work out the disk location of the file IBMBIO.COM,
which is the first entry in the directory shown in Figure 10-6. First, we
need some information from the BPB, which is in the boot sector of the
medium. (See Figures 10-3 and 10-4.) The BPB tells us that there are
■ 512 bytes per sector
■ 2 sectors per cluster
■ 2 sectors per FAT
■ 2 FATs
■ 112 entries in the root directory
From the BPB information, we can calculate the starting logical-sector
number of each of the disk's control areas and the files area by
constructing a table, as follows:
Length Sector
Area (sectors) numbers
──────────────────────────────────────────────────────────────────────────
Boot sector 1 00H
2 FATs * 2 sectors/FAT 4 01H─04H
112 directory entries 7 05H─0BH
*32 bytes/entry
/512 bytes/sector
Total sectors occupied by bootstrap, FATs, and 12
root directory
──────────────────────────────────────────────────────────────────────────
Therefore, the first sector of the files area is 12 (0CH).
The word at offset 01AH in the directory entry for IBMBIO.COM gives us the
starting cluster number for that file: cluster 2. To find the
logical-sector number of the first block in the file, we can follow the
procedure given earlier:
1. Cluster number - 2 = 2 - 2 = 0.
2. Multiply by sectors per cluster = 0 * 2 = 0.
3. Add logical-sector number of start of the files area = 0 + 0CH = 0CH.
So the calculated sector number of the beginning of the file IBMBIO.COM is
0CH, which is exactly what we expect knowing that the FORMAT program
always places the system files in contiguous sectors at the beginning of
the data area.
Now let's trace IBMBIO.COM's chain through the file allocation table
(Figures 10-9 and 10-10). This will be a little tedious, but a detailed
understanding of the process is crucial. In an actual program, we would
first read the boot sector using Int 25H, then calculate the address of
the FAT from the contents of the BPB, and finally read the FAT into
memory, again using Int 25H.
From IBMBIO.COM's directory entry, we already know that the first cluster
in the file is cluster 2. To examine that cluster's entry in the FAT, we
multiply the cluster number by 1.5, which gives 0003H as the FAT offset,
and fetch the word at that offset (which contains 4003H). Because the
product of the cluster and 1.5 is a whole number, we AND the word from the
FAT with 0FFFH, yielding the number 3, which is the number of the second
cluster assigned to the file.
──────────────────────────────────────────────────────────────────────────
0 1 2 3 4 5 6 7 8 9 A B C D E F
0000 FD FF FF 03 40 00 05 60 00 07 80 00 09 A0 00 0B ....@..'........
0010 C0 00 0D E0 00 0F 00 01 11 20 01 13 40 01 15 60 ......... ..@..'
0020 01 17 F0 FF 19 A0 01 1B C0 01 1D E0 01 1F 00 02 ................
0030 21 20 02 23 40 02 25 60 02 27 80 02 29 A0 02 2B ! .#@.%'.'..)..+
.
.
.
──────────────────────────────────────────────────────────────────────────
Figure 10-9. Hex dump of the first block of the file allocation table
(track 0, head 0, sector 2) for the PC-DOS 3.3 disk whose root directory
is shown in Figure 10-6. Notice that the first byte of the FAT contains
the media descriptor byte for a 5.25-inch, 2-sided, 9-sector floppy disk.
──────────────────────────────────────────────────────────────────────────
getfat proc near ; extracts the FAT field
; for a given cluster
; call AX = cluster #
; DS:BX = addr of FAT
; returns AX = FAT field
; other registers unchanged
push bx ; save affected registers
push cx
mov cx,ax
shl ax,1 ; cluster * 2
add ax,cx ; cluster * 3
test ax,1
pushf ; save remainder in Z flag
shr ax,1 ; cluster * 1.5
add bx,ax
mov ax,[bx]
popf ; was cluster * 1.5 whole number?
jnz getfat1 ; no, jump
and ax,0fffh ; yes, isolate bottom 12 bits
jmp getfat2
getfat1: mov cx,4 ; shift word right 4 bits
shr ax,cx
getfat2: pop cx ; restore registers and exit
pop bx
ret
getfat endp
──────────────────────────────────────────────────────────────────────────
Figure 10-10. Assembly-language procedure to access the file allocation
table (assumes 12-bit FAT fields). Given a cluster number, the procedure
returns the contents of that cluster's FAT entry in the AX register. This
simple example ignores the fact that FAT entries can span sector
boundaries.
To examine cluster 3's entry in the FAT, we multiply 3 by 1.5, which gives
4.5, and fetch the word at offset 0004H (which contains 0040H). Because
the product of 3 and 1.5 is not a whole number, we shift the word right
4 bits, yielding the number 4, which is the number of the third cluster
assigned to IBMBIO.COM.
In this manner, we can follow the chain through the FAT until we come to a
cluster (number 23, in this case) whose FAT entry contains the value
0FFFH, which is an end-of-file marker in FATs with 12-bit entries.
We have now established that the file IBMBIO.COM contains clusters 2
through 23 (02H─17H), from which we can calculate that logical sectors 0CH
through 38H are assigned to the file. Of course, the last cluster may be
only partially filled with actual data; the portion of the last cluster
used is the remainder of the file's size in bytes (found in the directory
entry) divided by the bytes per cluster.
Fixed-Disk Partitions
Fixed disks have another layer of organization beyond the logical volume
structure already discussed: partitions. The FDISK utility divides a fixed
disk into one or more partitions consisting of an integral number of
cylinders. Each partition can contain an independent file system and, for
that matter, its own copy of an operating system.
The first physical sector on a fixed disk (track 0, head 0, sector 1)
contains the master boot record, which is laid out as follows:
Bytes Contents
──────────────────────────────────────────────────────────────────────────
000─1BDH Reserved
1BE─1CDH Partition #1 descriptor
1CE─1DDH Partition #2 descriptor
1DE─1EDH Partition #3 descriptor
1EE─1FDH Partition #4 descriptor
1FE─1FFH Signature word (AA55H)
──────────────────────────────────────────────────────────────────────────
The partition descriptors in the master boot record define the size,
location, and type of each partition, as follows:
Byte(s) Contents
──────────────────────────────────────────────────────────────────────────
00H Active flag (0 = not bootable, 80H = bootable)
01H Starting head
02H─03H Starting cylinder/sector
04H Partition type
00H not used
01H FAT file system, 12-bit FAT entries
04H FAT file system, 16-bit FAT entries
05H extended partition
06H "huge partition" (MS-DOS versions 4.0 and later)
05H Ending head
06H─07H Ending cylinder/sector
08H─0BH Starting sector for partition, relative to beginning of
disk
0CH─0FH Partition length in sectorsThe active flag, which
indicates that the partition is bootable, can be set on
only one partition at a time.
──────────────────────────────────────────────────────────────────────────
MS-DOS treats partition types 1, 4, and 6 as normal logical volumes and
assigns them their own drive identifiers during the system boot process.
Partition type 5 can contain multiple logical volumes and has a special
extended boot record that describes each volume. The FORMAT utility
initializes MS-DOS fixed-disk partitions, creating the file system within
the partition (boot record, file allocation table, root directory, and
files area) and optionally placing a bootable copy of the operating system
in the file system.
Figure 10-11 contains a partial hex dump of a master block from a fixed
disk formatted under PC-DOS version 3.3. This dump illustrates the
partition descriptors for a normal partition with a 16-bit FAT and an
extended partition.
──────────────────────────────────────────────────────────────────────────
0000 .
.
.
0180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 01
01C0 01 00 04 04 D1 02 11 00 00 00 EE FF 00 00 00 00
01D0 C1 04 05 04 D1 FD 54 00 01 00 02 53 00 00 00 00
01E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
01F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 AA
──────────────────────────────────────────────────────────────────────────
Figure 10-11. A partial hex dump of a master block from a fixed disk
formatted under PC-DOS version 3.3. This disk contains two partitions. The
first partition has a 16-bit FAT and is marked "active" to indicate that
it contains a bootable copy of PC-DOS. The second partition is an
"extended" partition. The third and fourth partition entries are not used
in this example.
────────────────────────────────────────────────────────────────────────────
Chapter 11 Memory Management
Current versions of MS-DOS can manage as much as 1 megabyte of contiguous
random-access memory. On IBM PCs and compatibles, the memory occupied by
MS-DOS and other programs starts at address 0000H and may reach as high as
address 09FFFFH; this 640 KB area of RAM is sometimes referred to as
conventional memory. Memory above this address is reserved for ROM
hardware drivers, video refresh buffers, and the like. Computers that are
not IBM compatible may use other memory layouts.
The RAM area under the control of MS-DOS is divided into two major
sections:
■ The operating-system area
■ The transient-program area
The operating-system area starts at address 0000H──that is, it occupies
the lowest portion of RAM. It holds the interrupt vector table, the
operating system proper and its tables and buffers, any additional
installable drivers specified in the CONFIG.SYS file, and the resident
part of the COMMAND.COM command interpreter. The amount of memory occupied
by the operating-system area varies with the version of MS-DOS used, the
number of disk buffers, the size of installed device drivers, and so
forth.
The transient-program area (TPA), sometimes called the memory arena, is
the remainder of memory above the operating-system area. The memory arena
is dynamically allocated in blocks called arena entries. Each arena entry
has a special control structure called an arena header, and all of the
arena headers are chained together. Three MS-DOS Int 21H functions allow
programs to allocate, resize, and release blocks of memory from the TPA:
Function Action
──────────────────────────────────────────────────────────────────────────
48H Allocate memory block.
49H Release memory block.
4AH Resize memory block.
──────────────────────────────────────────────────────────────────────────
MS-DOS itself uses these functions when loading a program from disk at the
request of COMMAND.COM or another program. The EXEC function, which is the
MS-DOS program loader, calls Int 21H Function 48H to allocate a memory
block for the loaded program's environment and another for the program
itself and its program segment prefix. It then reads the program from the
disk into the assigned memory area. When the program terminates, MS-DOS
calls Int 21H Function 49H to release all memory owned by the program.
Transient programs can also employ the MS-DOS memory-management functions
to dynamically manage the memory available in the TPA. Proper use of these
functions is one of the most important criteria of whether a program is
well behaved under MS-DOS. Well-behaved programs are most likely to be
portable to future versions of the operating system and least likely to
cause interference with other processes under multitasking user interfaces
such as Microsoft Windows.
Using the Memory-Allocation Functions
The memory-allocation functions have two common uses:
■ To shrink a program's initial memory allocation so that there is enough
room to load and execute another program under its control.
■ To dynamically allocate additional memory required by the program and
to release the same memory when it is no longer needed.
Shrinking the Initial Memory Allocation
Although many MS-DOS application programs simply assume they own all
memory, this assumption is a relic of MS-DOS version 1 (and CP/M), which
could support only one active process at a time. Well-behaved MS-DOS
programs take pains to modify only memory that they actually own and to
release any memory that they don't need.
Unfortunately, under current versions of MS-DOS, the amount of memory that
a program will own is not easily predicted in advance. It turns out that
the amount of memory allocated to a program when it is first loaded
depends upon two factors:
■ The type of file the program is loaded from
■ The amount of memory available in the TPA
MS-DOS always allocates all of the largest available memory block in the
TPA to programs loaded from .COM (memory-image) files. Because .COM
programs contain no file header that can pass segment and memory-use
information to MS-DOS, MS-DOS simply assumes the worst case and gives such
a program everything. MS-DOS will load the program as long as there is an
available memory block as large as the size of the file plus 256 bytes for
the PSP and 2 bytes for the stack. The .COM program, when it receives
control, must determine whether enough memory is available to carry out
its functions.
MS-DOS uses more complicated rules to allocate memory to programs loaded
from .EXE files. First, of course, a memory block large enough to hold the
declared code, data, and stack segments must be available in the TPA. In
addition, the linker sets two fields in a .EXE file's header to inform
MS-DOS about the program's memory requirements. The first field,
MIN_ALLOC, defines the minimum number of paragraphs required by the
program, in addition to those for the code, data, and stack segments. The
second, MAX_ALLOC, defines the maximum number of paragraphs of additional
memory the program would use if they were available.
When loading a .EXE file, MS-DOS first attempts to allocate the number of
paragraphs in MAX_ALLOC plus the number of paragraphs required by the
program itself. If that much memory is not available, MS-DOS assigns all
of the largest available block to the program, provided that this is at
least the amount specified by MIN_ALLOC plus the size of the program
image. If that condition is not satisfied, the program cannot be executed.
After a .COM or .EXE program is loaded and running, it can use Int 21H
Function 4AH (Resize Memory Block) to release all the memory it does not
immediately need. This is conveniently done right after the program
receives control from MS-DOS, by calling the resize function with the
segment of the program's PSP in the ES register and the number of
paragraphs that the program requires to run in the BX register (Figure
11-1).
──────────────────────────────────────────────────────────────────────────
.
.
.
org 100h
main proc near ; entry point from MS-DOS
; DS, ES = PSP address
mov sp,offset stk ; COM program must move
; stack to safe area
; release extra memory...
mov ah,4ah ; function 4Ah =
; resize memory block
; BX = paragraphs to keep
mov bx,(offset stk - offset main + 10FH) / 16
int 21h ; transfer to MS-DOS
jc error ; jump if resize failed
.
.
.
main endp
.
.
.
dw 64 dup (?) ; new stack area
stk equ $ ; new base of stack
end main ; defines entry point
──────────────────────────────────────────────────────────────────────────
Figure 11-1. An example of a .COM program releasing excess memory after
it receives control from MS-DOS. Int 21H Function 4AH is called with ES
pointing to the program's PSP and BX containing the number of paragraphs
that the program needs to execute. In this case, the new size for the
program's memory block is calculated as the program image size plus the
size of the PSP (256 bytes), rounded up to the next paragraph. .EXE
programs use similar code.
Dynamic Allocation of Additional Memory
When a well-behaved program needs additional memory space──for an I/O
buffer or an array of intermediate results, for example──it can call Int
21H Function 48H (Allocate Memory Block) with the desired number of
paragraphs. If a sufficiently large block of unallocated memory is
available, MS-DOS returns the segment address of the base of the assigned
area and clears the carry flag (0), indicating that the function was
successful.
If no unallocated block of sufficient size is available, MS-DOS sets the
carry flag (1), returns an error code in the AX register, and returns the
size (in paragraphs) of the largest block available in the BX register
(Figure 11-2). In this case, no memory has yet been allocated. The
program can use the value returned in the BX register to determine whether
it can continue in a "degraded" fashion, with less memory. If it can, it
must call Int 21H Function 48H again to allocate the smaller memory
block.
When the MS-DOS memory manager is searching the chain of arena headers to
satisfy a memory-allocation request, it can use one of the following
strategies:
■ First fit: Use the arena entry at the lowest address that is large
enough to satisfy the request.
■ Best fit: Use the smallest arena entry that will satisfy the request,
regardless of its location.
■ Last fit: Use the arena entry at the highest address that is large
enough to satisfy the request.
──────────────────────────────────────────────────────────────────────────
.
.
.
mov ah,48h ; function 48h = allocate mem bl
mov bx,0800h ; 800h paragraphs = 32 KB
int 21h ; transfer to MS-DOS
jc error ; jump if allocation failed
mov buff_seg,ax ; save segment of allocated bloc
.
.
.
mov es,buff_seg ; ES:DI = address of block
xor di,di
mov cx,08000h ; store 32,768 bytes
mov al,0ffh ; fill buffer with -1s
cld
rep stosb ; now perform fast fill
.
.
.
mov cx,08000h ; length to write, bytes
mov bx,handle ; handle for prev opened file
push ds ; save our data segment
mov ds,buff_seg ; let DS:DX = buffer address
mov dx,0
mov ah,40h ; function 40h = write
int 21h ; transfer to MS-DOS
pop ds ; restore our data segment
jc error ; jump if write failed
.
.
.
mov es,buff_seg ; ES = seg of prev allocated blo
mov ah,49h ; function 49h = release mem blo
int 21h ; transfer to MS-DOS
jc error ; jump if release failed
.
error: .
.
handle dw 0 ; file handle
buff_seg dw 0 ; segment of allocated block
.
.
.
──────────────────────────────────────────────────────────────────────────
Figure 11-2. Example of dynamic memory allocation. The program requests a
32 KB memory block from MS-DOS, fills it with -1s, writes it to disk, and
then releases it.
If the arena entry selected is larger than the size requested, MS-DOS
divides it into two parts: one block of the size requested, which is
assigned to the program that called Int 21H Function 48H, and an unowned
block containing the remaining memory.
The default MS-DOS allocation strategy is first fit. However, under MS-DOS
versions 3.0 and later, an application program can change the strategy
with Int 21H Function 58H.
When a program is through with an allocated memory block, it should use
Int 21H Function 49H to release the block. If it does not, MS-DOS will
automatically release all memory allocations for the program when it
terminates.
Arena Headers
Microsoft has not officially documented the internal structure of arena
headers for the outside world at present. This is probably to deter
programmers from trying to manipulate their memory allocations directly
instead of through the MS-DOS functions provided for that purpose.
Arena headers have identical structures in MS-DOS versions 2 and 3. They
are 16 bytes (one paragraph) and are located immediately before the memory
area that they control (Figure 11-3). An arena header contains the
following information:
■ A byte signifying whether the header is a member or the last entry in
the entire chain of such headers
■ A word indicating whether the area it controls is available or whether
it already belongs to a program (if the latter, the word points to the
program's PSP)
■ A word indicating the size (in paragraphs) of the controlled memory
area (arena entry)
MS-DOS inspects the chain of arena headers whenever the program requests a
memory-block allocation, modification, or release function, or when a
program is EXEC'd or terminated. If any of the blocks appear to be
corrupted or if the chain is broken, MS-DOS displays the dreaded message
Memory allocation error
and halts the system.
In the example illustrated in Figure 11-3, COMMAND.COM originally loaded
PROGRAM1.COM into the TPA and, because it was a .COM file, COMMAND.COM
allocated it all of the TPA, controlled by arena header #1. PROGRAM1.COM
then used Int 21H Function 4AH (Resize Memory Block) to shrink its memory
allocation to the amount it actually needed to run and loaded and executed
PROGRAM2.EXE with the EXEC function (Int 21H Function 4BH). The EXEC
function obtained a suitable amount of memory, controlled by arena header
#2, and loaded PROGRAM2.EXE into it. PROGRAM2.EXE, in turn, needed some
additional memory to store some intermediate results, so it called Int 21H
Function 48H (Allocate Memory Block) to obtain the area controlled by
arena header #3. The highest arena header (#4) controls all of the
remaining TPA that has not been allocated to any program.
┌─────────────────────────────────────────────────┐◄ Top of RAM
│ Unowned RAM controlled by header #4 │ controlled by MS-DOS
├─────────────────────────────────────────────────┤
│ Arena header #4 │
├─────────────────────────────────────────────────┤
│ Memory area controlled by header #3; additional │
│ storage dynamically allocated by PROGRAM2.EXE │
├─────────────────────────────────────────────────┤
│ Arena header #3 │
├─────────────────────────────────────────────────┤
│ Memory area controlled by header #2, │
│ containing PROGRAM2.EXE │
├─────────────────────────────────────────────────┤
│ Arena header #2 │
├─────────────────────────────────────────────────┤
│ Memory area controlled by header #1, │
│ containing PROGRAM1.COM │
├─────────────────────────────────────────────────┤
│ Arena header #1 │
└─────────────────────────────────────────────────┘◄ Bottom of transient-
program area
Figure 11-3. An example diagram of MS-DOS arena headers and the
transient-program area. The environment blocks and their associated
headers have been omitted from this figure to increase its clarity.
Lotus/Intel/Microsoft Expanded Memory
When the IBM Personal Computer and MS-DOS were first released, the 640 KB
limit that IBM placed on the amount of RAM that could be directly managed
by MS-DOS seemed almost unimaginably huge. But as MS-DOS has grown in both
size and capabilities and the popular applications have become more
powerful, that 640 KB has begun to seem a bit crowded. Although personal
computers based on the 80286 and 80386 have the potential to manage up to
16 megabytes of RAM under operating systems such as MS OS/2 and XENIX,
this is little comfort to the millions of users of 8086/8088-based
computers and MS-DOS.
At the spring COMDEX in 1985, Lotus Development Corporation and Intel
Corporation jointly announced the Expanded Memory Specification 3.0 (EMS),
which was designed to head off rapid obsolescence of the older PCs because
of limited memory. Shortly afterward, Microsoft announced that it would
support the EMS and would enhance Microsoft Windows to use the memory made
available by EMS hardware and software. EMS versions 3.2 and 4.0, released
in fall 1985 and summer 1987, expanded support for multitasking operating
systems.
The LIM EMS (as it is usually known) has been an enormous success. EMS
memory boards are available from scores of manufacturers, and "EMS-aware"
software──especially spreadsheets, disk caches, and terminate-and-stay-
resident utilities──has become the rule rather than the exception.
What Is Expanded Memory?
The Lotus/Intel/Microsoft Expanded Memory Specification is a functional
definition of a bank-switched memory-expansion subsystem. It consists of
hardware expansion modules and a resident driver program specific to those
modules. In EMS versions 3.0 and 3.2, the expanded memory is made
available to application software as 16 KB pages mapped into a contiguous
64 KB area called the page frame, somewhere above the main memory area
used by MS-DOS/PC-DOS (0─640 KB). The exact location of the page frame is
user configurable, so it need not conflict with other hardware options. In
EMS version 4.0, the pages may be mapped anywhere in memory and can have
sizes other than 16 KB.
The EMS provides a uniform means for applications to access as much as 8
megabytes of memory (32 megabytes in EMS 4.0). The supporting software,
which is called the Expanded Memory Manager (EMM), provides a
hardware-independent interface between application software and the
expanded memory board(s). The EMM is supplied in the form of an
installable device driver that you link into the MS-DOS/PC-DOS system by
adding a line to the CONFIG.SYS file on the system boot disk.
Internally, the Expanded Memory Manager consists of two major portions,
which may be referred to as the driver and the manager. The driver portion
mimics some of the actions of a genuine installable device driver, in that
it includes initialization and output status functions and a valid device
header. The second, and major, portion of the EMM is the true interface
between application software and the expanded-memory hardware. Several
classes of services are provided:
■ Verification of functionality of hardware and software modules
■ Allocation of expanded-memory pages
■ Mapping of logical pages into the physical page frame
■ Deallocation of expanded-memory pages
■ Support for multitasking operating systems
Application programs communicate with the EMM directly, by means of
software Int 67H. MS-DOS versions 3.3 and earlier take no part in (and in
fact are completely oblivious to) any expanded-memory manipulations that
may occur. MS-DOS version 4.0 and Microsoft Windows, on the other hand,
are "EMS-aware" and can use the EMS memory when it is available.
Expanded memory should not be confused with extended memory. Extended
memory is the term used by IBM to refer to the memory at physical
addresses above 1 megabyte that can be accessed by an 80286 or 80386 CPU
in protected mode. Current versions of MS-DOS run the 80286 and 80386 in
real mode (8086-emulation mode), and extended memory is therefore not
directly accessible.
Checking for Expanded Memory
An application program can use either of two methods to test for the
existence of the Expanded Memory Manager:
■ Issue an open request (Int 21H Function 3DH) using the guaranteed
device name of the EMM driver: EMMXXXX0. If the open function succeeds,
either the driver is present or a file with the same name
coincidentally exists on the default disk drive. To rule out the
latter, the application can use IOCTL (Int 21H Function 44H)
subfunctions 00H and 07H to ensure that EMM is present. In either case,
the application should then use Int 21H Function 3EH to close the
handle that was obtained from the open function, so that the handle can
be reused for another file or device.
■ Use the address that is found in the Int 67H vector to inspect the
device header of the presumed EMM. Interrupt handlers and device
drivers must use this method. If the EMM is present, the name field at
offset 0AH of the device header contains the string EMMXXXX0. This
approach is nearly foolproof and avoids the relatively high overhead of
an MS-DOS open function. However, it is somewhat less well behaved
because it involves inspection of memory that does not belong to the
application.
These two methods of testing for the existence of the Expanded Memory
Manager are illustrated in Figures 11-4 and 11-5.
──────────────────────────────────────────────────────────────────────────
.
.
.
; attempt to "open" EMM...
mov dx,seg emm_name ; DS:DX = address of name
mov ds,dx ; of Expanded Memory Manager
mov dx,offset emm_name
mov ax,3d00h ; function 3dh, mode = 00h
; = open, read only
int 21h ; transfer to MS-DOS
jc error ; jump if open failed
; open succeeded, be sure
; it was not a file...
mov bx,ax ; BX = handle from open
mov ax,4400h ; function 44h subfunction 00h
; = IOCTL get device information
int 21h ; transfer to MS-DOS
jc error ; jump if IOCTL call failed
and dx,80h ; bit 7 = 1 if character device
jz error ; jump if it was a file
; EMM is present, be sure
; it is available...
; (BX still contains handle)
mov ax,4407h ; function 44h subfunction 07h
; = IOCTL get output status
int 21h ; transfer to MS-DOS
jc error ; jump if IOCTL call failed
or al,al ; test device status
jz error ; if AL = 0 EMM is not available
; now close handle ...
; (BX still contains handle)
mov ah,3eh ; function 3eh = close
int 21h ; transfer to MS-DOS
jc error ; jump if close failed
.
.
.
emm_name db 'EMMXXXX0',0 ; guaranteed device name for
; Expanded Memory Manager
──────────────────────────────────────────────────────────────────────────
Figure 11-4. Testing for the Expanded Memory Manager by means of the
MS-DOS open and IOCTL functions.
──────────────────────────────────────────────────────────────────────────
emm_int equ 67h ; Expanded Memory Manager
; software interrupt
.
.
.
; first fetch contents of
; EMM interrupt vector...
mov al,emm_int ; AL = EMM int number
mov ah,35h ; function 35h = get vector
int 21h ; transfer to MS-DOS
; now ES:BX = handler address
; assume ES:0000 points
; to base of the EMM...
mov di,10 ; ES:DI = address of name
; field in device header
; DS:SI = EMM driver name
mov si,seg emm_name
mov ds,si
mov si,offset emm_name
mov cx,8 ; length of name field
cld
repz cmpsb ; compare names...
jnz error ; jump if driver absent
.
.
.
emm_name db 'EMMXXXX0' ; guaranteed device name for
; Expanded Memory Manager
──────────────────────────────────────────────────────────────────────────
Figure 11-5. Testing for the Expanded Memory Manager by inspection of the
name field in the driver's device header.
Using Expanded Memory
After establishing that the memory-manager software is present, the
application program communicates with it directly by means of the "user
interrupt" 67H, bypassing MS-DOS/PC-DOS. The calling sequence for the EMM
is as follows:
──────────