SD Systems Versafloppy II controller with Z80 or 8080 processor control

In Nov 2003 I had many communications with Bruce Jones about operating the SD Systems Versafloppy II controller with slower and faster CPU's; and with a Z80 add-on Z80 DMA chip. These comments should be informative about any use of floppy disk controller chips and Z80's and 8080's. Here are his comments, with some edits by me (Herb Johnson) in []'s. I'll forward any inquiries to Bruce. Bruce also provided software and other discussions about SD Systems hardware: Check my SD Systems page for links and specifics.

To return to my S-100 home page, follow this link. Material on this page (c) copyright Herb Johnson 2003, except for material provided by Bruce Jones with his permission, for which he retains copyright. This page last updated June 1 2006. Herb Johnson


A Look At PIO and DMA Floppy Disk Transfers for S-100 Machines
with the SD Systems Versafloppy II

Bruce Jones, Nov 15th 2003. (Edits by Herb Johnson in []'s)

Introduction

This document covers floppy disk system I/O methods, based on my experience
designing hardware and
software for S-100 machines using the Digital Research operating systems of
CP/M-80 and MP/M-80.

[In these notes, Bruce describes techniques for use of very small routines
to read or write data sectors, using 8080 or Z80 processors running
at as low as 2MHz. Minimum code length for minimum execution time
is critical as the data rates for single and double density floppy
disk data provide little or no margin for delay. The first half of
this note describes the sector data transfer code and timing in detail.]

[Consequently additional coding techniques are necessary to deal with
the lack of a loop byte counter. This second half of this note
deals with those additional techniques.]

Background

The first hardware to run CP/M was Gary Kildall's MDS 800 development
machine. It used a DMA device
interfaced to a floppy disk controller for sector transfers. DMA was the
most common mechanism for many
types of peripheral device I/O, where contiguous blocks of data where to be
processed at high speed. The
introduction of the S-100 machines in 1975 resulted in a number of disk I/O
mechanisms that competed in the
market place, mostly on the basis of economy. The Programmed I/O, or PIO
floppy controller, became an entry-
level hardware device for controlling common and cheap flexible media
drives. Most of the PIO controllers
were designed with a single-user, single-task computing environment in mind.
When used in that environment
they appeared to exhibit the same performance and efficiency as the DMA
mechanism.

Types of FDD Controller [Data Transfers]
----------------------------------------

PIO - Wait State Control (WSC)
The PIO, or Programmed I/O controller was very typical of many S-100 floppy
drive controller mechanisms. It
generally referred to a controller device that accomplished sector transfers
through the 8080 IN and OUT port
functions. In order to synchronize the CPU with the variable response of the
disk drive, a simple I/O locking
mechanism was implemented, by utilizing the S-100 lines connected to the CPU
chip READY or WAIT control
pin. Some of these controllers provided options of external DMA or interrupt
I/O.

PIO - Intelligent or Buffered Transfer (BT)
Much less common than the above device, this type of PIO controller also
used processor IN and OUT
functions, however, the full sector would be buffered by onboard ram that
was part of the controller. Thus the
CPU need only poll the controller for a ready condition and then perform a
sector block move at maximum
speed. In some cases a simple interrupt routine could be implemented to
signal the end of a sector I/O operation.

PIO - Interrupt Driven (ID)
Almost as common and the WSC controller, (In fact, many WSC had options for
interrupt handling), this type
of controller required the BIOS writer to include an efficient interrupt
routine that moved bytes to and from the
floppy disk interface. The only advantage over the WSC method is that the
CPU was free to do additional work
until the sector transfer began. In terms of overall CPU availability, ID is
on a par with BT.

Direct Memory Access (DMA)
This type of controller freed the CPU of all floppy I/O chores. Except for
the execution of a brief routine that
prepared the DMA for a disk transfer, there was no other disk I/O overhead
for the CPU to perform. Polling the
DMA device or FDC for completion of task and status checking completed the
operation. Like the BT method,
the DMA offered good efficiency in a multi-tasking environment. But even
better than the BT transfers, the
DMA method dispensed with the CPU executed task of moving the sector data to
and from the interface. In
some cases a simple interrupt routine could be implemented to signal the end
of a sector I/O operation.

In summary, all controller types performed equally well in a single-tasking
environment. To the user, for a
given diskette format and drive combination, there was little perceptible
difference in operational response.
However, when considering the CPU as the most valuable system resource we
can consider efficiency of each
method to be different. From worst to best they are, WSC, BT & ID then DMA.

The Challenge of PIO-WSC Floppy I/O [Data Transfers]

Lets look at the CPU tasks needed to perform simple sector transfers using a
PIO-WSC controller:

1. Set the drive select logic to enable the correct diskette drive
2. Send out the track number for I/O
3. Send out the sector number for I/O
4. Send out the read/write command
5. Enable the wait-state logic
6. Input or output the sector data buffer
7. Deactivate the wait-state logic
8. Get the disk controller completion status
9. Retry up to 10 times on an FDC error
10. Return with status good or bad

[Step 5 sets up use of the FDC chip's byte-by-byte data request signal,
DRQ, which is connected to a CPU wait state generator. The FDC chip will
then force wait states on the FDC IN or OUT instruction, until the'
FDC has completed the transfer of that data byte. This rate of transfer
is set by the diskette data density to be read or written.]

Step 6 is timing sensitive. The code to perform this operation is simple,
and has a predetermined, fixed number of CPU
clock cycles. Depending on the
frequency, or data density, of the read/write media, the timing window for
every byte of floppy data needed to perform this operation is as follows:

Single Density Byte Transfer Window = 32 micro-seconds
Double Density Byte Transfer Window = 16 micro-seconds

The above timing is according to the Western Digital specification sheet for
the WD179X FDC chips.

[Also, step 6 requires either an interrupt from the FDC chip when the correct
number of data bytes have been transferred; or a zero-test for a byte
counting register.]

[8080 code for PIO data transfer]

The typical 8080 code to transfer a byte [data buffer] follows:

Floppy$byte:                   clocks    2mhz      4mhz
IN   fdcport    ;FDC port      10        5 uS      2.5 uS
MOV  M,A        ;sector buffer  7        3.5 uS    1.75 uS
INX  H          ;next loc.      7        3.5 uS    1.75 uS
DCR  B          ;byte counter   5        2.5 uS    1.25 uS
JNZ  floppy$byte               10        5 uS      2.5 uS

Total                          39       19.5uS     9.75 uS

The 2mhz 8080 will thus handle single-density media, but not
double-density (with this routine). The 4mhz [8080] processor will handle
both media types. If we consider 16 uS as the DD window, then an 8080
with a 2.44 mhz clock would just be able to handle DD transfers.

[note 2/22/04 by Herb: in a related comp.os.cpm discussion on this date,
Randy McLaughlin noted: "There was one other method posted [here] 
earlier by "Alison" that saves 3.5uS on a 2mhz 8080. Instead of an INX H 
followed by a DEC for a counter, if the buffer ends on a page boundary
an INR L works; getting rid of the INX (3.5uS)." A page boundary is a
memory address of the form "XX00H" where "XX" is the page address. Note
additional complications if the sector is larger than 256 bytes (one page).
Also review Bruce's other considerations in this document before coding
this method.]

[Z80 code for PIO data transfer]

Note that, [the 8080 code above]
performs worse on a Z80, since the Z80 IN and OUT instructions are
hard-wired to insert one additional clock cycle over that of the 8080. Thus
at 2mhz the Z80 requires 20 uS. It is possible to use a Z80 I/O instruction
that will successfully do floppy I/O for double-density media with a 2mhz
clock.


Floppy$Input:                    clocks    2mhz      4mhz
LD (HL),buffer ;sector buf.       n/a
LD   B,count   ;sector size       n/a
LD C,fdc$port  ;FDC I/Oport       n/a

INIR           ;thefull I/O code   21      10.5 uS   5.25 uS

A string of INIR or OUTIR instructions are used to cover [a number of]
actual sector sizes. Here is how simple it can be:

For 128 byte sectors, load 128 into B register, else load 0 [effectively
256] for all others, and enter at the sector length points:

1024$byte$entry:
               INIR
               INIR
512$byte$entry:
               INIR
256$byte$entry:
128$byte$entry:
               INIR

[Improved 8080 and Z80 PIO data transfer code]

A possible solution for 2mhz 8080 double-density floppy transfers:

Floppy$byte:                  clocks    2mhz
IN   fdcport    ;FDC port      10        5 uS
MOV  M,A        ;buffer         7        3.5 uS
INX  H          ;next loc.      7        3.5 uS
JMP  floppy$byte               10        5 uS

Total                          34        17 uS

This routine must be terminated by an end-of-operation interrupt, issued by
the floppy controller. To allow for some interrupt latency, add several
extra bytes to the sector buffer area. Note that at 17 uS it is a marginal
solution. I do not have an 8080 system to test this code, however, this
solution has been tried with a 4 mhz Z80 and it works flawlessly. At 2mhz 
[this routine] will not work, since the additional clock cycle added to all I/O
instructions results in a routine that consumes 17.5 uS, and this exceeds
the 17 uS window for double-density.

[note by Chuck Guzis June 2006: quote

"I'm surprised that you didn't carry this forward with a bit of loop
unrolling [with] this result:

floppy$byte:                  clocks    2mhz
IN   fdcport    ;FDC port      10        5 uS
MOV  M,A        ;buffer         7        3.5 uS
INX  H          ;next loc.      7        3.5 uS
IN   fdcport    ;FDC port      10        5 uS
MOV  M,A        ;buffer         7        3.5 uS
INX  H          ;next loc.      7        3.5 uS
IN   fdcport    ;FDC port      10        5 uS
MOV  M,A        ;buffer         7        3.5 uS
INX  H          ;next loc.      7        3.5 uS
IN   fdcport    ;FDC port      10        5 uS
MOV  M,A        ;buffer         7        3.5 uS
INX  H          ;next loc.      7        3.5 uS
JMP  floppy$byte               10        5 uS

Total                          106       53 uS

...or about 13 uS per byte--and just under the window for DD data transfers.
 Unrolling the transfer loop more produces rapidly decreasing returns.
 The cost is a modest 12 additional bytes." - end quote.]

[Note, however, that every four reads the delay due to the jump occurs
as described by Bruce. That delay can be avoided by unrolling the entire
sector read, if you have the code space to do so. - Herb Johnson]

An end-of-operation interrupt can be made on a Versafloppy II by jumpering
E7 to E8 on the Versafloppy board. This places [the FDC chip's] INTQ on the
S-100 VI3 [interrupt #3] line at pin 7.

DMA Transfers for 2mhz 8080 CPUs.
---------------------------------

The surest solution to obtain reliable double-density disk transfers with a
2mhz 8080 CPU is by implementing a DMA disk I/O mechanism. By jumpering E9
to E10 on the Versafloppy II board, a byte-by-byte data request signal
[DRQ from the FDC chip] will
be placed on the S-100 VI2 line (pin 6). This must be connected to a DMA
device, and code must be written to support whatever DMA device is
available. The task of building a general purpose DMA device for the S-100
bus is beyond the scope of this document. One complete solution would be to
replace the Versafloppy II with a Tarbell Double Density Disk Interface and
code for use of the DMA device on [the Tarbell] controller.

DMA Transfers for 2mhz Z80 CPUs.
-------------------------------

[For most Z80 CPU boards,] you could build a simple add-on DMA interface. Refer
to the Zilog "Microprocessor Applications Reference Book"  Vol. 1, 1981 page
2-35 for a typical example. [The Z80 DMA chip uses most of the Z80 lines
and so is a kind of "coprocessor" chip wired parallel to the Z80.]

[For instance,] I have added a Z80-DMA to one of my Z80 CPU
boards, simply by removing the Z80 from its socket, adding two 40 pin
sockets to a PC proto-board (one for the Z80, one for the DMA), plus a few
SSI chips for a port decoder to select the DMA registers. A wire from the
DMA [board] extends ("externally") to the S-100 pin 6 and thus picks up the
Versafloppy II DRQ line [wired to the card's VI2 line]. The port decoder 
is very simple, [as] it need only
decode a single I/O write-only port. There is no need to read the DMA
registers. The proto-board is inserted back into the Z80 socket on the CPU
board.

[Supporting code for PIO-WSC Floppy data transfers]

[This section covers the setup of the read and write code loops "FCDIO:"
and the code which follows completion of those loops via interrupt.
Since an interrupt routine is used to exit the read or write loop,
which otherwise would loop endlessly, something must be done when
the interrupt routine is completed and control returns to that loop.
I chose to have the interrupt routine "INTHDL:" write NOP - no ops - into the
read or write loop so execution will fall through to the next routine
which cleans up after the sector is read or written.]

FDCIO: could be two different code pieces, but since INTHDL has to "no-op"
out the code to break the loop, I decided on just one shared
place for the read and write loops. Note that this overhead is small
compared to the total time for an FDC operation. Also, with sector sizes
of, say 1024 bytes, this code is only entered 1/8 of the time for
BDOS disk reads, due to sector blocking/deblocking (1024/128 = 8).

[Finally, the read or write loop must be "refreshed" before each
use as it is "erased" when the previous read/write is completed.]

Note that the FDCIO I/O routine is a fast, irreducable loop. It reads sector
data from the FDC, and nothing more. This loop has no condition codes checks
for an end case. The read routine entry point actually writes this code into
place before execution. A modified form of the read loop, the write loop,
uses the same code space, and so it must be written into place as well,
before execution. Each entry point sets a code into the IOFLAG byte for
later reference. No matter the loop fuction, at the time of execution (R or
W), it will loop endlessly, without any need to accomodate a particular
sector size (128, 256, 512 or 1024 bytes). The physical end-of-operation
issued by the WD179X FDC chip determines the end point (buffered to the
S-100 VI3 line).

The interrupt handler, INTHDL, now must perform the following functions:

1 - input the FDC chip status port (This stops the INTQ from the WD179X &
thus the VI3 line 'quits')

2 - save that status value at ASTAT for later examination

3 - break the I/O FDCIO loop code, so that execution can move to the next
phase.
     It does this by writing seven 8080 NOPs into the loop. This ensures,
that upon return from the interrupt service routine, NO MATTER WHERE
in the [read or write] loop the code is [now] executing, it will execute
some number of NOPs [instead] and [fall through] to the CHEKFD
operation-complete test routine [which follows]. [T]Here a test is
performed on the saved status byte to see if the operation was successful
etc.

4 - RET from the service routine so CHEKFD can be entered.

The FDCIO loop has dispensed with the byte-counting function of the [B]
register. This shaves 5 uS from the I/O loop. Thus, a 2mhz 8080 has a chance
to do PIO floppy transfers with DD media. With this saving in I/O time an
8080 of 2mhz might just be able to R/W DD data. Since I don't have an 8080
CPU card any more, I cannot tell. I did run the code with my Z80 at both
2mhz & 4mhz to test it. 4mhz is fine, as I noted, but with the Zilog
chip-level hard-wired extra clock for all port I/O of the Z80, the timing is
too slow to work with DD at 2mhz.

NOTE

Actual measurments of DD FDC sector transfers showed that the window per
byte is actually 17 uS. But my Tek 2213 'scope may need calibration. :)


Code to setup read and write loop "refresh" and "no-ops"
---------------------------------------------------------

Actual code I used, to indicate what is involved:

WDATA EQU 67H ;FDC DATA I/O PORT

Entry point for FDC read. Note that the full code is written into place
prior to each sector transfer:

	MVI	A,0
	STA	IOFLAG	;SHOW OPERATION IS READ
	MVI	A,0DBH	;8080 IN opcode
	STA	FDCIO
	MVI	A,WDATA	;FDC I/O port
	STA	FDCIO+1
	MVI	A,77H	;8080 MOV M,A opcode
	STA	FDCIO+2
	MVI	A,23H	;8080 INX H opcode
	STA	FDCIO+3
	MVI	A,0C3H	;8080 JMP opcode
	STA	FDCIO+4
	PUSH	H		;save sector buffer pointer
	LXI	H,FDCIO	;make a pointer to loop address
	SHLD	FDCIO+5	;store it here
	POP	H		;recover buffer address
	JMP	FDXFER

Entry point for FDC write.

	MVI	A,1
	STA	IOFLAG	;SHOW OPERATION IS WRITE
	MVI	A,0D3H	;8080 OUT opcode
	STA	FDCIO
	MVI	A,WDATA	;FDC I/O port
	STA	FDCIO+1
	MVI	A,7EH	;8080 MOV A,M opcode
	STA	FDCIO+2
	MVI	A,23H	;8080 INX H opcode
	STA	FDCIO+3
	MVI	A,0C3H	;8080 JMP opcode
	STA	FDCIO+4
	PUSH	H		;save sector buffer pointer
	LXI	H,FDCIO	;make a pointer to loop address
	SHLD	FDCIO+5	;store it here
	POP	H		;recover buffer address
	JMP	FDXFER

FDXFER:
	EI			;be sure FDC can stop this loop

FDCIO:
        IN      WDATA           ;[this is the loop code]
        MOV     M,A             ;[which is refreshed or NOP'ed]
        INX     H               ;[for read or write]
	JMP	FDCIO

;Now go to code that processes FDC status

CHEKFD:

. . . . .

=============

The end-of-operation interrupt code

We have to fill the the above input/output looping code with NOPs, so that
when control is returned to the program, it will no longer loop, and will
continue on to enter the I/O status test routine.


WSTAT	EQU	64H ;FDC STATUS PORT


INTHDL:
	IN	WSTAT	;get FDC status to stop the FDC INTs
	STA	ASTAT	;save status
	IN	WSTAT	;to be sure FDC INTs stop
        XRA     A               ;fill input code with NOPs [00H]
	STA	FDCIO
	STA	FDCIO+1
	STA	FDCIO+2
	STA	FDCIO+3
	STA	FDCIO+4
	STA	FDCIO+5
	STA	FDCIO+6
	LDA	ASTAT	;recover FDC ending status

        RET             ;[..and return from interrupt]


ASTAT:	.BYTE	0
IOFLAG	.BYTE	0

NOTES:

You can replace the fill routines with a block move for the FDCIO routine,
and a simple zero fill routine at the interrupt service point.

Note that this input/output routine needs no byte counter. A sector of data
will be transfered, indepent of the sector length.

Remember to add several bytes to the sector buffer, beyond the size of the
largest sector size, in order to allow for interrupt latency.