In Nov 2003 I had many communications with Bruce Jones about operating the SD Systems Versafloppy II controller with slower and faster CPU's; and with a Z80 add-on Z80 DMA chip. These comments should be informative about any use of floppy disk controller chips and Z80's and 8080's. Here are his comments, with some edits by me (Herb Johnson) in []'s. I'll forward any inquiries to Bruce. Bruce also provided software and other discussions about SD Systems hardware: Check my SD Systems page for links and specifics.
To return to my S-100 home page, follow this link. Material on this page (c) copyright Herb Johnson 2003, except for material provided by Bruce Jones with his permission, for which he retains copyright. This page last updated June 1 2006. Herb Johnson
A Look At PIO and DMA Floppy Disk Transfers for S-100 Machines with the SD Systems Versafloppy II Bruce Jones, Nov 15th 2003. (Edits by Herb Johnson in []'s) Introduction This document covers floppy disk system I/O methods, based on my experience designing hardware and software for S-100 machines using the Digital Research operating systems of CP/M-80 and MP/M-80. [In these notes, Bruce describes techniques for use of very small routines to read or write data sectors, using 8080 or Z80 processors running at as low as 2MHz. Minimum code length for minimum execution time is critical as the data rates for single and double density floppy disk data provide little or no margin for delay. The first half of this note describes the sector data transfer code and timing in detail.] [Consequently additional coding techniques are necessary to deal with the lack of a loop byte counter. This second half of this note deals with those additional techniques.] Background The first hardware to run CP/M was Gary Kildall's MDS 800 development machine. It used a DMA device interfaced to a floppy disk controller for sector transfers. DMA was the most common mechanism for many types of peripheral device I/O, where contiguous blocks of data where to be processed at high speed. The introduction of the S-100 machines in 1975 resulted in a number of disk I/O mechanisms that competed in the market place, mostly on the basis of economy. The Programmed I/O, or PIO floppy controller, became an entry- level hardware device for controlling common and cheap flexible media drives. Most of the PIO controllers were designed with a single-user, single-task computing environment in mind. When used in that environment they appeared to exhibit the same performance and efficiency as the DMA mechanism. Types of FDD Controller [Data Transfers] ---------------------------------------- PIO - Wait State Control (WSC) The PIO, or Programmed I/O controller was very typical of many S-100 floppy drive controller mechanisms. It generally referred to a controller device that accomplished sector transfers through the 8080 IN and OUT port functions. In order to synchronize the CPU with the variable response of the disk drive, a simple I/O locking mechanism was implemented, by utilizing the S-100 lines connected to the CPU chip READY or WAIT control pin. Some of these controllers provided options of external DMA or interrupt I/O. PIO - Intelligent or Buffered Transfer (BT) Much less common than the above device, this type of PIO controller also used processor IN and OUT functions, however, the full sector would be buffered by onboard ram that was part of the controller. Thus the CPU need only poll the controller for a ready condition and then perform a sector block move at maximum speed. In some cases a simple interrupt routine could be implemented to signal the end of a sector I/O operation. PIO - Interrupt Driven (ID) Almost as common and the WSC controller, (In fact, many WSC had options for interrupt handling), this type of controller required the BIOS writer to include an efficient interrupt routine that moved bytes to and from the floppy disk interface. The only advantage over the WSC method is that the CPU was free to do additional work until the sector transfer began. In terms of overall CPU availability, ID is on a par with BT. Direct Memory Access (DMA) This type of controller freed the CPU of all floppy I/O chores. Except for the execution of a brief routine that prepared the DMA for a disk transfer, there was no other disk I/O overhead for the CPU to perform. Polling the DMA device or FDC for completion of task and status checking completed the operation. Like the BT method, the DMA offered good efficiency in a multi-tasking environment. But even better than the BT transfers, the DMA method dispensed with the CPU executed task of moving the sector data to and from the interface. In some cases a simple interrupt routine could be implemented to signal the end of a sector I/O operation. In summary, all controller types performed equally well in a single-tasking environment. To the user, for a given diskette format and drive combination, there was little perceptible difference in operational response. However, when considering the CPU as the most valuable system resource we can consider efficiency of each method to be different. From worst to best they are, WSC, BT & ID then DMA.
Lets look at the CPU tasks needed to perform simple sector transfers using a PIO-WSC controller: 1. Set the drive select logic to enable the correct diskette drive 2. Send out the track number for I/O 3. Send out the sector number for I/O 4. Send out the read/write command 5. Enable the wait-state logic 6. Input or output the sector data buffer 7. Deactivate the wait-state logic 8. Get the disk controller completion status 9. Retry up to 10 times on an FDC error 10. Return with status good or bad [Step 5 sets up use of the FDC chip's byte-by-byte data request signal, DRQ, which is connected to a CPU wait state generator. The FDC chip will then force wait states on the FDC IN or OUT instruction, until the' FDC has completed the transfer of that data byte. This rate of transfer is set by the diskette data density to be read or written.] Step 6 is timing sensitive. The code to perform this operation is simple, and has a predetermined, fixed number of CPU clock cycles. Depending on the frequency, or data density, of the read/write media, the timing window for every byte of floppy data needed to perform this operation is as follows: Single Density Byte Transfer Window = 32 micro-seconds Double Density Byte Transfer Window = 16 micro-seconds The above timing is according to the Western Digital specification sheet for the WD179X FDC chips. [Also, step 6 requires either an interrupt from the FDC chip when the correct number of data bytes have been transferred; or a zero-test for a byte counting register.] [8080 code for PIO data transfer] The typical 8080 code to transfer a byte [data buffer] follows: Floppy$byte: clocks 2mhz 4mhz IN fdcport ;FDC port 10 5 uS 2.5 uS MOV M,A ;sector buffer 7 3.5 uS 1.75 uS INX H ;next loc. 7 3.5 uS 1.75 uS DCR B ;byte counter 5 2.5 uS 1.25 uS JNZ floppy$byte 10 5 uS 2.5 uS Total 39 19.5uS 9.75 uS The 2mhz 8080 will thus handle single-density media, but not double-density (with this routine). The 4mhz [8080] processor will handle both media types. If we consider 16 uS as the DD window, then an 8080 with a 2.44 mhz clock would just be able to handle DD transfers. [note 2/22/04 by Herb: in a related comp.os.cpm discussion on this date, Randy McLaughlin noted: "There was one other method posted [here] earlier by "Alison" that saves 3.5uS on a 2mhz 8080. Instead of an INX H followed by a DEC for a counter, if the buffer ends on a page boundary an INR L works; getting rid of the INX (3.5uS)." A page boundary is a memory address of the form "XX00H" where "XX" is the page address. Note additional complications if the sector is larger than 256 bytes (one page). Also review Bruce's other considerations in this document before coding this method.] [Z80 code for PIO data transfer] Note that, [the 8080 code above] performs worse on a Z80, since the Z80 IN and OUT instructions are hard-wired to insert one additional clock cycle over that of the 8080. Thus at 2mhz the Z80 requires 20 uS. It is possible to use a Z80 I/O instruction that will successfully do floppy I/O for double-density media with a 2mhz clock. Floppy$Input: clocks 2mhz 4mhz LD (HL),buffer ;sector buf. n/a LD B,count ;sector size n/a LD C,fdc$port ;FDC I/Oport n/a INIR ;thefull I/O code 21 10.5 uS 5.25 uS A string of INIR or OUTIR instructions are used to cover [a number of] actual sector sizes. Here is how simple it can be: For 128 byte sectors, load 128 into B register, else load 0 [effectively 256] for all others, and enter at the sector length points: 1024$byte$entry: INIR INIR 512$byte$entry: INIR 256$byte$entry: 128$byte$entry: INIR [Improved 8080 and Z80 PIO data transfer code] A possible solution for 2mhz 8080 double-density floppy transfers: Floppy$byte: clocks 2mhz IN fdcport ;FDC port 10 5 uS MOV M,A ;buffer 7 3.5 uS INX H ;next loc. 7 3.5 uS JMP floppy$byte 10 5 uS Total 34 17 uS This routine must be terminated by an end-of-operation interrupt, issued by the floppy controller. To allow for some interrupt latency, add several extra bytes to the sector buffer area. Note that at 17 uS it is a marginal solution. I do not have an 8080 system to test this code, however, this solution has been tried with a 4 mhz Z80 and it works flawlessly. At 2mhz [this routine] will not work, since the additional clock cycle added to all I/O instructions results in a routine that consumes 17.5 uS, and this exceeds the 17 uS window for double-density. [note by Chuck Guzis June 2006: quote "I'm surprised that you didn't carry this forward with a bit of loop unrolling [with] this result: floppy$byte: clocks 2mhz IN fdcport ;FDC port 10 5 uS MOV M,A ;buffer 7 3.5 uS INX H ;next loc. 7 3.5 uS IN fdcport ;FDC port 10 5 uS MOV M,A ;buffer 7 3.5 uS INX H ;next loc. 7 3.5 uS IN fdcport ;FDC port 10 5 uS MOV M,A ;buffer 7 3.5 uS INX H ;next loc. 7 3.5 uS IN fdcport ;FDC port 10 5 uS MOV M,A ;buffer 7 3.5 uS INX H ;next loc. 7 3.5 uS JMP floppy$byte 10 5 uS Total 106 53 uS ...or about 13 uS per byte--and just under the window for DD data transfers. Unrolling the transfer loop more produces rapidly decreasing returns. The cost is a modest 12 additional bytes." - end quote.] [Note, however, that every four reads the delay due to the jump occurs as described by Bruce. That delay can be avoided by unrolling the entire sector read, if you have the code space to do so. - Herb Johnson] An end-of-operation interrupt can be made on a Versafloppy II by jumpering E7 to E8 on the Versafloppy board. This places [the FDC chip's] INTQ on the S-100 VI3 [interrupt #3] line at pin 7. DMA Transfers for 2mhz 8080 CPUs. --------------------------------- The surest solution to obtain reliable double-density disk transfers with a 2mhz 8080 CPU is by implementing a DMA disk I/O mechanism. By jumpering E9 to E10 on the Versafloppy II board, a byte-by-byte data request signal [DRQ from the FDC chip] will be placed on the S-100 VI2 line (pin 6). This must be connected to a DMA device, and code must be written to support whatever DMA device is available. The task of building a general purpose DMA device for the S-100 bus is beyond the scope of this document. One complete solution would be to replace the Versafloppy II with a Tarbell Double Density Disk Interface and code for use of the DMA device on [the Tarbell] controller. DMA Transfers for 2mhz Z80 CPUs. ------------------------------- [For most Z80 CPU boards,] you could build a simple add-on DMA interface. Refer to the Zilog "Microprocessor Applications Reference Book" Vol. 1, 1981 page 2-35 for a typical example. [The Z80 DMA chip uses most of the Z80 lines and so is a kind of "coprocessor" chip wired parallel to the Z80.] [For instance,] I have added a Z80-DMA to one of my Z80 CPU boards, simply by removing the Z80 from its socket, adding two 40 pin sockets to a PC proto-board (one for the Z80, one for the DMA), plus a few SSI chips for a port decoder to select the DMA registers. A wire from the DMA [board] extends ("externally") to the S-100 pin 6 and thus picks up the Versafloppy II DRQ line [wired to the card's VI2 line]. The port decoder is very simple, [as] it need only decode a single I/O write-only port. There is no need to read the DMA registers. The proto-board is inserted back into the Z80 socket on the CPU board.
[This section covers the setup of the read and write code loops "FCDIO:" and the code which follows completion of those loops via interrupt. Since an interrupt routine is used to exit the read or write loop, which otherwise would loop endlessly, something must be done when the interrupt routine is completed and control returns to that loop. I chose to have the interrupt routine "INTHDL:" write NOP - no ops - into the read or write loop so execution will fall through to the next routine which cleans up after the sector is read or written.] FDCIO: could be two different code pieces, but since INTHDL has to "no-op" out the code to break the loop, I decided on just one shared place for the read and write loops. Note that this overhead is small compared to the total time for an FDC operation. Also, with sector sizes of, say 1024 bytes, this code is only entered 1/8 of the time for BDOS disk reads, due to sector blocking/deblocking (1024/128 = 8). [Finally, the read or write loop must be "refreshed" before each use as it is "erased" when the previous read/write is completed.] Note that the FDCIO I/O routine is a fast, irreducable loop. It reads sector data from the FDC, and nothing more. This loop has no condition codes checks for an end case. The read routine entry point actually writes this code into place before execution. A modified form of the read loop, the write loop, uses the same code space, and so it must be written into place as well, before execution. Each entry point sets a code into the IOFLAG byte for later reference. No matter the loop fuction, at the time of execution (R or W), it will loop endlessly, without any need to accomodate a particular sector size (128, 256, 512 or 1024 bytes). The physical end-of-operation issued by the WD179X FDC chip determines the end point (buffered to the S-100 VI3 line). The interrupt handler, INTHDL, now must perform the following functions: 1 - input the FDC chip status port (This stops the INTQ from the WD179X & thus the VI3 line 'quits') 2 - save that status value at ASTAT for later examination 3 - break the I/O FDCIO loop code, so that execution can move to the next phase. It does this by writing seven 8080 NOPs into the loop. This ensures, that upon return from the interrupt service routine, NO MATTER WHERE in the [read or write] loop the code is [now] executing, it will execute some number of NOPs [instead] and [fall through] to the CHEKFD operation-complete test routine [which follows]. [T]Here a test is performed on the saved status byte to see if the operation was successful etc. 4 - RET from the service routine so CHEKFD can be entered. The FDCIO loop has dispensed with the byte-counting function of the [B] register. This shaves 5 uS from the I/O loop. Thus, a 2mhz 8080 has a chance to do PIO floppy transfers with DD media. With this saving in I/O time an 8080 of 2mhz might just be able to R/W DD data. Since I don't have an 8080 CPU card any more, I cannot tell. I did run the code with my Z80 at both 2mhz & 4mhz to test it. 4mhz is fine, as I noted, but with the Zilog chip-level hard-wired extra clock for all port I/O of the Z80, the timing is too slow to work with DD at 2mhz. NOTE Actual measurments of DD FDC sector transfers showed that the window per byte is actually 17 uS. But my Tek 2213 'scope may need calibration. :) Code to setup read and write loop "refresh" and "no-ops" --------------------------------------------------------- Actual code I used, to indicate what is involved: WDATA EQU 67H ;FDC DATA I/O PORT Entry point for FDC read. Note that the full code is written into place prior to each sector transfer: MVI A,0 STA IOFLAG ;SHOW OPERATION IS READ MVI A,0DBH ;8080 IN opcode STA FDCIO MVI A,WDATA ;FDC I/O port STA FDCIO+1 MVI A,77H ;8080 MOV M,A opcode STA FDCIO+2 MVI A,23H ;8080 INX H opcode STA FDCIO+3 MVI A,0C3H ;8080 JMP opcode STA FDCIO+4 PUSH H ;save sector buffer pointer LXI H,FDCIO ;make a pointer to loop address SHLD FDCIO+5 ;store it here POP H ;recover buffer address JMP FDXFER Entry point for FDC write. MVI A,1 STA IOFLAG ;SHOW OPERATION IS WRITE MVI A,0D3H ;8080 OUT opcode STA FDCIO MVI A,WDATA ;FDC I/O port STA FDCIO+1 MVI A,7EH ;8080 MOV A,M opcode STA FDCIO+2 MVI A,23H ;8080 INX H opcode STA FDCIO+3 MVI A,0C3H ;8080 JMP opcode STA FDCIO+4 PUSH H ;save sector buffer pointer LXI H,FDCIO ;make a pointer to loop address SHLD FDCIO+5 ;store it here POP H ;recover buffer address JMP FDXFER FDXFER: EI ;be sure FDC can stop this loop FDCIO: IN WDATA ;[this is the loop code] MOV M,A ;[which is refreshed or NOP'ed] INX H ;[for read or write] JMP FDCIO ;Now go to code that processes FDC status CHEKFD: . . . . . ============= The end-of-operation interrupt code We have to fill the the above input/output looping code with NOPs, so that when control is returned to the program, it will no longer loop, and will continue on to enter the I/O status test routine. WSTAT EQU 64H ;FDC STATUS PORT INTHDL: IN WSTAT ;get FDC status to stop the FDC INTs STA ASTAT ;save status IN WSTAT ;to be sure FDC INTs stop XRA A ;fill input code with NOPs [00H] STA FDCIO STA FDCIO+1 STA FDCIO+2 STA FDCIO+3 STA FDCIO+4 STA FDCIO+5 STA FDCIO+6 LDA ASTAT ;recover FDC ending status RET ;[..and return from interrupt] ASTAT: .BYTE 0 IOFLAG .BYTE 0 NOTES: You can replace the fill routines with a block move for the FDCIO routine, and a simple zero fill routine at the interrupt service point. Note that this input/output routine needs no byte counter. A sector of data will be transfered, indepent of the sector length. Remember to add several bytes to the sector buffer, beyond the size of the largest sector size, in order to allow for interrupt latency.