Experimental generic I/O system for Larceny
991018 / lth

$Id$


Introduction.

The generic I/O system allows the creation of ports where I/O channels
supplied by the operating system or the program can be mapped onto
standard Scheme ports, allowing standard I/O procedures like READ to be
performed on all sorts of channels.  In addition, the system provides
operations for byte I/O in addtion to character I/O.

The I/O system is built on top of the existing (as of 0.45) I/O system,
which is an alright design but both Unix-centric and somewhat limited;
as a result there are some restrictions in the current design (and a few
more in the implementation) that will be removed over time.


Ports.

Ports represent byte and character streams.  The generic I/O system
introduces two new procedures, MAKE-INPUT-PORT and MAKE-OUTPUT-PORT,
that create ports where user-defined procedures produce and consume
data.

Larceny distinguishes between byte ports and char ports.  The
distinction was introduced with the port of Petit Larceny to Windows 95
and is not at all well developed, and on Unix and Macintosh systems the
byte and char ports are currently identical.  These two types of ports
will be made clearly distinct at some point, and the use of character
operations on byte ports and vice versa will be prohibited.  

All ports are buffered and are structured to allow very efficient I/O.
Thus, the user provides only procedures for filling and flushing I/O
buffers, leaving the responsibility for single character and byte I/O to
the existing system.  (In a future version of Larceny, procedures like
READ-CHAR will be primitives.)  The operations to be performed for a
port depends on whether the port is an input port or output port, but in
both cases the operations are bundled in a _handler_: a procedure that
takes a selector argument and returns a procedure that performs the
operation on the port.

An input handler handles the following messages:

   (handler 'read)   : iodata * buffer -> { <fixnum> | eof | error }
        Read items (characters/bytes) into the buffer starting at 
        index 0, returning either the number of items read, the symbol
	"eof" if EOF was reached, and the symbol "error" if an error 
	was encountered.

   (handler 'close)  : iodata -> { ok | error }
        Close the port, returning the symbol "ok" if the closing 
        completed or "error" if it didn't.

   (handler 'ready?) : iodata -> boolean
        Return #t if an item is ready to be read on the port, #f if not.

   (handler 'name)   : iodata -> string
        Return the port's name.

An output handler handles the following messages:

   (handler 'write)  : iodata * buffer * count -> { ok | error }
        Write <count> items (characters/bytes) from the buffer starting 
	at index 0, returning the symbol "ok" if the writing completed 
	or "error" if an error was encountered.

   (handler 'close)  : iodata -> { ok | error }
        Close the port, returning the symbol "ok" if the closing 
        completed or "error" if it didn't.

   (handler 'name)   : iodata -> string
        Return the port's name.

In the current system, the buffer is always a string, even though that
does not make sense for byte ports.  In the future, the buffer for byte
ports will be a bytevector.

The "iodata" item is always the user-specifiable datum associated with
the port; see the port constructors below.


Constructors.

(MAKE-INPUT-PORT input-handler datum type)  ->  port

"Input-handler" is an input handler procedure, as described above, and
"datum" is any datum.  "Type" is  a symbol, either "char" or "byte".
The procedure returns an input port of the appropriate type.


(MAKE-OUTPUT-PORT output-handler datum type flush?)  ->  port

"Output-handler" is an output handler procedure, as described above, and
"datum" is any datum.  "Type" is a symbol, either "char" or "byte".
"Flush?" is a boolean.  The procedure returns an input port of the
appropriate type.  If "flush?" is TRUE, then the port will be flushed
following all output operations except WRITE-CHAR and WRITE-BYTE.


Predicates and field access.

(PORT-NAME port)  ->  string

Cause the "name" method of the handler to be called to return a name
for the port.

(PORT-POSITION port)  ->  exact-integer

Returns the current position of the read or write pointer of the port.
The position is 0 when a port is created and reading or writing an item
advanced the position by 1.

(PORT-HANDLER port)  ->  procedure

Returns the handler procedure.

(PORT-DATUM port)  ->  datum

Returns the user-defined handler associated with the port.

(PORT-CONTENT-TYPE port)  ->  symbol

Returns a symbol, "char" or "byte", describing the type of the port.
[Note: not currently implemented.]

(PORT-BUFFER port)  ->  { string | bytevector }

Returns the buffer associated with the port.

(PORT-ERROR-FLAG port)  ->  boolean
(PORT-ERROR-FLAG-SET! port boolean)  ->  unspecified

Returns or sets the port's error flag.  The error flag should be set
before an operation on the port signals an error.  (FIXME: It is unclear
exactly what effect the error flag has on the various operations.)

(PORT-EOF-FLAG port)  ->  boolean
(PORT-EOF-FLAG-SET! port flag)  ->  unspecified

Returns or sets the EOF flag of an input port.

(PORT-READ-POINTER port)  ->  fixnum
(PORT-READ-POINTER-SET! port fixnum)  ->  unspecified

Returns or sets the read pointer of an input port.  The read pointer is
the index in the port's buffer of the next item to read.  When the read
pointer equals the read limit (below), then the buffer is empty.

(PORT-READ-LIMIT port)  ->  fixnum
(PORT-READ-LIMIT-SET! port fixnum)  ->  unspecified

Returns or sets the read limit of an input port.  The read limit is the
index in the port's buffer immediately following the last valid item.

(PORT-FLUSH-FLAG port)  ->  boolean
(PORT-FLUSH-FLAG-SET! port boolean)  ->  unspecified

Returns or sets the discretionary-flush flag of an output port.

(PORT-WRITE-POINTER port)  ->  fixnum
(PORT-WRITE-POINTER-SET! port fixnum)  ->  unspecified

Returns or sets the write pointer of an output port.  The write pointer
is the index in the port's buffer of the next free slot.  When the write
pointer equals the buffer length, then the buffer is full.


I/O operations.

This I/O system introduces byte-port specific operations with the intent
that programs written using these operations will continue to work when
Larceny starts to enforce the distinction between char ports and byte
ports.

(READ-CHAR) -> char | eof-object
(READ-CHAR input-port) -> char | eof-object

(PEEK-CHAR) -> char | eof-object
(PEEK-CHAR input-port) -> char | eof-object

(CHAR-READY?) -> boolean
(CHAR-READY? input-port) -> boolean

(WRITE-CHAR c) -> unspecfied
(WRITE-CHAR c input-port) -> unspecified

These do what you expect them to.

(READ-BYTE) -> byte | eof-object
(READ-BYTE input-port) -> byte | eof-object

(PEEK-BYTE) -> byte | eof-object
(PEEK-BYTE input-port) -> byte | eof-object

(BYTE-READY?) -> boolean
(BYTE-READY? input-port) -> boolean

(WRITE-BYTE b) -> unspecfied
(WRITE-BYTE b input-port) -> unspecified

These read and write bytes (small nonnegative fixnums) rather than
characters but are otherwise identical to the character operations
above.

(READ-CHARACTERS input-port buffer start count)  ->  read-count | eof-object
(READ-BYTES input-port buffer start count)  ->  read-count | eof-object

These procedures read characters and bytes, respectively, into "buffer" 
starting at index "start" for at most "count" items, and returns the
number of items read or eof-object if no items were read.  The buffer is
a string for READ-CHARACTERS and a bytevector for READ-BYTES.

(WRITE-CHARACTERS output-port buffer start count)  ->  unspecified
(WRITE-BYTES output-port buffer start count)  ->  unspecified

These procedures write characters and bytes, respectively, from "buffer"
starting at index "start" for "count" items.  The buffer is a string for
WRITE-CHARACTERS and a bytevector for WRITE-BYTES.


Example.

Here is a simple string port system, with three operations:
OPEN-INPUT-STRING, OPEN-OUTPUT-STRING, and GET-OUTPUT-STRING.

(define-record string-input-datum (string idx))
(define-record string-output-datum (strings))

(define (open-input-string s)
  (make-input-port (lambda (selector)
                     (case selector
                       ((read)   string-io/fill-buffer)
                       ((close)  (lambda (datum) #t))
                       ((ready?) (lambda (datum) #t))
                       ((name)   (lambda (datum) "*string*"))))
                   (make-string-input-datum s 0)
                   'char))

(define (string-io/fill-buffer data buffer)
  (let ((s (string-input-datum-string data))
	(i (string-input-datum-idx data)))
    (let ((n (min (string-length buffer) (- (string-length s) i))))
      (if (= n 0)
	  'eof
	  (do ((j 0 (+ j 1))
	       (k i (+ k 1)))
	      ((= j n) 
               (string-input-datum-idx-set! data (+ i n))
	       n)
	    (string-set! buffer j (string-ref s k)))))))

(define (open-output-string)
  (make-output-port (lambda (selector)
                      (case selector
                        ((write) string-io/flush-buffer)
                        ((close) (lambda (datum) #t))
                        ((name)  (lambda (datum) "*string*"))))
                    (make-string-output-datum '())
                    'char
                    #f))

(define (string-io/flush-buffer data buffer count)
  (string-output-datum-strings-set! 
   data 
   (append! (string-output-datum-strings data)
            (list (substring buffer 0 count))))
  'ok)

(define (get-output-string port)
  (flush-output-port port)
  (let* ((data (port-datum port))
	 (strings (string-output-datum-strings data)))
    (cond ((null? strings)
	   "")
	  ((null? (cdr strings))
	   (car strings))
	  (else
	   (let ((x (apply string-append strings)))
             (string-output-datum-strings-set! data (list x))
	     x)))))


Caveats.

The underlying I/O system may not be thread safe; this must be checked,
and fixed if necessary.


Weaknesses.

* There is no way for the port-handler to ignore a CLOSE operation, but
  sometimes it would be reasonable for it to be able to do so.

* There is no way to signal a sophisticated error condition; the symbol
  ERROR does not carry enough information.  The error handling needs to
  be revisited when we have an exception system.


; eof
