summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorMichael I. Bushnell <mib@gnu.org>1994-04-05 01:49:57 +0000
committerMichael I. Bushnell <mib@gnu.org>1994-04-05 01:49:57 +0000
commita6a55edde2431c62f8044e0bd0e8e403cab1dbfe (patch)
tree7edc5a9fa19ea7214bda8d740fe2bb6f994fbd4c /doc
parent2c676995bf90d093970dd5971db88148140102b9 (diff)
Initial revision
Diffstat (limited to 'doc')
-rw-r--r--doc/hurd.texi554
1 files changed, 554 insertions, 0 deletions
diff --git a/doc/hurd.texi b/doc/hurd.texi
new file mode 100644
index 00000000..e7cc9e92
--- /dev/null
+++ b/doc/hurd.texi
@@ -0,0 +1,554 @@
+\input texinfo @c -*-texinfo-*-
+@setfilename hurd.texi
+
+@ifinfo
+@format
+START-INFO-DIR-ENTRY
+* Hurd: (hurd). The interfaces of the GNU Hurd.
+END-INFO-DIR-ENTRY
+@end format
+@end ifinfo
+
+@ifinfo
+Copyright @copyright{} 1994 Free Software Foundation, Inc.
+
+Permission is granted to make and distribute verbatim copies of
+this manual provided the copyright notice and this permission notice
+are preserved on all copies.
+
+@ignore
+Permission is granted to process this file through TeX and print the
+results, provided the printed document carries a copying permission
+notice identical to this one except for the removal of this paragraph
+(this paragraph not being relevant to the printed manual).
+
+@end ignore
+
+Permission is granted to copy and distribute modified versions of this
+manual under the conditions for verbatim copying, provided also that
+the entire resulting derived work is distributed under the terms of a
+permission notice identical to this one.
+
+Permission is granted to copy and distribute translations of this manual
+into another language, under the above conditions for modified versions.
+@end ifinfo
+
+@setchapternewpage odd
+@settitle Hurd Interface Manual
+@titlepage
+@finalout
+@title The GNU Hurd Interface Manual
+@author Michael I. Bushnell
+@page
+
+@vskip 0pt plus 1filll
+Copyright @copyright{} 1994 Free Software Foundation, Inc.
+
+Permission is granted to make and distribute verbatim copies of
+this manual provided the copyright notice and this permission notice
+are preserved on all copies.
+
+Permission is granted to copy and distribute modified versions of this
+manual under the conditions for verbatim copying, provided also that
+the entire resulting derived work is distributed under the terms of a
+permission notice identical to this one.
+
+Permission is granted to copy and distribute translations of this manual
+into another language, under the above conditions for modified versions.
+@end titlepage
+
+@node Top
+@top Introduction
+
+This manual describes the interfaces that make up the GNU Hurd. It is
+assumed that the reader is familiar with the features of the Mach
+kernel, and with using the Hurd interfaces as a user, and all of the
+associated C library calls. It concentrates on requirements and advice
+for the writing of Hurd servers, as well as describing the libraries
+that come with the GNU Hurd.
+
+@menu
+* I/O interface:: The interface for reading and writing
+ I/O channels
+* Shared I/O:: The interface for doing input and output
+ using shared memory
+* File interface:: The interface for modifying file-specific
+ characteristics
+* Filesystem interface:: Interfaces supported to control file-servers
+* Socket interface:: Interfaces used for manipulating sockets
+
+* Ports library:: A library to manage port rights for servers
+* Iohelp library:: A library to implement some common parts
+ of the I/O and shared I/O interfaces.
+* Fshelp library:: A library to implement some common parts
+ of the file interface.
+* Pager library:: A library to implement complex
+ multi-threaded pagers.
+* Diskfs library:: A library to do almost all the work of
+ implementing a disk-based filesystem.
+* Trivfs library:: A library to do the work of handling the
+ file protocol for directory-less
+ filesystems.
+* Mapped data:: Getting memory objects referring to the
+ data of an I/O object.
+
+@node I/O interface
+@chapter I/O interface
+
+The I/O interface is used to interact with almost all servers in the
+GNU Hurd. It provides facilities for reading and writing I/O streams.
+The I/O interface facilities are described in <hurd/io.defs> and
+<hurd/shared.h> The latter portion of <hurd/io.defs> and all of
+<hurd/shared.h> describe how to implement shared-memory I/O operations,
+and are described later. The present chapter is concerned with
+RPC-based I/O operations.
+
+@menu
+* I/O object ports:: How ports to I/O objects work
+* Simple operations:: Read, write, and seek
+* Open modes:: State bits that affect pieces of
+ operation
+* Asynchronous I/O:: How to get notified when I/O is possible
+* Information queries:: How to implement io_stat and
+ io_server_version
+@end menu
+
+@node I/O object ports
+@section I/O object ports
+
+Each port to an I/O server should be associated with a particular set of
+uids and gids, identifying the user who is responsible for operations on
+the port. Every port to an I/O server should also support either the
+file protocol or the socket protocol; naked I/O ports are not allowed.
+
+In addition, each port is associated with a default file pointer, a set
+of open mode bits, a pid (called the ``owner''), and some underlying
+object which can absorb data (for write) or provide data (for read).
+
+The uid and gid sets associated with a port may not be visibly shared
+with other ports; nor may they ever change. The identification of a set
+of uids and gids with a particular port must be fixed at the moment of
+the port's creation. The other characteristics of an I/O port may be
+shared with other users. The manner in which these characteristics are
+shared is not part of the I/O server interface; however, the file and
+socket interfaces make further requirements about what sharing is
+expected and prohibited from occurring.
+
+In general, users get send-rights to I/O ports by some mechanism that is
+external to the I/O protocol. (For example file servers give out I/O
+ports in response to the dir_pathtrans and fsys_getroot calls. Socket
+servers give out ports in response to the socket_create and
+socket_accept calles.) However, the I/O protocol provides methods of
+obtaining new ports that refer to the same underlying object as another
+port. In response to all of these calls, all underlying state
+(including, but not limited to, the default file poirter, open mode
+bits, and underlying object) must be shared between the old and new
+ports. In the following descriptions of these calls, this is what is
+meant by saying that the new port is "identical" to the old port. They
+all must return send-rights to a newly-constructed Mach port.
+
+The io_duplicate call simply returns another port which is identical
+to an existing port and has the same uid and gid set.
+
+The io_restrict_auth call should return another port, identical to the
+provided port, but which has a smaller associated uid and gid set. The
+uid and gid sets of the new port should be the intersection of the set
+on the existing port and the lists of uids and gids provided in the
+call.
+
+The io_reauthenticate call is used when users wish to have an entirely
+new set of uids or gids associated with a port. When such a call is
+received, the server must create a new port, and then make the call
+auth_server_authenticate to the auth server. The rendezvous port for
+the auth_server_authenticate call is the I/O port to which was made the
+io_reauthenticate call. The rend_int parameter should be copied from
+the io_reauthenticate call. The I/O server also gives the auth server a
+new port; this should be a newly created port identical to the old port.
+The auth server will return the set of uids and gids associated with the
+user, and guarantees that the new port will go directly to the user that
+possessed the associated authentication port.
+
+@node Simple operations
+@section Simple operations
+
+Users write to I/O ports by calling the io_write RPC. They specify an
+offset parameter; if the object supports writing at arbitrary offsets,
+this should be honored. If -1 is passed as the offset, then the default
+file pointer should be used. The server should return the amount of
+data which was successfully written. If the operation was interrupted
+after some but not all of the data was written, then it is considered to
+have succeeded and should return the amount written. If the port is not
+an I/O port at all, the error EOPNOTSUPP should be returned. If the
+port is an I/O port, but does not happen to support writing, then EBADF
+should be returned.
+
+Users read from I/O ports by calling the io_read RPC. The specify the
+amount of data they wish to read and the offset. The offset has the
+same meaning as for io_write above. The server should return the data
+read. If the call is interrupted after same data has been read (and the
+operation is not idempotent) then the server should return the amount
+read, even if less than the amount requested. The server should return
+as much data as possible, but never more than requested by the user. If
+there is no data, but there might be later, the call should block until
+data becomes available. End-of-file conditions are indicated by
+returning zero bytes. If the call is interrupted after some data has
+been read, but the call is idempotent, then the server may return EINTR
+rather than actually filling the buffer (taking care that any
+modifications of the default file pointer have been reversed).
+
+Objects are divided into two categories: seekable and non-seekable.
+Seekable objects are required to accept arbitrary offset parameters in
+the io_read and io_write calls, and to implement the io_seek call.
+Nonseekable objects must ignore the offset parameters to io_read and
+io_write, and should return ESPIPE to the io_seek call.
+
+On seekable objects, io_seek is used to change the default file pointer
+for reads and writes. (See the C library manual for the interpretation
+of the WHENCE and OFFSET arguments, and why the grammatically incorrect
+term `whence' is used.) It returns the new offset as modified by
+io_seek.
+
+The io_readable interface should return the amount of data which can be
+immediately read. For the special technical meaning of "immediately",
+see the description of asynchronous I/O. (*Note: Asynchronous I/O.)
+
+@node Open modes
+@section Open modes
+
+Each port is identified with a set of bits that affect its operation.
+These bits are modified with the io_set_all_openmodes call and fetched
+with the io_get_openmodes. In addition, the io_set_some_openmodes and
+io_clear_some_openmodes do atomic read/writes of the openmodes.
+
+The O_APPEND bit, when set, changes the behavior of io_write when it
+uses the default file pointer on seekable objects. When io_write is
+done on a port with the O_APPEND bit set, is must set the filepointer to
+one more than the "maximum correct value" (described below) before doing
+the write (which would then increment the file pointer as usual). This
+update must be atomically bound to the actual data write with respect to
+other users of io_read, io_write, and io_seek.
+
+A "correct value" for the file pointer which, when provided to io_read,
+will successfully return at least one byte of data and not end-of-file.
+The "maximum correct value" referred to in the description of O_APPEND
+is the maximum such correct value. (For ordinary files [see the
+description of the file protocol for more information] this is the same
+as the current file size.)
+
+The O_FSYNC bit, when set, should cause io_write not to delay writing
+data to underlying media in any fashion.
+
+The O_NONBLOCK bit, when set, should prevent read and write from
+blocking. They should copy such data as is immediately available. If
+no data is immediately available they should return EWOULDBLOCK.
+
+The definition of "immediate" is more or less server dependent. Some
+servers (disk based file servers, most notable) regard all data as
+immediatebly available. The one criterion is that something which must
+happen immediately may not wait for any user-synchronizable event.
+
+The O_ASYNC bit is deprecated; its use is documented in the following
+section. This bit must be shared between all users of the same
+underlying object.
+
+@node Asynchronous I/O
+@section Asynchronous I/O
+
+Users may wish to be notified of when I/O can be done without blocking;
+they use the io_async call to indicate this to the server. In the
+io_async call the user provides a port on which will be sent sig_post
+messages as I/O becomes possible. The server should return a port which
+will be used as a reference port in sig_post messages. Each io_async
+call should generate a new reference port. (See the C library manual
+for information on how to send sig_post messages.)
+
+The server should send one SIGIO signal to each registered async user
+everytime I/O becomes possible. I/O is possible if at least one byte
+can be read or written immediately. (The definition of ``immediately''
+must be the same as for the implementation of the O_NONBLOCK flag.)
+Everytime io_read or io_write is called, another signal should be sent
+to each user if I/O is still possible.
+
+Some objects may also define "urgent" conditions. Such servers should
+send the SIGURG signal to each registered async user anytime an urgent
+condition appears. After any RPC that has the possibility of clearing
+the urgent condition, the signal should again be sent to all registered
+users if the urgent condition is still present.
+
+A more fine-grained mechanism for doing async I/O is the io_select call.
+The user specifies the kind of access desired, and a send-once right.
+If I/O of the kind the user desires is immediately possible, then the
+server should return so indicating, and destroy the send-once right. If
+I/O is not immediately possible, the server should save the send-once
+right, and send a select_done message as soon as I/O becomes immediately
+possible. (Again, the definition of ``immediate'' must be the same for
+io_select, io_async, and O_NONBLOCK.)
+
+For compatibility, a deprecated feature (known as icky async I/O) is
+provided. The calls io_mod_owner and io_get_owner are used to set the
+``owner'' of the object; either a pid or a pgrp (negative) is provided.
+Whenever the I/O server is sending messages to all the io_async users,
+if the O_ASYNC bit is set for any user of the object, it should also
+send a signal to the owning pid/pgrp. The ID port for this call should
+be different from all the io_async id ports given to users. Users may
+find out what ID port will be used by calling io_get_icky_async_id.
+
+@node Information queries
+@section Information queries
+
+Users may call io_stat to find out information about the I/O object.
+Most of the fieds of a struct stat are meaningful only for files. All
+objects, however, are required to support the fields st_fstype, st_fsid,
+st_ino, st_atime, st_atime_usec, st_mtime_user, st_ctime, st_ctime_usec,
+st_blksize.
+
+st_fstype, st_fsid, and st_ino must be unique for the underlying object
+across the entire system.
+
+st_atime and st_atime_usec hold the seconds and microseconds,
+respectively, of the system clock at the last time the object was
+read with io_read.
+
+st_mtime and st_mtime_usec hold the second and microseconds,
+respectively, of the system clock at the last time the object was
+written with io_write.
+
+st_ctime and st_ctime_usec hold the seconds and microseconds,
+respectively, of the system clock at the last time permanent meta-data
+associated with the object was changed. The exact operations which
+couse such an update are server-dependent, but must include the creation
+of the object.
+
+st_blksize gives the optimal I/O size for io_read and io_write; users
+should endeavor to read and write amounts which are multiples of the
+optimal size, and to use offsets which are multiples of the optimal
+size.
+
+In addition, objects which are seekable should set st_size to the
+"maximum correct value" described above in the description of the
+O_APPEND flag.
+
+The st_uid and st_gid fields are unrelated to the ``owner'' as described
+above for icky async I/O.
+
+Users may find out the version of the server they are talking to by
+calling io_server_version; this should return strings and integers
+describing the version number of the server, as well as its name.
+
+@node Mapped data
+@section Mapped data
+
+Servers may optionally implement the io_map call; they may do so even if
+the do not implement the facilities described in the following chapter.
+The ports returned by io_map must implement the XP kernel interface and
+be suitable as arguments to vm_map.
+
+Seekable objects must allow access from 0 to the "maximum correct value"
+described for O_APPEND. Whether they provide access beyond such a point
+is server dependent; in addition, the meaning of such an object for a
+non-seekable object is server dependent. However, servers which
+implement the facilities of the next section are bound to certain
+requirements about which addresses in the memory objects provided by
+io_map must be valid. Simply put, any user following the rules
+described in the next chapter should not get any memory faults except as
+explicitly permitted by the next chapter.
+
+@node Shared I/O
+@chapter Shared I/O
+
+I/O servers may, optionally, provide the services described in this
+chapter in addition to the generic services described in the previous
+chapter. These facilities allow users to read and write I/O objects
+without making RPC's to the server in most circumstances.
+
+@menu
+* Rules:: The rules users must obey in using
+ shared I/O.
+* Examples:: Examples of the way different types
+ of servers could implement shared I/O.
+@end menu
+
+@node Rules
+@section Rules
+
+Any server implementing the facilities of this chapter must also support
+the io_map call as described in the previous chapter.
+
+Users of the shared I/O facilities must call io_map_cntl; this will
+return a memory object, called the shared page object. One page of this
+object should be mapped from offset zero into the user's address space.
+At the front of this page is a struct shared_io as described in
+<hurd/shared.h>. Frequent reference will be made to the members of this
+structure in this chapter, without further qualification.
+
+Only one shared user can be active on a given port at a time. If
+io_map_cntl is called for a port on which a user is already active, the
+server should return EBUSY, and which point the user should call
+io_duplicate to obtain a new port, and call io_map_cntl there.
+
+@menu
+* Conch:: How access to the shared page is mediated
+* Access rules:: Where in the io_map memory objects users
+ may peek and poke
+* Behavior modification:: Modifications of behavior
+* Status notifications:: Calls users should make at certain
+ times to keep the server abreast of the
+ current state of the object
+* Violations:: When the rules are broken
+
+@end menu
+
+@node Conch
+@subsection Conch
+
+Access to the shared page is mediated through a facility known as the
+``conch''. The ``lock'' field of the shared page protects the
+conch_status field; this lock must be acquired with spin_lock before
+conch_status may be modified or examined.
+
+If the conch_status field is USER_HAS_CONCH or USER_RELEASE_CONCH, then
+the user has the conch, and may access the shared page after releasing
+the spin lock. . If the conch status field is USER_COULD_HAVE_CONCH,
+then the user may immediately set conch_status to USER_HAS_CONCH, and
+proceed to access the shared page after releasing the spin lock. If the
+conch status is USER_HAS_NOT_CONCH, then the user should release the
+spin lock, and call io_get_conch. Upon return from io_get_conch, the
+user should reacquire the spin lock and check the conch status again.
+
+When the user is through accessing the shared page, the user should
+acquire the spin lock and examine the conch_status field. If it has
+been set to USER_RELEASE_CONCH, then the user should release the spin
+lock and call io_release_conch.
+
+The implementation of io_read and io_write must not modify the file
+contents except when the server is holding the conch; users who wish to
+be atomic with respect to those functions should be similarly reticent.
+
+The server should guarantee that at most one user has the conch at a
+time; the server may only have the conch if no user does. The server
+may not modify conch_status or the shared page if the status is
+USER_HAS_CONCH except to set it to USER_RELEASE_CONCH, thus requesting a
+call to io_release_conch.
+
+@node Access rules
+@subsection Access rules
+
+The conch fields file_size, read_size, and prenotify_size affect which
+areas of the data objects may be accessed. In addition, for
+non-seekable objects, the file pointers rd_file_pointer,
+wr_file_pointer, and xx_file_pointer affect which areas may be accessed.
+
+For seekable objects, the read object may be read from offset 0 through
+the minimum of file_size and read_size.
+
+For seekable objects, the write object may be modified from offset 0
+through the prenotify_size.
+
+For nonseekable objects, the read object may be read from
+rd_file_pointer through the minimum of file_size and read_size.
+
+For nonseekable objects, the write object may be modified from
+wr_file_pointer through prenotify_size.
+
+The server may permit access outside these regions, but data will not
+necessarily be preserved for any length of time if so written. If the
+server wishes to deny such access, it should fault with EIO. Servers
+may also issue faults on modifications of the write object for reasons
+such as EDQUOT and ENOSPC, as well as reporting hardware errors with
+EIO. Serveys may only fault valid addresses in the read object with EIO
+to indicate hardware failure.
+
+The foo field should be ignored if the value use_foo is clear in the
+shared page; this may result in there being no maximum valid address for
+a particular access. In that case, the object may be accessed to the
+end of its virtual address space.
+
+If use_file_size is set, the user may increase the file_size, but may
+not decrease it, to indicate the new "maximum correct value" as
+described for O_APPEND. Normally writes which extend beyond the current
+file_size should extend it to the end of the write.
+
+The xx_file_pointer for seekable objects is the same as the default file
+pointer used by io_read and io_write.
+
+If use_read_size is set and the user wishes to read past read_size, she
+may call io_readsleep, which will return as soon as read_size is
+increased. If read_block_reason is set to RBR_BUFFER_FULL, then the
+read_size will not be increased until the rd_file_pointer is increased.
+
+If use_prenotify_size is set and the user wishes to write past
+prenotify_size, she may call io_prenotify, specifying the maximum offset
+the user intends to write. The server should return when prenotify_size
+has been increased, but is not obligated to extend it as far as the user
+wishes. In addition, io_prenotify may return errors such as ENOSPC,
+indicating that the prenotify_size cannot be increased.
+
+Seekable objects may modify the xx_file_pointer at will (including
+pointing past read_size, file_size, or prenotify_size). Non-seekable
+objects, however, may only increase the rd_file_pointer and
+wr_file_pointer. In addition, they may not modify them to point past
+the valid data as described above. Failing to advance them may prevent
+the read_size or prenotify_size from being increased.
+
+If eof_notify is set, then the user may attempt to have the file_size to
+be increased by calling io_eofnotify after "noticing" the current file
+size limit. io_eofnotify must return immediately, but need not increase
+the file_size or clear user_file_size. (However, if it is impossible
+for io_eofnotify to ever do anything, then the server should not set
+eof_notify.)
+
+@node Behavior modification
+@subsection Behavior modification
+
+The server flag append_mode is a copy of the O_APPEND open mode bit; if
+it is set, then the user should do writes at file_size and set the file
+pointer appropriately (this applies only if the user would be writing at
+the file pointer in the first place).
+
+@node Status notification
+@subsection Status notification
+
+The flag do_sigio requests the user to call io_sigio every time the file
+pointers or the file_size have been changed.
+
+If use_postnotify_size is set, then the user should call io_postnotify
+after writing data that extends past postnotify_size. Writes beyond
+postnotify_size may be buffered internally to the server for arbitrarily
+long periods until io_postnotify is called, regardless of@c the setting
+of the O_FSYNC bit.
+
+After modifying or reading the object contents, the user should set the
+written or accessed fields respectively. (Users who fail to set these
+fields will not thereby defeat the mtime/atime mechanism.)
+
+If the flag use_eof is set, then users should call io_eofnotify after
+reading up to the file_size and noticing it.
+
+@node Violations
+@subsection Violations
+
+Users who hold the conch for too long while conch_status is set to
+USER_RELEASE_CONCH may have the conch stolen from them and their
+conch_status unilaterally downgraded to USER_HAS_NOT_CONCH. Users who
+hold the spin lock for too long (where this ``too long'' is much much
+shorter than the previous one) will have the spin lock stolen from them.
+
+Users who read or write outside the valid regions described above may
+get memory faults and may not expect data written to be saved in any
+fashion.
+
+Users who write the read object (when it is different from the write
+object) may or may not get faults; they may not expect such data to be
+saved in any fashion.
+
+Users who fail to call io_postnotify may cause data to be buffered for
+arbitrarily long periods.
+
+Users who reduce rd_file_pointer, wr_file_pointer, or file_size will
+have such modifications ignored.
+
+Users may not call any server functions (whether in the I/O protocol or
+another) while holding the conch except for those specified in this
+chapter. Such calls may block or fail silently.
+
+