$Id:$ This the sgmls release 1.1 SGML parser written by James Clark jjc@jclark.com, repackaged for FreeBSD. The original source may be obtained from ftp://ftp.jclark.com/. Pieces removed include: * Test documents: Compiled on FreeBSD, sgmls passes all tests. * sgml-mode.el: The sole file covered by the GNU GPL. This is not installed anyway and anyone wishing to do serious SGML editing would be best to get the psgml package. * Makefiles and config files for other operating systems (vms, dos, cms). * Formatted versions of the man pages. 20-Apr-1995 John Fieber The original README and TODO follow. ---------------------------------------------------------------------- This is sgmls, an SGML parser derived from the ARCSGML parser materials which were written by Charles F. Goldfarb. (These are available for anonymous ftp from ftp.ifi.uio.no [128.240.88.1] in the directory SIGhyper/SGMLUG/distrib.) The version number is given in the file version.c. The file INSTALL contains installation instructions. The file NEWS describes recent user-visible changes. The file sgmls.man contains a Unix manual page; sgmls.txt is the formatted version of this. The file sgml-mode.el contains a very simple SGML mode for GNU Emacs. The files sgmls.c and sgmls.h contain a small library for parsing the output of sgmls. This is used by sgmlsasp, which translates the output of sgmls using an ASP replacement file, and by rast, which translates the output of sgmls to the format of a RAST result. The files sgmlsasp.man and rast.man contain Unix manual pages for sgmlsasp and rast; sgmlsasp.txt and rast.txt are the formatted versions of these. The file LICENSE contains the license which applies to arcsgml and accordingly to those parts of sgmls derived from arcsgml. See also the copyright notice at the beginning of sgmlxtrn.c. The parts that were written by me are in the public domain (any files that were written entirely by me contain a comment to that effect.) The file sgml-mode.el is covered by the GNU GPL. Please report any bugs to me. When reporting bugs, please include the version number, details of your machine, OS and compiler, and a complete self-contained file that will allow me to reproduce the bug. James Clark jjc@jclark.com ---------------------------------------------------------------------- Warn about mixed content models where #PCDATA can't occur everywhere. Perhaps there should be a configuration option saying what a control character is for the purpose of SHUNCHAR CONTROLS. Should the current character that is printed in error messages be taken from be taken from the file entity or the current entity? Refine SYS_ action. If we distinguish DELNONCH in lexmark, lexgrp, lexsd, we can have separate action that ignores the following character as well. Should RSs in CDATA/SDATA entities be ignored as specified in 322:1-2? Similarily, do the rules on REs in 322:3-11 apply to CDATA/SDATA entities? (I don't think they count as being `in content'.) What should the entity manager do when it encounters code 13 in an input file? (Currently it treats it as an RE.) Document when invalid exclusions are detected. Option not to perform capacity checking. Give a warning if the recommendation of 422:1-3 is contravened. Should an empty CDATA/RCDATA marked section be allowed in the document type declaration subset? Include example of use of SGML_PATH in documentation. Try to detect the situation in 310:8-10 (but see 282:1-2). Resize hash tables if they become too full. Say something in the man page about message catalogues. Consider whether support for SHORTREF NONE requires further changes (other than disallowing short reference mapping declaration). Fake /dev/fd/N and /dev/stdin for systems that don't provide it. Improve the effficiency of the entity manager by not closing and reopening files. If we run out of FILEs choose the stream with the fewest bytes remaining to be read, and read the rest of it into memory. Each entity level will have its own read buffer. Support multi-line error messages: automatically indent after newline. (We could output to a temporary file first, then copy to stderr replacing newlines by newline+indent). Option that says to output out of context things. Divide up formal public identifier errors. Give these errors their own type code. Consider whether, when OMITTAG is NO, we need to change interpretation of an empty start-tag (7.4.1.1). Possibly turn errors 70 and 136 into warnings. Make things work with NORMSEP > 2. Would need to keep track of number of CDATA and SDATA entities in CDATA attributes. Handle `SCOPE INSTANCE'. In entgen.c, truncate filenames for OSs that don't do this themselves. Provide an option that specifies that maximum number of errors; when this limit is exceeded sgmls would exit. Document non-portable assumptions in the code. Option to write out SGML declaration. In this case make it write out APPINFO parameter. Allow there to be catalogs mapping public ids to filenames. Environment variable SGML_CATALOG containing list of filenames of catalogs.