Magic Tricks for Precision Timekeeping

                       Revised 19 September 1993

Note: This information file is included in the NTP Version 3
distribution (xntp3.tar.Z) as the file README.magic. This distribution
can be obtained via anonymous ftp from louie.udel.edu in the directory
pub/ntp.

1. Introduction

It most cases it is possible using NTP to synchronize a number of hosts
on an Ethernet or moderately loaded T1 network to a radio clock within a
few tens of milliseconds with no particular care in selecting the radio
clock or configuring the servers on the network. This may be adequate
for the majority of applications; however, modern workstations and high
speed networks can do much better than that, generally to within some
fraction of a millisecond, by using special care in the design of the
hardware and software interfaces.

The timekeeping accuracy of a NTP-synchronized host depends on two
quantities: the delay due to hardware and software processing and the
accumulated jitter due to such things as clock reading precision and
varying latencies in hardware and software queuing. Processing delays
directly affect the timekeeping accuracy, unless minimized by systematic
analysis and adjustment. Jitter, on the other hand, can be essentially
removed, as long as the statistical properties are unbiased, by the low-
pass filtering of the phase-lock loop incorporated in the NTP local
clock model.

This note discusses issues in the connection of external time sources
such as radio clocks and related timing signals to a primary (stratum-1)
NTP time server. Of principal concern are various techniques that can be
utilized to improve the accuracy and precision of the time accuracy and
frequency stability. Radio clocks are most often connected to a time
server using a serial asynchronous port. Much of the discussion in this
memorandum has to do with ways in which the delay incurred in this type
of connection can be controlled and ways in which the jitter due to
various causes can be minimized.

However, there are ways other than serial ports to connect a radio
clock, including special purpose hardware devices for some
architectures, and even unusual applications of existing interface
devices, such as the audio codec provided in some systems. Many of these
methods can yield accuracies as good as any attainable with a serial
port. For those radio clocks equipped with an IRIG-B signal output, for
example, a hardware device is available for the Sun SPARCstation; see
the xntpd.8 manual page in the doc directory of the NTP Version 3
distribution for further information. In addition, it is possible to
decode the IRIG-B signal using the audio codec included in the Sun
SPARCstation and a special kernel driver described in the irig.txt file
in the doc directory of the NTP Version 3 distribution. These devices
will not be discussed further in this memorandum.

2. Connection via Serial Port

Most radio clocks produce an ASCII timecode with a precision only to the
millisecond. This results in a maximum peak-to-peak (p-p) jitter in the
clock readings of one millisecond. However, assuming the read requests
are statistically independent of the clock update times, the reading
error is uniformly distributed over the millisecond, so that the average
over a large number of readings will make the clock appear 0.5 ms late.
To compensate for this, it is only necessary to add 0.5 ms to its
reading before further processing by the NTP algorithms.

Radio clocks are usually connected to the host computer using a serial
port operating at a typical speed of 9600 baud. The on-time reference
epoch for the timecode is usually the start bit of a designated
character, usually <CR>, which is part of the timecode. The UART chip
implementing the serial port most often has a sample clock of eight to
16 times the basic baud rate. Assuming the sample clock starts midway in
the start bit and continues to midway in the first stop bit, this
creates a processing delay of 10.5 baud times, or about 1.1 ms, relative
to the start bit of the character. The jitter contribution is usually no
more than a couple of sample-clock periods, or about 26 usec p-p. This
is small compared to the clock reading jitter and can be ignored. Thus,
the UART delay can be considered constant, so the hardware contribution
to the total mean delay budget is 0.5 + 1.1 = 1.6 ms.

In some kernel serial port drivers, in particular, the Sun zs driver,
an intentional delay is introduce in input character processing when the
first character is received after an idle period. A batch of characters
is passed to the calling program when either (a) a timeout in the
neighborhood of 10 ms expires or (b) an input buffer fills up. The
intent in this design is to reduce the interrupt load on the processor
by batching the characters where possible. Obviously, this can cause
severe problems for precision timekeeping. It is possible to patch the
zs driver to eliminate the jitter due to this cause; contact the author
for further details. However, there is a better solution which will be
described later in this note. The problem does not appear to be present
in the Serial/Parallel Controller (SPC) for the SBus, which contains
eight serial asynchronous ports along with a parallel port. The
measurements referred to below were made using this controller.

Good timekeeping depends strongly on the means available to capture an
accurate sample of the local clock or timestamp at the instant the stop
bit of the on-time character is found; therefore, the code path delay
between the character interrupt routine and the first place a timestamp
can be captured is very important, since on some systems such as Sun
SPARCstations, this path can be astonishingly long. The Sun scheduling
mechanisms involve both a hardware interrupt queue and a software
interrupt queue. Entries are made on the hardware queue as the interrupt
is signalled and generally with the lowest latency, estimated at 20-30
microseconds (usec) for a SPARC 4/65 IPC. Then, after minimal
processing, an entry is made on the software queue for later processing
in order of software interrupt priority. Finally, the software interrupt
unblocks the NTP daemon which calculates the current local clock offset
and introduces corrections as required.

Opportunities exist to capture timestamps at the hardware interrupt
time, software interrupt time and at the time the NTP daemon is
activated, but these involve various degrees of kernel trespass and
hardware gimmicks. To gain some idea of the severity of the errors
introduced at each of these stages, measurements were made using a Sun
4/65 IPC and a test setup that results in an error between the host
clock and a precision time source (calibrated cesium clock) no greater
than 0.1 ms. The total delay from the on-time epoch to when the NTP
daemon is activated was measured at 8.3 ms in an otherwise idle system,
but increased on rare occasion to over 25 ms under load, even when the
NTP daemon was operated at the highest available software priority
level. Since 1.6 ms of the total delay is due to the hardware, the
remaining 6.7 ms represents the total code path delay accounting for all
software processing from the hardware interrupt to the NTP daemon.

It is commonly observed that the latency variations (jitter) in typical
real-time applications scale as the processing delay. In the case above,
the ratio of the maximum observed delay (25 ms) to the baseline code
path delay (8.3 ms) is about three. It is natural to expect that this
ratio remain the same or less as the code path between the hardware
interrupt and where the timestamp is captured is reduced. However, in
general this requires trespass on kernel facilities and/or making use of
features not common to all or even most Unix implementations. In order
to assess the cost and benefits of increasingly more aggressive insult
to the hardware and software of the system, it is useful to construct a
budget of the code path delay at each of the timestamp opportunity
times. For instance, on Unix systems which include support for the SIGIO
facility, it is possible to intervene at the time the software interrupt
is serviced. The NTP daemon code uses this facility, when available, to
capture a timestamp and save it along with the data in a buffer for
later processing. This reduces the total code path delay from 6.7 ms to
3.5 ms on an otherwise idle system. This reduction applies to all input
processing, including network interfaces and serial ports.

3. The CLK Mode

By far the best place to capture the timestamp is right in the kernel
interrupt routine, but this gerally requires intruding in the code
itself, which can be intricate and architecture dependent. The next best
place is in some routine close to the interrupt routine on the code
path. There are two ways to do this, depending on the ancestry of the
Unix operating system variant. Older systems based primarily on the
original Unix 4.3bsd support what is called a line discipline module,
which is a hunk of code with more-or-less well defined interface
specifications that can get in the way, so to speak, of the code path
between the interrupt routine and the remainder of the serial port
processing. Newer systems based on System V STREAMS can do the same
thing using what is called a streams module. Both approaches are
supported in the NTP Version 3 distribution, as described in the README
files in the kernel directory of the distribution. In either case,
header and source files have to be copied to the kernel build tree and
certain tables in the kernel have to be modified. In neither case,
however, are kernel sources required. In order to take advantage of
this, the clock driver must include code to activate the feature and
extract the timestamp. At present, this support is included in the clock
drivers for the Spectracom WWVB clock (WWVB define), the PSTI/Traconex
WWV/WWVH clock (PST define) and a special one-pulse-per-second (pps)
signal (PPSCLK define) described later. If justified, support can be
easily added to most other clock drivers as well. For future reference,
these modules operating with supported drivers will be called the CLK
support.

The CLK line discipline and STREAMS modules operate in the same way.
They look for a designated character, usually <CR>, and stuff a Unix
timestamp in the data stream following that character whenever it is
found. Eventually, the data arrive at the particular clock driver
configured in the NTP Version 3 distribution. The driver then uses the
timestamp as a precise reference epoch, subject to the earlier
processing delays and jitter budget, for future reference. In order to
gain some insight as to the effectiveness of this approach, measurements
were made using the same test setup described above. The total delay
from the on-time epoch to the instant when the timestamp is captured was
measured at 3.5 ms. Thus, the code path delay is this value less the
hardware delay 3.5 - 1.6 = 1.9 ms.

While the improvement in accuracy in the baseline case is significant,
there is another factor, at least in Sun systems, that makes it even
more worthwhile. When processing the code path up to the CLK module, the
priority is apparently higher than for processing beyond it. In case of
heavy CPU activity, this can lead to relatively long tails in the
processing delays for the driver, which of course are avoided by
capturing the timestamp early in the code path.

4. The PPSCLK Mode

Many timing receivers can produce a 1-pps signal of considerably better
precision than the ASCII timecode. Using this signal, it is possible to
avoid the 1-ms p-p jitter and 1.6 ms hardware timecode adjustment
entirely. However, a device is required to interface this signal to the
hardware and operating system. In general, this requires some sort of
level converter and pulse generator that can turn the 1-pps signal on-
time transition into a valid character. An example of such a device is
described in the gadget directory of the NTP Version 3 distribution.
Although many different circuit designs could be used as well, this
particular device generates a single 26-usec start bit for each 1-pps
signal on-time transition. This appears to the UART operating at 38.4K
baud as an ASCII DEL (hex FF).

Now, assuming a serial port can be dedicated to this purpose, a source
of 1-pps character interrupts is available and can be used to provide a
precision reference. The NTP Version 3 daemon can be configured to
utilize this feature by specifying the PPSCLK define, which requires the
CLK module and gadget box described above. The character resulting from
each 1-pps signal on-time transition is intercepted by the CLK module
and a timestamp is inserted in the data stream. An interrupt is created
for the device driver, which reads the timestamp and discards the DEL
character. Since the timestamp is captured at the on-time transition,
the seconds-fraction portion is the offset between the local clock and
the on-time epoch less the UART delay of 273 usec at 38.4K baud. If the
local clock is within +-0.5 second of this epoch, as determined by other
means, the local clock correction is taken as the offset itself, if
between zero and 0.5 s, and the offset minus one second, if between 0.5
and 1.0 s. In the NTP daemon the resulting correction is first processed
by a multi-stage median/trimmed mean filter to remove residual jitter
and then processed by the usual NTP algorithms.

The baseline delay between the on-time transition and the timestamp
capture was measured at 400+-10 usec on an otherwise idle test system.
As the UART delay at 38.4K baud is about 270 usec, the difference, 130
usec, must be due to the hardware interrupt latency plus the time to
call the microtime() routine which actually reads the system clock and
microsecond counter. For these measurements the assembly-coded version
of this routine described in the ppsclock directory of the NTP Version 3
distribution was used. This routine reduces the time to read the system
clock from 42-85 usec with the native Sun C-coded routine to about 3
usec using the microtime() assembly-coded routine and can be ignored.
Thus, the 130 usec must be accounted for in interrupt service, register
window, context switching, streams operations and measurement
uncertainty, which is probably not unreasonable. The reason for the
difference between the this figure and the previously calculated value
of 1.9 ms for the CLK module and serial ASCII timecode is probably due
to the fact that all STREAMS modules other than the CLK module were
removed, since the serial port is not used for ordinary ASCII data.

An interesting feature of this approach is that the 1-pps signal is not
necessarily associated with any particular radio clock and, indeed,
there may be no such clock at all. Some precision timekeeping equipment,
such as cesium clocks, VLF receivers and LORAN-C timing receivers
produce only a precision 1-pps signal and rely on other mechanisms to
resolve the second of the day and day of the year. It is possible for an
NTP-synchronized host to derive the latter information using other NTP
peers, presumably properly synchronized within +-0.5 second, and to
remove residual jitter using the 1-pps signal. This makes it quite
practical to deliver precision time to local clients when the subnet
paths to remote primary servers are heavily congested. In extreme cases
like this, it has been found useful to increase the tracking aperture
from +-128 ms to as high as +-512 ms.

In the current implementation the radio timecode and 1-pps signal are
separately processed. The timecode capture and CLK support, if provided
by the radio driver, operate the same way whether or not the PPSCLK
support is enabled. If the local clock is reliably synchronized within
+-0.5 s and the 1-pps signal has been valid for some number of seconds,
its offset rather than whatever synchronization source has been selected
is used instead. However, while a this procedure delivers a new offset
estimate every second, the local clock is updated only as each valid
update is computed for the peer selected as the source of
synchronization.

However, there is a hazard to the use of the 1-pps signal in this way if
the radio generating the 1-pps signal misbehaves or loses
synchronization with its transmitter. In such a case the radio might
indicate the error, but the system has no way to associate the error
with the 1-pps signal. To deal with this problem the prefer parameter
described in the xntpd.8 man page in the doc directory of the NTP
Version 3 distribution can be used both to cause the clock selection
algorithm to choose a preferred peer, all other things being equal, as
well as associate the error indications in such a way that the 1-pps
signal will be disregarded if the peer stops providing valid updates,
such as would occur in an error condition. The prefer parameter can be
used in other situations as well when preference is to be given a
particular source of synchronization.

5. The PPS Mode

For the ultimate accuracy and lowest jitter, it would be best to
eliminate the UART and capture the 1-pps on-time transition directly
using an appropriate interface. This is in fact possible using a
modified serial port driver and data lead in the serial port interface
cable. In this scheme, described in detail in the ppsclock directory of
the NTP Version 3 distribution, the 1-pps source is connected via the
previously described gadget box to the carrier-detect lead of a serial
port. Happily, this can be the same port used for a radio clock, for
example, or another unrelated serial device. The scheme, referred to
subsequently as the PPS mode, is specific to the SunOS 4.1.x kernel and
requires a special STREAMS module. Instructions on how to build the
kernel are also included in that directory.

Except for special-purpose interface modules, such as the KSI/Odetics
TPRO IRIG-B decoder and the modified audio driver for the IRIG-B signal
mentioned previously, the PPS mode provides the most accurate and
precise timestamp available. There is essentially no latency and the
timestamp is captured within 20-30 usec of the on-time epoch.

The PPS mode requires the PPSPPS define and one of the radio clock
serial ports to be selected as the PPS interface. This is the port which
handles the 1-pps signal; however, the signal path has nothing to do
with the ordinary serial data path; the two signals are not related,
other than by the need to activate the PPS mode and pass the file
descriptor to a common processing routine. Thus, for the port to be
selected for the PPS function, the define for the associated radio clock
needs to have a PPS suffix. In case of multiple radio clocks on a single
time server, the PPS suffix is necessary on only one of them; more than
one PPS suffix would be an error.

The PPS mode works just like the CLK mode in the treatment of the prefer
parameter and indicated peer errors. As in the CLK mode, only the offset
within the second is used and only when the offset is less than +-0.5 s.
However, the precision of the clock adjustments is usually so fine that
the error budget is dominated by the inherent short-term stability of
typical computer local clock oscillators. Therefore, it is advisable to
reduce the poll interval for the preferred peer from the default 64 s to
something less, like 16 s. This is done using the minpoll and maxpoll
parameters of the peer or server command associated with the clock.
These parameters take as arguments a power of 2, in seconds, which
becomes the poll interval and, indirectly, affects the bandwidth of the
tracking loop.

6. Results and Conclusions

It is clear from the above that substantial improvements in timekeeping
accuracy are possible with varying degrees of hardware and software
intrusion. While the ultimate accuracy depends on the jitter and wander
characteristics of the computer local oscillator, it is possible to
reduce jitter to a negligible degree simply by processing with the NTP
phase-lock loop and local clock algorithms. The residual jitter using
the PPS mode on a Sun4 IPC is typically in the 40-100 usec range, while
the wander is rarely more than twice that under typical environmental
room conditions.

David L. Mills <mills@udel.edu>
Electrical Engineering Department
University of Delaware
Newark, DE 19716
302 831 8247 fax 302 831 4316

25 August 1993